Hi; First of all I do not want to be in the position of defending Pegasos algorithm. I am interested in implementing SGD/ASGD or Pegosos or even both. Of course what I want is to have useful code for the project (not for only me). I do not have a preference over algorithms if they are fast enough to solve problems.
Based on Leon Bottou's results, I would recommend a simple SGD > implementation of SVM rather than Pegasos. > > http://leon.bottou.org/projects/sgd > http://leon.bottou.org/publications/pdf/compstat-2010.pdf > http://arxiv.org/abs/1107.2490 > > > > However, I think there are two issues we > > need to clarify: > > > > 1) In general SGD like ideas are used for online learning (of course they > > can be converted to batch learning) and Pegasos is used for batch > learning. > > > > I see no need for batch learning unless there is a net training benefit. > Point taken. I do not have enough large scale experience. > > > > Therefore may be we need to two similar but different enough software > > architecture (I am not sure). If my intuition is right then it makes > sense > > to implement Pegasos and SGD independently. Further, especially Pegasus > is > > a state of the art method (in terms of speed) for text classification, > > structured data prediction and these kind of problems, may be this is > also > > a point we need to take into account because there thousands of people > who > > are dealing with web scale text data for search engines, recommender > > systems (I am not one of them therefore may be I am wrong here). > > > > Pegasos is nice, but I don't necessarily see it as state of the art. > > For large-scale problems, in fact, I don't even see SVM as state of the > art. Most (not all) large-scale problems tend to be sparse and very high > dimension. This makes simple linear classifiers with L1 regularization > very effective and often more effective than L2 regularization as with SVM. > Point taken. I completely agree that L1 regularization is very effective for most (not all) of the large scale data sets (At least I see this from the published papers). > > > > > 2) Pegasos will be faster for than any other SVM solver for only linear > > kernels. > > > I don't see this in the literature. See Xu's paper, referenced above. > Your comments are very interesting and I will definitely read Xu's paper (thanks a lot) . However, when I look Table 2 of Shai Shalev-Shwartz paper (reference given in my previous email), it seems that Pegosas is not faster than Svm-Light when used with a nonlinear kernel. I believe this is caused by kernel evaluations which are used in kernelized Pegasos (algorithm is given Figure 3 of the same paper). > > > In the past there was belief that Pegasos can be applied to > > nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and > it > > will be still faster than other SVM solvers/SMO like solvers. > > > I am not hearing a huge need for non-linear kernels in large scale > learning. Perhaps with image processing, but not with much else. Also, I > haven't heard that there isn't an SGD-like learning method for non-linear > kernels. > > Point taken. > > > ... It is also known fact that, with a appropriate model selection, > > nonlinear kernels give better classification accuracy then linear > kernels. > > > > Actually, not. I think that the situations where non-linear kernels are > better are more limited than most suppose, particularly for large-scale > applications. > > > > Exactly at this point, we need online learning (SGS/AGSD based method), > we > > can still use nonlinear kernels, parallelize the algorithm and we can > have > > a online SVM method for large/web scale datasets. > > > > Now this begins to sound right. > > Honestly I am so much into SVM and kernel machines and I fear that I am > > making big fuss out of small problems. > > > My key question is whether you have problems that need solving. Or do you > have an itch to do an implementation for the sake of having the > implementation? > > Either one is a reasonable motive, but the first is preferable. > As I mentioned above, I am really interested in applying machine learning algorithms to large/web scale data. This does not mean that I am a fan of a particular algorithm. Therefore I started to this discussion and titled it "What to Implement/Improve/Document?". From your comments also from the comments of other people I started to learn a lot (thanks a lot). I do not have an enough practical experience for large/web scale data problems so I am open to comments from everyone. As you have stated that "I see no need for batch learning unless there is a net training benefit" then SGD/AGSD based algorithms should have a priority over "batch learning" methods. I am looking forward to have more comments and I hope we will have a decision on what to implement. Cheers Ürün
