Hi;

First of all I do not want to be in the position of defending Pegasos
algorithm. I am interested in implementing SGD/ASGD or Pegosos or even
both. Of course what I want is to have useful code for the project (not for
only me). I do not have a preference over algorithms if they are fast
enough to solve problems.

Based on Leon Bottou's results, I would recommend a simple SGD
> implementation of SVM rather than Pegasos.
>
> http://leon.bottou.org/projects/sgd
> http://leon.bottou.org/publications/pdf/compstat-2010.pdf
> http://arxiv.org/abs/1107.2490
>
>
> > However, I think there are two issues we
> > need to clarify:
> >
> > 1) In general SGD like ideas are used for online learning (of course they
> > can be converted to batch learning) and Pegasos is used for batch
> learning.
> >
>
> I see no need for batch learning unless there is a net training benefit.
>

Point taken. I do not have enough large scale experience.


>
>
> >  Therefore may be we need to two similar but different enough software
> > architecture (I am not sure). If my intuition is right then it makes
> sense
> > to implement Pegasos and SGD independently. Further, especially Pegasus
> is
> > a state of the art method (in terms of speed) for text classification,
> > structured data prediction and these kind of problems, may be this is
> also
> > a point we need to take into account because there thousands of people
> who
> > are dealing with web scale text data for search engines, recommender
> > systems (I am not one of them therefore may be I am wrong here).
> >
>
> Pegasos is nice, but I don't necessarily see it as state of the art.
>
> For large-scale problems, in fact, I don't even see SVM as state of the
> art.  Most (not all) large-scale problems tend to be sparse and very high
> dimension.  This makes simple linear classifiers with L1 regularization
> very effective and often more effective than L2 regularization as with SVM.
>

Point taken. I completely agree that L1 regularization is very effective
for most (not all)
of the large scale data sets (At least I see this from the published
papers).

>
>
>
> > 2)  Pegasos will be faster for than any other SVM solver for only linear
> > kernels.
>
>
> I don't see this in the literature.  See Xu's paper, referenced above.
>

Your comments are very interesting and I will definitely read Xu's paper
(thanks a lot) . However, when I look Table 2 of  Shai Shalev-Shwartz paper
(reference given in my previous email), it seems that Pegosas is not faster
than Svm-Light when used with a nonlinear kernel. I believe this is caused
by kernel evaluations which are used in kernelized Pegasos (algorithm is
given Figure 3 of the same paper).


>
> > In the past there was belief that Pegasos can be applied to
> > nonlinear kernels(gaussian kernel, string kernel, HMM kernel etc. ) and
> it
> > will be still faster than other SVM solvers/SMO like solvers.
>
>
> I am not hearing a huge need for non-linear kernels in large scale
> learning.  Perhaps with image processing, but not with much else.  Also, I
> haven't heard that there isn't an SGD-like learning method for non-linear
> kernels.
>
> Point taken.

>
> > ... It is also known fact that, with a appropriate model selection,
> > nonlinear kernels give better classification accuracy then linear
> kernels.
> >
>
> Actually, not.  I think that the situations where non-linear kernels are
> better are more limited than most suppose, particularly for large-scale
> applications.
>
>
> > Exactly at this point, we need online learning (SGS/AGSD based method),
> we
> > can still use nonlinear kernels, parallelize the algorithm and we can
> have
> > a online SVM method for large/web scale datasets.
> >
>
> Now this begins to sound right.
>
> Honestly I am so much into SVM and kernel machines and I fear that I am
> > making big fuss out of small problems.
>
>
> My key question is whether you have problems that need solving.  Or do you
> have an itch to do an implementation for the sake of having the
> implementation?
>
> Either one is a reasonable motive, but the first is preferable.
>

As I mentioned above, I am really interested in applying machine learning
algorithms to large/web scale data. This does not mean
that I am a fan of a particular algorithm. Therefore I started to this
discussion and titled it "What to Implement/Improve/Document?".   From your
comments also from the comments of other people I started to learn a lot
(thanks a lot). I do not have an enough practical experience for large/web
scale data problems so I am open to comments from everyone.  As you have
stated that "I see no need for batch learning unless there is a net
training benefit" then SGD/AGSD based algorithms should have a priority
over "batch learning" methods. I am looking forward to have more comments
and I hope we will have a decision on what to implement.

Cheers
Ürün

Reply via email to