Re: About SVM Stochastic Gradient Descent (SGD) on BSP model

Zach Richardson Fri, 02 Sep 2011 21:53:44 -0700

Hi Joe,

I would be careful trying to implement SGD for solving an SVM in
map-reduce or BSP, SGD tends to take a very large number of iterations
in order to optimize to a good solution.  Syncing in any distributed
system, be it map-reduce or BSP, is just not fast (compared to
iterations on a single machine).  This makes SGD a very poor choice
for a distributed system, especially since it is inherently a
sequential algorithm, unlike batch gradient descent, which could be
parallelized (although still poorly in a paradigm like map-reduce and
doesn't tend to converge as fast as SGD.  (This is not to say that SGD
can't be used to optimize fairly large SVM's)  If you want to build a
distributed  SVM I would recommend looking at using an IPM or ADMM
since they require far fewer global syncs before they are able to
converge.  Stephen Boyd has some good research on using distributed
systems (including BSP) with an ADMM.


Zach


On Fri, Sep 2, 2011 at 10:35 AM, Zhiyong Xie <[email protected]> wrote:
> cool! I will check them out!
> thanks!
> Joe
>
> On Fri, Sep 2, 2011 at 8:07 AM, Thomas Jungblut <
> [email protected]> wrote:
>
>> Hi Joe,
>>
>> great thank you very much for clarification.
>> I love classification algorithms, so I'll be very interested in how you
>> develop this.
>> "Per se" you can translate every MapReduce algorithm to BSP, since BSP is
>> an
>> abstraction to MapReduce.
>> E.G: Map Phase is a local computation phase, merging and sorting are the
>> synchronization barrier (needs finish of all map tasks) and reducing is a
>> computational phase again.
>> On the english wikipedia is a good schema that shows how the workflow is.
>>
>> Actually you can make your map phase as well in BSP, but for the latest
>> release 0.3.0 you have to write data sharding and partitioning for
>> yourself.
>> There are examples and blog posts that shows how code them.
>> Your reduce step is depending on your implementation. Is there a single
>> reducer which updates the whole classifier?
>>
>> Actually I wanted to implement a k-means clustering in BSP, but sadly I was
>> very busy and have not too much time for it. It is quite similar to your
>> algorithm. The Map step is calculating the distance between the current
>> point and the centers and the reducer is going to update the centers.
>>
>> To provide you with a bit of information, I already rewritten an MapReduce
>> graph algorithm to BSP. [1][2]
>> These examples are without partitioning, I recently did an improvement to
>> the partitioning algorithm. So it makes sense to checkout the current trunk
>> and browse through the graph package and examples package. It contains
>> improved partitioning as well as graph examples.
>>
>> HF and GL.
>> If you need help, I'll be glad to help you.
>>
>> [1]
>>
>> http://codingwiththomas.blogspot.com/2011/04/graph-exploration-with-hadoop-mapreduce.html
>> [2]
>>
>> http://codingwiththomas.blogspot.com/2011/04/graph-exploration-using-apache-hama-and.html
>>
>> 2011/9/2 Zhiyong Xie <[email protected]>
>>
>> > Thank you Thomas! In short, SVM (
>> > http://en.wikipedia.org/wiki/Support_vector_machine) is a supervised
>> > learning classifier described as optimization problem and solved by
>> > gradient
>> > descent approach (
>> http://en.wikipedia.org/wiki/Stochastic_gradient_descent
>> > ).
>> > It is a iterative process, and kinda run a map/reduce pair per iteration.
>> > Map to calculate the gradient value for each point, and reduce phase to
>> > optimize the classifier. BSP model seems native for scientific and graph
>> > processing in my mind, not figure out or find much info online for this
>> > type
>> > of application so far .
>> >
>> > Best,
>> > Joe
>> >
>> > On Thu, Sep 1, 2011 at 10:36 AM, Thomas Jungblut <
>> > [email protected]> wrote:
>> >
>> > > Hi Joe,
>> > >
>> > > for non-insiders, would you please clarify what SGD and SVM are?
>> > > Then we could give you some tips how to implement them in BSP.
>> > >
>> > > Greetz,
>> > > Thomas
>> > >
>> > > 2011/9/1 Zhiyong Xie <[email protected]>
>> > >
>> > > > Hi there,
>> > > >
>> > > > May I ask whether anyone else have look into the SGD mapping on BSP
>> > model
>> > > > too? I'm investigating whether BSP model is a good candidate for
>> > > > implementing distributed version of SVM SGD.
>> > > >
>> > > > Thanks!
>> > > > Joe
>> > > > --
>> > > > Joe (Zhiyong) Xie
>> > > > Graduate Student
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Thomas Jungblut
>> > > Berlin
>> > >
>> > > mobile: 0170-3081070
>> > >
>> > > business: [email protected]
>> > > private: [email protected]
>> > >
>> >
>> >
>> >
>> > --
>> > Joe (Zhiyong) Xie
>> > Graduate Student
>> >
>>
>>
>>
>> --
>> Thomas Jungblut
>> Berlin
>>
>> mobile: 0170-3081070
>>
>> business: [email protected]
>> private: [email protected]
>>
>
>
>
> --
> Joe (Zhiyong) Xie
> Graduate Student
> Electrical Engineering Department
> University of Washington, Seattle, WA
> LinkedIn: http://www.linkedin.com/in/zhiyongxie
>

Re: About SVM Stochastic Gradient Descent (SGD) on BSP model

Reply via email to