Re: About SVM Stochastic Gradient Descent (SGD) on BSP model

Zhiyong Xie Sat, 03 Sep 2011 10:29:57 -0700

Thank you Zach! I will do more investigation before implementation for
sure:)
Joe


On Fri, Sep 2, 2011 at 9:52 PM, Zach Richardson <[email protected]
> wrote:

> Hi Joe,
>
> I would be careful trying to implement SGD for solving an SVM in
> map-reduce or BSP, SGD tends to take a very large number of iterations
> in order to optimize to a good solution.  Syncing in any distributed
> system, be it map-reduce or BSP, is just not fast (compared to
> iterations on a single machine).  This makes SGD a very poor choice
> for a distributed system, especially since it is inherently a
> sequential algorithm, unlike batch gradient descent, which could be
> parallelized (although still poorly in a paradigm like map-reduce and
> doesn't tend to converge as fast as SGD.  (This is not to say that SGD
> can't be used to optimize fairly large SVM's)  If you want to build a
> distributed  SVM I would recommend looking at using an IPM or ADMM
> since they require far fewer global syncs before they are able to
> converge.  Stephen Boyd has some good research on using distributed
> systems (including BSP) with an ADMM.
>
> Zach
>
>
> On Fri, Sep 2, 2011 at 10:35 AM, Zhiyong Xie <[email protected]> wrote:
> > cool! I will check them out!
> > thanks!
> > Joe
> >
> > On Fri, Sep 2, 2011 at 8:07 AM, Thomas Jungblut <
> > [email protected]> wrote:
> >
> >> Hi Joe,
> >>
> >> great thank you very much for clarification.
> >> I love classification algorithms, so I'll be very interested in how you
> >> develop this.
> >> "Per se" you can translate every MapReduce algorithm to BSP, since BSP
> is
> >> an
> >> abstraction to MapReduce.
> >> E.G: Map Phase is a local computation phase, merging and sorting are the
> >> synchronization barrier (needs finish of all map tasks) and reducing is
> a
> >> computational phase again.
> >> On the english wikipedia is a good schema that shows how the workflow
> is.
> >>
> >> Actually you can make your map phase as well in BSP, but for the latest
> >> release 0.3.0 you have to write data sharding and partitioning for
> >> yourself.
> >> There are examples and blog posts that shows how code them.
> >> Your reduce step is depending on your implementation. Is there a single
> >> reducer which updates the whole classifier?
> >>
> >> Actually I wanted to implement a k-means clustering in BSP, but sadly I
> was
> >> very busy and have not too much time for it. It is quite similar to your
> >> algorithm. The Map step is calculating the distance between the current
> >> point and the centers and the reducer is going to update the centers.
> >>
> >> To provide you with a bit of information, I already rewritten an
> MapReduce
> >> graph algorithm to BSP. [1][2]
> >> These examples are without partitioning, I recently did an improvement
> to
> >> the partitioning algorithm. So it makes sense to checkout the current
> trunk
> >> and browse through the graph package and examples package. It contains
> >> improved partitioning as well as graph examples.
> >>
> >> HF and GL.
> >> If you need help, I'll be glad to help you.
> >>
> >> [1]
> >>
> >>
> http://codingwiththomas.blogspot.com/2011/04/graph-exploration-with-hadoop-mapreduce.html
> >> [2]
> >>
> >>
> http://codingwiththomas.blogspot.com/2011/04/graph-exploration-using-apache-hama-and.html
> >>
> >> 2011/9/2 Zhiyong Xie <[email protected]>
> >>
> >> > Thank you Thomas! In short, SVM (
> >> > http://en.wikipedia.org/wiki/Support_vector_machine) is a supervised
> >> > learning classifier described as optimization problem and solved by
> >> > gradient
> >> > descent approach (
> >> http://en.wikipedia.org/wiki/Stochastic_gradient_descent
> >> > ).
> >> > It is a iterative process, and kinda run a map/reduce pair per
> iteration.
> >> > Map to calculate the gradient value for each point, and reduce phase
> to
> >> > optimize the classifier. BSP model seems native for scientific and
> graph
> >> > processing in my mind, not figure out or find much info online for
> this
> >> > type
> >> > of application so far .
> >> >
> >> > Best,
> >> > Joe
> >> >
> >> > On Thu, Sep 1, 2011 at 10:36 AM, Thomas Jungblut <
> >> > [email protected]> wrote:
> >> >
> >> > > Hi Joe,
> >> > >
> >> > > for non-insiders, would you please clarify what SGD and SVM are?
> >> > > Then we could give you some tips how to implement them in BSP.
> >> > >
> >> > > Greetz,
> >> > > Thomas
> >> > >
> >> > > 2011/9/1 Zhiyong Xie <[email protected]>
> >> > >
> >> > > > Hi there,
> >> > > >
> >> > > > May I ask whether anyone else have look into the SGD mapping on
> BSP
> >> > model
> >> > > > too? I'm investigating whether BSP model is a good candidate for
> >> > > > implementing distributed version of SVM SGD.
> >> > > >
> >> > > > Thanks!
> >> > > > Joe
> >> > > > --
> >> > > > Joe (Zhiyong) Xie
> >> > > > Graduate Student
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Thomas Jungblut
> >> > > Berlin
> >> > >
> >> > > mobile: 0170-3081070
> >> > >
> >> > > business: [email protected]
> >> > > private: [email protected]
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Joe (Zhiyong) Xie
> >> > Graduate Student
> >> >
> >>
> >>
> >>
> >> --
> >> Thomas Jungblut
> >> Berlin
> >>
> >> mobile: 0170-3081070
> >>
> >> business: [email protected]
> >> private: [email protected]
> >>
> >
> >
> >
> > --
> > Joe (Zhiyong) Xie
> > Graduate Student
> > Electrical Engineering Department
> > University of Washington, Seattle, WA
> > LinkedIn: http://www.linkedin.com/in/zhiyongxie
> >
>



-- 
Joe (Zhiyong) Xie
Graduate Student
Electrical Engineering Department
University of Washington, Seattle, WA
LinkedIn: http://www.linkedin.com/in/zhiyongxie

Re: About SVM Stochastic Gradient Descent (SGD) on BSP model

Reply via email to