[ https://issues.apache.org/jira/browse/SPARK-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ahmed Mahran updated SPARK-14880: --------------------------------- External issue URL: https://github.com/mashin-io/rich-spark/blob/master/main/src/main/scala/org/apache/spark/mllib/optimization/ParallelSGD.scala (was: https://github.com/mashin-io/rich-spark/blob/master/src/main/scala/org/apache/spark/mllib/optimization/ParallelSGD.scala) > Parallel Gradient Descent with less map-reduce shuffle overhead > --------------------------------------------------------------- > > Key: SPARK-14880 > URL: https://issues.apache.org/jira/browse/SPARK-14880 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Ahmed Mahran > Labels: performance > > The current implementation of (Stochastic) Gradient Descent performs one > map-reduce shuffle per iteration. Moreover, when the sampling fraction gets > smaller, the algorithm becomes shuffle-bound instead of CPU-bound. > {code} > (1 to numIterations or convergence) { > rdd > .sample(fraction) > .map(Gradient) > .reduce(Update) > } > {code} > A more performant variation requires only one map-reduce regardless from the > number of iterations. A local mini-batch SGD could be run on each partition, > then the results could be averaged. This is based on (Zinkevich, Martin, > Markus Weimer, Lihong Li, and Alex J. Smola. "Parallelized stochastic > gradient descent." In Advances in neural information processing systems, > 2010, > http://www.research.rutgers.edu/~lihong/pub/Zinkevich11Parallelized.pdf). > {code} > rdd > .shuffle() > .mapPartitions((1 to numIterations or convergence) { > iter.sample(fraction).map(Gradient).reduce(Update) > }) > .reduce(Average) > {code} > A higher level iteration could enclose the above variation; shuffling the > data before the local mini-batches and feeding back the average weights from > the last iteration. This allows more variability in the sampling of the > mini-batches with the possibility to cover the whole dataset. Here is a Spark > based implementation > https://github.com/mashin-io/rich-spark/blob/master/src/main/scala/org/apache/spark/mllib/optimization/ParallelSGD.scala > {code} > (1 to numIterations1 or convergence) { > rdd > .shuffle() > .mapPartitions((1 to numIterations2 or convergence) { > iter.sample(fraction).map(Gradient).reduce(Update) > }) > .reduce(Average) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org