[ 
https://issues.apache.org/jira/browse/FLINK-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15585097#comment-15585097
 ] 

Gábor Hermann commented on FLINK-1807:
--------------------------------------

Hi all,

I have a workaround in mind for a "real" SGD. The main idea is to use a 
minibatch approach instead of random sampling.
We would split the data to minibatches randomly, then collect every partition 
into a single object containing all the data corresponding that partition.
I.e. we would have something like a {{DataSet[Array[(MiniBatchId, 
Array[Array[Double]])]}},
where every element of this DataSet (i.e. every array) would contain the data 
for one partition, and every element of the array would correspond to one 
partition of a minibatch.  (Actually an {{Array[Array[Array[Double]]]}} is 
sufficient to represent a partition.) Then we would have a static DataSet that 
represents the data at iteration, so we would avoid the problem of using a 
dynamic DataSet inside an iteration.

At every iteration we would broadcast the vector model, choose a minibatch 
(e.g. iteration number modulo number of minibatches), and calculate the 
gradient at every partition based on that minibatch. Then we would aggregate 
these gradients and update the vector model.

The main drawback of this approach is that we would have to keep all the data 
in memory. If that's tolerable we could make this improvement. What do you 
think? Do you see any other disadvantages?

> Stochastic gradient descent optimizer for ML library
> ----------------------------------------------------
>
>                 Key: FLINK-1807
>                 URL: https://issues.apache.org/jira/browse/FLINK-1807
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Theodore Vasiloudis
>              Labels: ML
>
> Stochastic gradient descent (SGD) is a widely used optimization technique in 
> different ML algorithms. Thus, it would be helpful to provide a generalized 
> SGD implementation which can be instantiated with the respective gradient 
> computation. Such a building block would make the development of future 
> algorithms easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to