[ 
https://issues.apache.org/jira/browse/FLINK-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691024#comment-15691024
 ] 

ASF GitHub Bot commented on FLINK-4961:
---------------------------------------

Github user gaborhermann commented on the issue:

    https://github.com/apache/flink/pull/2819
  
    Hello Theodore,
    
    Thanks for taking a look at this PR!
    
    1. No problem. Some performance evaluations might help us in this case too. 
I don't believe it should have much effect on the performance anyway as a 3-way 
join is only used outside the iteration.
    2. I really like the idea of doing a performance evaluation! I'm not 
exactly sure how to do this with a `join` instead of a `coGroup`, so let me 
sketch an implementation before elaborating on the pros/cons. (It was more 
straightforward to use `coGroup`.)
    3. The benefit of using more blocks is to use less memory, just as in the 
ALS algorithm, with the disadvantage of using more network. However, more 
blocks here means much more network and bookkeeping compared to ALS, because 
there are more Flink iterations. I've investigated this a bit more and found 
that the main "bottleneck" is the maximum size of the sort-buffer, which was 
around 100 MB with around 40 GB TaskManager memory, and large matrix blocks do 
not fit in. So even given enough memory, we cannot use large blocks. 
Unfortunately we cannot configure the maximum size of the sort-buffer, and I 
would not like to change the underlying code in the core or runtime, but I 
tried a workaround which might work (splitting the factor blocks to subblocks). 
I'll just need to run some measurements. If this workaround turns out fine, 
then it should be okay to go on with this and give instructions in the docs.
    4. Okay, I agree.
    
    I hope we could generalize this non-overlapping blocking to other 
optimization problems :) I'll take a look at the paper you linked!


> SGD for Matrix Factorization
> ----------------------------
>
>                 Key: FLINK-4961
>                 URL: https://issues.apache.org/jira/browse/FLINK-4961
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Gábor Hermann
>            Assignee: Gábor Hermann
>
> We have started an implementation of distributed stochastic gradient descent 
> for matrix factorization based on Gemulla et al. [1].
> The main problem with distributed SGD in general is the conflicting updates 
> of the model variable. In case of matrix factorization we can avoid 
> conflicting updates by carefully deciding in each iteration step which blocks 
> of the rating matrix we should use to update the corresponding blocks of the 
> user and item matrices (see Figure 1. in paper).
> Although a general SGD solver might seem relevant for this issue, we can do 
> much better in the special case of matrix factorization. E.g. in case of a 
> linear regression model, the model is broadcasted in every iteration. As the 
> model is typically small in that case, we can only avoid conflicts by having 
> a "global" model. Based on this, the general SGD solver is a different issue.
> To give more details, the algorithm works as follows.
> We randomly create user and item vectors, then randomly partition them into 
> {{k}} user and {{k}} item blocks. Based on these factor blocks we partition 
> the rating matrix to {{k * k}} blocks correspondingly.
> In one iteration step we choose {{k}} non-conflicting rating blocks, i.e. we 
> should not choose two rating blocks simultaneously with the same user or item 
> block. This is done by assigning a rating block ID to every user and item 
> block. We match the user, item, and rating blocks by the current rating block 
> ID, and update the user and item factors by the ratings locally. We also 
> update the rating block ID for the factor blocks, thus in the next iteration 
> we use other rating blocks to update the factors.
> In {{k}} iteration we sweep through the whole rating matrix of {{k * k}} 
> blocks (so instead of {{numberOfIterationSteps}} iterations we should do {{k 
> * numberOfIterationSteps}} iterations).
> [1] [http://people.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to