[ 
https://issues.apache.org/jira/browse/FLINK-4961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15670658#comment-15670658
 ] 

ASF GitHub Bot commented on FLINK-4961:
---------------------------------------

Github user gaborhermann commented on the issue:

    https://github.com/apache/flink/pull/2819
  
    There are some open questions:
    
    1. Should we optimize 3 way join? For now the join order is burnt into the 
code, also we might be able to give hints for join strategies.
    2. How should we handle empty blocks? When matching a rating block with the 
current factor blocks there might be no rating block or no factor blocks with 
that id, as the rating block corresponds to differnt user and item block at 
every iteration. For now we do the join between the blocks with a `coGroup`, 
and do basically a full-outer-join, because we need to change the rating block 
ID for every factor block at each iteration. This might not be the most optimal 
solution (see comments at `coGroup`), but I don't see a better one right now.
    3. The number of blocks determine also the number of iterations. Therefore 
the higher number of blocks degrade the performance. We conducted experiments 
on a cluster that shows this:
    see [plot for movielens 
data](https://s18.postimg.org/txap3x9o9/movielens_blocks.png) and [for lfm_1b 
data](https://s11.postimg.org/ysnonuer7/lfm1b_blocks.png). Based on this we 
would recommend setting the number of blocks to the smallest possible that can 
fit into memory (and at least the parallelism of the execution). There might be 
some way to avoid this and break the computation to more blocks while doing the 
same amount of iteration, but it's not trivial because of the possibly 
conflicting user-item blocks (and why the paper uses this blocking in the 
first-place). Should we investigate this further? With the recommended settings 
(and given enough memory) the algorithm performs well (see the plots).
    4. The testing data is made by hand to ensure changes to the code does not 
change the algorithm. The algorithm produces good results on real data. The 
question is whether we should make a more thorough testing mechanism for matrix 
factorization (as proposed in the [PR for 
iALS](https://github.com/apache/flink/pull/2542)) or is this kind of testing 
sufficient?


> SGD for Matrix Factorization
> ----------------------------
>
>                 Key: FLINK-4961
>                 URL: https://issues.apache.org/jira/browse/FLINK-4961
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Gábor Hermann
>            Assignee: Gábor Hermann
>
> We have started an implementation of distributed stochastic gradient descent 
> for matrix factorization based on Gemulla et al. [1].
> The main problem with distributed SGD in general is the conflicting updates 
> of the model variable. In case of matrix factorization we can avoid 
> conflicting updates by carefully deciding in each iteration step which blocks 
> of the rating matrix we should use to update the corresponding blocks of the 
> user and item matrices (see Figure 1. in paper).
> Although a general SGD solver might seem relevant for this issue, we can do 
> much better in the special case of matrix factorization. E.g. in case of a 
> linear regression model, the model is broadcasted in every iteration. As the 
> model is typically small in that case, we can only avoid conflicts by having 
> a "global" model. Based on this, the general SGD solver is a different issue.
> To give more details, the algorithm works as follows.
> We randomly create user and item vectors, then randomly partition them into 
> {{k}} user and {{k}} item blocks. Based on these factor blocks we partition 
> the rating matrix to {{k * k}} blocks correspondingly.
> In one iteration step we choose {{k}} non-conflicting rating blocks, i.e. we 
> should not choose two rating blocks simultaneously with the same user or item 
> block. This is done by assigning a rating block ID to every user and item 
> block. We match the user, item, and rating blocks by the current rating block 
> ID, and update the user and item factors by the ratings locally. We also 
> update the rating block ID for the factor blocks, thus in the next iteration 
> we use other rating blocks to update the factors.
> In {{k}} iteration we sweep through the whole rating matrix of {{k * k}} 
> blocks (so instead of {{numberOfIterationSteps}} iterations we should do {{k 
> * numberOfIterationSteps}} iterations).
> [1] [http://people.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to