[ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941517#comment-13941517
 ] 

Dmitriy Lyubimov commented on MAHOUT-1464:
------------------------------------------

I have non-slim A'A. Of course slim operator implementation is upper triangular 
that cuts outer product computation cost two times in comparison... 
Significantly wide A'A on the other hand cannot really apply the same cut, 
since it needs to form rows in distributed way.

Not surprisingly, slim test takes 17 seconds  and the "fat" one takes 21 
seconds on my fairly ancient computer for squaring 400x550 matrix (single 
thread). Actually, i expected a little more significant gap.

 I wonder if there's a more interesting way to do this other than forming outer 
product vertical blocks.

Maybe I need to use square blocks. In this case i can reuse roughly half of 
them -- but then there will be significantly more objects with this (albeit 
smaller in size). and then i will have to have an extra shuffle operation to 
form the lower triangular part of the matrix still. 

Anyway. i think i will commit what i have.

> RowSimilarityJob on Spark
> -------------------------
>
>                 Key: MAHOUT-1464
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>    Affects Versions: 0.9
>         Environment: hadoop, spark
>            Reporter: Pat Ferrel
>              Labels: performance
>             Fix For: 0.9
>
>         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch
>
>
> Create a version of RowSimilarityJob that runs on Spark. Ssc has a prototype 
> here: https://gist.github.com/sscdotopen/8314254. This should be compatible 
> with Mahout Spark DRM DSL so a DRM can be used as input. 
> Ideally this would extend to cover MAHOUT-1422 which is a feature request for 
> RSJ on two inputs to calculate the similarity of rows of one DRM with those 
> of another. This cross-similarity has several applications including 
> cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to