[ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941517#comment-13941517 ]
Dmitriy Lyubimov commented on MAHOUT-1464: ------------------------------------------ I have non-slim A'A. Of course slim operator implementation is upper triangular that cuts outer product computation cost two times in comparison... Significantly wide A'A on the other hand cannot really apply the same cut, since it needs to form rows in distributed way. Not surprisingly, slim test takes 17 seconds and the "fat" one takes 21 seconds on my fairly ancient computer for squaring 400x550 matrix (single thread). Actually, i expected a little more significant gap. I wonder if there's a more interesting way to do this other than forming outer product vertical blocks. Maybe I need to use square blocks. In this case i can reuse roughly half of them -- but then there will be significantly more objects with this (albeit smaller in size). and then i will have to have an extra shuffle operation to form the lower triangular part of the matrix still. Anyway. i think i will commit what i have. > RowSimilarityJob on Spark > ------------------------- > > Key: MAHOUT-1464 > URL: https://issues.apache.org/jira/browse/MAHOUT-1464 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Affects Versions: 0.9 > Environment: hadoop, spark > Reporter: Pat Ferrel > Labels: performance > Fix For: 0.9 > > Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch > > > Create a version of RowSimilarityJob that runs on Spark. Ssc has a prototype > here: https://gist.github.com/sscdotopen/8314254. This should be compatible > with Mahout Spark DRM DSL so a DRM can be used as input. > Ideally this would extend to cover MAHOUT-1422 which is a feature request for > RSJ on two inputs to calculate the similarity of rows of one DRM with those > of another. This cross-similarity has several applications including > cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)