[ 
https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037270#comment-13037270
 ] 

Shannon Quinn commented on MAHOUT-537:
--------------------------------------

The patch has been ready to go since I posted it, but our original consensus 
based on the limitations of 0.20 (which haven't changed) are what kept this 
patch in limbo: namely that 0.20 conveniently leaves out a crucial data type, 
the absence of which requires 3 M/R passes to do the matrix-matrix 
multiplication, whereas in 0.18 and 0.21--where this type is present--requires 
only 1 pass. 

In your last post, however, you alluded to some cleverness in doing joins and 
customizing the partitioner that I never did get the details on. Would you mind 
expounding on that? I scoured through every 0.20 format type and type manager I 
could find and didn't see anything promising, so your more experienced 
perspective would be most helpful. 

> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> -------------------------------------------------------------
>
>                 Key: MAHOUT-537
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-537
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.4, 0.5
>            Reporter: Shannon Quinn
>            Assignee: Shannon Quinn
>             Fix For: 0.6
>
>         Attachments: MAHOUT-537.patch, MAHOUT-537.patch, MAHOUT-537.patch, 
> MAHOUT-537.patch
>
>
> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API, 
> in particular eliminate dependence on the deprecated JobConf, using instead 
> the separate Job and Configuration objects.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to