[
https://issues.apache.org/jira/browse/MAHOUT-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037270#comment-13037270
]
Shannon Quinn commented on MAHOUT-537:
--------------------------------------
The patch has been ready to go since I posted it, but our original consensus
based on the limitations of 0.20 (which haven't changed) are what kept this
patch in limbo: namely that 0.20 conveniently leaves out a crucial data type,
the absence of which requires 3 M/R passes to do the matrix-matrix
multiplication, whereas in 0.18 and 0.21--where this type is present--requires
only 1 pass.
In your last post, however, you alluded to some cleverness in doing joins and
customizing the partitioner that I never did get the details on. Would you mind
expounding on that? I scoured through every 0.20 format type and type manager I
could find and didn't see anything promising, so your more experienced
perspective would be most helpful.
> Bring DistributedRowMatrix into compliance with Hadoop 0.20.2
> -------------------------------------------------------------
>
> Key: MAHOUT-537
> URL: https://issues.apache.org/jira/browse/MAHOUT-537
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.4, 0.5
> Reporter: Shannon Quinn
> Assignee: Shannon Quinn
> Fix For: 0.6
>
> Attachments: MAHOUT-537.patch, MAHOUT-537.patch, MAHOUT-537.patch,
> MAHOUT-537.patch
>
>
> Convert the current DistributedRowMatrix to use the newer Hadoop 0.20.2 API,
> in particular eliminate dependence on the deprecated JobConf, using instead
> the separate Job and Configuration objects.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira