[jira] Resolved: (MAHOUT-314) DistributedRowMatrix needs a sparse DistributedRowMatrix times(DistributedRowMatrix other) implementation

Jake Mannix (JIRA) Fri, 05 Mar 2010 23:01:52 -0800

     [ 
https://issues.apache.org/jira/browse/MAHOUT-314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jake Mannix resolved MAHOUT-314.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.3

Committed.  

Current implementation is a map-side join, which means it carries with it all 
of the limitations of such operations (both matrices must have the same number 
of input splits, in particular).  The first of the two matrices must be 
considered to be transposed, and the second is not:

{code}
DistributedRowMatrix a = new DistributedRowMatrix(pathToA, tempPathA, numRowsA, 
numColsA);
DistributedRowMatrix b = new DistributedRowMatrix(pathToB, tempPathB, numRowsA, 
numColsB); // note numRowsA == numRowsB is required for this.

DistributedRowMatrix aTransposeb = a.times(b); // if a and b were 
non-distributed matrices, we'd be computing a.transpose().times(b); to get this 
result
{code}
 
There should be a better way of denoting this, maybe by having a separate class 
which is DistributedColumnMatrix with this method:
{code}
public class DistributedColumnMatrix {
  public DistributedRowMatrix times(DistributedRowMatrix rowMatrix);
}
{code}

Well... that's not quite it, but something like that. 

> DistributedRowMatrix needs a sparse DistributedRowMatrix 
> times(DistributedRowMatrix other) implementation
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-314
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-314
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.3
>            Reporter: Jake Mannix
>            Assignee: Jake Mannix
>             Fix For: 0.3
>
>
> If the matrix which is being multiplied by has been transformed into a 
> column-sparse matrix backed by a SequenceFile<IntWritable,VectorWritable>, 
> then doing a simple map-side join on the two, and taking the (sparse) outer 
> product of each row-pair, and then doing a matrix-summing reducer (probably 
> row-at-a-time, for memory constraints) would implement sparse matrix 
> multiplication in one pass over the data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAHOUT-314) DistributedRowMatrix needs a sparse DistributedRowMatrix times(DistributedRowMatrix other) implementation

Reply via email to