Re: [jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

Jake Mannix Thu, 04 Mar 2010 09:34:33 -0800

On Thu, Mar 4, 2010 at 9:10 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:


> To be conformable, the dictionaries must be the identical object.
>
> At that point, you do what we do now.  The labels are irrelevant to the dot
> product and are only used during input and output.
>
>       a.times(b)    // this checks that a.d == b.d, then does
> a.rawMatrix.times(b.rawMatrix)
> *
> make sense?
>

Ok, you're just saying that you can *have* labels without loss of
performance, but the calculations are all still done with the integers?
What do the labels actually serve as a use here?  Just for
mapping back to user-space at the end of the day?  If so, this
can be done distributed to, as long as row and column dictionaries
can be given some sense of a "UID" to identify themselves well
enough (even just doing string equality on the Path URI to the
dictionary should be fine).

  -jake


>
> On Thu, Mar 4, 2010 at 8:59 AM, Jake Mannix <jake.man...@gmail.com> wrote:
>
> > On Thu, Mar 4, 2010 at 8:54 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> >
> > > I haven't examined the out-of-core scenarios at all, but in-memory, it
> is
> > > possible to have labels with no performance cost if you assume add the
> > > constraint that labeled matrices are only conformable if they share the
> > > identical label dictionary.  That implies that you can use the internal
> > row
> > > and column indexes for all internal operations.
> >
> >
> > Care to elaborate?  If you're multiplying two a matrix by a vector, both
> > labeled by  Map<Integer,String> and reverse Map<String,Integer> for both
> > the rows and columns (and they match in the right way), what is the fast
> > way to do the individual dot products, which performs comparably to
> > walking the sparse int[] / double[] parallel arrays?
> >
> >  -jake
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: [jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

Reply via email to