Re: [jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

Ted Dunning Thu, 04 Mar 2010 08:55:06 -0800

I haven't examined the out-of-core scenarios at all, but in-memory, it is
possible to have labels with no performance cost if you assume add the
constraint that labeled matrices are only conformable if they share the
identical label dictionary.  That implies that you can use the internal row
and column indexes for all internal operations.  This is pretty easy to
enforce if the persistent form of the matrix has only labels and not indexes
since you can simple augment a shared dictionary as you read or generate the
matrix.  For distributed operations, I am considerably more dubious of this
approach.


On Thu, Mar 4, 2010 at 8:37 AM, Jake Mannix (JIRA) <j...@apache.org> wrote:

>  Having keys for row be objects is one thing, but doing this all the time
> for the keys for the Vector indexes will seriously slow down inner loops,
> due to the translation time between object to int (via a multitude of
> hashCode() calls), and we treating the rows and columns on equal footing is
> pretty required.
>



-- 
Ted Dunning, CTO
DeepDyve

Re: [jira] Commented: (MAHOUT-322) DistributedRowMatrix should live in SequenceFile instead of SequenceFile

Reply via email to