Re: Sequential access to VectorWritable content proposal.

Sean Owen Mon, 13 Dec 2010 17:01:42 -0800

This may be a late or ill-informed comment but --

I don't believe there's any issue with VectorWritable per se, no. Hadoop
most certainly assumes that one Writable can fit into RAM. A 1GB Writable is
just completely incompatible with how Hadoop works. The algorithm would have
to be parallelized more then.

Yes that may mean re-writing the code to deal with small Vectors and such.
That probably doesn't imply a change to VectorWritable but to the jobs using
it.

On Tue, Dec 14, 2010 at 12:54 AM, Ted Dunning <[email protected]> wrote:

> We should talk about whether it isn't better to just serialize chunks of
> vectors instead.
>
> This would not require fancy footwork and perversion of the contracts that
> WritableVector thinks it has now.  It would also be bound to be a better
> match for the problem domain.
>
> So what about just using VW as it stands for vectors that are at most 100K
> elements and are keyed by row and left-most column?  Why isn't that the
> better option?
>
>

Re: Sequential access to VectorWritable content proposal.

Reply via email to