This may be a late or ill-informed comment but -- I don't believe there's any issue with VectorWritable per se, no. Hadoop most certainly assumes that one Writable can fit into RAM. A 1GB Writable is just completely incompatible with how Hadoop works. The algorithm would have to be parallelized more then.
Yes that may mean re-writing the code to deal with small Vectors and such. That probably doesn't imply a change to VectorWritable but to the jobs using it. On Tue, Dec 14, 2010 at 12:54 AM, Ted Dunning <[email protected]> wrote: > We should talk about whether it isn't better to just serialize chunks of > vectors instead. > > This would not require fancy footwork and perversion of the contracts that > WritableVector thinks it has now. It would also be bound to be a better > match for the problem domain. > > So what about just using VW as it stands for vectors that are at most 100K > elements and are keyed by row and left-most column? Why isn't that the > better option? > >
