Hey Dmitriy, I've also been playing around with a VectorWritable format which is backed by a SequenceFile, but I've been focussed on the case where it's essentially the entire matrix, and the rows don't fit into memory. This seems different than your current use case, however - you just want (relatively) small vectors to load faster, right?
-jake On Mon, Dec 13, 2010 at 10:18 AM, Ted Dunning <[email protected]> wrote: > Interesting idea. > > Would this introduce a new vector type that only allows iterating through > the elements once? > > On Mon, Dec 13, 2010 at 9:49 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > > Hi all, > > > > I would like to submit a patch to VectorWritable that allows for > streaming > > access to vector elements without having to prebuffer all of them first. > > (current code allows for the latter only). > > > > That patch would allow to strike down one of the memory usage issues in > > current Stochastic SVD implementation and effectively open memory bound > for > > n of the SVD work. (The value i see is not to open up the the bound > though > > but just be more efficient in memory use, thus essentially speeding u p > the > > computation. ) > > > > If it's ok, i would like to create a JIRA issue and provide a patch for > it. > > > > Another issue is to provide an SSVD patch that depends on that patch for > > VectorWritable. > > > > Thank you. > > -Dmitriy > > >
