How common is it that a row won't fit in memory? My experience is that essentially all rows that I am interested will fit in very modest amounts of memory, but that row by row handling is imperative.
Is this just gilding the lily? On Mon, Dec 13, 2010 at 10:24 AM, Jake Mannix <[email protected]> wrote: > Hey Dmitriy, > > I've also been playing around with a VectorWritable format which is backed > by a > SequenceFile, but I've been focussed on the case where it's essentially the > entire > matrix, and the rows don't fit into memory. This seems different than your > current > use case, however - you just want (relatively) small vectors to load > faster, > right? > > -jake > > On Mon, Dec 13, 2010 at 10:18 AM, Ted Dunning <[email protected]> > wrote: > > > Interesting idea. > > > > Would this introduce a new vector type that only allows iterating through > > the elements once? > > > > On Mon, Dec 13, 2010 at 9:49 AM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > Hi all, > > > > > > I would like to submit a patch to VectorWritable that allows for > > streaming > > > access to vector elements without having to prebuffer all of them > first. > > > (current code allows for the latter only). > > > > > > That patch would allow to strike down one of the memory usage issues in > > > current Stochastic SVD implementation and effectively open memory bound > > for > > > n of the SVD work. (The value i see is not to open up the the bound > > though > > > but just be more efficient in memory use, thus essentially speeding u p > > the > > > computation. ) > > > > > > If it's ok, i would like to create a JIRA issue and provide a patch for > > it. > > > > > > Another issue is to provide an SSVD patch that depends on that patch > for > > > VectorWritable. > > > > > > Thank you. > > > -Dmitriy > > > > > >
