Initial work on vector preprocessing is done in my git on "ssvd-vw-hack" branch of my ssvd work (doc is here : https://github.com/dlyubimov/ssvd-doc) . The heavy lifting is done thru VectorPreprocessor interface and seems to work like a charm. I did not test it extensively, but when complete, it should be able to cope with ocasional spikes in data density without cirppling SVD mapper's memory.
thanks. -d On Mon, Dec 13, 2010 at 9:49 AM, Dmitriy Lyubimov <[email protected]> wrote: > Hi all, > > I would like to submit a patch to VectorWritable that allows for streaming > access to vector elements without having to prebuffer all of them first. > (current code allows for the latter only). > > That patch would allow to strike down one of the memory usage issues in > current Stochastic SVD implementation and effectively open memory bound for > n of the SVD work. (The value i see is not to open up the the bound though > but just be more efficient in memory use, thus essentially speeding u p the > computation. ) > > If it's ok, i would like to create a JIRA issue and provide a patch for it. > > > Another issue is to provide an SSVD patch that depends on that patch for > VectorWritable. > > Thank you. > -Dmitriy >
