Jake, Are you saying vector element=sequence file record?
This is also a common problem and there are huge benefits to be had in terms of I/O in stochastic SVD implementation (i had reviewed that in issues section there). I was thinking in that direction too, but what i thought was that this problem might be better served by block-oriented record rather than element-oriented record format. I think block-wise formats are quite common in scientific libraries like ScaLaPack. But it's quite clear that some algorithms would be averse to pre-blocking, but not to row splicing (i.e. just split a long row into subfragments). That's also an interesting initiative although it's not going to work out of the door with the stochastic SVD code. The fundamental issue with the existing approach is that it has to create MR splits on vector bounaries, not block boundaries (although it 100% block-wise algorithm seems quite plausible and promising there too). On Mon, Dec 13, 2010 at 1:06 PM, Jake Mannix <[email protected]> wrote: > I'm not sure that Dmitriy's use-case has an easy solution. As you > say, Writable loads into memory the whole thing, independently of > whether you try / not try to do buffering on iteration. > > My situation (monstrous vectors) is easier, in some respects: if > the matrices are essentially > SequenceFile<IntWritable,Pair<IntWritable,DoubleWritable>>, then > there are a lot bigger vectors which can be handled in MR jobs, but > they no longer really look like "vectors" in the interface sense. > > -jake > > On Mon, Dec 13, 2010 at 12:52 PM, Ted Dunning <[email protected]> > wrote: > > > OK. > > > > Let's assume that this is needed. > > > > I think that an iterable interface on VectorWritable that throws > > UnsupportedOperationException or similar if > > you try to get the iterator twice is much more transparent than a watcher > > structure and much easier for a user > > to discover/re-invent. > > > > Another (evil) thought is a parallel class to VectorWritable which is > > essentially SequentialAccessVectorWritable that supports reading and > > writing. It seems to me that the Writable isn't real compatible with > this > > interface in any case. How will that be resolved? > > > > > > On Mon, Dec 13, 2010 at 11:36 AM, Dmitriy Lyubimov <[email protected] > > >wrote: > > > > > Absent of this solution, i realistically don't see how i can go without > a > > > push technique in accessing the vectors. > > > > > >
