Ted,

I suspect there still will be problems if you  have pull ("Iterator") but
not  push ("Watcher") technique with MR.

Ted,

I already thought a little bit about that.

The problem here is that one, fundamentally, wants to do some input
preprocessing, whatever that is, (in case of SSVD it's sequential matrix
multiplication or dot-product accumulation), during execution of
Writable.readFields(), thus rendering preallocation of memory for the entire
vector unnecessary.

The framework will not make Writable available to the reducer until input
format's record reader is done reading it. Hence I think one can't request
an iterator after the record is read. So, like you said, Writable is indeed
really not compatible with Iterator. But not with  push-parsing. That's what
i am in part saying, is that Writable imposes fundamental constraints on how
you could do any sequential data preprocessing (either dense or sparse
sequential data) in a map.

I will have to look at details of VectorWritable to make sure all cases are
covered (I only took a very brief look so far). But as long as it is able to
produce elements in order of index increase, push technique will certainly
work for most algorithms (and in some cases, notably with SSVD, even if it
produces the data in non-sequential way, it would work too ) .

BTW absense of sequentiality in data access requirement in stochastic svd
technique is one of the reasons why e.g. blocked format would work too with
SSVD.





On Mon, Dec 13, 2010 at 12:52 PM, Ted Dunning <[email protected]> wrote:

> OK.
>
> Let's assume that this is needed.
>
> I think that an iterable interface on VectorWritable that throws
> UnsupportedOperationException or similar if
> you try to get the iterator twice is much more transparent than a watcher
> structure and much easier for a user
> to discover/re-invent.
>
> Another (evil) thought is a parallel class to VectorWritable which is
> essentially SequentialAccessVectorWritable that supports reading and
> writing.  It seems to me that the Writable isn't real compatible with this
> interface in any case.  How will that be resolved?
>
>
> On Mon, Dec 13, 2010 at 11:36 AM, Dmitriy Lyubimov <[email protected]
> >wrote:
>
> > Absent of this solution, i realistically don't see how i can go without a
> > push technique in accessing the vectors.
> >
>

Reply via email to