Check the source for VectorWritable, I'm pretty sure it serializes
in the order of the nonDefaultIterator(), which for SASVectors is in order,
so while these are indeed non-optimal for random access and mutating
operations, that is indeed the tradeoff you have to make when picking
your vector impl.

  -jake

On Mon, Dec 13, 2010 at 4:30 PM, Dmitriy Lyubimov <[email protected]> wrote:

> Yes, it should be. I thought Ted implied VectorWritable does it only this
> way and non other.
>
> If we can differentiate I'd rather do it. Implying that if you save in one
> format (non-sequential) we'd support it with caveat that it's subpar in
> certain cases whereas where you want to format input sequentially, we'd
> eliminate vector prebuffering stage. Yes, that will work. Thank you, Jake.
>
> -d
>
>
> On Mon, Dec 13, 2010 at 4:26 PM, Jake Mannix <[email protected]>
> wrote:
>
> > Dmitriy,
> >
> >  You should be able to specify that your matrices be stored in
> > SequentialAccessSparseVector format if you need to.  This is
> > almost always the right thing for HDFS-backed matrices, because
> > HDFS is write-once, and SASVectors are optimized for read-only
> > sequential access, which is your exact use case, right?
> >
> >  -jake
> >
> > On Mon, Dec 13, 2010 at 4:21 PM, Dmitriy Lyubimov <[email protected]>
> > wrote:
> >
> > > I don't think sequentiality is a requirement in the case i am working
> on.
> > > However, let me peek at the code first. I am guessing it is some form
> of
> > a
> > > near-perfect hash, in which case it may not be possible to read it in
> > parts
> > > at all. Which would be bad, indeed. I would need to find a completely
> > > alternative input format then to overcome my case.
> > >
> > > On Mon, Dec 13, 2010 at 4:01 PM, Ted Dunning <[email protected]>
> > > wrote:
> > >
> > > > I don't thikn that sequentiality part of the contract.
> > > >  RandomAccessSparseVectors are likely to
> > > > produce disordered values when serialized, I think.
> > > >
> > > > On Mon, Dec 13, 2010 at 1:48 PM, Dmitriy Lyubimov <[email protected]
> >
> > > > wrote:
> > > >
> > > > > I will have to look at details of VectorWritable to make sure all
> > cases
> > > > are
> > > > > covered (I only took a very brief look so far). But as long as it
> is
> > > able
> > > > > to
> > > > > produce elements in order of index increase, push technique will
> > > > certainly
> > > > > work for most algorithms (and in some cases, notably with SSVD,
> even
> > if
> > > > it
> > > > > produces the data in non-sequential way, it would work too ) .
> > > > >
> > > >
> > >
> >
>

Reply via email to