> We should talk about whether it isn't better to just serialize chunks of
> vectors instead.

That's actually is a better option, i mentioned that as a preferrable
solution venue. Just splice the rows.  But that brings up an issue of data
prep utils, so that would require more coding than adding a capability to
the VW.



On Mon, Dec 13, 2010 at 4:54 PM, Ted Dunning <[email protected]> wrote:

> We should talk about whether it isn't better to just serialize chunks of
> vectors instead.
>
> This would not require fancy footwork and perversion of the contracts that
> WritableVector thinks it has now.  It would also be bound to be a better
> match for the problem domain.
>
> So what about just using VW as it stands for vectors that are at most 100K
> elements and are keyed by row and left-most column?  Why isn't that the
> better option?
>
> On Mon, Dec 13, 2010 at 4:41 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > Thank you, Jake.
> >
> > Yes i will do that tonight. I did do a brief assessment of that code
> before
> > but not very thoroughly though which was why i thought it must be a low
> > hanging fruit; but Ted and you certainly should know better so  i need to
> > look at it the second time to be sure. Thank you, sir.
> >
> > -Dima
> >
> > On Mon, Dec 13, 2010 at 4:37 PM, Jake Mannix <[email protected]>
> > wrote:
> >
> > > Check the source for VectorWritable, I'm pretty sure it serializes
> > > in the order of the nonDefaultIterator(), which for SASVectors is in
> > order,
> > > so while these are indeed non-optimal for random access and mutating
> > > operations, that is indeed the tradeoff you have to make when picking
> > > your vector impl.
> > >
> > >  -jake
> > >
> > > On Mon, Dec 13, 2010 at 4:30 PM, Dmitriy Lyubimov <[email protected]>
> > > wrote:
> > >
> > > > Yes, it should be. I thought Ted implied VectorWritable does it only
> > this
> > > > way and non other.
> > > >
> > > > If we can differentiate I'd rather do it. Implying that if you save
> in
> > > one
> > > > format (non-sequential) we'd support it with caveat that it's subpar
> in
> > > > certain cases whereas where you want to format input sequentially,
> we'd
> > > > eliminate vector prebuffering stage. Yes, that will work. Thank you,
> > > Jake.
> > > >
> > > > -d
> > > >
> > > >
> > > > On Mon, Dec 13, 2010 at 4:26 PM, Jake Mannix <[email protected]>
> > > > wrote:
> > > >
> > > > > Dmitriy,
> > > > >
> > > > >  You should be able to specify that your matrices be stored in
> > > > > SequentialAccessSparseVector format if you need to.  This is
> > > > > almost always the right thing for HDFS-backed matrices, because
> > > > > HDFS is write-once, and SASVectors are optimized for read-only
> > > > > sequential access, which is your exact use case, right?
> > > > >
> > > > >  -jake
> > > > >
> > > > > On Mon, Dec 13, 2010 at 4:21 PM, Dmitriy Lyubimov <
> [email protected]
> > >
> > > > > wrote:
> > > > >
> > > > > > I don't think sequentiality is a requirement in the case i am
> > working
> > > > on.
> > > > > > However, let me peek at the code first. I am guessing it is some
> > form
> > > > of
> > > > > a
> > > > > > near-perfect hash, in which case it may not be possible to read
> it
> > in
> > > > > parts
> > > > > > at all. Which would be bad, indeed. I would need to find a
> > completely
> > > > > > alternative input format then to overcome my case.
> > > > > >
> > > > > > On Mon, Dec 13, 2010 at 4:01 PM, Ted Dunning <
> > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > I don't thikn that sequentiality part of the contract.
> > > > > > >  RandomAccessSparseVectors are likely to
> > > > > > > produce disordered values when serialized, I think.
> > > > > > >
> > > > > > > On Mon, Dec 13, 2010 at 1:48 PM, Dmitriy Lyubimov <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I will have to look at details of VectorWritable to make sure
> > all
> > > > > cases
> > > > > > > are
> > > > > > > > covered (I only took a very brief look so far). But as long
> as
> > it
> > > > is
> > > > > > able
> > > > > > > > to
> > > > > > > > produce elements in order of index increase, push technique
> > will
> > > > > > > certainly
> > > > > > > > work for most algorithms (and in some cases, notably with
> SSVD,
> > > > even
> > > > > if
> > > > > > > it
> > > > > > > > produces the data in non-sequential way, it would work too )
> .
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to