Hey Sean,

On Thu, Apr 15, 2010 at 7:16 AM, Sean Owen <sro...@gmail.com> wrote:

> Along the way to a patch for MAHOUT-379, I'm having some trouble
> figuring out SequentialAccessSparseVector.DenseVector. I think it can
> be simplified, but unless I'm misunderstanding there are several bugs
> here. I'd like to find my mistake or else simplify/fix this along the
> way.
>

  You mean SequentialAccessSparseVector.DenseElement, right?

  I apologize for some of those iterator element crazy loops.  They may
be simplifyable.  I'd be surprised if the logic is really *wrong*, because
there are too many places where we'd break if so.  Although, we very
rarely iterateAll() on sparse vectors, so I guess it's possible this has
bugs in it still.

get() uses offset and index, both initialized to the 'ind' argument,
> which is an index into the vector, not an offset into the mapping
> array. But then "cur = indices[index]" is compared against offset,
> which seems incorrect.
>

Oof.  That code is hard to read.  I'll have to look at it closer and
see what the hell I was thinking.


> On a related note in the Iterator's next() method we have
>
>      element.offset = offset++;
>
> But this makes the two values forever inequal by one, which doesn't
> sound right? they start off equal at 0.
>

That seems pretty clear.  If that line can at all be correct it would
have to be ++offset, not offset++.


> I could be mistaken, but I also wonder whether this can be
> dramatically simplified by a single Element implementation which both
> understands its logical index into the vector, and offset into the
> mapping array if applicable.
>

If you can dramatically simplify it in such a way that it works with
both dense and sparse iteration, and can handle the ugly case
of Element.set(double) (which completely changes the index and
value arrays by possibly shifting them over one), by all means.

Is there a simple unit test we can add to see if it's truly broken
currently?  It seems like it should be easy to see, if we
exercise dense iteration on SequentialAccessSparseVector impls,
possibly ones which have a nonzero value for get(0).

Speaking of which, I notice that indeed, we don't ever mess with
iterateAll().  A really simple unit test should show that is totally
broken.

Hmm... I tried changing TestSparseVector.testIterator() to test
SeqAcc instead of RandAcc, and it still passed, so it's not
a strict enough test, I guess...

  -jake

Reply via email to