On Mon, Apr 19, 2010 at 9:13 AM, Sean Owen <sro...@gmail.com> wrote:

> More on Vector, as I'm browsing through it:
>
> AbstractVector.minus(Vector) says:
>

//snip


> The stanza after the instanceof checks can just become the body of an
> overriding method in these two subclasses right?
>

Yep, sure.


> Since we're computing "this - that", makes sense to only look at
> "that" where it is nonzero. But the first version iterates over
> indices where "this" is nonzero, so it's wrong. (Yeah just checked
> with a test.)
>

Yikes.  Maybe MAHOUT-303 should really be higher prioritized, huh?


> Was the intent to compute "that - this" in this case, so to be able to
> iterate over nonzero elements of "this", and then invert it at the
> end? This works:
>
>  @Override
>  public Vector minus(Vector that) {
>    if (this.size() != that.size()) {
>      throw new CardinalityException();
>    }
>    Vector result = that.clone();
>    Iterator<Element> iter = this.iterateNonZero();
>    while (iter.hasNext()) {
>      Element thisElement = iter.next();
>      int index = thisElement.index();
>      result.setQuick(index, that.getQuick(index) -
> thisElement.get()); // this is "-(that - this)"
>    }
>    return result.times(-1.0);
>  }
>
> It's nice but involves another Vector allocation at the end.
>

You can avoid the Vector allocation by replacing

result.times(-1.0)
with
result.assign(Functions.negate)


> But then I'm also confused since this is the version intended for
> DenseVector and RandomAccessSparseVector. I'd imagine it's best used
> with SequentialAccessSparseVector, where "iterateNonZero()" has the
> most benefit?
>

If "that" is a DenseVector or RandomAccessSparseVector, it's easily
randomly accessible, and so iterating over "this" and operating on a copy
of "that" is efficient.  For that instanceof SequentialAccessSparseVector,
iterating over that is done in the else clause (which is efficient).  This
makes sense, right?


> But that's only an issue to optimize if you iterate over "this", like
> in my update above. Could we just have one implementation that
> iterates over "that"? It's more straightforward. The issue isn't
> really implementation but number of non-zero elements in "this" versus
> "that", as the TODO comments point out.
>

The efficiency points are twofold: number of nonzero elements, and
the impl: you don't want to iterate over a vector of any type while
continually calling setQuick() on a SequentialAccessSparseVector -
that is horribly inefficient, as it causes a binary search each time,
followed by a possible copy/shift of most of the index and value
arrays.

   -jake

Reply via email to