On Mon, Apr 19, 2010 at 9:13 AM, Sean Owen <[email protected]> wrote:
> More on Vector, as I'm browsing through it:
>
> AbstractVector.minus(Vector) says:
>
//snip
> The stanza after the instanceof checks can just become the body of an
> overriding method in these two subclasses right?
>
Yep, sure.
> Since we're computing "this - that", makes sense to only look at
> "that" where it is nonzero. But the first version iterates over
> indices where "this" is nonzero, so it's wrong. (Yeah just checked
> with a test.)
>
Yikes. Maybe MAHOUT-303 should really be higher prioritized, huh?
> Was the intent to compute "that - this" in this case, so to be able to
> iterate over nonzero elements of "this", and then invert it at the
> end? This works:
>
> @Override
> public Vector minus(Vector that) {
> if (this.size() != that.size()) {
> throw new CardinalityException();
> }
> Vector result = that.clone();
> Iterator<Element> iter = this.iterateNonZero();
> while (iter.hasNext()) {
> Element thisElement = iter.next();
> int index = thisElement.index();
> result.setQuick(index, that.getQuick(index) -
> thisElement.get()); // this is "-(that - this)"
> }
> return result.times(-1.0);
> }
>
> It's nice but involves another Vector allocation at the end.
>
You can avoid the Vector allocation by replacing
result.times(-1.0)
with
result.assign(Functions.negate)
> But then I'm also confused since this is the version intended for
> DenseVector and RandomAccessSparseVector. I'd imagine it's best used
> with SequentialAccessSparseVector, where "iterateNonZero()" has the
> most benefit?
>
If "that" is a DenseVector or RandomAccessSparseVector, it's easily
randomly accessible, and so iterating over "this" and operating on a copy
of "that" is efficient. For that instanceof SequentialAccessSparseVector,
iterating over that is done in the else clause (which is efficient). This
makes sense, right?
> But that's only an issue to optimize if you iterate over "this", like
> in my update above. Could we just have one implementation that
> iterates over "that"? It's more straightforward. The issue isn't
> really implementation but number of non-zero elements in "this" versus
> "that", as the TODO comments point out.
>
The efficiency points are twofold: number of nonzero elements, and
the impl: you don't want to iterate over a vector of any type while
continually calling setQuick() on a SequentialAccessSparseVector -
that is horribly inefficient, as it causes a binary search each time,
followed by a possible copy/shift of most of the index and value
arrays.
-jake