On Mon, Apr 19, 2010 at 9:13 AM, Sean Owen <sro...@gmail.com> wrote: > More on Vector, as I'm browsing through it: > > AbstractVector.minus(Vector) says: >
//snip > The stanza after the instanceof checks can just become the body of an > overriding method in these two subclasses right? > Yep, sure. > Since we're computing "this - that", makes sense to only look at > "that" where it is nonzero. But the first version iterates over > indices where "this" is nonzero, so it's wrong. (Yeah just checked > with a test.) > Yikes. Maybe MAHOUT-303 should really be higher prioritized, huh? > Was the intent to compute "that - this" in this case, so to be able to > iterate over nonzero elements of "this", and then invert it at the > end? This works: > > @Override > public Vector minus(Vector that) { > if (this.size() != that.size()) { > throw new CardinalityException(); > } > Vector result = that.clone(); > Iterator<Element> iter = this.iterateNonZero(); > while (iter.hasNext()) { > Element thisElement = iter.next(); > int index = thisElement.index(); > result.setQuick(index, that.getQuick(index) - > thisElement.get()); // this is "-(that - this)" > } > return result.times(-1.0); > } > > It's nice but involves another Vector allocation at the end. > You can avoid the Vector allocation by replacing result.times(-1.0) with result.assign(Functions.negate) > But then I'm also confused since this is the version intended for > DenseVector and RandomAccessSparseVector. I'd imagine it's best used > with SequentialAccessSparseVector, where "iterateNonZero()" has the > most benefit? > If "that" is a DenseVector or RandomAccessSparseVector, it's easily randomly accessible, and so iterating over "this" and operating on a copy of "that" is efficient. For that instanceof SequentialAccessSparseVector, iterating over that is done in the else clause (which is efficient). This makes sense, right? > But that's only an issue to optimize if you iterate over "this", like > in my update above. Could we just have one implementation that > iterates over "that"? It's more straightforward. The issue isn't > really implementation but number of non-zero elements in "this" versus > "that", as the TODO comments point out. > The efficiency points are twofold: number of nonzero elements, and the impl: you don't want to iterate over a vector of any type while continually calling setQuick() on a SequentialAccessSparseVector - that is horribly inefficient, as it causes a binary search each time, followed by a possible copy/shift of most of the index and value arrays. -jake