Robin, I addressed the 7 points you mentioned, except for the one about
empty vectors.
It's unclear what aggregate should return gor empty vectors (either through
an exception or 0) and so I opted for 0 everywhere.

The new tests are VectorBinaryAssignTest, VectorBinaryAssignCostTest,
VectorBinaryAggregateTest, VectorBinaryAggregateCostTest and FunctionTest.

The Times benchmark should run more quickly now – there was a tiny bug in
createOptimizedCopy that asked if "isDense()" rather than
"vector.isDense()" which made it not create the right kind of vector.

Comments-wise, I would *really* like this do be over, I feel like I've
explained the whole iteration to death by now in tons of places.



On Thu, May 2, 2013 at 8:07 PM, Dan Filimon <dangeorge.fili...@gmail.com>wrote:

>    This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10669/
>   Review request for mahout, Ted Dunning, Sebastian Schelter, and Robin
> Anil.
> By Dan Filimon.
>
> *Updated May 2, 2013, 5:07 p.m.*
> Changes
>
> This is the nearly-done version of the code.
>
> The bulk of the changes are in AbstractVector, VectorBinaryAssign, 
> VectorBinaryAggregate and Functions.
>
> As for comments in the new code, I felt the names are fairly self explanatory 
> so didn't comment every class.
> I did however write comments on top describing the changes as well as write 
> tests and link to the design doc.
>
>   Description
>
> This patch contains code cleaning up AbstractVector and making the operations 
> as fast as possible while still having a high level interface.
>
> The main changes are in AbstractVector as well as new methods in 
> DoubleDoubleFunction.
>
>   Testing
>
> The vectors test pass but it's likely that the patch in it's current state is 
> broken as other, unrelated tests (BallKMeans...) are failing.
> Also, my Hadoop conf is broken so I didn't run all the core tests. Anyone?
>
> I can't seem to find the bug, so _please_ have a closer look. It's still a 
> work in progress.
>
> The benchmarks seem comparable (although there are some jarring diferences – 
> Minkowski distance seems a lot slower in new-dan-1 than old-trunk-2). It may 
> be however that this is just variance due to the load of the machine at the 
> time. I'm having trouble interpreting the benchmarks in general, so anyone 
> who could give me a hand is more than welcome.
>
>   Diffs (updated)
>
>    - 
> core/src/main/java/org/apache/mahout/common/distance/ChebyshevDistanceMeasure.java
>    (0a064c9)
>    - 
> core/src/main/java/org/apache/mahout/common/distance/CosineDistanceMeasure.java
>    (0c51591)
>    - 
> core/src/main/java/org/apache/mahout/common/distance/ManhattanDistanceMeasure.java
>    (c98e5f7)
>    - 
> core/src/main/java/org/apache/mahout/common/distance/MinkowskiDistanceMeasure.java
>    (b650a8d)
>    - 
> core/src/main/java/org/apache/mahout/common/distance/TanimotoDistanceMeasure.java
>    (d32c42a)
>    - core/src/main/java/org/apache/mahout/ep/Mapping.java (900a0b8)
>    - 
> core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/GivensThinSolver.java
>    (7c391ef)
>    - 
> core/src/test/java/org/apache/mahout/clustering/iterator/TestClusterClassifier.java
>    (62c5acf)
>    - 
> core/src/test/java/org/apache/mahout/common/distance/CosineDistanceMeasureTest.java
>    (50b03f0)
>    - math/src/main/java/org/apache/mahout/math/AbstractMatrix.java
>    (e12aa38)
>    - math/src/main/java/org/apache/mahout/math/AbstractVector.java
>    (090aa7a)
>    - math/src/main/java/org/apache/mahout/math/Centroid.java (0c42196)
>    - math/src/main/java/org/apache/mahout/math/ConstantVector.java
>    (51d67d4)
>    - math/src/main/java/org/apache/mahout/math/DelegatingVector.java
>    (12220d4)
>    - math/src/main/java/org/apache/mahout/math/DenseVector.java (41c356b)
>    - 
> math/src/main/java/org/apache/mahout/math/FileBasedSparseBinaryMatrix.java
>    (094003b)
>    - math/src/main/java/org/apache/mahout/math/MatrixSlice.java (7f79c96)
>    - math/src/main/java/org/apache/mahout/math/MatrixVectorView.java
>    (af70727)
>    - math/src/main/java/org/apache/mahout/math/NamedVector.java (4b7a41d)
>    - math/src/main/java/org/apache/mahout/math/OrderedIntDoubleMapping.java
>    (650d82d)
>    - math/src/main/java/org/apache/mahout/math/PermutedVectorView.java
>    (d1ea93a)
>    - math/src/main/java/org/apache/mahout/math/RandomAccessSparseVector.java
>    (6f85692)
>    - 
> math/src/main/java/org/apache/mahout/math/SequentialAccessSparseVector.java
>    (21982f9)
>    - math/src/main/java/org/apache/mahout/math/Vector.java (2f8b417)
>    - math/src/main/java/org/apache/mahout/math/VectorBinaryAggregate.java
>    (PRE-CREATION)
>    - math/src/main/java/org/apache/mahout/math/VectorBinaryAssign.java
>    (PRE-CREATION)
>    - math/src/main/java/org/apache/mahout/math/VectorView.java (add2a60)
>    - math/src/main/java/org/apache/mahout/math/WeightedVector.java
>    (06fbd05)
>    - 
> math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java
>    (893047a)
>    - 
> math/src/main/java/org/apache/mahout/math/function/DoubleDoubleFunction.java
>    (82b350a)
>    - math/src/main/java/org/apache/mahout/math/function/DoubleFunction.java
>    (9eb98a3)
>    - math/src/main/java/org/apache/mahout/math/function/Functions.java
>    (eb2e42f)
>    - math/src/main/java/org/apache/mahout/math/function/Mult.java
>    (6801a94)
>    - math/src/main/java/org/apache/mahout/math/function/PlusMult.java
>    (60587b1)
>    - 
> math/src/main/java/org/apache/mahout/math/function/SquareRootFunction.java
>    (0b6f4b0)
>    - math/src/main/java/org/apache/mahout/math/function/TimesFunction.java
>    (8ab0bb1)
>    - math/src/main/java/org/apache/mahout/math/jet/math/Constants.java
>    (53535d6)
>    - 
> math/src/main/java/org/apache/mahout/math/jet/random/AbstractDistribution.java
>    (9728fe0)
>    - 
> math/src/main/java/org/apache/mahout/math/jet/random/engine/RandomEngine.java
>    (e31d862)
>    - 
> math/src/main/java/org/apache/mahout/math/random/AbstractSamplerFunction.java
>    (f98a5b7)
>    - math/src/main/java/org/apache/mahout/math/random/Multinomial.java
>    (41f2712)
>    - math/src/main/java/org/apache/mahout/math/random/WeightedThing.java
>    (38a1cde)
>    - math/src/test/java/org/apache/mahout/math/FunctionTest.java
>    (PRE-CREATION)
>    - 
> math/src/test/java/org/apache/mahout/math/VectorBinaryAggregateCostTest.java
>    (PRE-CREATION)
>    - math/src/test/java/org/apache/mahout/math/VectorBinaryAggregateTest.java
>    (PRE-CREATION)
>    - math/src/test/java/org/apache/mahout/math/VectorBinaryAssignCostTest.java
>    (PRE-CREATION)
>    - math/src/test/java/org/apache/mahout/math/VectorBinaryAssignTest.java
>    (PRE-CREATION)
>    - math/src/test/java/org/apache/mahout/math/VectorTest.java (d6d554b)
>    - math/src/test/java/org/apache/mahout/math/random/MultinomialTest.java
>    (8981f8b)
>
> View Diff <https://reviews.apache.org/r/10669/diff/>
>

Reply via email to