Robin, I addressed the 7 points you mentioned, except for the one about empty vectors. It's unclear what aggregate should return gor empty vectors (either through an exception or 0) and so I opted for 0 everywhere.
The new tests are VectorBinaryAssignTest, VectorBinaryAssignCostTest, VectorBinaryAggregateTest, VectorBinaryAggregateCostTest and FunctionTest. The Times benchmark should run more quickly now – there was a tiny bug in createOptimizedCopy that asked if "isDense()" rather than "vector.isDense()" which made it not create the right kind of vector. Comments-wise, I would *really* like this do be over, I feel like I've explained the whole iteration to death by now in tons of places. On Thu, May 2, 2013 at 8:07 PM, Dan Filimon <dangeorge.fili...@gmail.com>wrote: > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10669/ > Review request for mahout, Ted Dunning, Sebastian Schelter, and Robin > Anil. > By Dan Filimon. > > *Updated May 2, 2013, 5:07 p.m.* > Changes > > This is the nearly-done version of the code. > > The bulk of the changes are in AbstractVector, VectorBinaryAssign, > VectorBinaryAggregate and Functions. > > As for comments in the new code, I felt the names are fairly self explanatory > so didn't comment every class. > I did however write comments on top describing the changes as well as write > tests and link to the design doc. > > Description > > This patch contains code cleaning up AbstractVector and making the operations > as fast as possible while still having a high level interface. > > The main changes are in AbstractVector as well as new methods in > DoubleDoubleFunction. > > Testing > > The vectors test pass but it's likely that the patch in it's current state is > broken as other, unrelated tests (BallKMeans...) are failing. > Also, my Hadoop conf is broken so I didn't run all the core tests. Anyone? > > I can't seem to find the bug, so _please_ have a closer look. It's still a > work in progress. > > The benchmarks seem comparable (although there are some jarring diferences – > Minkowski distance seems a lot slower in new-dan-1 than old-trunk-2). It may > be however that this is just variance due to the load of the machine at the > time. I'm having trouble interpreting the benchmarks in general, so anyone > who could give me a hand is more than welcome. > > Diffs (updated) > > - > core/src/main/java/org/apache/mahout/common/distance/ChebyshevDistanceMeasure.java > (0a064c9) > - > core/src/main/java/org/apache/mahout/common/distance/CosineDistanceMeasure.java > (0c51591) > - > core/src/main/java/org/apache/mahout/common/distance/ManhattanDistanceMeasure.java > (c98e5f7) > - > core/src/main/java/org/apache/mahout/common/distance/MinkowskiDistanceMeasure.java > (b650a8d) > - > core/src/main/java/org/apache/mahout/common/distance/TanimotoDistanceMeasure.java > (d32c42a) > - core/src/main/java/org/apache/mahout/ep/Mapping.java (900a0b8) > - > core/src/main/java/org/apache/mahout/math/hadoop/stochasticsvd/qr/GivensThinSolver.java > (7c391ef) > - > core/src/test/java/org/apache/mahout/clustering/iterator/TestClusterClassifier.java > (62c5acf) > - > core/src/test/java/org/apache/mahout/common/distance/CosineDistanceMeasureTest.java > (50b03f0) > - math/src/main/java/org/apache/mahout/math/AbstractMatrix.java > (e12aa38) > - math/src/main/java/org/apache/mahout/math/AbstractVector.java > (090aa7a) > - math/src/main/java/org/apache/mahout/math/Centroid.java (0c42196) > - math/src/main/java/org/apache/mahout/math/ConstantVector.java > (51d67d4) > - math/src/main/java/org/apache/mahout/math/DelegatingVector.java > (12220d4) > - math/src/main/java/org/apache/mahout/math/DenseVector.java (41c356b) > - > math/src/main/java/org/apache/mahout/math/FileBasedSparseBinaryMatrix.java > (094003b) > - math/src/main/java/org/apache/mahout/math/MatrixSlice.java (7f79c96) > - math/src/main/java/org/apache/mahout/math/MatrixVectorView.java > (af70727) > - math/src/main/java/org/apache/mahout/math/NamedVector.java (4b7a41d) > - math/src/main/java/org/apache/mahout/math/OrderedIntDoubleMapping.java > (650d82d) > - math/src/main/java/org/apache/mahout/math/PermutedVectorView.java > (d1ea93a) > - math/src/main/java/org/apache/mahout/math/RandomAccessSparseVector.java > (6f85692) > - > math/src/main/java/org/apache/mahout/math/SequentialAccessSparseVector.java > (21982f9) > - math/src/main/java/org/apache/mahout/math/Vector.java (2f8b417) > - math/src/main/java/org/apache/mahout/math/VectorBinaryAggregate.java > (PRE-CREATION) > - math/src/main/java/org/apache/mahout/math/VectorBinaryAssign.java > (PRE-CREATION) > - math/src/main/java/org/apache/mahout/math/VectorView.java (add2a60) > - math/src/main/java/org/apache/mahout/math/WeightedVector.java > (06fbd05) > - > math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java > (893047a) > - > math/src/main/java/org/apache/mahout/math/function/DoubleDoubleFunction.java > (82b350a) > - math/src/main/java/org/apache/mahout/math/function/DoubleFunction.java > (9eb98a3) > - math/src/main/java/org/apache/mahout/math/function/Functions.java > (eb2e42f) > - math/src/main/java/org/apache/mahout/math/function/Mult.java > (6801a94) > - math/src/main/java/org/apache/mahout/math/function/PlusMult.java > (60587b1) > - > math/src/main/java/org/apache/mahout/math/function/SquareRootFunction.java > (0b6f4b0) > - math/src/main/java/org/apache/mahout/math/function/TimesFunction.java > (8ab0bb1) > - math/src/main/java/org/apache/mahout/math/jet/math/Constants.java > (53535d6) > - > math/src/main/java/org/apache/mahout/math/jet/random/AbstractDistribution.java > (9728fe0) > - > math/src/main/java/org/apache/mahout/math/jet/random/engine/RandomEngine.java > (e31d862) > - > math/src/main/java/org/apache/mahout/math/random/AbstractSamplerFunction.java > (f98a5b7) > - math/src/main/java/org/apache/mahout/math/random/Multinomial.java > (41f2712) > - math/src/main/java/org/apache/mahout/math/random/WeightedThing.java > (38a1cde) > - math/src/test/java/org/apache/mahout/math/FunctionTest.java > (PRE-CREATION) > - > math/src/test/java/org/apache/mahout/math/VectorBinaryAggregateCostTest.java > (PRE-CREATION) > - math/src/test/java/org/apache/mahout/math/VectorBinaryAggregateTest.java > (PRE-CREATION) > - math/src/test/java/org/apache/mahout/math/VectorBinaryAssignCostTest.java > (PRE-CREATION) > - math/src/test/java/org/apache/mahout/math/VectorBinaryAssignTest.java > (PRE-CREATION) > - math/src/test/java/org/apache/mahout/math/VectorTest.java (d6d554b) > - math/src/test/java/org/apache/mahout/math/random/MultinomialTest.java > (8981f8b) > > View Diff <https://reviews.apache.org/r/10669/diff/> >