----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/10193/#review18511 -----------------------------------------------------------
core/src/test/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansTestMR.java <https://reviews.apache.org/r/10193/#comment38797> This is commented out because FastProjectionSearch is currently broken. - Dan Filimon On March 29, 2013, 1:27 p.m., Dan Filimon wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/10193/ > ----------------------------------------------------------- > > (Updated March 29, 2013, 1:27 p.m.) > > > Review request for mahout, Ted Dunning and Sebastian Schelter. > > > Description > ------- > > This patch implements the MapReduce version of StreamingKMeans for > MAHOUT-1154. > > It adds 5 new classes: > - CentroidWritable: class representing a centroid that can be written to a > SeqFile > - StreamingKMeansDriver: class implementing AbstractJob that is the entry > point to the mapreduction > - StreamingKMeansMapper: mapper, running StreamingKMeans (see MAHOUT-1162) > clustering the points one by one > - StreamingKMeansReducer: reducer, running BallKMeans (see MAHOUT-1162) a > number of times and picking the clustering with the lowest total clustering > cost. > The cost is determined by randomly splitting the incoming centroids into a > "training" and "test" set, computing the centroids on the training set and > the cost on the test set. The intent is to see whether the centroids actually > describe the distribution of the points or not. > - StreamingKMeansUtilMR: helper class with a method to instantiate a searcher > from a Configuration. > > Additionally, there is a test class StreamingKMeansTestMR that tests the > mapper, reducer and mapper and reducer together using MRUnit. > > !!! > Since MRUnit is now a dependency, the core pom.xml file adds MRUnit as a > dependency. We depend on snapshot 1.0 which is not yet released (it will be > very soon), hence the updated pom.xml is not provided for now. > !!! > > > Diffs > ----- > > > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/CentroidWritable.java > PRE-CREATION > > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansDriver.java > PRE-CREATION > > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansMapper.java > PRE-CREATION > > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansReducer.java > PRE-CREATION > > core/src/main/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansUtilsMR.java > PRE-CREATION > > core/src/test/java/org/apache/mahout/clustering/streaming/mapreduce/StreamingKMeansTestMR.java > PRE-CREATION > src/conf/driver.classes.default.props ac45eef > > Diff: https://reviews.apache.org/r/10193/diff/ > > > Testing > ------- > > See StreamingKMeansTestMR for the tests. These are all performed on data > sample from a "hypercube" distribution (there are multinormal distributions > in each vertex of the cube). > Additionally there are ongoing tests on the 20 newsgroups data set (and some > more are on the way). > > > Thanks, > > Dan Filimon > >
