Mahout 0.8 compilation issue on hadoop 1.0.3

2013-09-18 Thread Mehant Baid
I was trying to compile Mahout-0.8 on Hadoop version 1.0.3. In 
DummyStatusReporter.java the @override directive is added for the method 
getProgress(). The getProgress() method does not exist in its base class 
(StatusReporter.java) in hadoop versions 1.0.3, 1.0.4 and is only 
included in 1.1.1 onwards and hence I am getting a compilation error.


Are these Hadoop versions not supported on Mahout-0.8 or should I open a 
JIRA for this issue.


Thanks
Mehant


Re: Why Kahan summation was not used anywhere?

2013-09-18 Thread Ted Dunning
This has come up before.

As a background, if you add up lots of numbers using a straightforward
loop, you can lose precision.  In the worse case the loss is O(n \epsilon),
but in virtually all real examples the lossage is O(\epsilon \sqrt(n)).  IF
we are summing a billion numbers, the square root is ~10^5 so we can
potentially lose 5 sig figs (out of 17 available with double precision).

Kahan summation increases the number of floating point operations by 4x,
but using a clever trick and manages to retain most of the bits that would
otherwise be lost.  Shewchuk summation uses divide and conquer to limit the
lossage with O(log n) storage and no increase in the number of flops.

There are several cases to consider:

1) online algorithms such as OnlineSummarizer.

2) dot product and friends.

3) general matrix decompositions

In the first case, we can often have millions or even billions of numbers
to analyze. that said, however, the input data is typically quite noisy and
signal to noise ratios > 100 are actually kind of rare in Mahout
applications.  Modified Shewchuk estimation (see below for details) could
decrease summation error from a few parts in 10^12 to less than 1 part in
10^12 at minimal cost.  These errors are 10^10 smaller than the noise in
our data so this seems not useful.

In the second case, we almost always are summing products of sparse
vectors.  Having thousands of non-zero elements is common but millions of
non-zeros are quite rare.  Billions of non-zeros are unheard of.  This
means that the errors are going to be trivial.

In the third case, we often have dense matrices, but the sizes are
typically on the order of 100 x 100 or less.  This makes the errors even
smaller than our common dot products.

To me, this seems to say that this isn't worth doing.  I am happy to be
corrected if you have counter evidence.

Note that BLAS does naive summation and none of the Mahout operations are
implemented using anything except double precision floating point.


Here is an experiment that tests to see how big the problem really is:

@Test
public void runKahanSum() {
Random gen = RandomUtils.getRandom();

double ksum = 0; // Kahan sum
double c = 0;// low order bits for Kahan sum
double sum = 0;  // naive sum
double[] vsum = new double[16];  // 8 way decomposed sum
for (int i = 0; i < 1e9; i++) {
double x = gen.nextDouble();

double y = x - c;
double t = ksum + y;
c = (t - ksum) - y;
ksum = t;

sum += x;

vsum[i % 16] += x;
}

// now add up the decomposed pieces
double zsum = 0;
for (int i = 0; i < vsum.length; i++) {
zsum += vsum[i];
}
System.out.printf("%.4f %.4f %.4f\n", ksum, 1e12 * (sum - ksum) /
ksum, 1e12 * (zsum - ksum) / ksum);
}

A typical result here is that naive summation gives results that are
accurate to within 1 part in 10^12, 8 way summation manages < 0.05 parts in
10^12 and 16 way summation is only slightly better than 8 way summation.

If the random numbers being summed are changed to have a mean of zero, then
the relative error increases to 1.7 parts in 10^12 and 0.3 parts in 10^12,
but the absolute error is much smaller.

Generally, it doesn't make sense to do the accumulation in float's because
these operations are almost always memory channel bound rather than CPU
bound.  Changing to floating point arithmetic in spite of this decreases
the accuracy to about 500 parts per million, 200 parts per million
respectively for naive summation and 8 way summation






On Wed, Sep 18, 2013 at 2:16 PM, Peng Cheng  wrote:

> For a large scale computational engine this seems unwashed. Most
> summation/average and dot product of vectors still use naive summation
> despite of its O(n) error.
>
> Is there a reason?
>
> All the best,
> Yours Peng
>
>


Why Kahan summation was not used anywhere?

2013-09-18 Thread Peng Cheng
For a large scale computational engine this seems unwashed. Most 
summation/average and dot product of vectors still use naive summation 
despite of its O(n) error.


Is there a reason?

All the best,
Yours Peng



[jira] [Commented] (MAHOUT-1322) TestDistributedRowMatrix.testTranspose is unstable

2013-09-18 Thread Stevo Slavic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770930#comment-13770930
 ] 

Stevo Slavic commented on MAHOUT-1322:
--

We have to check if this one is, like MAHOUT-1325, caused by MAPREDUCE-5367.

> TestDistributedRowMatrix.testTranspose is unstable
> --
>
> Key: MAHOUT-1322
> URL: https://issues.apache.org/jira/browse/MAHOUT-1322
> Project: Mahout
>  Issue Type: Bug
>  Components: Math
>Affects Versions: 0.8
> Environment: ubuntu3 Apache Jenkins CI node
>Reporter: Stevo Slavic
>Assignee: Stevo Slavic
>Priority: Minor
> Fix For: 0.9
>
>
> Mahout-Quality build job execution #2217 failed:
> {noformat}
> [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ mahout-core ---
> [INFO] Surefire report directory: 
> /x1/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/core/target/surefire-reports
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.mahout.cf.taste.impl.model.GenericItemPreferenceArrayTest
> Running org.apache.mahout.math.VectorWritableTest
> Running org.apache.mahout.cf.taste.hadoop.item.RecommenderJobTest
> Running org.apache.mahout.cf.taste.impl.model.BooleanUserPreferenceArrayTest
> Running org.apache.mahout.cf.taste.impl.model.file.FileIDMigratorTest
> Running org.apache.mahout.math.MatrixWritableTest
> Running org.apache.mahout.cf.taste.impl.model.GenericDataModelTest
> Running 
> org.apache.mahout.math.hadoop.similarity.TestVectorDistanceSimilarityJob
> Running org.apache.mahout.cf.taste.impl.model.MemoryIDMigratorTest
> Running org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest
> Running org.apache.mahout.ep.EvolutionaryProcessTest
> Running 
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJobTest
> Running org.apache.mahout.math.hadoop.TestDistributedRowMatrix
> Running org.apache.mahout.math.stats.SamplerTest
> Running 
> org.apache.mahout.math.hadoop.solver.TestDistributedConjugateGradientSolver
> Running org.apache.mahout.math.neighborhood.SearchSanityTest
> Running 
> org.apache.mahout.cf.taste.impl.model.PlusAnonymousConcurrentUserDataModelTest
> Running org.apache.mahout.math.hadoop.stats.BasicStatsTest
> Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.245 sec - 
> in org.apache.mahout.cf.taste.impl.model.GenericItemPreferenceArrayTest
> Running org.apache.mahout.cf.taste.hadoop.TopItemsQueueTest
> Running 
> org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDSolverSparseSequentialTest
> Running org.apache.mahout.math.neighborhood.LocalitySensitiveHashSearchTest
> Running org.apache.mahout.math.hadoop.stochasticsvd.SSVDCommonTest
> Running org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducerTest
> Running org.apache.mahout.math.ssvd.SequentialOutOfCoreSvdTest
> Running 
> org.apache.mahout.math.hadoop.solver.TestDistributedConjugateGradientSolverCLI
> Running org.apache.mahout.math.VarintTest
> Running 
> org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.VectorSimilarityMeasuresTest
> Running 
> org.apache.mahout.math.hadoop.decomposer.TestDistributedLanczosSolverCLI
> Running 
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJobTest
> Running org.apache.mahout.math.neighborhood.SearchQualityTest
> Running org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJobTest
> Running org.apache.mahout.cf.taste.impl.model.file.FileDataModelTest
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.33 sec - in 
> org.apache.mahout.cf.taste.impl.model.BooleanUserPreferenceArrayTest
> Running org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArrayTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.307 sec - 
> in org.apache.mahout.cf.taste.impl.model.MemoryIDMigratorTest
> Running org.apache.mahout.cf.taste.impl.model.BooleanItemPreferenceArrayTest
> Running org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDSolverDenseTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.383 sec - 
> in org.apache.mahout.math.MatrixWritableTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.383 sec - 
> in org.apache.mahout.cf.taste.impl.model.GenericDataModelTest
> Running org.apache.mahout.math.stats.OnlineAucTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.244 sec - 
> in org.apache.mahout.math.VarintTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.373 sec - 
> in org.apache.mahout.math.stats.SamplerTest
> Tests run: 1, Failures: 

[jira] [Commented] (MAHOUT-1323) ParallelALSFactorizationJobTest.completeJobToyExample is unstable

2013-09-18 Thread Stevo Slavic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770931#comment-13770931
 ] 

Stevo Slavic commented on MAHOUT-1323:
--

We have to check if this one is, like MAHOUT-1325, caused by MAPREDUCE-5367.

> ParallelALSFactorizationJobTest.completeJobToyExample is unstable
> -
>
> Key: MAHOUT-1323
> URL: https://issues.apache.org/jira/browse/MAHOUT-1323
> Project: Mahout
>  Issue Type: Bug
>  Components: Collaborative Filtering
>Affects Versions: 0.8
> Environment: ubuntu3 Apache Jenkins CI node
>Reporter: Stevo Slavic
>Assignee: Stevo Slavic
>Priority: Minor
> Fix For: 0.9
>
>
> Mahout-Quality build #2224 failed because of this test.
> Relevant build log:
> {noformat}
> [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ mahout-core ---
> [INFO] Surefire report directory: 
> /x1/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/core/target/surefire-reports
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.mahout.math.VarintTest
> Running 
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJobTest
> Running 
> org.apache.mahout.math.hadoop.solver.TestDistributedConjugateGradientSolver
> Running 
> org.apache.mahout.math.hadoop.decomposer.TestDistributedLanczosSolverCLI
> Running 
> org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.VectorSimilarityMeasuresTest
> Running org.apache.mahout.math.hadoop.TestDistributedRowMatrix
> Running org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDSolverDenseTest
> Running org.apache.mahout.math.VectorWritableTest
> Running org.apache.mahout.math.neighborhood.LocalitySensitiveHashSearchTest
> Running org.apache.mahout.cf.taste.impl.model.file.FileIDMigratorTest
> Running org.apache.mahout.math.neighborhood.SearchQualityTest
> Running 
> org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDSolverSparseSequentialTest
> Running org.apache.mahout.cf.taste.impl.model.MemoryIDMigratorTest
> Running org.apache.mahout.math.hadoop.stats.BasicStatsTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.09 sec - in 
> org.apache.mahout.math.VarintTest
> Running org.apache.mahout.math.ssvd.SequentialOutOfCoreSvdTest
> Running org.apache.mahout.ep.EvolutionaryProcessTest
> Running org.apache.mahout.cf.taste.hadoop.item.RecommenderJobTest
> Running 
> org.apache.mahout.math.hadoop.similarity.TestVectorDistanceSimilarityJob
> Running org.apache.mahout.math.neighborhood.SearchSanityTest
> Running 
> org.apache.mahout.math.hadoop.solver.TestDistributedConjugateGradientSolverCLI
> Running org.apache.mahout.math.stats.SamplerTest
> Running org.apache.mahout.math.MatrixWritableTest
> Running org.apache.mahout.math.hadoop.stochasticsvd.SSVDCommonTest
> Running org.apache.mahout.math.hadoop.stochasticsvd.LocalSSVDPCASparseTest
> Running org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducerTest
> Running 
> org.apache.mahout.cf.taste.impl.model.PlusAnonymousConcurrentUserDataModelTest
> Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.211 sec - 
> in 
> org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.VectorSimilarityMeasuresTest
> Running org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJobTest
> Running org.apache.mahout.cf.taste.impl.model.GenericItemPreferenceArrayTest
> Running org.apache.mahout.cf.taste.impl.model.BooleanItemPreferenceArrayTest
> Running org.apache.mahout.math.stats.OnlineAucTest
> Running 
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJobTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.317 sec - 
> in org.apache.mahout.cf.taste.impl.model.MemoryIDMigratorTest
> Running org.apache.mahout.cf.taste.impl.model.GenericDataModelTest
> Running org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArrayTest
> Running org.apache.mahout.cf.taste.impl.model.file.FileDataModelTest
> Tests run: 15, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.297 sec - 
> in 
> org.apache.mahout.cf.taste.impl.model.PlusAnonymousConcurrentUserDataModelTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.429 sec - 
> in org.apache.mahout.math.stats.SamplerTest
> Running org.apache.mahout.cf.taste.hadoop.TopItemsQueueTest
> Running org.apache.mahout.cf.taste.impl.model.BooleanUserPreferenceArrayTest
> Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.315 sec - 
> in org.apache.mahout.cf.taste.impl.model.BooleanItemPreferenceArrayTest
> Tests run: 2, Failures

[jira] [Commented] (MAHOUT-1325) MapReduce unit tests cannot be run in parallel

2013-09-18 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770917#comment-13770917
 ] 

Suneel Marthi commented on MAHOUT-1325:
---

That pretty much explains the mysterious astral rays that were failing random 
MR tests (during parallel execution) in the run up to 0.8 release; didn't have 
time then to get to the bottom of these failures.

> MapReduce unit tests cannot be run in parallel
> --
>
> Key: MAHOUT-1325
> URL: https://issues.apache.org/jira/browse/MAHOUT-1325
> Project: Mahout
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.8
> Environment: ubuntu1 Apache Jenkins CI node
>Reporter: Stevo Slavic
>Assignee: Stevo Slavic
>Priority: Minor
> Fix For: 0.9
>
> Attachments: 
> org.apache.mahout.text.SequenceFilesFromMailArchivesTest-output.txt
>
>
> Mahout-Quality build job execution #2225 failed because of this test.
> Relevant build log:
> {noformat}
> [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
> mahout-integration ---
> [INFO] Surefire report directory: 
> /home/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/integration/target/surefire-reports
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Running org.apache.mahout.utils.email.MailProcessorTest
> Running org.apache.mahout.clustering.cdbw.TestCDbwEvaluator
> Running org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.io.VectorWriterTest
> Running org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Running org.apache.mahout.clustering.TestClusterEvaluator
> Running org.apache.mahout.utils.regex.RegexUtilsTest
> Running org.apache.mahout.utils.vectors.lucene.DriverTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.17 sec - in 
> org.apache.mahout.utils.regex.RegexUtilsTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.412 sec - 
> in org.apache.mahout.utils.email.MailProcessorTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.344 sec - 
> in org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.clustering.TestClusterDumper
> Running org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.601 sec - 
> in org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Running org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.383 sec - 
> in org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.332 sec - 
> in org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.252 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.369 sec - 
> in org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.415 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.5 sec - in 
> org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.arff.DriverTest
> Running org.apache.mahout.utils.Bump125Test
> Running org.apache.mahout.utils.SplitInputTest
> Running org.apache.mahout.utils.TestConcatenateVectorsJob
> Running org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.043 sec - 
> in org.apache.mahout.utils.vectors.io.VectorWriterTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.362 sec - 
> in org.apache.mahout.utils.Bump125Test
> Running 
> org.apache.mahout.cf.taste.impl.similarity.jdbc.MySQLJDBCInMemoryItemSimilarityTest
> Running org.apache.mahout.text.SequenceFilesFromLuceneStorageTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.304 sec - 
> in org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.887 sec - 
> in org.apache.mahout.utils.vectors.arff.DriverTest
> Tests run: 1, Failures: 

[jira] [Commented] (MAHOUT-1325) MapReduce unit tests cannot be run in parallel

2013-09-18 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770890#comment-13770890
 ] 

Sean Owen commented on MAHOUT-1325:
---

Fair enough, I thought it was actually a small enough update to just go through 
with, and the tests passed for me. It would not hurt at all to make a JIRA, 
yes. I won't be touching this stuff again, no worries.

> MapReduce unit tests cannot be run in parallel
> --
>
> Key: MAHOUT-1325
> URL: https://issues.apache.org/jira/browse/MAHOUT-1325
> Project: Mahout
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.8
> Environment: ubuntu1 Apache Jenkins CI node
>Reporter: Stevo Slavic
>Assignee: Stevo Slavic
>Priority: Minor
> Fix For: 0.9
>
> Attachments: 
> org.apache.mahout.text.SequenceFilesFromMailArchivesTest-output.txt
>
>
> Mahout-Quality build job execution #2225 failed because of this test.
> Relevant build log:
> {noformat}
> [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
> mahout-integration ---
> [INFO] Surefire report directory: 
> /home/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/integration/target/surefire-reports
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Running org.apache.mahout.utils.email.MailProcessorTest
> Running org.apache.mahout.clustering.cdbw.TestCDbwEvaluator
> Running org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.io.VectorWriterTest
> Running org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Running org.apache.mahout.clustering.TestClusterEvaluator
> Running org.apache.mahout.utils.regex.RegexUtilsTest
> Running org.apache.mahout.utils.vectors.lucene.DriverTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.17 sec - in 
> org.apache.mahout.utils.regex.RegexUtilsTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.412 sec - 
> in org.apache.mahout.utils.email.MailProcessorTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.344 sec - 
> in org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.clustering.TestClusterDumper
> Running org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.601 sec - 
> in org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Running org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.383 sec - 
> in org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.332 sec - 
> in org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.252 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.369 sec - 
> in org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.415 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.5 sec - in 
> org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.arff.DriverTest
> Running org.apache.mahout.utils.Bump125Test
> Running org.apache.mahout.utils.SplitInputTest
> Running org.apache.mahout.utils.TestConcatenateVectorsJob
> Running org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.043 sec - 
> in org.apache.mahout.utils.vectors.io.VectorWriterTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.362 sec - 
> in org.apache.mahout.utils.Bump125Test
> Running 
> org.apache.mahout.cf.taste.impl.similarity.jdbc.MySQLJDBCInMemoryItemSimilarityTest
> Running org.apache.mahout.text.SequenceFilesFromLuceneStorageTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.304 sec - 
> in org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.887 sec - 
> in org.apache.mahout.utils.vectors.arff.DriverTest
> Tests run: 1, Failures: 0, E

[jira] [Updated] (MAHOUT-1325) MapReduce unit tests cannot be run in parallel

2013-09-18 Thread Stevo Slavic (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stevo Slavic updated MAHOUT-1325:
-

Summary: MapReduce unit tests cannot be run in parallel  (was: 
SequenceFilesFromMailArchivesTest.testMapReduce is unstable)

> MapReduce unit tests cannot be run in parallel
> --
>
> Key: MAHOUT-1325
> URL: https://issues.apache.org/jira/browse/MAHOUT-1325
> Project: Mahout
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.8
> Environment: ubuntu1 Apache Jenkins CI node
>Reporter: Stevo Slavic
>Assignee: Stevo Slavic
>Priority: Minor
> Fix For: 0.9
>
> Attachments: 
> org.apache.mahout.text.SequenceFilesFromMailArchivesTest-output.txt
>
>
> Mahout-Quality build job execution #2225 failed because of this test.
> Relevant build log:
> {noformat}
> [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
> mahout-integration ---
> [INFO] Surefire report directory: 
> /home/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/integration/target/surefire-reports
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Running org.apache.mahout.utils.email.MailProcessorTest
> Running org.apache.mahout.clustering.cdbw.TestCDbwEvaluator
> Running org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.io.VectorWriterTest
> Running org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Running org.apache.mahout.clustering.TestClusterEvaluator
> Running org.apache.mahout.utils.regex.RegexUtilsTest
> Running org.apache.mahout.utils.vectors.lucene.DriverTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.17 sec - in 
> org.apache.mahout.utils.regex.RegexUtilsTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.412 sec - 
> in org.apache.mahout.utils.email.MailProcessorTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.344 sec - 
> in org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.clustering.TestClusterDumper
> Running org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.601 sec - 
> in org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Running org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.383 sec - 
> in org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.332 sec - 
> in org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.252 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.369 sec - 
> in org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.415 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.5 sec - in 
> org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.arff.DriverTest
> Running org.apache.mahout.utils.Bump125Test
> Running org.apache.mahout.utils.SplitInputTest
> Running org.apache.mahout.utils.TestConcatenateVectorsJob
> Running org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.043 sec - 
> in org.apache.mahout.utils.vectors.io.VectorWriterTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.362 sec - 
> in org.apache.mahout.utils.Bump125Test
> Running 
> org.apache.mahout.cf.taste.impl.similarity.jdbc.MySQLJDBCInMemoryItemSimilarityTest
> Running org.apache.mahout.text.SequenceFilesFromLuceneStorageTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.304 sec - 
> in org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.887 sec - 
> in org.apache.mahout.utils.vectors.arff.DriverTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.486 sec - 
> in 
> org.apache.mahout.cf.taste.impl.similarity.jdbc.MySQLJDBCInMemoryItemSimilarityTest
> Runn

[jira] [Resolved] (MAHOUT-1321) TestSequenceFilesFromDirectory.testSequenceFileFromDirectoryMapReduce is unstable

2013-09-18 Thread Stevo Slavic (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stevo Slavic resolved MAHOUT-1321.
--

   Resolution: Duplicate
Fix Version/s: (was: 0.9)

Resolving as duplicate of MAHOUT-1325.

> TestSequenceFilesFromDirectory.testSequenceFileFromDirectoryMapReduce is 
> unstable
> -
>
> Key: MAHOUT-1321
> URL: https://issues.apache.org/jira/browse/MAHOUT-1321
> Project: Mahout
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.8
> Environment: ubuntu4 and ubuntu3 Apache Jenkins CI nodes
>Reporter: Stevo Slavic
>Assignee: Stevo Slavic
>
> Relevant Mahout-Quality job execution #2216 build output:
> {noformat}
> [INFO] --- maven-surefire-plugin:2.15:test (default-test) @ 
> mahout-integration ---
> [INFO] Surefire report directory: 
> /home/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/integration/target/surefire-reports
> [INFO] parallel='classes', perCoreThreadCount=false, threadCount=1, 
> useUnlimitedThreads=false
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.mahout.text.LuceneStorageConfigurationTest
> Running org.apache.mahout.text.LuceneSegmentInputSplitTest
> Running org.apache.mahout.text.LuceneSegmentInputFormatTest
> Running org.apache.mahout.text.SequenceFilesFromLuceneStorageDriverTest
> Running org.apache.mahout.text.MailArchivesClusteringAnalyzerTest
> Running org.apache.mahout.text.LuceneSegmentRecordReaderTest
> Running org.apache.mahout.text.SequenceFilesFromMailArchivesTest
> Running org.apache.mahout.text.SequenceFilesFromLuceneStorageMRJobTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.153 sec - 
> in org.apache.mahout.text.MailArchivesClusteringAnalyzerTest
> Running org.apache.mahout.text.SequenceFilesFromLuceneStorageTest
> Running org.apache.mahout.utils.vectors.VectorHelperTest
> Running org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.text.TestSequenceFilesFromDirectory
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.04 sec - in 
> org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.455 sec - 
> in org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.905 sec - 
> in org.apache.mahout.text.LuceneStorageConfigurationTest
> Running org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.029 sec - 
> in org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Running org.apache.mahout.utils.vectors.arff.DriverTest
> Running org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.194 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Running org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.791 sec - 
> in org.apache.mahout.utils.vectors.arff.DriverTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.034 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Running org.apache.mahout.utils.vectors.lucene.DriverTest
> Running org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Running org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Running org.apache.mahout.utils.vectors.io.VectorWriterTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.421 sec - 
> in org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.161 sec - 
> in org.apache.mahout.text.SequenceFilesFromMailArchivesTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.704 sec - 
> in org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 3.302 sec <<< 
> FAILURE! - in org.apache.mahout.text.TestSequenceFilesFromDirectory
> testSequenceFileFromDirectoryMapReduce(org.apache.mahout.text.TestSequenceFilesFromDirectory)
>   Time elapsed: 3.133 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<1> but was:<0>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:555)
>   at org.junit.Assert.assertEquals(Assert.java:542)
>   at 
> org.apac

[jira] [Commented] (MAHOUT-1325) SequenceFilesFromMailArchivesTest.testMapReduce is unstable

2013-09-18 Thread Stevo Slavic (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770867#comment-13770867
 ] 

Stevo Slavic commented on MAHOUT-1325:
--

This unwanted behavior appears to be caused by a bug in LocalJobRunner in 
Hadoop itself (see MAPREDUCE-5367), and it's affecting all of our mapreduce 
tests when they are run concurrently, in parallel.

We currently use hadoop-core 1.2.1 (btw, I wish that hadoop-core upgrade to 
1.2.1 for 0.9 was tracked in Mahout JIRA and changelog, IMO it's an important 
dependency, I wonder why [~srowen] didn't log it). I've tried, same issue is 
present in hadoop-core 1.1.2 (used in mahout 0.8 release). So we have to wait 
for hadoop-core 1.3.0 to be released since MAPREDUCE-5367 is tagged as fixed in 
that version/branch.

> SequenceFilesFromMailArchivesTest.testMapReduce is unstable
> ---
>
> Key: MAHOUT-1325
> URL: https://issues.apache.org/jira/browse/MAHOUT-1325
> Project: Mahout
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.8
> Environment: ubuntu1 Apache Jenkins CI node
>Reporter: Stevo Slavic
>Assignee: Stevo Slavic
>Priority: Minor
> Fix For: 0.9
>
> Attachments: 
> org.apache.mahout.text.SequenceFilesFromMailArchivesTest-output.txt
>
>
> Mahout-Quality build job execution #2225 failed because of this test.
> Relevant build log:
> {noformat}
> [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
> mahout-integration ---
> [INFO] Surefire report directory: 
> /home/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/integration/target/surefire-reports
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Running org.apache.mahout.utils.email.MailProcessorTest
> Running org.apache.mahout.clustering.cdbw.TestCDbwEvaluator
> Running org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.io.VectorWriterTest
> Running org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Running org.apache.mahout.clustering.TestClusterEvaluator
> Running org.apache.mahout.utils.regex.RegexUtilsTest
> Running org.apache.mahout.utils.vectors.lucene.DriverTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.17 sec - in 
> org.apache.mahout.utils.regex.RegexUtilsTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.412 sec - 
> in org.apache.mahout.utils.email.MailProcessorTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.344 sec - 
> in org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.clustering.TestClusterDumper
> Running org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.601 sec - 
> in org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Running org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.383 sec - 
> in org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.332 sec - 
> in org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.252 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.369 sec - 
> in org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.415 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.5 sec - in 
> org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.arff.DriverTest
> Running org.apache.mahout.utils.Bump125Test
> Running org.apache.mahout.utils.SplitInputTest
> Running org.apache.mahout.utils.TestConcatenateVectorsJob
> Running org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.043 sec - 
> in org.apache.mahout.utils.vectors.io.VectorWriterTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.362 sec - 
> in org.apache.mahout.utils.Bump125Test
> Running 
> org.apache.mahout.cf

[jira] [Resolved] (MAHOUT-1324) SequenceFilesFromLuceneStorageMRJobTest.testRun is unstable

2013-09-18 Thread Stevo Slavic (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stevo Slavic resolved MAHOUT-1324.
--

   Resolution: Duplicate
Fix Version/s: (was: 0.9)

Resolving as duplicate of MAHOUT-1325.

> SequenceFilesFromLuceneStorageMRJobTest.testRun is unstable
> ---
>
> Key: MAHOUT-1324
> URL: https://issues.apache.org/jira/browse/MAHOUT-1324
> Project: Mahout
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.8
> Environment: ubuntu2 Apache Jenkins CI node
>Reporter: Stevo Slavic
>Assignee: Stevo Slavic
>Priority: Minor
>
> Mahout-Quality build job execution #2223 failed because of this test.
> Relevant build log output:
> {noformat}
> [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
> mahout-integration ---
> [INFO] Surefire report directory: 
> /home/hudson/jenkins-slave/workspace/Mahout-Quality/trunk/integration/target/surefire-reports
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Running org.apache.mahout.utils.vectors.arff.DriverTest
> Running org.apache.mahout.utils.SplitInputTest
> Running org.apache.mahout.utils.vectors.lucene.DriverTest
> Running org.apache.mahout.utils.regex.RegexUtilsTest
> Running org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.email.MailProcessorTest
> Running org.apache.mahout.utils.Bump125Test
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.164 sec - 
> in org.apache.mahout.utils.regex.RegexUtilsTest
> Running org.apache.mahout.utils.TestConcatenateVectorsJob
> Running org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.374 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.219 sec - 
> in org.apache.mahout.utils.Bump125Test
> Running org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Running org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.437 sec - 
> in org.apache.mahout.utils.email.MailProcessorTest
> Running org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.364 sec - 
> in org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.827 sec - 
> in org.apache.mahout.utils.vectors.arff.DriverTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.211 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Running org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Running org.apache.mahout.utils.vectors.io.VectorWriterTest
> Running org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.182 sec - 
> in org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.296 sec - 
> in org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.317 sec - 
> in org.apache.mahout.utils.TestConcatenateVectorsJob
> Running org.apache.mahout.text.SequenceFilesFromLuceneStorageTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.191 sec - 
> in org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.492 sec - 
> in org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.text.LuceneSegmentInputSplitTest
> Running org.apache.mahout.text.LuceneSegmentRecordReaderTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.635 sec - 
> in org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Running org.apache.mahout.text.MailArchivesClusteringAnalyzerTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.649 sec - 
> in org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.342 sec - 
> in org.apache.mahout.text.MailArchivesClusteringAnalyzerTest
> Running org.apache.mahout.text.LuceneSegmentInputFormatTest
> Running org.apache.mahout.text.SequenceFilesFromLuceneStorageMRJobTest
> Running org.apache.mahout.text.TestSequenceFilesFromDirectory
> Running org.apache.mah

[jira] [Updated] (MAHOUT-1325) SequenceFilesFromMailArchivesTest.testMapReduce is unstable

2013-09-18 Thread Stevo Slavic (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stevo Slavic updated MAHOUT-1325:
-

Attachment: 
org.apache.mahout.text.SequenceFilesFromMailArchivesTest-output.txt

Whenever mapreduce test fails, there is a class cast exception in stacktrace in 
test output file like in attached 
[^org.apache.mahout.text.SequenceFilesFromMailArchivesTest-output.txt]

> SequenceFilesFromMailArchivesTest.testMapReduce is unstable
> ---
>
> Key: MAHOUT-1325
> URL: https://issues.apache.org/jira/browse/MAHOUT-1325
> Project: Mahout
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.8
> Environment: ubuntu1 Apache Jenkins CI node
>Reporter: Stevo Slavic
>Assignee: Stevo Slavic
>Priority: Minor
> Fix For: 0.9
>
> Attachments: 
> org.apache.mahout.text.SequenceFilesFromMailArchivesTest-output.txt
>
>
> Mahout-Quality build job execution #2225 failed because of this test.
> Relevant build log:
> {noformat}
> [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ 
> mahout-integration ---
> [INFO] Surefire report directory: 
> /home/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/integration/target/surefire-reports
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Running org.apache.mahout.utils.email.MailProcessorTest
> Running org.apache.mahout.clustering.cdbw.TestCDbwEvaluator
> Running org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.io.VectorWriterTest
> Running org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Running org.apache.mahout.clustering.TestClusterEvaluator
> Running org.apache.mahout.utils.regex.RegexUtilsTest
> Running org.apache.mahout.utils.vectors.lucene.DriverTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.17 sec - in 
> org.apache.mahout.utils.regex.RegexUtilsTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.412 sec - 
> in org.apache.mahout.utils.email.MailProcessorTest
> Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.344 sec - 
> in org.apache.mahout.utils.nlp.collocations.llr.BloomTokenFilterTest
> Running org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.clustering.TestClusterDumper
> Running org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.601 sec - 
> in org.apache.mahout.utils.vectors.csv.CSVVectorIteratorTest
> Running org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Running org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.383 sec - 
> in org.apache.mahout.utils.vectors.VectorHelperTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.332 sec - 
> in org.apache.mahout.utils.vectors.lucene.LuceneIterableTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.252 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFTypeTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.369 sec - 
> in org.apache.mahout.utils.vectors.lucene.CachedTermInfoTest
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.415 sec - 
> in org.apache.mahout.utils.vectors.arff.ARFFVectorIterableTest
> Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.5 sec - in 
> org.apache.mahout.utils.regex.RegexMapperTest
> Running org.apache.mahout.utils.vectors.arff.DriverTest
> Running org.apache.mahout.utils.Bump125Test
> Running org.apache.mahout.utils.SplitInputTest
> Running org.apache.mahout.utils.TestConcatenateVectorsJob
> Running org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.043 sec - 
> in org.apache.mahout.utils.vectors.io.VectorWriterTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.362 sec - 
> in org.apache.mahout.utils.Bump125Test
> Running 
> org.apache.mahout.cf.taste.impl.similarity.jdbc.MySQLJDBCInMemoryItemSimilarityTest
> Running org.apache.mahout.text.SequenceFilesFromLuceneStorageTest
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.304 sec - 
> in org.apache.mahout.utils.vectors.arff.MapBackedARFFModelTest
> Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.887 sec - 
> in org.apache.mahout.utils.vectors.arff.Driver

[jira] [Updated] (MAHOUT-1338) Reduce mahout-integration transitive dependencies to avoid JAR hell, version conflicts

2013-09-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated MAHOUT-1338:
--

Status: Patch Available  (was: Open)

> Reduce mahout-integration transitive dependencies to avoid JAR hell, version 
> conflicts
> --
>
> Key: MAHOUT-1338
> URL: https://issues.apache.org/jira/browse/MAHOUT-1338
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
> Attachments: MAHOUT-1338.patch
>
>
> mahout-integration contains bits of client and connector code for a lot of 
> projects, like Lucene, Cassandra, MongoDB, etc. As such, its transitive 
> dependencies in Maven pull in quite a lot. 
> Most of these are unnecessary for any particular user, since probably at most 
> one client/package is of interest. In fact, mahout-integration is not used by 
> most users at all. 
> In the worst case, it causes actual version problems when trying to package 
> up the transitive dependencies of something depending on Mahout.
> I suggest several changes along these lines, all of which are represented in 
> the attached patch:
> 1. Remove direct lucene-core and cassandra-all dependencies, as they are not 
> necessary
> 2. Mark all dependencies like hector, mongodb, etc as optional in Maven
> 3. In fact, mark mahout-examples, mahout-buildtools and mahout-integration as 
> optional with respect to the overall project.
> 4. Bonus: update Cassandra client version to pull in slightly newer deps

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAHOUT-1338) Reduce mahout-integration transitive dependencies to avoid JAR hell, version conflicts

2013-09-18 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated MAHOUT-1338:
--

Attachment: MAHOUT-1338.patch

> Reduce mahout-integration transitive dependencies to avoid JAR hell, version 
> conflicts
> --
>
> Key: MAHOUT-1338
> URL: https://issues.apache.org/jira/browse/MAHOUT-1338
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
> Attachments: MAHOUT-1338.patch
>
>
> mahout-integration contains bits of client and connector code for a lot of 
> projects, like Lucene, Cassandra, MongoDB, etc. As such, its transitive 
> dependencies in Maven pull in quite a lot. 
> Most of these are unnecessary for any particular user, since probably at most 
> one client/package is of interest. In fact, mahout-integration is not used by 
> most users at all. 
> In the worst case, it causes actual version problems when trying to package 
> up the transitive dependencies of something depending on Mahout.
> I suggest several changes along these lines, all of which are represented in 
> the attached patch:
> 1. Remove direct lucene-core and cassandra-all dependencies, as they are not 
> necessary
> 2. Mark all dependencies like hector, mongodb, etc as optional in Maven
> 3. In fact, mark mahout-examples, mahout-buildtools and mahout-integration as 
> optional with respect to the overall project.
> 4. Bonus: update Cassandra client version to pull in slightly newer deps

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAHOUT-1338) Reduce mahout-integration transitive dependencies to avoid JAR hell, version conflicts

2013-09-18 Thread Sean Owen (JIRA)
Sean Owen created MAHOUT-1338:
-

 Summary: Reduce mahout-integration transitive dependencies to 
avoid JAR hell, version conflicts
 Key: MAHOUT-1338
 URL: https://issues.apache.org/jira/browse/MAHOUT-1338
 Project: Mahout
  Issue Type: Improvement
Affects Versions: 0.8
Reporter: Sean Owen
Assignee: Sean Owen
Priority: Minor


mahout-integration contains bits of client and connector code for a lot of 
projects, like Lucene, Cassandra, MongoDB, etc. As such, its transitive 
dependencies in Maven pull in quite a lot. 

Most of these are unnecessary for any particular user, since probably at most 
one client/package is of interest. In fact, mahout-integration is not used by 
most users at all. 

In the worst case, it causes actual version problems when trying to package up 
the transitive dependencies of something depending on Mahout.

I suggest several changes along these lines, all of which are represented in 
the attached patch:

1. Remove direct lucene-core and cassandra-all dependencies, as they are not 
necessary
2. Mark all dependencies like hector, mongodb, etc as optional in Maven
3. In fact, mark mahout-examples, mahout-buildtools and mahout-integration as 
optional with respect to the overall project.
4. Bonus: update Cassandra client version to pull in slightly newer deps

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira