[jira] [Created] (MAHOUT-1290) Issue when running Mahout Recommender Demo

2013-07-24 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1290:
-

 Summary: Issue when running Mahout Recommender Demo
 Key: MAHOUT-1290
 URL: https://issues.apache.org/jira/browse/MAHOUT-1290
 Project: Mahout
  Issue Type: Bug
  Components: Examples
Affects Versions: 0.8
Reporter: Suneel Marthi
 Fix For: 0.9


When running jetty:run under *mahout-integration*, seeing a 
ClassNotFoundException:
 org.apache.mahout.cf.taste.**example.grouplens.**GroupLensRecommender.

The problem is happening because the webapp
folder wasn't moved to the examples dir and the Jetty dependency wasn't added 
asa Maven plugin when the GroupLens example moved to the examples submodule. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: mahout-nightly #1301

2013-07-24 Thread Apache Jenkins Server
See 

--
[...truncated 1557 lines...]
Running org.apache.mahout.cf.taste.impl.common.InvertedRunningAverageTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.009 sec - in 
org.apache.mahout.cf.taste.impl.common.InvertedRunningAverageTest
Running org.apache.mahout.cf.taste.impl.common.FastByIDMapTest
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.271 sec - in 
org.apache.mahout.cf.taste.impl.common.FastByIDMapTest
Running org.apache.mahout.cf.taste.impl.common.RunningAverageTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.005 sec - in 
org.apache.mahout.cf.taste.impl.common.RunningAverageTest
Running org.apache.mahout.cf.taste.impl.common.RefreshHelperTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.074 sec - in 
org.apache.mahout.cf.taste.impl.common.RefreshHelperTest
Running org.apache.mahout.cf.taste.impl.common.FastIDSetTest
Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.24 sec - in 
org.apache.mahout.cf.taste.impl.common.FastIDSetTest
Running org.apache.mahout.cf.taste.impl.common.RunningAverageAndStdDevTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.061 sec - in 
org.apache.mahout.cf.taste.impl.common.RunningAverageAndStdDevTest
Running org.apache.mahout.cf.taste.impl.common.CacheTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.891 sec - in 
org.apache.mahout.cf.taste.impl.common.CacheTest
Running org.apache.mahout.cf.taste.impl.common.BitSetTest
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec - in 
org.apache.mahout.cf.taste.impl.common.BitSetTest
Running org.apache.mahout.cf.taste.impl.common.LongPrimitiveArrayIteratorTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec - in 
org.apache.mahout.cf.taste.impl.common.LongPrimitiveArrayIteratorTest
Running org.apache.mahout.cf.taste.impl.common.WeightedRunningAverageTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.008 sec - in 
org.apache.mahout.cf.taste.impl.common.WeightedRunningAverageTest
Running org.apache.mahout.cf.taste.impl.common.FastMapTest
Tests run: 14, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.404 sec - in 
org.apache.mahout.cf.taste.impl.common.FastMapTest
Running org.apache.mahout.cf.taste.impl.common.SamplingLongPrimitiveIteratorTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.932 sec - in 
org.apache.mahout.cf.taste.impl.common.SamplingLongPrimitiveIteratorTest
Running org.apache.mahout.cf.taste.impl.similarity.GenericItemSimilarityTest
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.087 sec - in 
org.apache.mahout.cf.taste.impl.similarity.GenericItemSimilarityTest
Running org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarityTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.099 sec - in 
org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarityTest
Running 
org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarityTest
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.116 sec - in 
org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarityTest
Running 
org.apache.mahout.cf.taste.impl.similarity.AveragingPreferenceInferrerTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.084 sec - in 
org.apache.mahout.cf.taste.impl.similarity.AveragingPreferenceInferrerTest
Running org.apache.mahout.cf.taste.impl.similarity.file.FileItemSimilarityTest
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.085 sec - in 
org.apache.mahout.cf.taste.impl.similarity.file.FileItemSimilarityTest
Running 
org.apache.mahout.cf.taste.impl.similarity.SpearmanCorrelationSimilarityTest
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.096 sec - in 
org.apache.mahout.cf.taste.impl.similarity.SpearmanCorrelationSimilarityTest
Running 
org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarityTest
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.111 sec - in 
org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarityTest
Running 
org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarityTest
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.106 sec - in 
org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarityTest
Running org.apache.mahout.cf.taste.impl.model.MemoryIDMigratorTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.024 sec - in 
org.apache.mahout.cf.taste.impl.model.MemoryIDMigratorTest
Running org.apache.mahout.cf.taste.impl.model.GenericDataModelTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.118 sec - in 
org.apache.mahout.cf.taste.impl.model.GenericDataModelTest
Running org.apache.mahout.cf.taste.impl.model.BooleanUserPreferenceArrayTest

Jenkins build is back to normal : Mahout-Quality #2155

2013-07-24 Thread Apache Jenkins Server
See 



Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #552

2013-07-24 Thread Apache Jenkins Server
See 

--
[...truncated 2171 lines...]
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/OpenLongObjectHashMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/OpenFloatObjectHashMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/OpenDoubleObjectHashMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractByteByteMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractByteCharMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractByteIntMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractByteShortMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractByteLongMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractByteFloatMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractByteDoubleMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractCharByteMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractCharCharMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractCharIntMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractCharShortMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractCharLongMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractCharFloatMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractCharDoubleMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractIntByteMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractIntCharMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractIntIntMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractIntShortMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahout/math/map/AbstractIntLongMap.java
[INFO] Writing to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/math/target/generated-sources/mahout/org/apache/mahou

0.8

2013-07-24 Thread Grant Ingersoll
0.8 artifacts are pushed to the mirror location.  I will send an official 
announcement tomorrow.

In the meantime, please review the release notes at: 
https://cwiki.apache.org/confluence/display/MAHOUT/Release+0.8

The new features/fixes section is pretty weak.

-Grant

[jira] [Updated] (MAHOUT-1284) DummyRecordWriter's bug with reused Writables

2013-07-24 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated MAHOUT-1284:


Fix Version/s: (was: 0.8)
   (was: 0.7)
   0.9

> DummyRecordWriter's bug with reused Writables
> -
>
> Key: MAHOUT-1284
> URL: https://issues.apache.org/jira/browse/MAHOUT-1284
> Project: Mahout
>  Issue Type: Bug
>Affects Versions: 0.7, 0.8
>Reporter: Maysam Yabandeh
>Priority: Minor
>  Labels: test
> Fix For: 0.9
>
> Attachments: MAHOUT-1284.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It is a recommended practice to reuse the Writable objects. 
> DummyRecordWriter, which is used for testing in Mahout, however keeps the 
> same Writable instance in a map: next time that the user reuses the Writable 
> object, the internal map of DummyRecordWriter changes as well. This makes 
> DummyRecordWriter fail for testing the MapReduce jobs that reuse the 
> Writables.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Regarding Online Recommenders

2013-07-24 Thread Gokhan Capan
Ok, I tested the MatrixBackedDataModel, and the heap size is reduced to 7G
for the Netflix Data, still large.

The same history is encoded in 2 SparseRowMatrices, one is row-indexed by
users and one is by item.

It has serious concurrency issues at several places, though (sets and
removes need to be thread-safe).

Best

Gokhan


On Sat, Jul 20, 2013 at 12:15 AM, Peng Cheng  wrote:

> Hi,
>
> Just one simple question: Is the 
> org.apache.mahout.math.**BinarySearch.binarySearch()
> function an optimized version of Arrays.binarySearch()? If it is not, why
> implement it again?
>
> Yours Peng
>
>
> On 13-07-17 06:31 PM, Sebastian Schelter wrote:
>
>> You are completely right, the simple interface would only be usable for
>> readonly / batch-updatable recommenders. Online recommenders might need
>> something different. I tried to widen the discussion here to discuss all
>> kinds of API changes in the recommenders that would be necessary in the
>> future.
>>
>>
>>
>> 2013/7/17 Peng Cheng 
>>
>>  One thing that suddenly comes to my mind is that, for a simple interface
>>> like FactorizablePreferences, maybe sequential READ in real time is
>>> possible, but sequential WRITE in O(1) time is Utopia. Because you need
>>> to
>>> flush out old preference with same user and item ID (in worst case it
>>> could
>>> be an interpolation search), otherwise you are permitting a user rating
>>> an
>>> item twice with different values. Considering how FileDataModel suppose
>>> to
>>> work (new files flush old files), maybe using the simple interface has
>>> less
>>> advantages than we used to believe.
>>>
>>>
>>> On 13-07-17 04:58 PM, Sebastian Schelter wrote:
>>>
>>>  Hi Peng,

 I never wanted to discard the old interface, I just wanted to split it
 up.
 I want to have a simple interface that only supports sequential access
 (and
 allows for very memory efficient implementions, e.g. by the use of
 primitive arrays). DataModel should *extend* this interface and provide
 sequential and random access (basically what is already does).

 Than a recommender such as SGD could state that it only needs sequential
 access to the preferences and you can either feed it a DataModel (so we
 don"t break backwards compatibility) or a memory efficient sequential
 access thingy.

 Does that make sense for you?


 2013/7/17 Peng Cheng 

   I see, OK so we shouldn't use the old implementation. But I mean, the
 old

> interface doesn't have to be discarded. The discrepancy between your
> FactorizablePreferences and DataModel is that, your model supports
> getPreferences(), which returns all preferences as an iterator, and
> DataModel supports a few old functions that returns preferences for an
> individual user or item.
>
> My point is that, it is not hard for each of them to implement what
> they
> lack of: old DataModel can implement getPreferences() just by a a loop
> in
> abstract class. Your new FactorizablePreferences can implement those
> old
> functions by a binary search that takes O(log n) time, or an
> interpolation
> search that takes O(log log n) time in average. So does the online
> update.
> It will just be a matter of different speed and space, but not
> different
> interface standard, we can use old unit tests, old examples, old
> everything. And we will be more flexible in writing ensemble
> recommender.
>
> Just a few thoughts, I'll have to validate the idea first before
> creating
> a new JIRA ticket.
>
> Yours Peng
>
>
>
> On 13-07-16 02:51 PM, Sebastian Schelter wrote:
>
>   I completely agree, Netflix is less than one gigabye in a smart
>
>> representation, 12x more memory is a nogo. The techniques used in
>> FactorizablePreferences allow a much more memory efficient
>> representation,
>> tested on KDD Music dataset which is approx 2.5 times Netflix and fits
>> into
>> 3GB with that approach.
>>
>>
>> 2013/7/16 Ted Dunning 
>>
>>Netflix is a small dataset.  12G for that seems quite excessive.
>>
>>  Note also that this is before you have done any work.
>>>
>>> Ideally, 100million observations should take << 1GB.
>>>
>>> On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng 
>>> wrote:
>>>
>>>The second idea is indeed splendid, we should separate
>>> time-complexity
>>>
>>>  first and space-complexity first implementation. What I'm not quite
 sure,
 is that if we really need to create two interfaces instead of one.
 Personally, I think 12G heap space is not that high right? Most new

   laptop

>>>   can already handle that (emphasis on laptop). And if we replace
>>> hash
>>>
 map
 (the culprit of high memory consumption) with list/linkedList