[jira] [Commented] (MAHOUT-1370) Vectordump doesn't write to output file in MapReduce Mode

2013-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838683#comment-13838683
 ] 

Hudson commented on MAHOUT-1370:


SUCCESS: Integrated in Mahout-Quality #2351 (See 
[https://builds.apache.org/job/Mahout-Quality/2351/])
MAHOUT-1370: Vectordump doesn't write to output file in MapReduce Mode 
(smarthi: rev 1547655)
* /mahout/trunk/CHANGELOG
* 
/mahout/trunk/integration/src/main/java/org/apache/mahout/utils/vectors/VectorDumper.java


> Vectordump doesn't write to output file in MapReduce Mode
> -
>
> Key: MAHOUT-1370
> URL: https://issues.apache.org/jira/browse/MAHOUT-1370
> Project: Mahout
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.7, 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 0.9
>
>
> When trying to run to run Vectordump in MR mode, get a 
> FileNotFoundException: No such File or Directory.
> {Code}
> 13/12/03 19:29:22 INFO vectors.VectorDumper: Output file: 
> /tmp/mahout-work-user/reuters-lda/vectordump
> Exception in thread "main" java.io.FileNotFoundException: 
> /tmp/mahout-work-user/reuters-lda/vectordump (No such file or directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:194)
>   at java.io.FileOutputStream.(FileOutputStream.java:145)
>   at com.google.common.io.Files.newWriter(Files.java:101)
>   at 
> org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:153)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at 
> org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:262)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>   at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>   at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> {Code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1347) Add Streaming K-Means clustering algorithm to examples/bin/cluster-reuters.sh

2013-12-03 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838684#comment-13838684
 ] 

Hudson commented on MAHOUT-1347:


SUCCESS: Integrated in Mahout-Quality #2351 (See 
[https://builds.apache.org/job/Mahout-Quality/2351/])
MAHOUT-1347: Added 'qualcluster' utility to get stats about quality of 
Streaming KMeans clustering (smarthi: rev 1547716)
* /mahout/trunk/examples/bin/cluster-reuters.sh
MAHOUT-1347: Added -ow flag for Streaming KMeans output (smarthi: rev 1547710)
* /mahout/trunk/examples/bin/cluster-reuters.sh


> Add Streaming K-Means clustering algorithm to examples/bin/cluster-reuters.sh
> -
>
> Key: MAHOUT-1347
> URL: https://issues.apache.org/jira/browse/MAHOUT-1347
> Project: Mahout
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 0.9
>
> Attachments: MAHOUT-1347.patch
>
>
> Add Streaming K-Means Clustering to examples/bin/cluster_reuters.sh



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (MAHOUT-1368) Convert OnlineSummarizer to use the new TDigest

2013-12-03 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838638#comment-13838638
 ] 

Suneel Marthi edited comment on MAHOUT-1368 at 12/4/13 6:07 AM:


Ted, we need to hold off on committing this patch until we fix the issue with 
ClusterQualitySummarizer which is broken after applying this patch.  I'll look 
at it tomorrow, its too late in the night now to wrap my head around it.

Running ClusterQualitySummarizer (after applying this patch) on output 
StreamingKMeans (using Reuters dataset) and it throws the following exception:-

{Code}
Average distance in cluster 0 [4]: 18723.469424
Average distance in cluster 1 [1169]: 13974.466645
Average distance in cluster 2 [1932]: 1273.335898
Exception in thread "main" java.lang.IllegalArgumentException
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
at org.apache.mahout.math.stats.TDigest.quantile(TDigest.java:268)
at 
org.apache.mahout.math.stats.OnlineSummarizer.getQuartile(OnlineSummarizer.java:83)
at 
org.apache.mahout.math.stats.OnlineSummarizer.getMax(OnlineSummarizer.java:79)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.printSummaries(ClusterQualitySummarizer.java:74)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.printSummaries(ClusterQualitySummarizer.java:66)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.run(ClusterQualitySummarizer.java:141)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.main(ClusterQualitySummarizer.java:281)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)

{Code}


was (Author: smarthi):
Ted, we need to hold off on committing this patch until we fix the issue with 
ClusterQualitySummarizer which is broken after applying this patch.  I'll look 
at it tomorrow, its too late in the night now to wrap my head around it.

Running ClusterQualitySummarizer (after applying this patch) on output 
StreamingKMeans and it throws the following exception:-

{Code}
Average distance in cluster 0 [4]: 18723.469424
Average distance in cluster 1 [1169]: 13974.466645
Average distance in cluster 2 [1932]: 1273.335898
Exception in thread "main" java.lang.IllegalArgumentException
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
at org.apache.mahout.math.stats.TDigest.quantile(TDigest.java:268)
at 
org.apache.mahout.math.stats.OnlineSummarizer.getQuartile(OnlineSummarizer.java:83)
at 
org.apache.mahout.math.stats.OnlineSummarizer.getMax(OnlineSummarizer.java:79)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.printSummaries(ClusterQualitySummarizer.java:74)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.printSummaries(ClusterQualitySummarizer.java:66)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.run(ClusterQualitySummarizer.java:141)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.main(ClusterQualitySummarizer.java:281)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)

{Code}

> Convert OnlineSummarizer to use the new TDigest
> ---
>
> Key: MAHOUT-1368
> URL: https://issues.apache.org/jira/browse/MAHOUT-1368
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Fix For: 0.9
>
> Attachments: MAHOUT-1368.patch
>
>
> The new TDigest provides better accuracy for quartile estimation as well as 
> producing any other quantile you might like.  The current quartile estimation 
> of the OnlineSummarizer fails for highly skewed distributions and can't 
> re

[jira] [Comment Edited] (MAHOUT-1368) Convert OnlineSummarizer to use the new TDigest

2013-12-03 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838638#comment-13838638
 ] 

Suneel Marthi edited comment on MAHOUT-1368 at 12/4/13 6:07 AM:


Ted, we need to hold off on committing this patch until we fix the issue with 
ClusterQualitySummarizer which is broken after applying this patch.  I'll look 
at it tomorrow, its too late in the night now to wrap my head around it.

Running ClusterQualitySummarizer (after applying this patch) on output 
StreamingKMeans and it throws the following exception:-

{Code}
Average distance in cluster 0 [4]: 18723.469424
Average distance in cluster 1 [1169]: 13974.466645
Average distance in cluster 2 [1932]: 1273.335898
Exception in thread "main" java.lang.IllegalArgumentException
at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
at org.apache.mahout.math.stats.TDigest.quantile(TDigest.java:268)
at 
org.apache.mahout.math.stats.OnlineSummarizer.getQuartile(OnlineSummarizer.java:83)
at 
org.apache.mahout.math.stats.OnlineSummarizer.getMax(OnlineSummarizer.java:79)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.printSummaries(ClusterQualitySummarizer.java:74)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.printSummaries(ClusterQualitySummarizer.java:66)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.run(ClusterQualitySummarizer.java:141)
at 
org.apache.mahout.clustering.streaming.tools.ClusterQualitySummarizer.main(ClusterQualitySummarizer.java:281)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)

{Code}


was (Author: smarthi):
Ted, we need to hold off on committing this patch until we fix the issue with 
ClusterQualitySummarizer which is broken after applying this patch.  I'll look 
at it tomorrow, too late in the night to wrap my head around the issue.

> Convert OnlineSummarizer to use the new TDigest
> ---
>
> Key: MAHOUT-1368
> URL: https://issues.apache.org/jira/browse/MAHOUT-1368
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Fix For: 0.9
>
> Attachments: MAHOUT-1368.patch
>
>
> The new TDigest provides better accuracy for quartile estimation as well as 
> producing any other quantile you might like.  The current quartile estimation 
> of the OnlineSummarizer fails for highly skewed distributions and can't 
> really be extended to provide other quantiles.  The TDigest handles all of 
> this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1368) Convert OnlineSummarizer to use the new TDigest

2013-12-03 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838638#comment-13838638
 ] 

Suneel Marthi commented on MAHOUT-1368:
---

Ted, we need to hold off on committing this patch until we fix the issue with 
ClusterQualitySummarizer which is broken after applying this patch.  I'll look 
at it tomorrow, too late in the night to wrap my head around the issue.

> Convert OnlineSummarizer to use the new TDigest
> ---
>
> Key: MAHOUT-1368
> URL: https://issues.apache.org/jira/browse/MAHOUT-1368
> Project: Mahout
>  Issue Type: Bug
>Reporter: Ted Dunning
> Fix For: 0.9
>
> Attachments: MAHOUT-1368.patch
>
>
> The new TDigest provides better accuracy for quartile estimation as well as 
> producing any other quantile you might like.  The current quartile estimation 
> of the OnlineSummarizer fails for highly skewed distributions and can't 
> really be extended to provide other quantiles.  The TDigest handles all of 
> this.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Jenkins build is back to normal : mahout-nightly #1428

2013-12-03 Thread Apache Jenkins Server
See 



Jenkins build is back to normal : mahout-nightly » Mahout Core #1428

2013-12-03 Thread Apache Jenkins Server
See 




[jira] [Updated] (MAHOUT-1242) No key redistribution function for associative maps

2013-12-03 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi updated MAHOUT-1242:
--

Affects Version/s: 0.7
   0.8
Fix Version/s: 0.9
 Assignee: Suneel Marthi

> No key redistribution function for associative maps
> ---
>
> Key: MAHOUT-1242
> URL: https://issues.apache.org/jira/browse/MAHOUT-1242
> Project: Mahout
>  Issue Type: Improvement
>  Components: collections, Math
>Affects Versions: 0.7, 0.8
>Reporter: Dawid Weiss
>Assignee: Suneel Marthi
> Fix For: 0.9
>
> Attachments: MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just 
> returns the key value:
> {code}
>   /**
>* Returns a hashcode for the specified value.
>*
>* @return a hash code value for the specified value.
>*/
>   public static int hash(int value) {
> return value;
> //return value * 0x278DDE6D; // see 
> org.apache.mahout.math.jet.random.engine.DRand
> /*
> value &= 0x7FFF; // make it >=0
> int hashCode = 0;
> do hashCode = 31*hashCode + value%10;
> while ((value /= 10) > 0);
> return 28629151*hashCode; // spread even further; h*31^5
> */
>   }
>  {code}
> This easily leads to very degenerate behavior on keys that have constant 
> lower bits (long collision chains). A simple (and strong) hash function like 
> the final step of murmurhash3 goes a long way at ensuring the keys 
> distribution is more uniform regardless of the input distribution.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Mahout 0.9 release

2013-12-03 Thread Suneel Marthi
JIRAs Update for 0.9 release:-

Wiki - Isabel, Sebastian and other volunteers
-
M-1245, M-1304, M-1305, M-1307, M-1326

Suneel
---
M-1319
M-1242 (Patch available to be committed to trunk)

Pat
---
M-1288 Solr Recommender

Yexi, Suneel
---
M-1265 - Multi Layer Perceptron

Stevo, Isabel
-
M-1366 

Andrew
--
M-1030, M-1349

Ted
--
M-1368 (Patch available to be committed to trunk)










On Sunday, December 1, 2013 7:57 AM, Suneel Marthi  
wrote:
 
Open JIRAs for 0.9 release :-

Wiki - Isabel, Sebastian and other volunteers
-

M-1245, M-1304, M-1305, M-1307, M-1326

Suneel
---
M-1319, M-1328

Pat
---
M-1288 Solr Recommender

Sebastian, Peng

M-1286

Yexi, Suneel
---
M-1265 - Multi Layer Perceptron
Ted, do u have cycles to review this, the patch's up on Reviewboard.

Stevo, Isabel
-
M-1366 - Please delete old releases from mirroring system
M-1345 - Enable Randomized testing for all modules

Andrew
--
M-1030

Open Issues (any takers for these ???)

M-1242 
M-1349 






On Friday, November 29, 2013 12:07 PM, Sebastian Schelter 
 wrote:
 
On 29.11.2013 17:59, Suneel Marthi wrote:
> Open JIRAs for 0.9:
> 
> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - related to 
> Wiki updates. 
> Definitely appreciate more hands here to review/update the wiki
> 
> M-1286 - Peng and
>  Sebastian, no updates on this. Can this be included in 0.9?

I will look into this over the weekend!


> 
> M-1030 - Andrew Musselman
> 
> M-1319, M-1328 -  Suneel
> 
> M-1347 - Suneel, patch has been committed to trunk.
> 
> M-1265 - I have been working with Yexi on this. Ted, would u have time to 
> review this; the code's on Reviewboard.
> 
> M-1288 - Sole Recommender, Pat Ferrel
> 
> M-1345: Isabel, Frank. I think we are good on this patch. Isabel, could u 
> commit this to trunk?
> 
> M-1312: Stevo, could u look at this?
> 
> M-1349: Any takers for this??
> 
> Others: Spectral Kmeans clustering documentation (Shannon)
> 
> 
> 
> 
> On Thursday, November 28, 2013 10:38 AM, Suneel Marthi 
>  wrote:
>  
> Adding Mahout-1349 to the list of JIRAs . 
> 
> 
> 
> 
> 
> On Thursday, November 28, 2013 10:37 AM, Suneel Marthi 
>  wrote:
>  
> Update on Open JIRAs for 0.9:
> 
> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - all related 
> to Wiki updates, please see Isabel's updates.
> 
>
 M-1286 - Peng and
>  Sebastian, we had talked about this during the last hangout. Can this be 
>included in 0.9?
> 
> M-1030- Andrew Musselman, its critical that we get this into 0.9, its been 
> deferred for last 2 Mahout releases.
> 
> M-1319, M-1328, M-1347, M-1350 - Suneel
> 
> 
> M-1265 - Multi Layer Perceptron, Yexi please look at my comments on 
> Reviewboard.
> 
> M-1273 - Kun Yung, Ted, defer this to next release ???
> 
> 
> 
> M-1312, M-1256 - Stevo, could u take one of them
> 
> 
> On Thursday, November 28, 2013 5:01 AM, Isabel Drost-Fromm 
>  wrote:
> 
> On Wed, 27 Nov 2013 14:23:11 -0800
>  (PST)
> Suneel Marthi  wrote:
>> Below are the Open issues for 0.9:-
> 
> This looks like we should be targeting Dec. 9th as code freeze to me.
> What do you all think?
> 
> 
>> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 - All
>> related to Wiki updates, missing Wiki documentation and Wiki
>> migration to new CMS.  Isabel's working on M-1245 (migrating to new
>> CMS). Could some of the others be consolidated with
 that?
> 
> I believe MAHOUT-1245 essentially is ready to be published - all I want
> before notifying INFRA to
> switch to the new cms based site is one other
> person to take at least a brief look.
> 
> For MAHOUT-1304 - Sebastian, can you please check that the cms based
> site actually does fit on 1280px? We can close this issue then.
> 
> MAHOUT-1305 - I think this should be turned into a task to actually
> delete most of the pages that have been migrated to the new CMS (almost
> all of them). Once 1245 is shipped, it would be great if a few more
> people could lend a hand in getting this done.
> 
> MAHOUT-1307 - Can be closed once switched to CMS
> 
> MAHOUT-1326 - This really relates to the old Confluence export plugin
> we once have been using to generate static pages out of our wiki that
> is no longer active. Unless anyone on the Mahout dev list
> knows how to
> fully
>  delete all exported static pages we should file an issue with
> INFRA to ask for help getting those deleted. They definitely are
> confusing to users.
> 
> 
> 
>> M-1286 - Peng and ssc, we had talked about this during the last
>> hangout. Can this be included in 0.9?
>>
>> M-1030 - Andrew Musselman? Any updates on this, its important that we
>> fix this for 0.9
>>
>> M-1319, M-1328,
>>   M-1347, M-1364 - Suneel
>>
>> M-1273 - Kun Yung, remember talking about th

[jira] [Commented] (MAHOUT-1354) Mahout Support for Hadoop 2

2013-12-03 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838418#comment-13838418
 ] 

Suneel Marthi commented on MAHOUT-1354:
---

Tested the patch on Hadoop 1.2.1 and worked fine for me.

> Mahout Support for Hadoop 2 
> 
>
> Key: MAHOUT-1354
> URL: https://issues.apache.org/jira/browse/MAHOUT-1354
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 1.0
>
> Attachments: MAHOUT-1354_initial.patch
>
>
> Mahout support for Hadoop , now that Hadoop 2 is official.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (MAHOUT-1370) Vectordump doesn't write to output file in MapReduce Mode

2013-12-03 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1370.
---

Resolution: Fixed

Patch committed to trunk.

> Vectordump doesn't write to output file in MapReduce Mode
> -
>
> Key: MAHOUT-1370
> URL: https://issues.apache.org/jira/browse/MAHOUT-1370
> Project: Mahout
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.7, 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 0.9
>
>
> When trying to run to run Vectordump in MR mode, get a 
> FileNotFoundException: No such File or Directory.
> {Code}
> 13/12/03 19:29:22 INFO vectors.VectorDumper: Output file: 
> /tmp/mahout-work-user/reuters-lda/vectordump
> Exception in thread "main" java.io.FileNotFoundException: 
> /tmp/mahout-work-user/reuters-lda/vectordump (No such file or directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:194)
>   at java.io.FileOutputStream.(FileOutputStream.java:145)
>   at com.google.common.io.Files.newWriter(Files.java:101)
>   at 
> org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:153)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at 
> org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:262)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>   at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>   at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> {Code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Work started] (MAHOUT-1370) Vectordump doesn't write to output file in MapReduce Mode

2013-12-03 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1370 started by Suneel Marthi.

> Vectordump doesn't write to output file in MapReduce Mode
> -
>
> Key: MAHOUT-1370
> URL: https://issues.apache.org/jira/browse/MAHOUT-1370
> Project: Mahout
>  Issue Type: Bug
>  Components: Integration
>Affects Versions: 0.7, 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 0.9
>
>
> When trying to run to run Vectordump in MR mode, get a 
> FileNotFoundException: No such File or Directory.
> {Code}
> 13/12/03 19:29:22 INFO vectors.VectorDumper: Output file: 
> /tmp/mahout-work-user/reuters-lda/vectordump
> Exception in thread "main" java.io.FileNotFoundException: 
> /tmp/mahout-work-user/reuters-lda/vectordump (No such file or directory)
>   at java.io.FileOutputStream.open(Native Method)
>   at java.io.FileOutputStream.(FileOutputStream.java:194)
>   at java.io.FileOutputStream.(FileOutputStream.java:145)
>   at com.google.common.io.Files.newWriter(Files.java:101)
>   at 
> org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:153)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at 
> org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:262)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>   at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>   at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> {Code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAHOUT-1370) Vectordump doesn't write to output file in MapReduce Mode

2013-12-03 Thread Suneel Marthi (JIRA)
Suneel Marthi created MAHOUT-1370:
-

 Summary: Vectordump doesn't write to output file in MapReduce Mode
 Key: MAHOUT-1370
 URL: https://issues.apache.org/jira/browse/MAHOUT-1370
 Project: Mahout
  Issue Type: Bug
  Components: Integration
Affects Versions: 0.8, 0.7
Reporter: Suneel Marthi
Assignee: Suneel Marthi
Priority: Minor
 Fix For: 0.9


When trying to run to run Vectordump in MR mode, get a 
FileNotFoundException: No such File or Directory.

{Code}

13/12/03 19:29:22 INFO vectors.VectorDumper: Output file: 
/tmp/mahout-work-user/reuters-lda/vectordump
Exception in thread "main" java.io.FileNotFoundException: 
/tmp/mahout-work-user/reuters-lda/vectordump (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:194)
at java.io.FileOutputStream.(FileOutputStream.java:145)
at com.google.common.io.Files.newWriter(Files.java:101)
at 
org.apache.mahout.utils.vectors.VectorDumper.run(VectorDumper.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at 
org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:262)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)


{Code}




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (MAHOUT-1256) Improve the CSV handling code to get vectors

2013-12-03 Thread Dan Filimon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Filimon resolved MAHOUT-1256.
-

   Resolution: Won't Fix
Fix Version/s: 0.9

Too vague, have a patch lying around to improve reading CSV files as vectors, 
but it's untested and quite hacky.
Since nobody wanted this and we have an upcoming release, dropping this 
completely.

> Improve the CSV handling code to get vectors
> 
>
> Key: MAHOUT-1256
> URL: https://issues.apache.org/jira/browse/MAHOUT-1256
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Dan Filimon
>Assignee: Dan Filimon
>Priority: Minor
> Fix For: 0.9
>
>
> Minor additions to iterate through a CSV file directly (as long as it's only 
> numbers).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Work stopped] (MAHOUT-1256) Improve the CSV handling code to get vectors

2013-12-03 Thread Dan Filimon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1256 stopped by Dan Filimon.

> Improve the CSV handling code to get vectors
> 
>
> Key: MAHOUT-1256
> URL: https://issues.apache.org/jira/browse/MAHOUT-1256
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Dan Filimon
>Assignee: Dan Filimon
>Priority: Minor
>
> Minor additions to iterate through a CSV file directly (as long as it's only 
> numbers).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Work started] (MAHOUT-1256) Improve the CSV handling code to get vectors

2013-12-03 Thread Dan Filimon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAHOUT-1256 started by Dan Filimon.

> Improve the CSV handling code to get vectors
> 
>
> Key: MAHOUT-1256
> URL: https://issues.apache.org/jira/browse/MAHOUT-1256
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Dan Filimon
>Assignee: Dan Filimon
>Priority: Minor
>
> Minor additions to iterate through a CSV file directly (as long as it's only 
> numbers).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAHOUT-1030) Regression: Clustered Points Should be WeightedPropertyVectorWritable not WeightedVectorWritable

2013-12-03 Thread Andrew Musselman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Musselman updated MAHOUT-1030:
-

Attachment: MAHOUT-1030.patch

Changing property name from "distance" to "distance-squared".

> Regression: Clustered Points Should be WeightedPropertyVectorWritable not 
> WeightedVectorWritable
> 
>
> Key: MAHOUT-1030
> URL: https://issues.apache.org/jira/browse/MAHOUT-1030
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering, Integration
>Affects Versions: 0.7
>Reporter: Jeff Eastman
>Assignee: Andrew Musselman
> Fix For: 1.0, 0.9
>
> Attachments: MAHOUT-1030.patch, MAHOUT-1030.patch, MAHOUT-1030.patch, 
> MAHOUT-1030.patch, MAHOUT-1030.patch
>
>
> Looks like this won't make it into this build. Pretty widespread impact on 
> code and tests and I don't know which properties were implemented in the old 
> version. I will create a JIRA and post my interim results.
> On 6/8/12 12:21 PM, Jeff Eastman wrote:
> > That's a reversion that evidently got in when the new 
> > ClusterClassificationDriver was introduced. It should be a pretty easy fix 
> > and I will see if I can make the change before Paritosh cuts the release 
> > bits tonight.
> >
> > On 6/7/12 1:00 PM, Pat Ferrel wrote:
> >> It appears that in kmeans the clusteredPoints are now written as 
> >> WeightedVectorWritable where in mahout 0.6 they were 
> >> WeightedPropertyVectorWritable? This means that the distance from the 
> >> centroid is no longer stored here? Why? I hope I'm wrong because that is 
> >> not a welcome change. How is one to order clustered docs by distance from 
> >> cluster centroid?
> >>
> >> I'm sure I could calculate the distance but that would mean looking up the 
> >> centroid for the cluster id given in the above WeightedVectorWritable, 
> >> which means iterating through all the clusters for each clustered doc. In 
> >> my case the number of clusters could be fairly large.
> >>
> >> Am I missing something?
> >>
> >>
> >



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Mahout 0.9 release

2013-12-03 Thread Andrew Musselman
Awesome, thanks


On Tue, Dec 3, 2013 at 1:49 PM, Suneel Marthi wrote:

>
>
> Phew. I see the problem. Change https to http in the base directory and
> that should do it.
>
>
> Change the base directory to:
> http://svn.apache.org/repos/asf/mahout/trunk
>
> I don't know why, could be because your patch was generated from
> http://svn.apache.org/repos/asf/mahout/trunk.
>
>
>
> On Tuesday, December 3, 2013 4:39 PM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> Yeah
>
>
> > On Dec 3, 2013, at 1:36 PM, Suneel Marthi 
> wrote:
> >
> > Andrew,
> >
> > Let me try this. Is it the patch for M-1030 that u r trying to get onto
> reviewboard?
> >
> >
> >
> >
> >
> > On Tuesday, December 3, 2013 3:14 PM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
> >
> > Must be missing something in the reviewboard workflow; keep getting this
> > error when I use mahout/trun or the full URL as Base Directory:  The
> file '
> >
> https://svn.apache.org/repos/asf/mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/JsonClusterWriter.java
> '
> > (r1546394) could not be found in the repository
> >
> > But the rev I have locally is newer than that rev.
> >
> > $ svn info
> > Path: .
> > URL: http://svn.apache.org/repos/asf/mahout/trunk
> > Repository Root: http://svn.apache.org/repos/asf
> > Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> > Revision: 1546876
> > Node Kind: directory
> > Schedule: normal
> > Last Changed Author: smarthi
> > Last Changed Rev: 1546869
> > Last Changed Date: 2013-12-01 15:35:30 -0800 (Sun, 01 Dec 2013)
> >
> >
> >
> >> On Mon, Dec 2, 2013 at 6:53 AM, Yexi Jiang  wrote:
> >>
> >> I used the base as: https://svn.apache.org/repos/asf/mahout/trunk
> >>
> >>
> >> 2013/12/2 Suneel Marthi 
> >>
> >>> Its been a while since I last did it, I think the Base Directory needs
> to
> >>> be mahout/trunk.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Monday, December 2, 2013 1:17 AM, Andrew Musselman <
> >>> andrew.mussel...@gmail.com> wrote:
> >>>
> >>> Any tips on submitting to reviewboard for mahout?  I tried selecting
> repo
> >>> mahout and didn't know which base directory to use, and then used
> >>> mahout-git and wasn't able to use the patch I made via subversion.
> >>>
> >>>
> >>>
> >>> On Sun, Dec 1, 2013 at 2:56 PM, Suneel Marthi   wrote:
> >>>
>  Here's the link to Reviewboard
> 
>  https://reviews.apache.org
> 
> 
> 
> 
> 
> 
>  On Sunday, December 1, 2013 1:51 PM, Andrew Musselman <
>  andrew.mussel...@gmail.com> wrote:
> 
>  No, just reviewboard in general; never put any patches up before.
> 
> 
> >> On Dec 1, 2013, at 8:52 AM, Suneel Marthi 
> > wrote:
> >
> > For M-1349??  There's no patch for this, no one's worked on it yet.
> >
> >
> >
> >> On Sunday, December 1, 2013 11:50 AM, Andrew Musselman <
> > andrew.mussel...@gmail.com> wrote:
> > I will look at M-1349 since I'm in there.
> >
> > Where's the Reviewboard.
> >
> >
> > On Sun, Dec 1, 2013 at 4:57 AM, Suneel Marthi <
> >> suneel_mar...@yahoo.com
> 
>  wrote:
> > Open JIRAs for 0.9 release :-
> >
> > Wiki - Isabel, Sebastian and other volunteers
> > -
> >
> > M-1245, M-1304, M-1305, M-1307, M-1326
> >
> > Suneel
> > ---
> > M-1319, M-1328
> >
> > Pat
> > ---
> > M-1288 Solr Recommender
> >
> > Sebastian, Peng
> > 
> > M-1286
> >
> > Yexi, Suneel
> > ---
> > M-1265 - Multi Layer Perceptron
> > Ted, do u have cycles to review this, the patch's up on Reviewboard.
> >
> > Stevo, Isabel
> > -
> > M-1366 - Please delete old releases from mirroring system
> > M-1345 - Enable Randomized testing for all modules
> >
> > Andrew
> > --
> > M-1030
> >
> > Open Issues (any takers for these ???)
> > 
> > M-1242
> > M-1349
> >
> >
> >
> >
> >
> >
> >> On Friday, November 29, 2013 12:07 PM, Sebastian Schelter <
> > ssc.o...@googlemail.com> wrote:
> >
> >> On 29.11.2013 17:59, Suneel Marthi wrote:
> >> Open JIRAs for 0.9:
> >>
> >> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 -
>  related to Wiki updates.
> >> Definitely appreciate more hands here to review/update the wiki
> >>
> >> M-1286 - Peng and
> >>   Sebastian, no updates on this. Can this be included in 0.9?
> >
> > I will look into this over the weekend!
> >
> >
> >>
> >> M-1030 - Andrew Musselman
> >>
> >> M-1319, M-1328 -  Suneel
> >>
> >> M-1347 - Suneel, patch has been committed to trunk.
> >>
> >> M-1265 - I have been working with Yexi on this. Ted, would u have
> >>> time
>  to revie

Re: Mahout 0.9 release

2013-12-03 Thread Suneel Marthi


Phew. I see the problem. Change https to http in the base directory and that 
should do it.


Change the base directory to: 
http://svn.apache.org/repos/asf/mahout/trunk

I don't know why, could be because your patch was generated from 
http://svn.apache.org/repos/asf/mahout/trunk.



On Tuesday, December 3, 2013 4:39 PM, Andrew Musselman 
 wrote:
 
Yeah


> On Dec 3, 2013, at 1:36 PM, Suneel Marthi  wrote:
> 
> Andrew, 
> 
> Let me try this. Is it the patch for M-1030 that u r trying to get onto 
> reviewboard?
> 
> 
> 
> 
> 
> On Tuesday, December 3, 2013 3:14 PM, Andrew Musselman 
>  wrote:
> 
> Must be missing something in the reviewboard workflow; keep getting this
> error when I use mahout/trun or the full URL as Base Directory:  The file '
> https://svn.apache.org/repos/asf/mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/JsonClusterWriter.java'
> (r1546394) could not be found in the repository
> 
> But the rev I have locally is newer than that rev.
> 
> $ svn info
> Path: .
> URL: http://svn.apache.org/repos/asf/mahout/trunk
> Repository Root: http://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 1546876
> Node Kind: directory
> Schedule: normal
> Last Changed Author: smarthi
> Last Changed Rev: 1546869
> Last Changed Date: 2013-12-01 15:35:30 -0800 (Sun, 01 Dec 2013)
> 
> 
> 
>> On Mon, Dec 2, 2013 at 6:53 AM, Yexi Jiang  wrote:
>> 
>> I used the base as: https://svn.apache.org/repos/asf/mahout/trunk
>> 
>> 
>> 2013/12/2 Suneel Marthi 
>> 
>>> Its been a while since I last did it, I think the Base Directory needs to
>>> be mahout/trunk.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Monday, December 2, 2013 1:17 AM, Andrew Musselman <
>>> andrew.mussel...@gmail.com> wrote:
>>> 
>>> Any tips on submitting to reviewboard for mahout?  I tried selecting repo
>>> mahout and didn't know which base directory to use, and then used
>>> mahout-git and wasn't able to use the patch I made via subversion.
>>> 
>>> 
>>> 
>>> On Sun, Dec 1, 2013 at 2:56 PM, Suneel Marthi >>> wrote:
>>> 
 Here's the link to Reviewboard
 
 https://reviews.apache.org
 
 
 
 
 
 
 On Sunday, December 1, 2013 1:51 PM, Andrew Musselman <
 andrew.mussel...@gmail.com> wrote:
 
 No, just reviewboard in general; never put any patches up before.
 
 
>> On Dec 1, 2013, at 8:52 AM, Suneel Marthi 
> wrote:
> 
> For M-1349??  There's no patch for this, no one's worked on it yet.
> 
> 
> 
>> On Sunday, December 1, 2013 11:50 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
> I will look at M-1349 since I'm in there.
> 
> Where's the Reviewboard.
> 
> 
> On Sun, Dec 1, 2013 at 4:57 AM, Suneel Marthi <
>> suneel_mar...@yahoo.com
 
 wrote:
> Open JIRAs for 0.9 release :-
> 
> Wiki - Isabel, Sebastian and other volunteers
> -
> 
> M-1245, M-1304, M-1305, M-1307, M-1326
> 
> Suneel
> ---
> M-1319, M-1328
> 
> Pat
> ---
> M-1288 Solr Recommender
> 
> Sebastian, Peng
> 
> M-1286
> 
> Yexi, Suneel
> ---
> M-1265 - Multi Layer Perceptron
> Ted, do u have cycles to review this, the patch's up on Reviewboard.
> 
> Stevo, Isabel
> -
> M-1366 - Please delete old releases from mirroring system
> M-1345 - Enable Randomized testing for all modules
> 
> Andrew
> --
> M-1030
> 
> Open Issues (any takers for these ???)
> 
> M-1242
> M-1349
> 
> 
> 
> 
> 
> 
>> On Friday, November 29, 2013 12:07 PM, Sebastian Schelter <
> ssc.o...@googlemail.com> wrote:
> 
>> On 29.11.2013 17:59, Suneel Marthi wrote:
>> Open JIRAs for 0.9:
>> 
>> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 -
 related to Wiki updates.
>> Definitely appreciate more hands here to review/update the wiki
>> 
>> M-1286 - Peng and
>>   Sebastian, no updates on this. Can this be included in 0.9?
> 
> I will look into this over the weekend!
> 
> 
>> 
>> M-1030 - Andrew Musselman
>> 
>> M-1319, M-1328 -  Suneel
>> 
>> M-1347 - Suneel, patch has been committed to trunk.
>> 
>> M-1265 - I have been working with Yexi on this. Ted, would u have
>>> time
 to review this; the code's on Reviewboard.
>> 
>> M-1288 - Sole Recommender, Pat Ferrel
>> 
>> M-1345: Isabel, Frank. I think we are good on this patch. Isabel,
 could u commit this to trunk?
>> 
>> M-1312: Stevo, could u look at this?
>> 
>> M-1349: Any takers for this??
>> 
>> Others: Spectral Kmeans clustering documentation (Shannon)
>> 
>> 
>> 
>> 
>> On

Re: Mahout 0.9 release

2013-12-03 Thread Andrew Musselman
Yeah

> On Dec 3, 2013, at 1:36 PM, Suneel Marthi  wrote:
> 
> Andrew, 
> 
> Let me try this. Is it the patch for M-1030 that u r trying to get onto 
> reviewboard?
> 
> 
> 
> 
> 
> On Tuesday, December 3, 2013 3:14 PM, Andrew Musselman 
>  wrote:
> 
> Must be missing something in the reviewboard workflow; keep getting this
> error when I use mahout/trun or the full URL as Base Directory:  The file '
> https://svn.apache.org/repos/asf/mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/JsonClusterWriter.java'
> (r1546394) could not be found in the repository
> 
> But the rev I have locally is newer than that rev.
> 
> $ svn info
> Path: .
> URL: http://svn.apache.org/repos/asf/mahout/trunk
> Repository Root: http://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 1546876
> Node Kind: directory
> Schedule: normal
> Last Changed Author: smarthi
> Last Changed Rev: 1546869
> Last Changed Date: 2013-12-01 15:35:30 -0800 (Sun, 01 Dec 2013)
> 
> 
> 
>> On Mon, Dec 2, 2013 at 6:53 AM, Yexi Jiang  wrote:
>> 
>> I used the base as: https://svn.apache.org/repos/asf/mahout/trunk
>> 
>> 
>> 2013/12/2 Suneel Marthi 
>> 
>>> Its been a while since I last did it, I think the Base Directory needs to
>>> be mahout/trunk.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Monday, December 2, 2013 1:17 AM, Andrew Musselman <
>>> andrew.mussel...@gmail.com> wrote:
>>> 
>>> Any tips on submitting to reviewboard for mahout?  I tried selecting repo
>>> mahout and didn't know which base directory to use, and then used
>>> mahout-git and wasn't able to use the patch I made via subversion.
>>> 
>>> 
>>> 
>>> On Sun, Dec 1, 2013 at 2:56 PM, Suneel Marthi >>> wrote:
>>> 
 Here's the link to Reviewboard
 
 https://reviews.apache.org
 
 
 
 
 
 
 On Sunday, December 1, 2013 1:51 PM, Andrew Musselman <
 andrew.mussel...@gmail.com> wrote:
 
 No, just reviewboard in general; never put any patches up before.
 
 
>> On Dec 1, 2013, at 8:52 AM, Suneel Marthi 
> wrote:
> 
> For M-1349??  There's no patch for this, no one's worked on it yet.
> 
> 
> 
>> On Sunday, December 1, 2013 11:50 AM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
> I will look at M-1349 since I'm in there.
> 
> Where's the Reviewboard.
> 
> 
> On Sun, Dec 1, 2013 at 4:57 AM, Suneel Marthi <
>> suneel_mar...@yahoo.com
 
 wrote:
> Open JIRAs for 0.9 release :-
> 
> Wiki - Isabel, Sebastian and other volunteers
> -
> 
> M-1245, M-1304, M-1305, M-1307, M-1326
> 
> Suneel
> ---
> M-1319, M-1328
> 
> Pat
> ---
> M-1288 Solr Recommender
> 
> Sebastian, Peng
> 
> M-1286
> 
> Yexi, Suneel
> ---
> M-1265 - Multi Layer Perceptron
> Ted, do u have cycles to review this, the patch's up on Reviewboard.
> 
> Stevo, Isabel
> -
> M-1366 - Please delete old releases from mirroring system
> M-1345 - Enable Randomized testing for all modules
> 
> Andrew
> --
> M-1030
> 
> Open Issues (any takers for these ???)
> 
> M-1242
> M-1349
> 
> 
> 
> 
> 
> 
>> On Friday, November 29, 2013 12:07 PM, Sebastian Schelter <
> ssc.o...@googlemail.com> wrote:
> 
>> On 29.11.2013 17:59, Suneel Marthi wrote:
>> Open JIRAs for 0.9:
>> 
>> Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 -
 related to Wiki updates.
>> Definitely appreciate more hands here to review/update the wiki
>> 
>> M-1286 - Peng and
>>   Sebastian, no updates on this. Can this be included in 0.9?
> 
> I will look into this over the weekend!
> 
> 
>> 
>> M-1030 - Andrew Musselman
>> 
>> M-1319, M-1328 -  Suneel
>> 
>> M-1347 - Suneel, patch has been committed to trunk.
>> 
>> M-1265 - I have been working with Yexi on this. Ted, would u have
>>> time
 to review this; the code's on Reviewboard.
>> 
>> M-1288 - Sole Recommender, Pat Ferrel
>> 
>> M-1345: Isabel, Frank. I think we are good on this patch. Isabel,
 could u commit this to trunk?
>> 
>> M-1312: Stevo, could u look at this?
>> 
>> M-1349: Any takers for this??
>> 
>> Others: Spectral Kmeans clustering documentation (Shannon)
>> 
>> 
>> 
>> 
>> On Thursday, November 28, 2013 10:38 AM, Suneel Marthi <
 suneel_mar...@yahoo.com> wrote:
>> 
>> Adding Mahout-1349 to the list of JIRAs .
>> 
>> 
>> 
>> 
>> 
>> On Thursday, November 28, 2013 10:37 AM, Suneel Marthi <
 suneel_mar...@yahoo.com> wrote:
>> 
>> Update on Open JIRAs for 0.9:
>> 
>> Mahou

Re: Mahout 0.9 release

2013-12-03 Thread Suneel Marthi
Andrew, 

Let me try this. Is it the patch for M-1030 that u r trying to get onto 
reviewboard?





On Tuesday, December 3, 2013 3:14 PM, Andrew Musselman 
 wrote:
 
Must be missing something in the reviewboard workflow; keep getting this
error when I use mahout/trun or the full URL as Base Directory:  The file '
https://svn.apache.org/repos/asf/mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/JsonClusterWriter.java'
(r1546394) could not be found in the repository

But the rev I have locally is newer than that rev.

$ svn info
Path: .
URL: http://svn.apache.org/repos/asf/mahout/trunk
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 1546876
Node Kind: directory
Schedule: normal
Last Changed Author: smarthi
Last Changed Rev: 1546869
Last Changed Date: 2013-12-01 15:35:30 -0800 (Sun, 01 Dec 2013)



On Mon, Dec 2, 2013 at 6:53 AM, Yexi Jiang  wrote:

> I used the base as: https://svn.apache.org/repos/asf/mahout/trunk
>
>
> 2013/12/2 Suneel Marthi 
>
> > Its been a while since I last did it, I think the Base Directory needs to
> > be mahout/trunk.
> >
> >
> >
> >
> >
> > On Monday, December 2, 2013 1:17 AM, Andrew Musselman <
> > andrew.mussel...@gmail.com> wrote:
> >
> > Any tips on submitting to reviewboard for mahout?  I tried selecting repo
> > mahout and didn't know which base directory to use, and then used
> > mahout-git and wasn't able to use the patch I made via subversion.
> >
> >
> >
> > On Sun, Dec 1, 2013 at 2:56 PM, Suneel Marthi  > >wrote:
> >
> > > Here's the link to Reviewboard
> > >
> > > https://reviews.apache.org
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Sunday, December 1, 2013 1:51 PM, Andrew Musselman <
> > > andrew.mussel...@gmail.com> wrote:
> > >
> > > No, just reviewboard in general; never put any patches up before.
> > >
> > >
> > > > On Dec 1, 2013, at 8:52 AM, Suneel Marthi 
> > > wrote:
> > > >
> > > > For M-1349??  There's no patch for this, no one's worked on it yet.
> > > >
> > > >
> > > >
> > > > On Sunday, December 1, 2013 11:50 AM, Andrew Musselman <
> > > andrew.mussel...@gmail.com> wrote:
> > > > I will look at M-1349 since I'm in there.
> > > >
> > > > Where's the Reviewboard.
> > > >
> > > >
> > > > On Sun, Dec 1, 2013 at 4:57 AM, Suneel Marthi <
> suneel_mar...@yahoo.com
> > >
> > > wrote:
> > > > Open JIRAs for 0.9 release :-
> > > >
> > > > Wiki - Isabel, Sebastian and other volunteers
> > > > -
> > > >
> > > > M-1245, M-1304, M-1305, M-1307, M-1326
> > > >
> > > > Suneel
> > > > ---
> > > > M-1319, M-1328
> > > >
> > > > Pat
> > > > ---
> > > > M-1288 Solr Recommender
> > > >
> > > > Sebastian, Peng
> > > > 
> > > > M-1286
> > > >
> > > > Yexi, Suneel
> > > > ---
> > > > M-1265 - Multi Layer Perceptron
> > > > Ted, do u have cycles to review this, the patch's up on Reviewboard.
> > > >
> > > > Stevo, Isabel
> > > > -
> > > > M-1366 - Please delete old releases from mirroring system
> > > > M-1345 - Enable Randomized testing for all modules
> > > >
> > > > Andrew
> > > > --
> > > > M-1030
> > > >
> > > > Open Issues (any takers for these ???)
> > > > 
> > > > M-1242
> > > > M-1349
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Friday, November 29, 2013 12:07 PM, Sebastian Schelter <
> > > ssc.o...@googlemail.com> wrote:
> > > >
> > > > On 29.11.2013 17:59, Suneel Marthi wrote:
> > > > > Open JIRAs for 0.9:
> > > > >
> > > > > Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 -
> > > related to Wiki updates.
> > > > > Definitely appreciate more hands here to review/update the wiki
> > > > >
> > > > > M-1286 - Peng and
> > > > >  Sebastian, no updates on this. Can this be included in 0.9?
> > > >
> > > > I will look into this over the weekend!
> > > >
> > > >
> > > > >
> > > > > M-1030 - Andrew Musselman
> > > > >
> > > > > M-1319, M-1328 -  Suneel
> > > > >
> > > > > M-1347 - Suneel, patch has been committed to trunk.
> > > > >
> > > > > M-1265 - I have been working with Yexi on this. Ted, would u have
> > time
> > > to review this; the code's on Reviewboard.
> > > > >
> > > > > M-1288 - Sole Recommender, Pat Ferrel
> > > > >
> > > > > M-1345: Isabel, Frank. I think we are good on this patch. Isabel,
> > > could u commit this to trunk?
> > > > >
> > > > > M-1312: Stevo, could u look at this?
> > > > >
> > > > > M-1349: Any takers for this??
> > > > >
> > > > > Others: Spectral Kmeans clustering documentation (Shannon)
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thursday, November 28, 2013 10:38 AM, Suneel Marthi <
> > > suneel_mar...@yahoo.com> wrote:
> > > > >
> > > > > Adding Mahout-1349 to the list of JIRAs .
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thursday, November 28, 2013 10:37 AM, Suneel Marthi <
> > > suneel_mar...@yahoo.com> wrote:
> > > > >
> > > >

Re: Mahout 0.9 release

2013-12-03 Thread Andrew Musselman
Must be missing something in the reviewboard workflow; keep getting this
error when I use mahout/trun or the full URL as Base Directory:  The file '
https://svn.apache.org/repos/asf/mahout/trunk/integration/src/main/java/org/apache/mahout/utils/clustering/JsonClusterWriter.java'
(r1546394) could not be found in the repository

But the rev I have locally is newer than that rev.

$ svn info
Path: .
URL: http://svn.apache.org/repos/asf/mahout/trunk
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 1546876
Node Kind: directory
Schedule: normal
Last Changed Author: smarthi
Last Changed Rev: 1546869
Last Changed Date: 2013-12-01 15:35:30 -0800 (Sun, 01 Dec 2013)



On Mon, Dec 2, 2013 at 6:53 AM, Yexi Jiang  wrote:

> I used the base as: https://svn.apache.org/repos/asf/mahout/trunk
>
>
> 2013/12/2 Suneel Marthi 
>
> > Its been a while since I last did it, I think the Base Directory needs to
> > be mahout/trunk.
> >
> >
> >
> >
> >
> > On Monday, December 2, 2013 1:17 AM, Andrew Musselman <
> > andrew.mussel...@gmail.com> wrote:
> >
> > Any tips on submitting to reviewboard for mahout?  I tried selecting repo
> > mahout and didn't know which base directory to use, and then used
> > mahout-git and wasn't able to use the patch I made via subversion.
> >
> >
> >
> > On Sun, Dec 1, 2013 at 2:56 PM, Suneel Marthi  > >wrote:
> >
> > > Here's the link to Reviewboard
> > >
> > > https://reviews.apache.org
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Sunday, December 1, 2013 1:51 PM, Andrew Musselman <
> > > andrew.mussel...@gmail.com> wrote:
> > >
> > > No, just reviewboard in general; never put any patches up before.
> > >
> > >
> > > > On Dec 1, 2013, at 8:52 AM, Suneel Marthi 
> > > wrote:
> > > >
> > > > For M-1349??  There's no patch for this, no one's worked on it yet.
> > > >
> > > >
> > > >
> > > > On Sunday, December 1, 2013 11:50 AM, Andrew Musselman <
> > > andrew.mussel...@gmail.com> wrote:
> > > > I will look at M-1349 since I'm in there.
> > > >
> > > > Where's the Reviewboard.
> > > >
> > > >
> > > > On Sun, Dec 1, 2013 at 4:57 AM, Suneel Marthi <
> suneel_mar...@yahoo.com
> > >
> > > wrote:
> > > > Open JIRAs for 0.9 release :-
> > > >
> > > > Wiki - Isabel, Sebastian and other volunteers
> > > > -
> > > >
> > > > M-1245, M-1304, M-1305, M-1307, M-1326
> > > >
> > > > Suneel
> > > > ---
> > > > M-1319, M-1328
> > > >
> > > > Pat
> > > > ---
> > > > M-1288 Solr Recommender
> > > >
> > > > Sebastian, Peng
> > > > 
> > > > M-1286
> > > >
> > > > Yexi, Suneel
> > > > ---
> > > > M-1265 - Multi Layer Perceptron
> > > > Ted, do u have cycles to review this, the patch's up on Reviewboard.
> > > >
> > > > Stevo, Isabel
> > > > -
> > > > M-1366 - Please delete old releases from mirroring system
> > > > M-1345 - Enable Randomized testing for all modules
> > > >
> > > > Andrew
> > > > --
> > > > M-1030
> > > >
> > > > Open Issues (any takers for these ???)
> > > > 
> > > > M-1242
> > > > M-1349
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Friday, November 29, 2013 12:07 PM, Sebastian Schelter <
> > > ssc.o...@googlemail.com> wrote:
> > > >
> > > > On 29.11.2013 17:59, Suneel Marthi wrote:
> > > > > Open JIRAs for 0.9:
> > > > >
> > > > > Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 -
> > > related to Wiki updates.
> > > > > Definitely appreciate more hands here to review/update the wiki
> > > > >
> > > > > M-1286 - Peng and
> > > > >  Sebastian, no updates on this. Can this be included in 0.9?
> > > >
> > > > I will look into this over the weekend!
> > > >
> > > >
> > > > >
> > > > > M-1030 - Andrew Musselman
> > > > >
> > > > > M-1319, M-1328 -  Suneel
> > > > >
> > > > > M-1347 - Suneel, patch has been committed to trunk.
> > > > >
> > > > > M-1265 - I have been working with Yexi on this. Ted, would u have
> > time
> > > to review this; the code's on Reviewboard.
> > > > >
> > > > > M-1288 - Sole Recommender, Pat Ferrel
> > > > >
> > > > > M-1345: Isabel, Frank. I think we are good on this patch. Isabel,
> > > could u commit this to trunk?
> > > > >
> > > > > M-1312: Stevo, could u look at this?
> > > > >
> > > > > M-1349: Any takers for this??
> > > > >
> > > > > Others: Spectral Kmeans clustering documentation (Shannon)
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thursday, November 28, 2013 10:38 AM, Suneel Marthi <
> > > suneel_mar...@yahoo.com> wrote:
> > > > >
> > > > > Adding Mahout-1349 to the list of JIRAs .
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thursday, November 28, 2013 10:37 AM, Suneel Marthi <
> > > suneel_mar...@yahoo.com> wrote:
> > > > >
> > > > > Update on Open JIRAs for 0.9:
> > > > >
> > > > > Mahout-1245, Mahout-1304, Mahout-1305, Mahout-1307, Mahout-1326 -
> all
> > > related to Wiki updates, please see 

Re: Welcome to Frank Scholten as new Mahout committer

2013-12-03 Thread Frank Scholten
Thanks for the kind words Isabel!

A bit about myself: I live in Utrecht, the Netherlands, have a CS master
from University of Twente and I work at Trifork Amsterdam, doing OSS Java
software development. I like automation so I am interested in things like
ML but also continuous integration and configuration management.

It's quite an honor to be part of this project with amazingly smart people.
I am looking forward to continuing work on Mahout, especially in there area
of integration though I also want to dive more into certain algorithms and
explore parts of the code I haven't looked at yet.

Thank you!

Cheers,

Frank






On Tue, Dec 3, 2013 at 5:32 PM, Andrew Musselman  wrote:

> Welcome Frank; congratulations!
>
>
> On Tue, Dec 3, 2013 at 8:31 AM, Stevo Slavić  wrote:
>
> > Congrats and welcome Frank!
> >
> >
> > On Tue, Dec 3, 2013 at 2:34 PM, Gokhan Capan  wrote:
> >
> > > Congratulations, Frank!
> > >
> > > Gokhan
> > >
> > >
> > > On Tue, Dec 3, 2013 at 3:27 PM, Isabel Drost-Fromm  > > >wrote:
> > >
> > > >
> > > > Hi,
> > > >
> > > > this is to announce that the Project Management Committee (PMC) for
> > > Apache
> > > > Mahout has asked Frank Scholten to become committer and we are
> pleased
> > to
> > > > announce that he has accepted.
> > > >
> > > > Being a committer enables easier contribution to the project since in
> > > > addition
> > > > to posting patches on JIRA it also gives write access to the code
> > > > repository.
> > > > That also means that now we have yet another person who can commit
> > > patches
> > > > submitted by others to our repo *wink*
> > > >
> > > > Frank, you've been following the project for quite some time now -
> > > > contributing
> > > > valuable changes over and over again. I certainly look forward to
> > working
> > > > with you in the future. Welcome!
> > > >
> > > >
> > > > Isabel
> > > >
> > > >
> > > >
> > >
> >
>


Re: Welcome to Frank Scholten as new Mahout committer

2013-12-03 Thread Dmitriy Lyubimov
Welcome, Frank!


On Tue, Dec 3, 2013 at 8:32 AM, Andrew Musselman  wrote:

> Welcome Frank; congratulations!
>
>
> On Tue, Dec 3, 2013 at 8:31 AM, Stevo Slavić  wrote:
>
> > Congrats and welcome Frank!
> >
> >
> > On Tue, Dec 3, 2013 at 2:34 PM, Gokhan Capan  wrote:
> >
> > > Congratulations, Frank!
> > >
> > > Gokhan
> > >
> > >
> > > On Tue, Dec 3, 2013 at 3:27 PM, Isabel Drost-Fromm  > > >wrote:
> > >
> > > >
> > > > Hi,
> > > >
> > > > this is to announce that the Project Management Committee (PMC) for
> > > Apache
> > > > Mahout has asked Frank Scholten to become committer and we are
> pleased
> > to
> > > > announce that he has accepted.
> > > >
> > > > Being a committer enables easier contribution to the project since in
> > > > addition
> > > > to posting patches on JIRA it also gives write access to the code
> > > > repository.
> > > > That also means that now we have yet another person who can commit
> > > patches
> > > > submitted by others to our repo *wink*
> > > >
> > > > Frank, you've been following the project for quite some time now -
> > > > contributing
> > > > valuable changes over and over again. I certainly look forward to
> > working
> > > > with you in the future. Welcome!
> > > >
> > > >
> > > > Isabel
> > > >
> > > >
> > > >
> > >
> >
>


[jira] [Updated] (MAHOUT-1354) Mahout Support for Hadoop 2

2013-12-03 Thread Gokhan Capan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gokhan Capan updated MAHOUT-1354:
-

Attachment: MAHOUT-1354_initial.patch

Could you guys test this initial patch against different versions of clusters 
to see if that works?

Usage:
mahout against hadoop1 (version 1.2.1): 
mvn package

mahout against hadoop2-stable (version 2.2.0, by default): 
mvn package -Phadoop2 

mahout against hadoop2-earlier: 
mvn package -Phadoop2 -Dhadoop.version=


> Mahout Support for Hadoop 2 
> 
>
> Key: MAHOUT-1354
> URL: https://issues.apache.org/jira/browse/MAHOUT-1354
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 1.0
>
> Attachments: MAHOUT-1354_initial.patch
>
>
> Mahout support for Hadoop , now that Hadoop 2 is official.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Build failed in Jenkins: Mahout-Examples-Cluster-Reuters-II #683

2013-12-03 Thread Apache Jenkins Server
See 


Changes:

[sslavic] Removed no longer applicable exclusion filter for high-scale-lib from 
examples job assembly descriptor

[sslavic] Removed not necessary use of 
maven-dependency-plugin:copy-dependencies in integration and examples modules

[smarthi] NOJIRA: Removed unused import

[frankscholten] Testing...first commit. Removed TODO without description

--
[...truncated 2282 lines...]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ mahout-core ---
[INFO] Building jar: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-jar-plugin:2.4:test-jar (default) @ mahout-core ---
[INFO] Building jar: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/core/target/mahout-core-0.9-SNAPSHOT-tests.jar
[INFO] 
[INFO] --- maven-assembly-plugin:2.4:single (job) @ mahout-core ---
[INFO] Reading assembly descriptor: src/main/assembly/job.xml
[INFO] Building jar: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar
[INFO] 
[INFO] --- maven-source-plugin:2.2.1:jar-no-fork (attach-sources) @ mahout-core 
---
[INFO] Building jar: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/core/target/mahout-core-0.9-SNAPSHOT-sources.jar
[INFO] 
[INFO] --- maven-install-plugin:2.5.1:install (default-install) @ mahout-core 
---
[INFO] Installing 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
 to 
/export/home/hudson/.m2/repository/org/apache/mahout/mahout-core/0.9-SNAPSHOT/mahout-core-0.9-SNAPSHOT.jar
[INFO] Installing 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/core/pom.xml
 to 
/export/home/hudson/.m2/repository/org/apache/mahout/mahout-core/0.9-SNAPSHOT/mahout-core-0.9-SNAPSHOT.pom
[INFO] Installing 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/core/target/mahout-core-0.9-SNAPSHOT-tests.jar
 to 
/export/home/hudson/.m2/repository/org/apache/mahout/mahout-core/0.9-SNAPSHOT/mahout-core-0.9-SNAPSHOT-tests.jar
[INFO] Installing 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar
 to 
/export/home/hudson/.m2/repository/org/apache/mahout/mahout-core/0.9-SNAPSHOT/mahout-core-0.9-SNAPSHOT-job.jar
[INFO] Installing 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/core/target/mahout-core-0.9-SNAPSHOT-sources.jar
 to 
/export/home/hudson/.m2/repository/org/apache/mahout/mahout-core/0.9-SNAPSHOT/mahout-core-0.9-SNAPSHOT-sources.jar
[INFO] 
[INFO] 
[INFO] Building Mahout Integration 0.9-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ mahout-integration 
---
[INFO] 
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
mahout-integration ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 0 resource
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
mahout-integration ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 129 source files to 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/integration/target/classes
[WARNING] Note: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/integration/src/main/java/org/apache/mahout/cf/taste/impl/model/mongodb/MongoDBDataModel.java
 uses or overrides a deprecated API.
[WARNING] Note: Recompile with -Xlint:deprecation for details.
[WARNING] Note: 
/zonestorage/hudson_solaris/home/hudson/hudson-slave/workspace/Mahout-Examples-Cluster-Reuters-II/trunk/integration/src/main/java/org/apache/mahout/cf/taste/impl/model/mongodb/MongoDBDataModel.java
 uses unchecked or unsafe operations.
[WARNING] Note: Recompile with -Xlint:unchecked for details.
[INFO] 
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ 
mahout-integration ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 10 resources
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
mahout-integration ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 37 source files to 
/zonestorage/hudson_solaris/home/hudson/hud

[jira] [Commented] (MAHOUT-1354) Mahout Support for Hadoop 2

2013-12-03 Thread Gokhan Capan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837933#comment-13837933
 ] 

Gokhan Capan commented on MAHOUT-1354:
--

Today I had some troubles with integration's transitive dependencies, let me 
dig further.

So this still should stay in 1.0 queue

> Mahout Support for Hadoop 2 
> 
>
> Key: MAHOUT-1354
> URL: https://issues.apache.org/jira/browse/MAHOUT-1354
> Project: Mahout
>  Issue Type: Improvement
>Affects Versions: 0.8
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
> Fix For: 1.0
>
>
> Mahout support for Hadoop , now that Hadoop 2 is official.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Welcome to Frank Scholten as new Mahout committer

2013-12-03 Thread Andrew Musselman
Welcome Frank; congratulations!


On Tue, Dec 3, 2013 at 8:31 AM, Stevo Slavić  wrote:

> Congrats and welcome Frank!
>
>
> On Tue, Dec 3, 2013 at 2:34 PM, Gokhan Capan  wrote:
>
> > Congratulations, Frank!
> >
> > Gokhan
> >
> >
> > On Tue, Dec 3, 2013 at 3:27 PM, Isabel Drost-Fromm  > >wrote:
> >
> > >
> > > Hi,
> > >
> > > this is to announce that the Project Management Committee (PMC) for
> > Apache
> > > Mahout has asked Frank Scholten to become committer and we are pleased
> to
> > > announce that he has accepted.
> > >
> > > Being a committer enables easier contribution to the project since in
> > > addition
> > > to posting patches on JIRA it also gives write access to the code
> > > repository.
> > > That also means that now we have yet another person who can commit
> > patches
> > > submitted by others to our repo *wink*
> > >
> > > Frank, you've been following the project for quite some time now -
> > > contributing
> > > valuable changes over and over again. I certainly look forward to
> working
> > > with you in the future. Welcome!
> > >
> > >
> > > Isabel
> > >
> > >
> > >
> >
>


Re: Welcome to Frank Scholten as new Mahout committer

2013-12-03 Thread Stevo Slavić
Congrats and welcome Frank!


On Tue, Dec 3, 2013 at 2:34 PM, Gokhan Capan  wrote:

> Congratulations, Frank!
>
> Gokhan
>
>
> On Tue, Dec 3, 2013 at 3:27 PM, Isabel Drost-Fromm  >wrote:
>
> >
> > Hi,
> >
> > this is to announce that the Project Management Committee (PMC) for
> Apache
> > Mahout has asked Frank Scholten to become committer and we are pleased to
> > announce that he has accepted.
> >
> > Being a committer enables easier contribution to the project since in
> > addition
> > to posting patches on JIRA it also gives write access to the code
> > repository.
> > That also means that now we have yet another person who can commit
> patches
> > submitted by others to our repo *wink*
> >
> > Frank, you've been following the project for quite some time now -
> > contributing
> > valuable changes over and over again. I certainly look forward to working
> > with you in the future. Welcome!
> >
> >
> > Isabel
> >
> >
> >
>


[jira] [Commented] (MAHOUT-1242) No key redistribution function for associative maps

2013-12-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837702#comment-13837702
 ] 

Dawid Weiss commented on MAHOUT-1242:
-

Yes, looks good to me.

> No key redistribution function for associative maps
> ---
>
> Key: MAHOUT-1242
> URL: https://issues.apache.org/jira/browse/MAHOUT-1242
> Project: Mahout
>  Issue Type: Improvement
>  Components: collections, Math
>Reporter: Dawid Weiss
> Attachments: MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just 
> returns the key value:
> {code}
>   /**
>* Returns a hashcode for the specified value.
>*
>* @return a hash code value for the specified value.
>*/
>   public static int hash(int value) {
> return value;
> //return value * 0x278DDE6D; // see 
> org.apache.mahout.math.jet.random.engine.DRand
> /*
> value &= 0x7FFF; // make it >=0
> int hashCode = 0;
> do hashCode = 31*hashCode + value%10;
> while ((value /= 10) > 0);
> return 28629151*hashCode; // spread even further; h*31^5
> */
>   }
>  {code}
> This easily leads to very degenerate behavior on keys that have constant 
> lower bits (long collision chains). A simple (and strong) hash function like 
> the final step of murmurhash3 goes a long way at ensuring the keys 
> distribution is more uniform regardless of the input distribution.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1242) No key redistribution function for associative maps

2013-12-03 Thread Suneel Marthi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837697#comment-13837697
 ] 

Suneel Marthi commented on MAHOUT-1242:
---

Thanks for taking this, patch looks good. Is this good to be committed?

> No key redistribution function for associative maps
> ---
>
> Key: MAHOUT-1242
> URL: https://issues.apache.org/jira/browse/MAHOUT-1242
> Project: Mahout
>  Issue Type: Improvement
>  Components: collections, Math
>Reporter: Dawid Weiss
> Attachments: MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just 
> returns the key value:
> {code}
>   /**
>* Returns a hashcode for the specified value.
>*
>* @return a hash code value for the specified value.
>*/
>   public static int hash(int value) {
> return value;
> //return value * 0x278DDE6D; // see 
> org.apache.mahout.math.jet.random.engine.DRand
> /*
> value &= 0x7FFF; // make it >=0
> int hashCode = 0;
> do hashCode = 31*hashCode + value%10;
> while ((value /= 10) > 0);
> return 28629151*hashCode; // spread even further; h*31^5
> */
>   }
>  {code}
> This easily leads to very degenerate behavior on keys that have constant 
> lower bits (long collision chains). A simple (and strong) hash function like 
> the final step of murmurhash3 goes a long way at ensuring the keys 
> distribution is more uniform regardless of the input distribution.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Welcome to Frank Scholten as new Mahout committer

2013-12-03 Thread Gokhan Capan
Congratulations, Frank!

Gokhan


On Tue, Dec 3, 2013 at 3:27 PM, Isabel Drost-Fromm wrote:

>
> Hi,
>
> this is to announce that the Project Management Committee (PMC) for Apache
> Mahout has asked Frank Scholten to become committer and we are pleased to
> announce that he has accepted.
>
> Being a committer enables easier contribution to the project since in
> addition
> to posting patches on JIRA it also gives write access to the code
> repository.
> That also means that now we have yet another person who can commit patches
> submitted by others to our repo *wink*
>
> Frank, you've been following the project for quite some time now -
> contributing
> valuable changes over and over again. I certainly look forward to working
> with you in the future. Welcome!
>
>
> Isabel
>
>
>


[jira] [Resolved] (MAHOUT-1328) CLI-invoked K-means final step (Cluster Classification Driver) ignores job-specific -D MR parameters

2013-12-03 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi resolved MAHOUT-1328.
---

   Resolution: Not A Problem
Fix Version/s: (was: 0.9)
   0.8

This has been fixed by Mahout-1201 for 0.8.

> CLI-invoked K-means final step (Cluster Classification Driver) ignores 
> job-specific -D MR parameters
> 
>
> Key: MAHOUT-1328
> URL: https://issues.apache.org/jira/browse/MAHOUT-1328
> Project: Mahout
>  Issue Type: Bug
>  Components: Clustering
>Affects Versions: 0.8
>Reporter: Stewart Whiting
>Assignee: Suneel Marthi
> Fix For: 0.8
>
>
> I believe this is an issue - someone please correct me if not!
> I am running a large k-means clustering task. Our default cluster map/reduce 
> slots per node and JVM memory parameters etc are not appropriate for the 
> memory requirements of this.
> So, I invoke K-means clustering from the CLI using, for example:
> mahout kmeans -i /mahout-input -o /mahout-output -c clusters -dm 
> org.apache.mahout.common.distance.CosineDistanceMeasure -x 12 -ow -k 50 -cl 
> -Dmapred.child.java.opts=-Xmx7096m 
> -Dmapred.tasktracker.reduce.tasks.maximum=1 
> -Dmapred.tasktracker.map.tasks.maximum=1 -Dmapred.job.map.memory.mb=7000 
> -Dmapred.cluster.max.map.memory.mb=7000 
> -Dmapred.cluster.reduce.memory.mb=7000 
> -Dmapred.cluster.max.reduce.memory.mb=7000
> The initial MR tasks for each clustering iteration run successfully. 
> Inspecting the Hadoop config for each task after completion show that the job 
> runs with the explicitly provided MR configuration from the -D parameters.
> However, when the final cluster classification task is run (i.e. to generate 
> the clusteredPoints/ directory), it usually fails due to outOfMemory errors. 
> Inspecting the MR task logs for it shows that it ran with the default cluster 
> settings, not those provided by my -D CLI parameters.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Welcome to Frank Scholten as new Mahout committer

2013-12-03 Thread Isabel Drost-Fromm

Hi,

this is to announce that the Project Management Committee (PMC) for Apache
Mahout has asked Frank Scholten to become committer and we are pleased to
announce that he has accepted.

Being a committer enables easier contribution to the project since in addition
to posting patches on JIRA it also gives write access to the code repository.
That also means that now we have yet another person who can commit patches
submitted by others to our repo *wink*

Frank, you've been following the project for quite some time now - contributing
valuable changes over and over again. I certainly look forward to working
with you in the future. Welcome!


Isabel




[jira] [Commented] (MAHOUT-1242) No key redistribution function for associative maps

2013-12-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837610#comment-13837610
 ] 

Dawid Weiss commented on MAHOUT-1242:
-

I'm not an expert on hash functions, really. I believe Austin Appleby's did a 
good job on MH3 and I rely on his work here.

> No key redistribution function for associative maps
> ---
>
> Key: MAHOUT-1242
> URL: https://issues.apache.org/jira/browse/MAHOUT-1242
> Project: Mahout
>  Issue Type: Improvement
>  Components: collections, Math
>Reporter: Dawid Weiss
> Attachments: MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just 
> returns the key value:
> {code}
>   /**
>* Returns a hashcode for the specified value.
>*
>* @return a hash code value for the specified value.
>*/
>   public static int hash(int value) {
> return value;
> //return value * 0x278DDE6D; // see 
> org.apache.mahout.math.jet.random.engine.DRand
> /*
> value &= 0x7FFF; // make it >=0
> int hashCode = 0;
> do hashCode = 31*hashCode + value%10;
> while ((value /= 10) > 0);
> return 28629151*hashCode; // spread even further; h*31^5
> */
>   }
>  {code}
> This easily leads to very degenerate behavior on keys that have constant 
> lower bits (long collision chains). A simple (and strong) hash function like 
> the final step of murmurhash3 goes a long way at ensuring the keys 
> distribution is more uniform regardless of the input distribution.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAHOUT-1242) No key redistribution function for associative maps

2013-12-03 Thread Tharindu Rusira (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tharindu Rusira updated MAHOUT-1242:


Attachment: (was: MAHOUT-1242.patch)

> No key redistribution function for associative maps
> ---
>
> Key: MAHOUT-1242
> URL: https://issues.apache.org/jira/browse/MAHOUT-1242
> Project: Mahout
>  Issue Type: Improvement
>  Components: collections, Math
>Reporter: Dawid Weiss
> Attachments: MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just 
> returns the key value:
> {code}
>   /**
>* Returns a hashcode for the specified value.
>*
>* @return a hash code value for the specified value.
>*/
>   public static int hash(int value) {
> return value;
> //return value * 0x278DDE6D; // see 
> org.apache.mahout.math.jet.random.engine.DRand
> /*
> value &= 0x7FFF; // make it >=0
> int hashCode = 0;
> do hashCode = 31*hashCode + value%10;
> while ((value /= 10) > 0);
> return 28629151*hashCode; // spread even further; h*31^5
> */
>   }
>  {code}
> This easily leads to very degenerate behavior on keys that have constant 
> lower bits (long collision chains). A simple (and strong) hash function like 
> the final step of murmurhash3 goes a long way at ensuring the keys 
> distribution is more uniform regardless of the input distribution.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAHOUT-1242) No key redistribution function for associative maps

2013-12-03 Thread Tharindu Rusira (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tharindu Rusira updated MAHOUT-1242:


Attachment: MAHOUT-1242.patch

Thanks [~dweiss] for the quick feedback. I'm attaching the reworked patch.
By the way, any ideas of a better hash mechanism you can think of (which suits 
this context)? 


> No key redistribution function for associative maps
> ---
>
> Key: MAHOUT-1242
> URL: https://issues.apache.org/jira/browse/MAHOUT-1242
> Project: Mahout
>  Issue Type: Improvement
>  Components: collections, Math
>Reporter: Dawid Weiss
> Attachments: MAHOUT-1242.patch, MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just 
> returns the key value:
> {code}
>   /**
>* Returns a hashcode for the specified value.
>*
>* @return a hash code value for the specified value.
>*/
>   public static int hash(int value) {
> return value;
> //return value * 0x278DDE6D; // see 
> org.apache.mahout.math.jet.random.engine.DRand
> /*
> value &= 0x7FFF; // make it >=0
> int hashCode = 0;
> do hashCode = 31*hashCode + value%10;
> while ((value /= 10) > 0);
> return 28629151*hashCode; // spread even further; h*31^5
> */
>   }
>  {code}
> This easily leads to very degenerate behavior on keys that have constant 
> lower bits (long collision chains). A simple (and strong) hash function like 
> the final step of murmurhash3 goes a long way at ensuring the keys 
> distribution is more uniform regardless of the input distribution.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAHOUT-1242) No key redistribution function for associative maps

2013-12-03 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837569#comment-13837569
 ] 

Dawid Weiss commented on MAHOUT-1242:
-

You introduced an unused import (Random). And the code could be simplified a 
bit too (although it remains identical at the bytecode level it seems clearer):
{code}
/**
 * Hashes a 4-byte sequence (Java int).
 */
public static int hash(int k) {
k ^= k >>> 16;
k *= 0x85ebca6b;
k ^= k >>> 13;
k *= 0xc2b2ae35;
k ^= k >>> 16;
return k;
}
{code}

> No key redistribution function for associative maps
> ---
>
> Key: MAHOUT-1242
> URL: https://issues.apache.org/jira/browse/MAHOUT-1242
> Project: Mahout
>  Issue Type: Improvement
>  Components: collections, Math
>Reporter: Dawid Weiss
> Attachments: MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just 
> returns the key value:
> {code}
>   /**
>* Returns a hashcode for the specified value.
>*
>* @return a hash code value for the specified value.
>*/
>   public static int hash(int value) {
> return value;
> //return value * 0x278DDE6D; // see 
> org.apache.mahout.math.jet.random.engine.DRand
> /*
> value &= 0x7FFF; // make it >=0
> int hashCode = 0;
> do hashCode = 31*hashCode + value%10;
> while ((value /= 10) > 0);
> return 28629151*hashCode; // spread even further; h*31^5
> */
>   }
>  {code}
> This easily leads to very degenerate behavior on keys that have constant 
> lower bits (long collision chains). A simple (and strong) hash function like 
> the final step of murmurhash3 goes a long way at ensuring the keys 
> distribution is more uniform regardless of the input distribution.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (MAHOUT-1242) No key redistribution function for associative maps

2013-12-03 Thread Tharindu Rusira (JIRA)

[ 
https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837560#comment-13837560
 ] 

Tharindu Rusira edited comment on MAHOUT-1242 at 12/3/13 10:46 AM:
---

Hi [~dweiss], I'm currently working on this issue and for the time being I 
attach a simple implementation of the final step of murmurhash3 as you 
suggested. 
Your feedback is highly appreciated.
Thanks.
P.S. not tested


was (Author: tharindu_rusira):
Hi [~dawidweiss], I'm currently working on this issue and for the time being I 
attach a simple implementation of the final step of murmurhash3 as you 
suggested. 
Your feedback is highly appreciated.
Thanks.
P.S. not tested

> No key redistribution function for associative maps
> ---
>
> Key: MAHOUT-1242
> URL: https://issues.apache.org/jira/browse/MAHOUT-1242
> Project: Mahout
>  Issue Type: Improvement
>  Components: collections, Math
>Reporter: Dawid Weiss
> Attachments: MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just 
> returns the key value:
> {code}
>   /**
>* Returns a hashcode for the specified value.
>*
>* @return a hash code value for the specified value.
>*/
>   public static int hash(int value) {
> return value;
> //return value * 0x278DDE6D; // see 
> org.apache.mahout.math.jet.random.engine.DRand
> /*
> value &= 0x7FFF; // make it >=0
> int hashCode = 0;
> do hashCode = 31*hashCode + value%10;
> while ((value /= 10) > 0);
> return 28629151*hashCode; // spread even further; h*31^5
> */
>   }
>  {code}
> This easily leads to very degenerate behavior on keys that have constant 
> lower bits (long collision chains). A simple (and strong) hash function like 
> the final step of murmurhash3 goes a long way at ensuring the keys 
> distribution is more uniform regardless of the input distribution.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAHOUT-1242) No key redistribution function for associative maps

2013-12-03 Thread Tharindu Rusira (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAHOUT-1242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tharindu Rusira updated MAHOUT-1242:


Attachment: MAHOUT-1242.patch

Hi [~dawidweiss], I'm currently working on this issue and for the time being I 
attach a simple implementation of the final step of murmurhash3 as you 
suggested. 
Your feedback is highly appreciated.
Thanks.
P.S. not tested

> No key redistribution function for associative maps
> ---
>
> Key: MAHOUT-1242
> URL: https://issues.apache.org/jira/browse/MAHOUT-1242
> Project: Mahout
>  Issue Type: Improvement
>  Components: collections, Math
>Reporter: Dawid Weiss
> Attachments: MAHOUT-1242.patch
>
>
> All integer-based maps currently use HashFunctions.hash(int) which just 
> returns the key value:
> {code}
>   /**
>* Returns a hashcode for the specified value.
>*
>* @return a hash code value for the specified value.
>*/
>   public static int hash(int value) {
> return value;
> //return value * 0x278DDE6D; // see 
> org.apache.mahout.math.jet.random.engine.DRand
> /*
> value &= 0x7FFF; // make it >=0
> int hashCode = 0;
> do hashCode = 31*hashCode + value%10;
> while ((value /= 10) > 0);
> return 28629151*hashCode; // spread even further; h*31^5
> */
>   }
>  {code}
> This easily leads to very degenerate behavior on keys that have constant 
> lower bits (long collision chains). A simple (and strong) hash function like 
> the final step of murmurhash3 goes a long way at ensuring the keys 
> distribution is more uniform regardless of the input distribution.



--
This message was sent by Atlassian JIRA
(v6.1#6144)