Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-07 Thread Kevin Moulart
Perfect ! It works like a charm now ! I'll still be testing after lunch, and let you know if any new problem subsists, but it looks promising ! Thanks you very much ! Kévin Moulart 2014-03-06 19:31 GMT+01:00 Ted Dunning ted.dunn...@gmail.com: On Thu, Mar 6, 2014 at 7:46 AM, Kevin Moulart

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Kevin Moulart
Hi again, and thanks for the enthousiasm ! I did compile the trunk with the hadoop2 profile and, althoug it didn't work at first because of some Canopy tests not passing, when I skipped the tests it compiled and when I tested it afterward it passed. I used the version I have isntalled, so I just

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Gokhan Capan
Kevin, From trunk, can you build mahout for hadoop2 using this command: mvn clean package -DskipTests=true -Dhadoop2.version=YOUR_HADOOP2_VERSION Then can you verify that you have the right hadoop jars with the following command: find . -name hadoop*.jar Gokhan On Thu, Mar 6, 2014 at

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Kevin Moulart
Hi thanks very much it seems to have worked ! Compiling with mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0 works and I no longer have the error, but then when running tests that used to work with previous install like trainAdaptativeLogistic and then ValidateAdaptativeLogistic, the first

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Sean Owen
That's gonna be a Guava version problem. I have seen variants of this for a while. Hadoop still uses 11.0.2 even in HEAD and you can often get away with using a later version in a project like this, even though code that executes on Hadoop will use an older Guava than you compiled against. This is

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Kevin Moulart
Ok so should I try and recompile and change the guava version to 11.0.2 in the pom ? Kévin Moulart 2014-03-06 16:26 GMT+01:00 Sean Owen sro...@gmail.com: That's gonna be a Guava version problem. I have seen variants of this for a while. Hadoop still uses 11.0.2 even in HEAD and you can often

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Sean Owen
If I'm right, then it will cause compile errors, but then, you just fix those by replacing some Guava constructs with equivalent Java or older Guava code. IIRC it is fairly trivial. And in fact probably should not use Guava 12+ methods for this reason even if compiling against 12+. And in fact I

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Kevin Moulart
Indeed it causes compile errors : [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project mahout-math: Compilation failure [ERROR] /home/myCompny/Downloads/mahout9/math/src/main/java/org/apache/mahout/math/stats/GroupTree.java:[171,31]

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-06 Thread Ted Dunning
On Thu, Mar 6, 2014 at 7:46 AM, Kevin Moulart kevinmoul...@gmail.comwrote: [ERROR] /home/myCompny/Downloads/mahout9/math/src/main/java/org/apache/mahout/math/stats/GroupTree.java:[171,31] cannot find symbol Replace that line with: stack = new ArrayDequeGroupTree();

Re: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Kevin Moulart
Hi and thanks for your help! I had been told that the version of mahout used by Cloudera (CDH 4.6) was in fact 0.8 with a patch for mr2 support. ( http://mail-archives.apache.org/mod_mbox/mahout-user/201402.mbox/%3CCAEccTywqSAKA_HeX4vTZ-5XPmKtj5b8zMGQUfn5qRsiq=7o=u...@mail.gmail.com%3E) But I

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Suneel Marthi
Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098 which address the issues u r seeing. The second part of the issue u r seeing with Mahout 0.9 distro seems to be related to how u set it up on CDH4. I apologize for not being helpful here as I am not a CDH4 user or

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Andrew Musselman
I'm not sure about this either but I think these are all the changes to Mahout in CDH 4.6.0: http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.6.0.CHANGES.txt MAHOUT-1291 MAHOUT-1033 MAHOUT-1142 On Wed, Mar 5, 2014 at 8:30 AM, Suneel Marthi suneel_mar...@yahoo.comwrote: Not sure if

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Sean Owen
CDH 4.5 and 4.6 are both 0.7 + patches. Neither contains 0.8, since it has (tiny) breaking changes vs 0.7 and this is a minor version update. CDH5 contains 0.8 + patches. I did not say CDH4 has 0.8 -- re-read the message of mine that was quoted.

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Dmitriy Lyubimov
Yeah. it would seem CDH releases of Mahout produce some sort of cut-down version of such. I suggest to switch to official release tarbal (or write to Cloudera support about it). On Wed, Mar 5, 2014 at 8:38 AM, Andrew Musselman andrew.mussel...@gmail.com wrote: I'm not sure about this either

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Sean Owen
I don't follow what here makes you say they are cut down releases? They are release plus patches not release minus patches. The question is not about how to use 0.7, but how to use 1.0-SNAPSHOT. Why would switching to the official 0.7 release help? I think the answer is you build Mahout for

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Suneel Marthi
I apologize Sean I wasn't aware of the complete history in this thread.  I didn't know about Hadoop 2.x being involved here, if so yes need to build Mahout against HEAD with Hadoop 2 profile to get working. On Wednesday, March 5, 2014 12:04 PM, Sean Owen sro...@gmail.com wrote: CDH 4.5

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Dmitriy Lyubimov
On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen sro...@gmail.com wrote: I don't follow what here makes you say they are cut down releases? meaning it seems to be pretty much 2 releases behind the official. But i definitely don't follow CDH developments in this department, you seem in a better

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Sean Owen
I don't understand this -- CDH always bundles the latest release. You know that CDH4 was released in July 2012, right? So it included 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a month after it began beta 2. CDH follows semantic versioning and won't introduce changes that

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Sean Owen
You can always install whatever version of anything on your cluster that you want. It may or may not work, but often happens to, at least for whatever you need it to do. It's just the same as it is without a packaged distribution -- dump new tarballs and cross your fingers. Nothing is weird or

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Andrew Musselman
Yeah, for sure; balancing clients' risk aversion to technical features is why we often recommend vendor solutions. Having a little button to choose a newer version of a component in the Manager UI (even with a confirmation dialog that said Are you sure? Are you crazy?) would be more palatable to

Re: Fwd: PCA with ssvd leads to StackOverFlowError

2014-03-05 Thread Andrew Musselman
I mean balance the risk aversion against the value of new features duh. On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman andrew.mussel...@gmail.com wrote: Yeah, for sure; balancing clients' risk aversion to technical features is why we often recommend vendor solutions. Having a little

PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Kevin Moulart
Hi, I'm trying to apply a PCA to reduce the dimension of a matrix of 1603 columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and I always get a StackOverflowError : Here is my command line : mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100 -pca

Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Dmitriy Lyubimov
Kevin, thanks for reporting this. Stack overflow error has not been known to happen to date. But i will take a look. It looks like a bug in the mean computation code, given your stack trace, although it may have been induced by some circumstances specific to your deployment. What version is it?

Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Dmitriy Lyubimov
It doesn't look like -us has been removed. At least i see it on the head of the trunk, SSVDCli.java, line 62: addOption(uSigma, us, Compute U * Sigma, String.valueOf(false)); i.e. short version(single dash) -us true, or long version(double-dash) --uSigma true. Can you check again with 0.9?

Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Dmitriy Lyubimov
as for the stack trace, it looks like it doesn't agree with current trunk. Again, i need to know which version you are running. But from looking at current trunk, i don't really see how that may be happening at the moment. On Tue, Mar 4, 2014 at 9:40 AM, Dmitriy Lyubimov dlie...@gmail.com

Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Suneel Marthi
I have not seen the stackoverflow error, but this code has been fixed since .8 Sent from my iPhone On Mar 4, 2014, at 12:40 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: It doesn't look like -us has been removed. At least i see it on the head of the trunk, SSVDCli.java, line 62:

Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Suneel Marthi
The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7 which had this issue (from ur stacktrace, its apparent u r using Mahout 0.7).  Please upgrade to the latest mahout version. On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart kevinmoul...@gmail.com wrote: Hi, I'm