Re:Re: Re: Hadoop SSVD OutOfMemory Problem
I think they used to run individually in eclipse just fine. I am sure it will also work with idea. With maven, I never ran anything less than module worth of tests. On Apr 28, 2015 7:55 PM, "lastarsenal" wrote: > Ok, I have github account and clone mahout in my local workdir. > > > I revised the code and run test: mvn test, however, there are 3 test > failure: > Failed tests: > > LocalSSVDPCASparseTest.runPCATest1:87->runSSVDSolver:222->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86 > null > > LocalSSVDSolverDenseTest.testSSVDSolverPowerIterations1:59->runSSVDSolver:172->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86 > null > > LocalSSVDSolverSparseSequentialTest.testSSVDSolverPowerIterations1:69->runSSVDSolver:177->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86 > null > > > Now, my question is, how can I run a specified test with maven? For "mvn > test" is so slow, then if I can do like "mvn test LocalSSVDPCASparseTest", > my efficiency will be improved. > > At 2015-04-29 01:25:34, "Dmitriy Lyubimov" wrote: > >Just Dmitriy is fine. > > > >In order to create a pull request, please check out the process page > >http://mahout.apache.org/developers/github.html. Note that it is written > >for both committers and contributors, so you need to ignore the details > for > >committers. > > > >Basically, you just need a github account, clone (fork) apache/mahout in > >your account, (optionally) create a patch branch, commit your > modifications > >there, and then use github UI to create a pull request against > >apache/mahout. > > > >thanks. > > > >-d > > > >On Mon, Apr 27, 2015 at 8:39 PM, lastarsenal wrote: > > > >> Hi, Dmitriy Lyubimov > >> > >> > >> OK, I have submitted a JIRA issue at > >> https://issues.apache.org/jira/browse/MAHOUT-1700 > >> > >> > >> I'm a newbie for mahout, so, what should I do next for this issue? Thank > >> you! > >> > >> At 2015-04-28 02:16:37, "Dmitriy Lyubimov" wrote: > >> >Thank you for this analysis. I can't immediately confirm this since > it's > >> >been a while but this sounds credible. > >> > > >> >Do you mind to file a jira with all this information, and even perhaps > do > >> a > >> >PR on github? > >> > > >> >thank you. > >> > > >> >On Mon, Apr 27, 2015 at 4:32 AM, lastarsenal > wrote: > >> > > >> >> Hi, All, > >> >> > >> >> > >> >> Recently, I tried mahout's hadoop ssvd(mahout-0.9 or mahout-1.0) > >> >> job. There's a java heap space out of memory problem in > >> ABtDenseOutJob. I > >> >> found the reason, the ABtDenseOutJob map code is as below: > >> >> > >> >> > >> >> protected void map(Writable key, VectorWritable value, Context > >> context) > >> >> throws IOException, InterruptedException { > >> >> > >> >> > >> >> Vector vec = value.get(); > >> >> > >> >> > >> >> int vecSize = vec.size(); > >> >> if (aCols == null) { > >> >> aCols = new Vector[vecSize]; > >> >> } else if (aCols.length < vecSize) { > >> >> aCols = Arrays.copyOf(aCols, vecSize); > >> >> } > >> >> > >> >> > >> >> if (vec.isDense()) { > >> >> for (int i = 0; i < vecSize; i++) { > >> >> extendAColIfNeeded(i, aRowCount + 1); > >> >> aCols[i].setQuick(aRowCount, vec.getQuick(i)); > >> >> } > >> >> } else if (vec.size() > 0) { > >> >> for (Vector.Element vecEl : vec.nonZeroes()) { > >> >> int i = vecEl.index(); > >> >> extendAColIfNeeded(i, aRowCount + 1); > >> >> aCols[i].setQuick(aRowCount, vecEl.get()); > >> >> } > >> >> } > >> >> aRowCount++; > >> >> } > >> >> > >> >> > >> >> If the input is RandomAccessSparseVector, usually with big data, it's > >> >> vec.size() is Integer.MAX_VALUE, which is 2^31, then aCols = new > >> >> Vector[vecSize] will introduce the OutOfMemory problem. The > settlement > >> of > >> >> course should be enlarge every tasktracker's maximum memory: > >> >> > >> >> mapred.child.java.opts > >> >> -Xmx1024m > >> >> > >> >> However, if you are NOT hadoop administrator or ops, you have no > >> >> permission to modify the config. So, I try to modify ABtDenseOutJob > map > >> >> code to support RandomAccessSparseVector situation, I use hashmap to > >> >> represent aCols instead of the original Vector[] aCols array, the > >> modified > >> >> code is as below: > >> >> > >> >> > >> >> private Map aColsMap = new HashMap Vector>(); > >> >> protected void map(Writable key, VectorWritable value, Context > >> context) > >> >> throws IOException, InterruptedException { > >> >> > >> >> > >> >> Vector vec = value.get(); > >> >> if (vec.isDense()) { > >> >> for (int i = 0; i < vecSize; i++) { > >> >> //extendAColIfNeeded(i, aRowCount + 1); > >> >> if (aColsMap.get(i) == null) { > >> >> aColsMap.put(i, new > RandomAccessSparseVector(Integer.MAX_VALUE, > >> >> 100)); > >> >> } > >> >> aColsMap.get(i).setQuick(aRo
Re:Re: Re: Hadoop SSVD OutOfMemory Problem
Ok, I have github account and clone mahout in my local workdir. I revised the code and run test: mvn test, however, there are 3 test failure: Failed tests: LocalSSVDPCASparseTest.runPCATest1:87->runSSVDSolver:222->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86 null LocalSSVDSolverDenseTest.testSSVDSolverPowerIterations1:59->runSSVDSolver:172->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86 null LocalSSVDSolverSparseSequentialTest.testSSVDSolverPowerIterations1:69->runSSVDSolver:177->Assert.assertTrue:52->Assert.assertTrue:41->Assert.fail:86 null Now, my question is, how can I run a specified test with maven? For "mvn test" is so slow, then if I can do like "mvn test LocalSSVDPCASparseTest", my efficiency will be improved. At 2015-04-29 01:25:34, "Dmitriy Lyubimov" wrote: >Just Dmitriy is fine. > >In order to create a pull request, please check out the process page >http://mahout.apache.org/developers/github.html. Note that it is written >for both committers and contributors, so you need to ignore the details for >committers. > >Basically, you just need a github account, clone (fork) apache/mahout in >your account, (optionally) create a patch branch, commit your modifications >there, and then use github UI to create a pull request against >apache/mahout. > >thanks. > >-d > >On Mon, Apr 27, 2015 at 8:39 PM, lastarsenal wrote: > >> Hi, Dmitriy Lyubimov >> >> >> OK, I have submitted a JIRA issue at >> https://issues.apache.org/jira/browse/MAHOUT-1700 >> >> >> I'm a newbie for mahout, so, what should I do next for this issue? Thank >> you! >> >> At 2015-04-28 02:16:37, "Dmitriy Lyubimov" wrote: >> >Thank you for this analysis. I can't immediately confirm this since it's >> >been a while but this sounds credible. >> > >> >Do you mind to file a jira with all this information, and even perhaps do >> a >> >PR on github? >> > >> >thank you. >> > >> >On Mon, Apr 27, 2015 at 4:32 AM, lastarsenal wrote: >> > >> >> Hi, All, >> >> >> >> >> >> Recently, I tried mahout's hadoop ssvd(mahout-0.9 or mahout-1.0) >> >> job. There's a java heap space out of memory problem in >> ABtDenseOutJob. I >> >> found the reason, the ABtDenseOutJob map code is as below: >> >> >> >> >> >> protected void map(Writable key, VectorWritable value, Context >> context) >> >> throws IOException, InterruptedException { >> >> >> >> >> >> Vector vec = value.get(); >> >> >> >> >> >> int vecSize = vec.size(); >> >> if (aCols == null) { >> >> aCols = new Vector[vecSize]; >> >> } else if (aCols.length < vecSize) { >> >> aCols = Arrays.copyOf(aCols, vecSize); >> >> } >> >> >> >> >> >> if (vec.isDense()) { >> >> for (int i = 0; i < vecSize; i++) { >> >> extendAColIfNeeded(i, aRowCount + 1); >> >> aCols[i].setQuick(aRowCount, vec.getQuick(i)); >> >> } >> >> } else if (vec.size() > 0) { >> >> for (Vector.Element vecEl : vec.nonZeroes()) { >> >> int i = vecEl.index(); >> >> extendAColIfNeeded(i, aRowCount + 1); >> >> aCols[i].setQuick(aRowCount, vecEl.get()); >> >> } >> >> } >> >> aRowCount++; >> >> } >> >> >> >> >> >> If the input is RandomAccessSparseVector, usually with big data, it's >> >> vec.size() is Integer.MAX_VALUE, which is 2^31, then aCols = new >> >> Vector[vecSize] will introduce the OutOfMemory problem. The settlement >> of >> >> course should be enlarge every tasktracker's maximum memory: >> >> >> >> mapred.child.java.opts >> >> -Xmx1024m >> >> >> >> However, if you are NOT hadoop administrator or ops, you have no >> >> permission to modify the config. So, I try to modify ABtDenseOutJob map >> >> code to support RandomAccessSparseVector situation, I use hashmap to >> >> represent aCols instead of the original Vector[] aCols array, the >> modified >> >> code is as below: >> >> >> >> >> >> private Map aColsMap = new HashMap(); >> >> protected void map(Writable key, VectorWritable value, Context >> context) >> >> throws IOException, InterruptedException { >> >> >> >> >> >> Vector vec = value.get(); >> >> if (vec.isDense()) { >> >> for (int i = 0; i < vecSize; i++) { >> >> //extendAColIfNeeded(i, aRowCount + 1); >> >> if (aColsMap.get(i) == null) { >> >> aColsMap.put(i, new RandomAccessSparseVector(Integer.MAX_VALUE, >> >> 100)); >> >> } >> >> aColsMap.get(i).setQuick(aRowCount, vec.getQuick(i)); >> >> //aCols[i].setQuick(aRowCount, vec.getQuick(i)); >> >> } >> >> } else if (vec.size() > 0) { >> >> for (Vector.Element vecEl : vec.nonZeroes()) { >> >> int i = vecEl.index(); >> >> //extendAColIfNeeded(i, aRowCount + 1); >> >> if (aColsMap.get(i) == null) { >> >> aColsMap.put(i, new RandomAccessSparseVector(Integer.MAX_VALUE, >> >> 100)); >> >> } >> >> aColsMap.g