Don't know what is happening. I rebooted my Linux VM, did a clean mahout build, zapped bin/work, and got the same result. Will have to debug more later today...
-----Original Message----- From: Grant Ingersoll [mailto:[email protected]] Sent: Friday, May 20, 2011 12:54 PM To: [email protected] Subject: Re: Is LDA Broken? Hmm, that's weird. I might suggest doing "clean install" first as well as deleting your examples/bin/work directory. On May 20, 2011, at 3:41 PM, Jeff Eastman wrote: > I uncommented line 39 and am getting the same errors (index error with > kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & > MapR). Trying to run locally, I get this curious output. I don't have much > time today to pursue it (in meetings all day) but will do my best: > > [dev@devbox mahout]$ ./examples/bin/build-reuters.sh > Please select a number to choose the corresponding clustering algorithm > 1. kmeans clustering > 2. lda clustering > Enter your choice : 1 > ok. You chose 1 and we'll use kmeans Clustering > Downloading Reuters-21578 > % Total % Received % Xferd Average Speed Time Time Time Current > Dload Upload Total Spent Left Speed > 100 7959k 100 7959k 0 0 1145k 0 0:00:06 0:00:06 --:--:-- 1135k > Extracting... > no HADOOP_HOME set, running locally > May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn > WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on > classpath, will use command-line arguments only > Deleting all files in ./examples/bin/work/reuters-out > May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Program took 4690 ms > no HADOOP_HOME set, running locally > May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, > --endPhase=2147483647, > --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, > --input=./examples/bin/work/reuters-out/, --keyPrefix=, > --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, > --tempDir=temp} > Exception in thread "main" java.lang.IllegalStateException: > java.io.FileNotFoundException: > /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt > (Too many open files) > at > org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746) > at > org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76) > at > org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at > org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) > C > > -----Original Message----- > From: Grant Ingersoll [mailto:[email protected]] > Sent: Friday, May 20, 2011 11:17 AM > To: [email protected] > Subject: Re: Is LDA Broken? > > yeah, sorry. I commented out line 39: cd examples/bin > > On May 20, 2011, at 1:58 PM, Jeff Eastman wrote: > >> It does seem these two symptoms are of the same problem. I applied the >> patch; however, and now neither option runs. It appears the cd is off but I >> can't see where. >> >> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh >> Please select a number to choose the corresponding clustering algorithm >> 1. kmeans clustering >> 2. lda clustering >> Enter your choice : 1 >> ok. You chose 1 and we'll use kmeans Clustering >> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or >> directory >> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or >> directory >> >> >> -----Original Message----- >> From: Grant Ingersoll [mailto:[email protected]] >> Sent: Friday, May 20, 2011 10:50 AM >> To: [email protected] >> Subject: Re: Is LDA Broken? >> >> Likely so, see MAHOUT-694. >> >> >> On May 20, 2011, at 1:39 PM, Sean Owen wrote: >> >>> Oh sorry these are the same issue? Great! >>> On May 20, 2011 5:44 PM, "Jake Mannix" <[email protected]> wrote: >>>> Looks like Grant got a fix posted? Has anyone else tried it? >>>> >>>> -jake >>>> >>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <[email protected]> wrote: >>>> >>>>> I think we definitely need to figure out whether it's a bug or some other >>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of >>> thing >>>>> that needs a fix ASAP in which case write up all you know and everyone >>> will >>>>> pile in to look at it. >>>>> >>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <[email protected]> wrote: >>>>> >>>>>> Is this an issue that should be fixed before we release? It seems to be >>>>>> broken to me. >>>>>> >>>>>> -----Original Message----- >>>>>> From: Jeff Eastman [mailto:[email protected]] >>>>>> Sent: Thursday, May 19, 2011 5:05 PM >>>>>> To: [email protected] >>>>>> Subject: Is LDA Broken? >>>>>> >>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations >>> (20) >>>>>> without ever producing a non-zero Log Likelihood. This is not the >>>>> behavior >>>>>> that I recall from earlier runs and seems quite unlikely to be correct. >>>>>> >>>>>> >>>>> >> >> > > -------------------------------------------- > Grant Ingersoll > Join the LUCENE REVOLUTION > Lucene & Solr User Conference > May 25-26, San Francisco > www.lucenerevolution.org > -------------------------------------------- Grant Ingersoll Join the LUCENE REVOLUTION Lucene & Solr User Conference May 25-26, San Francisco www.lucenerevolution.org
