Welcome to the Java garbage collector. You never know when it will close a stream.
On Fri, May 20, 2011 at 4:45 PM, Grant Ingersoll <[email protected]> wrote: > Jeff, I just put up a patch on M-694 that closes the input stream. Does that > fix it for you? I can't repro. it here (which is weird, b/c my ulimit > reports max # of files as 256) > > -Grant > > On May 20, 2011, at 4:09 PM, Jeff Eastman wrote: > >> Don't know what is happening. I rebooted my Linux VM, did a clean mahout >> build, zapped bin/work, and got the same result. Will have to debug more >> later today... >> >> -----Original Message----- >> From: Grant Ingersoll [mailto:[email protected]] >> Sent: Friday, May 20, 2011 12:54 PM >> To: [email protected] >> Subject: Re: Is LDA Broken? >> >> Hmm, that's weird. I might suggest doing "clean install" first as well as >> deleting your examples/bin/work directory. >> >> On May 20, 2011, at 3:41 PM, Jeff Eastman wrote: >> >>> I uncommented line 39 and am getting the same errors (index error with >>> kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & >>> MapR). Trying to run locally, I get this curious output. I don't have much >>> time today to pursue it (in meetings all day) but will do my best: >>> >>> [dev@devbox mahout]$ ./examples/bin/build-reuters.sh >>> Please select a number to choose the corresponding clustering algorithm >>> 1. kmeans clustering >>> 2. lda clustering >>> Enter your choice : 1 >>> ok. You chose 1 and we'll use kmeans Clustering >>> Downloading Reuters-21578 >>> % Total % Received % Xferd Average Speed Time Time Time >>> Current >>> Dload Upload Total Spent Left Speed >>> 100 7959k 100 7959k 0 0 1145k 0 0:00:06 0:00:06 --:--:-- >>> 1135k >>> Extracting... >>> no HADOOP_HOME set, running locally >>> May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn >>> WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on >>> classpath, will use command-line arguments only >>> Deleting all files in ./examples/bin/work/reuters-out >>> May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info >>> INFO: Program took 4690 ms >>> no HADOOP_HOME set, running locally >>> May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info >>> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, >>> --endPhase=2147483647, >>> --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, >>> --input=./examples/bin/work/reuters-out/, --keyPrefix=, >>> --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, >>> --tempDir=temp} >>> Exception in thread "main" java.lang.IllegalStateException: >>> java.io.FileNotFoundException: >>> /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt >>> (Too many open files) >>> at >>> org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79) >>> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724) >>> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746) >>> at >>> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76) >>> at >>> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >>> at >>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> at java.lang.reflect.Method.invoke(Method.java:597) >>> at >>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) >>> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) >>> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) >>> C >>> >>> -----Original Message----- >>> From: Grant Ingersoll [mailto:[email protected]] >>> Sent: Friday, May 20, 2011 11:17 AM >>> To: [email protected] >>> Subject: Re: Is LDA Broken? >>> >>> yeah, sorry. I commented out line 39: cd examples/bin >>> >>> On May 20, 2011, at 1:58 PM, Jeff Eastman wrote: >>> >>>> It does seem these two symptoms are of the same problem. I applied the >>>> patch; however, and now neither option runs. It appears the cd is off but >>>> I can't see where. >>>> >>>> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh >>>> Please select a number to choose the corresponding clustering algorithm >>>> 1. kmeans clustering >>>> 2. lda clustering >>>> Enter your choice : 1 >>>> ok. You chose 1 and we'll use kmeans Clustering >>>> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or >>>> directory >>>> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or >>>> directory >>>> >>>> >>>> -----Original Message----- >>>> From: Grant Ingersoll [mailto:[email protected]] >>>> Sent: Friday, May 20, 2011 10:50 AM >>>> To: [email protected] >>>> Subject: Re: Is LDA Broken? >>>> >>>> Likely so, see MAHOUT-694. >>>> >>>> >>>> On May 20, 2011, at 1:39 PM, Sean Owen wrote: >>>> >>>>> Oh sorry these are the same issue? Great! >>>>> On May 20, 2011 5:44 PM, "Jake Mannix" <[email protected]> wrote: >>>>>> Looks like Grant got a fix posted? Has anyone else tried it? >>>>>> >>>>>> -jake >>>>>> >>>>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <[email protected]> wrote: >>>>>> >>>>>>> I think we definitely need to figure out whether it's a bug or some >>>>>>> other >>>>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of >>>>> thing >>>>>>> that needs a fix ASAP in which case write up all you know and everyone >>>>> will >>>>>>> pile in to look at it. >>>>>>> >>>>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Is this an issue that should be fixed before we release? It seems to be >>>>>>>> broken to me. >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: Jeff Eastman [mailto:[email protected]] >>>>>>>> Sent: Thursday, May 19, 2011 5:05 PM >>>>>>>> To: [email protected] >>>>>>>> Subject: Is LDA Broken? >>>>>>>> >>>>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations >>>>> (20) >>>>>>>> without ever producing a non-zero Log Likelihood. This is not the >>>>>>> behavior >>>>>>>> that I recall from earlier runs and seems quite unlikely to be correct. >>>>>>>> >>>>>>>> >>>>>>> >>>> >>>> >>> >>> -------------------------------------------- >>> Grant Ingersoll >>> Join the LUCENE REVOLUTION >>> Lucene & Solr User Conference >>> May 25-26, San Francisco >>> www.lucenerevolution.org >>> >> >> -------------------------------------------- >> Grant Ingersoll >> Join the LUCENE REVOLUTION >> Lucene & Solr User Conference >> May 25-26, San Francisco >> www.lucenerevolution.org >> > > -------------------------- > Grant Ingersoll > Lucene Revolution -- Lucene and Solr User Conference > May 25-26 in San Francisco > www.lucenerevolution.org > >
