Welcome to the Java garbage collector. You never know when it will
close a stream.

On Fri, May 20, 2011 at 4:45 PM, Grant Ingersoll <[email protected]> wrote:
> Jeff, I just put up a patch on M-694 that closes the input stream.  Does that 
> fix it for you?  I can't repro. it here (which is weird, b/c my ulimit 
> reports max # of files as 256)
>
> -Grant
>
> On May 20, 2011, at 4:09 PM, Jeff Eastman wrote:
>
>> Don't know what is happening. I rebooted my Linux VM, did a clean mahout 
>> build, zapped bin/work, and got the same result. Will have to debug more 
>> later today...
>>
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:[email protected]]
>> Sent: Friday, May 20, 2011 12:54 PM
>> To: [email protected]
>> Subject: Re: Is LDA Broken?
>>
>> Hmm, that's weird.  I might suggest doing "clean install" first as well as 
>> deleting your examples/bin/work directory.
>>
>> On May 20, 2011, at 3:41 PM, Jeff Eastman wrote:
>>
>>> I uncommented line 39 and am getting the same errors (index error with 
>>> kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & 
>>> MapR). Trying to run locally, I get this curious output. I don't have much 
>>> time today to pursue it (in meetings all day) but will do my best:
>>>
>>> [dev@devbox mahout]$ ./examples/bin/build-reuters.sh
>>> Please select a number to choose the corresponding clustering algorithm
>>> 1. kmeans clustering
>>> 2. lda clustering
>>> Enter your choice : 1
>>> ok. You chose 1 and we'll use kmeans Clustering
>>> Downloading Reuters-21578
>>> % Total    % Received % Xferd  Average Speed   Time    Time     Time  
>>> Current
>>>                                Dload  Upload   Total   Spent    Left  Speed
>>> 100 7959k  100 7959k    0     0  1145k      0  0:00:06  0:00:06 --:--:-- 
>>> 1135k
>>> Extracting...
>>> no HADOOP_HOME set, running locally
>>> May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>> WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on 
>>> classpath, will use command-line arguments only
>>> Deleting all files in ./examples/bin/work/reuters-out
>>> May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 4690 ms
>>> no HADOOP_HOME set, running locally
>>> May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, 
>>> --endPhase=2147483647, 
>>> --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, 
>>> --input=./examples/bin/work/reuters-out/, --keyPrefix=, 
>>> --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, 
>>> --tempDir=temp}
>>> Exception in thread "main" java.lang.IllegalStateException: 
>>> java.io.FileNotFoundException: 
>>> /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt
>>>  (Too many open files)
>>>       at 
>>> org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79)
>>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724)
>>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
>>>       at 
>>> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76)
>>>       at 
>>> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
>>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>       at 
>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>       at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>       at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>       at 
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>>> C
>>>
>>> -----Original Message-----
>>> From: Grant Ingersoll [mailto:[email protected]]
>>> Sent: Friday, May 20, 2011 11:17 AM
>>> To: [email protected]
>>> Subject: Re: Is LDA Broken?
>>>
>>> yeah, sorry.  I commented out line 39: cd examples/bin
>>>
>>> On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:
>>>
>>>> It does seem these two symptoms are of the same problem. I applied the 
>>>> patch; however, and now neither option runs. It appears the cd is off but 
>>>> I can't see where.
>>>>
>>>> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh
>>>> Please select a number to choose the corresponding clustering algorithm
>>>> 1. kmeans clustering
>>>> 2. lda clustering
>>>> Enter your choice : 1
>>>> ok. You chose 1 and we'll use kmeans Clustering
>>>> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or 
>>>> directory
>>>> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or 
>>>> directory
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Grant Ingersoll [mailto:[email protected]]
>>>> Sent: Friday, May 20, 2011 10:50 AM
>>>> To: [email protected]
>>>> Subject: Re: Is LDA Broken?
>>>>
>>>> Likely so, see MAHOUT-694.
>>>>
>>>>
>>>> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
>>>>
>>>>> Oh sorry these are the same issue? Great!
>>>>> On May 20, 2011 5:44 PM, "Jake Mannix" <[email protected]> wrote:
>>>>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>>>>>
>>>>>> -jake
>>>>>>
>>>>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <[email protected]> wrote:
>>>>>>
>>>>>>> I think we definitely need to figure out whether it's a bug or some 
>>>>>>> other
>>>>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>>>>> thing
>>>>>>> that needs a fix ASAP in which case write up all you know and everyone
>>>>> will
>>>>>>> pile in to look at it.
>>>>>>>
>>>>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <[email protected]> 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>>>>> broken to me.
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Jeff Eastman [mailto:[email protected]]
>>>>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>>>>> To: [email protected]
>>>>>>>> Subject: Is LDA Broken?
>>>>>>>>
>>>>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>>>>> (20)
>>>>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>>>>> behavior
>>>>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>>
>>>
>>> --------------------------------------------
>>> Grant Ingersoll
>>> Join the LUCENE REVOLUTION
>>> Lucene & Solr User Conference
>>> May 25-26, San Francisco
>>> www.lucenerevolution.org
>>>
>>
>> --------------------------------------------
>> Grant Ingersoll
>> Join the LUCENE REVOLUTION
>> Lucene & Solr User Conference
>> May 25-26, San Francisco
>> www.lucenerevolution.org
>>
>
> --------------------------
> Grant Ingersoll
> Lucene Revolution -- Lucene and Solr User Conference
> May 25-26 in San Francisco
> www.lucenerevolution.org
>
>

Reply via email to