The LuceneIterator has a built-in circuit breaker if it gets too many errors.  
If  you are using lucene.vector, you can pass in --maxPercentErrorDocs X, where 
X is some percentage of docs you are willing to allow errors in.  The default 
is no errors.


On Sep 18, 2011, at 10:48 AM, Philippe Adjiman wrote:

> Hi,
> 
> I was trying to generate vectors from a lucene index using the lucene.vector
> driver, it worked fine using mahout 0.4 but in mahout 0.5 i get the
> following exception:
> 
> SEVERE: There are too many documents that do not have a term vector for
> description
> Exception in thread "main" java.lang.IllegalStateException: There are too
> many documents that do not have a term vector for description
> at
> org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:114)
> at
> org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:41)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
> at
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43)
> at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:206)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> 
> My lucene index was created using:
> 
> 
> doc.add(new Field("documentId", documentId, Field.Store.YES,
> Field.Index.NOT_ANALYZED));
> doc.add(new Field("content", content, Field.Store.YES,
> Field.Index.ANALYZED,TermVector.YES));
> 
> 
> If it is a know issue, sorry for the duplicate, else let me know if i can
> help in order to reproduce.
> 
> 
> -Philippe
> 
> 
> -- 
> Philippe Adjiman | twitter: padjiman | linkedin:
> il.linkedin.com/in/philippeadjiman | blog: http://philippeadjiman.com/blog

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com
Lucene Eurocon 2011: http://www.lucene-eurocon.com

Reply via email to