The LuceneIterator has a built-in circuit breaker if it gets too many errors. If you are using lucene.vector, you can pass in --maxPercentErrorDocs X, where X is some percentage of docs you are willing to allow errors in. The default is no errors.
On Sep 18, 2011, at 10:48 AM, Philippe Adjiman wrote: > Hi, > > I was trying to generate vectors from a lucene index using the lucene.vector > driver, it worked fine using mahout 0.4 but in mahout 0.5 i get the > following exception: > > SEVERE: There are too many documents that do not have a term vector for > description > Exception in thread "main" java.lang.IllegalStateException: There are too > many documents that do not have a term vector for description > at > org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:114) > at > org.apache.mahout.utils.vectors.lucene.LuceneIterator.computeNext(LuceneIterator.java:41) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) > at > org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:43) > at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:206) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) > > My lucene index was created using: > > > doc.add(new Field("documentId", documentId, Field.Store.YES, > Field.Index.NOT_ANALYZED)); > doc.add(new Field("content", content, Field.Store.YES, > Field.Index.ANALYZED,TermVector.YES)); > > > If it is a know issue, sorry for the duplicate, else let me know if i can > help in order to reproduce. > > > -Philippe > > > -- > Philippe Adjiman | twitter: padjiman | linkedin: > il.linkedin.com/in/philippeadjiman | blog: http://philippeadjiman.com/blog -------------------------------------------- Grant Ingersoll http://www.lucidimagination.com Lucene Eurocon 2011: http://www.lucene-eurocon.com