Hi, I opened a rather detailed JIRA ticket and submitted patch regarding this issue already: https://issues.apache.org/jira/browse/MAHOUT-675
The short of it is that the LuceneIterator throws an IllegalStateException when a null term vector is encountered in the computeNext method. That is problematic as it pushes the responsibility of checking if a document field has no legitimate terms onto the person creating and maintaining the Lucene index. That might sound good in principle however, it isn't exactly intuitive to do that when creating or maintaining a Lucene index. The reason this is even an issue is that when you start creating custom Lucene analyzers, a pretty important practice if you want to improve your text mining results, it is possible that you will end up filtering out all terms in the target field for some documents; that is actually a desirable result as it indicates that those document is noise. Thus, when you attempt to dump the vectors of that index, the noise documents cause an IllegalStateException and it does not indicate that the issue was due the the custom analyzer. I believe, at least in my situation, a better approach is for the LuceneIterator to log a warning with the idField when it encounters a problem document and move onto the next one. Thanks, Chris
