Ever since I started using Lucene, I found all answers to all possible questions in the archive.
But I need help about those ones. 1. I am using MoreLikeThis class, and cannot figure out why not all terms are retrieved when using like() to generate queries. I extract the terms from a document using getTermFreqVectors(i) and got about 1160 terms. But when extracting the query using like() on the exact same reader, I got about 760 terms in the query. I set up fieldnames and stopwords correctly, and the following: mlt.setAnalyzer(ANALYZER); // ANALYZER is a snowball analyzer, the same one I've created the index with mlt.setMinDocFreq(0); mlt.setMinTermFreq(0); mlt.setMaxQueryTerms(2000); I was trying to understand the logic behind the order in which the terms appear (when retrieving queries with like(), but it seems so random (relatively to the termFreqVectors, which are strictly sorted). 2. Not related to 1: I need to generate a crawler for my project, and was wondering if there are any suggestions for a convenient API (since LARM is no longer available.) Any advice(s) will be highly appreciated!