On Jul 7, 2005, at 1:12 PM, MariLuz Elola wrote:
Hi Erik, excuse me for all my questions. Thank you very much for
your speedy answers, and sorry for my bad english.
I am spanish and I don“t speak english very well.
Well, I have one question more.
Finally I am using IndexReader to return all the documents:
Directory directory = FSDirectory.getDirectory(path,
false);
IndexReader reader = IndexReader.open(directory);
for (int start = base; start < end; start++) {
Document doc = reader.document(start);
String id=doc.get
(es.seinet.xtent.searchEngine.lucene.general.Util.ID);
ides.add(id);
}
It works fine and speedy. The only problem is that it is impossible
to sort the results by some metadata (gets all the documents order
by title, for example).
If you truly need to have a Query that can find all documents, then
add a special field to each document with a fixed value such as
doc:yes and then do a TermQuery for doc:yes. You could then leverage
Lucene's sorting capability.
My question is about the parameter maxClauseCount. I think the same
that you. It is not a good idea bump up the limit...
If I use the default vale (1024) and I search, I am getting this
error:
[SearchCollection,executeQuery] caught a class
org.apache.lucene.search.BooleanQuery$TooManyClauses
with message: null
Are there any way to search all the documents (210.000 documents)
and internally works only with 1024, returns documents until 1024
and not get the toomanyclauses error??? I need to work efficiently
with collections of more than 250.000 regitries, and the users
normally does complex querys (ej: DATE:[20050601 to 20050701] AND
TITLE:Lucene* ...... ect....)
The issue is that PrefixQuery, WildcardQuery, RangeQuery, and
FuzzyQuery all expand to the terms that match in a BooleanQuery OR
fashion. You need to identify what terms those are and address them
individually. I can't offer specific advice since I don't know what
fields you're using and what values they may contain. But one
example is with dates. If you index dates and do it at the
millisecond granularity but you really only need to query by YEAR
then there is a great chance one of those query types will expand to
TooManyClauses. If, instead, you indexed dates by YYYY when all you
need is year granularity then you have far fewer terms. I hope this
makes sense and helps.
Erik