Topic: Search performance with large numbers of indexes vs. one large index
Hello,
we are experiencing a performance problem when using large numbers of indexes.
We have an application with about
6 Mio. Documents one index of about 7 GB probably 10 to 15 million different words in that index.
The creation of the index out of one DB (where the documents are coming from) with two processor takes about 20 hours.
For several reasons (e.g. parallelizing the index creation), we created several indexes, by splitting the documents into logical groups.
We first created an artifical benchmark:
10 Mio. Documents 500 Indexes (in about 3 files per index) 10 GB Index alltogether about 5.000 randomly selected words
Querying this index took about 0.4s per query, so it was only twice the time than querying index, which was fine for us.
We did the same with one index merged out of the 500 indexes.
The lucene search performance was fine here as well (about 0.2s per query on our machine).
We then implemented the "real thing" which is:
6 Mio. Documents 800 Indexes (with about 28 files per index) about 7 GB index size probably 10 to 15 million different words in that index.
We now have a query performance of 4-8 seconds per query.
The test with the real data in one index has not been finished so far.
My questions are:
- Is the size of the "wordlist" the problem?
- Would we be a lot faster, when we have a smaller number of files per index?
- Is 500-1000 still a reasonable number of indexes?
- Is there a more or less a linear relationship between the number of indexes and the execution time of the query (as all indexes have to be checked and the results have to be merged)?
- Are there any parameters that could be configured for that usecase?
- Should we implement any specialized classes specific to our use case?
Thanks, Jochen Franke
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]