Hi,
do you optimize the index?
Do you tried to implement a own hit collector?
Stefan
Am 25.01.2005 um 01:01 schrieb Peter Hollas:
I am working on a public accessible Struts based species database
project where the number of species names is currently at 2.3 million,
and in the near future will be somewhere nearer 4 million (probably
the largest there is). The species names are typically 1 to 7 words in
length, and the broad requirement is to be able to do a fulltext
search across them. It is also necessary to sort the results into
alphabetical order by species name.
Currently we can issue a simple search query and expect a response
back in about 0.2 seconds (~3,000 results) with the Lucene index that
we have built. Lucene gives a much more predictable and faster average
query time than using standard fulltext indexing with mySQL. This
however returns result in score order, and not alphabetically.
To sort the resultset into alphabetical order, we added the species
names as a seperate keyword field, and sorted using it whilst
querying. This solution works fine, but is unacceptable since a query
that returns thousands of results can take upwards of 30 seconds to
sort them.
My question is whether it is possible to somehow return the names in
alphabetical order without using a String SortField. My last resort
will be to perform a monthly index rebuild, and return results by
index order (about a day to re-index!). But ideally there might be a
way to modify the Lucene API to incorporate a scoring system in a way
that scores by lexical order.
Any ideas are appreciated!
Many thanks, Peter.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]