Lucene Performance

Thibaut Britz Fri, 18 Jan 2008 08:53:01 -0800

Hi,

We have an index of about 9 gigabytes here at work, where a few queries take
a very long time to succeed.


What I noticed is, that we have a large of number of multiple value fields
(50). How does lucene scale with queries going over a large amount of
fields?
Is it better to use a keyword for each possible value in each field and
append those to the remaining field (like if I have a protocol field and one
field having the url, I would add _HTTP_ or _FTP_ to the url field and omit
the protocol field).
A better solution would probably to divide the index in multiple indexes to
of disjunct entries (in this case, divide the index in an index with all the
http urls, and one index with all the ftp urls, because I know that I never
need both protocol types together)

Another thing I noticed is that we append a lot of queries, so we have a lot
of duplicate phrases like (A and B or C) and ... and (A and B or C) (more
nested than that). Is lucene doing any internal query optimization (like
karnaugh maps) by removing the last (A and B or C), as it is not needed, or
do I have to do that myself?


Thanks for your help.
Thibaut

-- 
View this message in context: 
http://www.nabble.com/Lucene-Performance-tp14952958p14952958.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Lucene Performance

Reply via email to