RE: Search performance with one index vs. many indexes
Follow Up to the article from Friday -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Monday, February 28, 2005 1:30 AM To: Lucene Users List Subject: Re: Search performance with one index vs. many indexes Jochen Franke writes: Topic: Search performance with large numbers of indexes vs. one large index My questions are: - Is the size of the wordlist the problem? - Would we be a lot faster, when we have a smaller number of files per index? sure. Look: Index lookup of a word is O(ln(n)) where n is the number of words. Index lookup of a word in k indexes having m words is O( k ln(m) ) In the best case all word lists are distict (purely theoretical), that is n = k*m or m = n/k For n = 15 Mio, k = 800 ln(n) = 16.5 k*ln(n/k) = 7871 In a realistic case, m is much bigger since word lists won't be distinct. But it's the linear factor k that bites you. In the worst case (all words in all indices) you have k*ln(n) = 13218.8 HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search performance with one index vs. many indexes
Hi All, Sorry about that please disregard that last email. I must not be fully awake yet. Sorry, Kevin Runde -Original Message- From: Runde, Kevin [mailto:[EMAIL PROTECTED] Sent: Monday, February 28, 2005 7:34 AM To: Lucene Users List Subject: RE: Search performance with one index vs. many indexes Follow Up to the article from Friday -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Monday, February 28, 2005 1:30 AM To: Lucene Users List Subject: Re: Search performance with one index vs. many indexes Jochen Franke writes: Topic: Search performance with large numbers of indexes vs. one large index My questions are: - Is the size of the wordlist the problem? - Would we be a lot faster, when we have a smaller number of files per index? sure. Look: Index lookup of a word is O(ln(n)) where n is the number of words. Index lookup of a word in k indexes having m words is O( k ln(m) ) In the best case all word lists are distict (purely theoretical), that is n = k*m or m = n/k For n = 15 Mio, k = 800 ln(n) = 16.5 k*ln(n/k) = 7871 In a realistic case, m is much bigger since word lists won't be distinct. But it's the linear factor k that bites you. In the worst case (all words in all indices) you have k*ln(n) = 13218.8 HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query Tuning
Hi All, How does Lucene handle multi term queries? Does it use short circuiting? So if a user entered: (a OR b) AND c But my program knew testing for c is cheaper than testing for (a OR b) and I rewrote the query as: c AND (a OR b) Would the query run faster? Sorry if this has already be answered, but for some reason the Archive search is not working for me today. Thanks, Kevin