RE: Search performance with one index vs. many indexes
Hi All, Sorry about that please disregard that last email. I must not be fully awake yet. Sorry, Kevin Runde -Original Message- From: Runde, Kevin [mailto:[EMAIL PROTECTED] Sent: Monday, February 28, 2005 7:34 AM To: Lucene Users List Subject: RE: Search performance with one index vs. many indexes Follow Up to the article from Friday -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Monday, February 28, 2005 1:30 AM To: Lucene Users List Subject: Re: Search performance with one index vs. many indexes Jochen Franke writes: > Topic: Search performance with large numbers of indexes vs. one large index > > > My questions are: > > - Is the size of the "wordlist" the problem? > - Would we be a lot faster, when we have a smaller number > of files per index? sure. Look: Index lookup of a word is O(ln(n)) where n is the number of words. Index lookup of a word in k indexes having m words is O( k ln(m) ) In the best case all word lists are distict (purely theoretical), that is n = k*m or m = n/k For n = 15 Mio, k = 800 ln(n) = 16.5 k*ln(n/k) = 7871 In a realistic case, m is much bigger since word lists won't be distinct. But it's the linear factor k that bites you. In the worst case (all words in all indices) you have k*ln(n) = 13218.8 HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search performance with one index vs. many indexes
Follow Up to the article from Friday -Original Message- From: Morus Walter [mailto:[EMAIL PROTECTED] Sent: Monday, February 28, 2005 1:30 AM To: Lucene Users List Subject: Re: Search performance with one index vs. many indexes Jochen Franke writes: > Topic: Search performance with large numbers of indexes vs. one large index > > > My questions are: > > - Is the size of the "wordlist" the problem? > - Would we be a lot faster, when we have a smaller number > of files per index? sure. Look: Index lookup of a word is O(ln(n)) where n is the number of words. Index lookup of a word in k indexes having m words is O( k ln(m) ) In the best case all word lists are distict (purely theoretical), that is n = k*m or m = n/k For n = 15 Mio, k = 800 ln(n) = 16.5 k*ln(n/k) = 7871 In a realistic case, m is much bigger since word lists won't be distinct. But it's the linear factor k that bites you. In the worst case (all words in all indices) you have k*ln(n) = 13218.8 HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search performance with one index vs. many indexes
Jochen Franke writes: > Topic: Search performance with large numbers of indexes vs. one large index > > > My questions are: > > - Is the size of the "wordlist" the problem? > - Would we be a lot faster, when we have a smaller number > of files per index? sure. Look: Index lookup of a word is O(ln(n)) where n is the number of words. Index lookup of a word in k indexes having m words is O( k ln(m) ) In the best case all word lists are distict (purely theoretical), that is n = k*m or m = n/k For n = 15 Mio, k = 800 ln(n) = 16.5 k*ln(n/k) = 7871 In a realistic case, m is much bigger since word lists won't be distinct. But it's the linear factor k that bites you. In the worst case (all words in all indices) you have k*ln(n) = 13218.8 HTH Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]