RE: Search performance with one index vs. many indexes

2005-02-28 Thread Runde, Kevin
Follow Up to the article from Friday 

-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 28, 2005 1:30 AM
To: Lucene Users List
Subject: Re: Search performance with one index vs. many indexes

Jochen Franke writes:
 Topic: Search performance with large numbers of indexes vs. one large
index
 
 
 My questions are:
 
 - Is the size of the wordlist the problem?
 - Would we be a lot faster, when we have a smaller number
 of files per index?

sure. 
Look:
Index lookup of a word is O(ln(n)) where n is the number of words.
Index lookup of a word in k indexes having m words is O( k ln(m) )
In the best case all word lists are distict (purely theoretical), 
that is n = k*m or m = n/k
For n = 15 Mio, k = 800
ln(n) = 16.5
k*ln(n/k) = 7871
In a realistic case, m is much bigger since word lists won't be
distinct.
But it's the linear factor k that bites you.
In the worst case (all words in all indices) you have
k*ln(n) = 13218.8

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Search performance with one index vs. many indexes

2005-02-28 Thread Runde, Kevin
Hi All,

Sorry about that please disregard that last email. I must not be fully
awake yet.

Sorry,
Kevin Runde 

-Original Message-
From: Runde, Kevin [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 28, 2005 7:34 AM
To: Lucene Users List
Subject: RE: Search performance with one index vs. many indexes

Follow Up to the article from Friday 

-Original Message-
From: Morus Walter [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 28, 2005 1:30 AM
To: Lucene Users List
Subject: Re: Search performance with one index vs. many indexes

Jochen Franke writes:
 Topic: Search performance with large numbers of indexes vs. one large
index
 
 
 My questions are:
 
 - Is the size of the wordlist the problem?
 - Would we be a lot faster, when we have a smaller number
 of files per index?

sure. 
Look:
Index lookup of a word is O(ln(n)) where n is the number of words.
Index lookup of a word in k indexes having m words is O( k ln(m) )
In the best case all word lists are distict (purely theoretical), 
that is n = k*m or m = n/k
For n = 15 Mio, k = 800
ln(n) = 16.5
k*ln(n/k) = 7871
In a realistic case, m is much bigger since word lists won't be
distinct.
But it's the linear factor k that bites you.
In the worst case (all words in all indices) you have
k*ln(n) = 13218.8

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Query Tuning

2005-02-21 Thread Runde, Kevin
Hi All,

How does Lucene handle multi term queries? Does it use short circuiting?
So if a user entered:
(a OR b) AND c
But my program knew testing for c is cheaper than testing for (a OR
b) and I rewrote the query as:
c AND (a OR b)
Would the query run faster?

Sorry if this has already be answered, but for some reason the Archive
search is not working for me today.

Thanks,
Kevin