Re: Sorting date stored in milliseconds time

2005-02-27 Thread Morus Walter
Ben writes:
> 
> I store my date in milliseconds, how can I do a sort on it? SortField
> has INT, FLOAT and STRING. Do I need to create a new sort class, to
> sort the long value?
> 
Why do you need that precicion?
Remember: there's a price to pay. The memory required for sorting and
the time to set up the sort cache depends on the number of different terms,
dates in your case.
I can hardly think of an application where seconds are relevant, what do
you need milliseconds for?

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: help with boolean expression

2005-02-27 Thread Morus Walter
Omar Didi writes:
> I have a problem understanding how would lucene iterpret this boolean 
> expression : A AND B OR C .
> it neither return the same count as when I enter (A AND B) OR C nor A AND (B 
> OR C). 
> if anyone knows how it is interpreted i would be thankful.
> thanks

A AND B OR C creates a query that requires A and B. C influcenes the 
score, but is neither sufficient nor required for a match.

IMO query parser is broken for queries mixing AND and OR without explicit
braces.
My favorite sample is `a AND b OR c AND d' which equals `a AND b AND c AND d'
in query parser.

I suggested a patch some time ago, but it's still pending in bugzilla.
http://issues.apache.org/bugzilla/show_bug.cgi?id=25820

Don't know if it's still usable with current sources.

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Search performance with one index vs. many indexes

2005-02-27 Thread Morus Walter
Jochen Franke writes:
> Topic: Search performance with large numbers of indexes vs. one large index
> 
> 
> My questions are:
> 
> - Is the size of the "wordlist" the problem?
> - Would we be a lot faster, when we have a smaller number
> of files per index?

sure. 
Look:
Index lookup of a word is O(ln(n)) where n is the number of words.
Index lookup of a word in k indexes having m words is O( k ln(m) )
In the best case all word lists are distict (purely theoretical), 
that is n = k*m or m = n/k
For n = 15 Mio, k = 800
ln(n) = 16.5
k*ln(n/k) = 7871
In a realistic case, m is much bigger since word lists won't be distinct.
But it's the linear factor k that bites you.
In the worst case (all words in all indices) you have
k*ln(n) = 13218.8

HTH
Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]