I'm wondering if your date field's precision may be a little too
much? What I mean is that you are going all the way down to
seconds. Whenever you do a range query you are essentially spawning
a BooleanQuery with a representation of that range. Do you really
need to be that precise? I usually stick with YYYYMMDD for search
date fields and it works pretty well. So you know, I have a 13 GB
index with 3 million records and my search time is very low.
Definitely under 1 second.
Just a thought.....
On Feb 27, 2008, at 6:14 AM, Jamie wrote:
Hi Michael & Others
Ok. I've gathered some more statistics from a different machine for
your analysis.
(I had to switch machines because the original one was in
production and my tests were interfering).
Here are the statistics from the new machine:
Total Documents: 1.2 million
Results Returned: 900k
Store Size 238G (size of original documents)
Index Size 1.2G (lucene index size)
Index / Store Ratio 0.5%
The search query is as follows:
archivedate:[d20071229010000 TO d20080228235900]
As you can see, I am using a range query to search between specific
dates.
Question: should this query be moved to a filter rather? I did not
do this as I needed to have the option to sort on date.
There are no other specific filters applied and in this example
sorting is turned off.
On this particular machine the search time varies between 2.64
seconds and about 5 seconds.
The limitations of this machine are that it does uses a normal IDE
drive to house the index, not a SATA drive
IOStat Statistics
Linux 2.6.20-15-server 27/02/2008
avg-cpu: %user %nice %system %iowait %steal %idle
20.25 0.00 3.23 0.34 0.00 76.19
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 7.12 50.67 186.41 38936841 143240688
See attached for hardware info and the CPU call tree (taken from
YourKit).
I would appreciate your recommendations.
Jamie
h t wrote:
Hi Michael,
I guess the hotspot of lucene is
org.apache.lucene.search.IndexSearcher.search()
Hi Jamie,
What's the original text size of a million emails?
I estimate the size of an email is around 100k, is this true?
When you doing search, what kind keywords did you input, words or
short
sentence?
How many results return?
Did you use filter to shrink the results size?
2008/2/27, Michael Stoppelman <[EMAIL PROTECTED]>:
So you're saying searches are taking 10 seconds on a 5G index? If
so that
seems ungodly slow.
If you're on *nix, have you watched your iostat statistics? Maybe
something
is hammering your hds.
Something seems amiss.
What lucene methods were pointed to as hotspots by YourKit?
<stats.zip>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]