RE: Lucene Speed under diff JVMs

2002-12-06 Thread Jonathan Reichhold
It doesn't surprise me that the IBM JDK is faster indexing.  This JVM is
better optimized in this case from my experience.

I did some serious load testing with various JVM implementation from Sun
and IBM and found that the opposite when it came to searching.  I.e.
Lucene searches were fastest under Sun 1.4.1.  This JVM was consequently
able to handle a higher load (faster response increases queries/second).
IBM was drastically slower at handling queries.  I've never tried
Jrocket since I don't like the cost.

The index for my tests had 7million records and 6 major fields.  Queries
were randomly chosen from a list of 2 million real user queries.  The
query load was meant to simulate real loads from a production site.
This was all accomplished on a single 1U, Redhat Linux 7.2, 2-processor
box with 1 GB of RAM.  Query times were very good compared to previous
indexing methods.

Jonathan

-Original Message-
From: Armbrust, Daniel C. [mailto:[EMAIL PROTECTED]] 
Sent: Thursday, December 05, 2002 2:47 PM
To: 'Lucene Users List'
Subject: Lucene Speed under diff JVMs


This may be of use to people who want to make lucene index faster.
Also, I'm curious as to what JVM most people run Lucene under, and if
anyone else has seen results like this:

I'm using the class that Otis wrote (see message from about 3 weeks ago)
for testing the scalability of lucene (more results on that later) and I
first tried running it under different versions of Java, to see where it
runs the fastest.  The class simply creates an index out of randomly
generated documents. 

All of the following were running on a dual CPU 1 GHz PIII Windows 2000
machine that wasn't doing much else during the benchmark.  The indexing
program was single threaded, so it only used one of the processors of
the machine.

java version "1.3.1_04"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1_04-b02)
Java HotSpot(TM) Client VM (build 1.3.1_04-b02, mixed mode)

42 seconds/1000 documents

java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21) Java
HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)

42 seconds/1000 documents

Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1_01) BEA
WebLogic JRockit(R) Virtual Machine (build
8.0_Beta-1.4.1_01-win32-CROSIS-20021105-1617, Native Threads,
Generational Concurrent Garbage Collector)

35 seconds/1000 documents

java version "1.3.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.3.1) Classic
VM (build 1.3.1, J2RE 1.3.1 IBM Windows 32 build cn131-20020403 (JIT
enabled: jitc))

27 seconds/1000 documents


As you can see, the IBM jvm pretty much smoked Suns.  And beat out
JRockit as well.  Just a hunch, but it wouldn't surprise me if search
times were also faster under the IBM jdk.  Has anyone else come to this
conclusion?


Dan

--
To unsubscribe, e-mail:

For additional commands, e-mail:




--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: Lucene Speed under diff JVMs

2002-12-06 Thread Armbrust, Daniel C.
Class that was used (attached)

And correction, the UnStored field had 1000 words, not 500.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]] 
Sent: Friday, December 06, 2002 10:57 AM
To: Lucene Users List
Subject: RE: Lucene Speed under diff JVMs


Otis doesn't mind.

-

One more bit of info that I should have included:

The randomly generated documents consisted of 2 fields, one Text with 3 words, and one 
UnStored with 500 words.  Average word length was 7 characters.

If Otis (he wrote it, I just made a tweak or two) doesn't mind, I'll post the source 
code.

Dan




Words2Index.java
Description: Binary data
--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>


RE: Lucene Speed under diff JVMs

2002-12-06 Thread Otis Gospodnetic
Otis doesn't mind.

--- "Armbrust, Daniel C." <[EMAIL PROTECTED]> wrote:
> One more bit of info that I should have included:
> 
> The randomly generated documents consisted of 2 fields, one Text with
> 3 words, and one UnStored with 500 words.  Average word length was 7
> characters.
> 
> If Otis (he wrote it, I just made a tweak or two) doesn't mind, I'll
> post the source code.
> 
> Dan
> 
> 
> --
> To unsubscribe, e-mail:  
> 
> For additional commands, e-mail:
> 
> 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: Lucene Speed under diff JVMs

2002-12-06 Thread Armbrust, Daniel C.
One more bit of info that I should have included:

The randomly generated documents consisted of 2 fields, one Text with 3 words, and one 
UnStored with 500 words.  Average word length was 7 characters.

If Otis (he wrote it, I just made a tweak or two) doesn't mind, I'll post the source 
code.

Dan


--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: Lucene Speed under diff JVMs

2002-12-06 Thread Armbrust, Daniel C.
To clarify (which means adding the info I should have put in it the first time but 
missed), the run was of 40,000 documents.  The number was an average.

Each run was done twice (and the results were identical).

And the machine was a dual processor machine, so most OS tasks ran on the idle 
processor, while the indexing process gobbeled up the other one.

And I'm definitely not trying to say one JVM is better than another, but for this task 
of creating a lucene index, there is a very noticeable speed difference.  I was really 
just curious if anyone else had done any tests similar to this.

Dan




> As you can see, the IBM jvm pretty much smoked Suns.  And beat out
> JRockit as well.  Just a hunch, but it wouldn't surprise me if search
> times were also faster under the IBM jdk.  Has anyone else come to this
> conclusion?

Just a brief note on performance measurements and statistical sampling: no
offense, but if these are measurements of a single trial of 1000 documents
for each JVM, they're not so different that I'd be willing to conclude
that one JVM is notably faster for this task than another.  The problem is
compounded by the fact that it can be hard to tell just how much CPU is
being taken up by OS tasks (and this can fluctuate quite a lot).  If you
really want to quote statistics like this, using 5 or 10 trials would give
a more accurate notion of the real performance differences (if any).

Casuistically :),

Joshua O'Madadhain

  [EMAIL PROTECTED] Per Obscuriuswww.ics.uci.edu/~jmadden
   Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for.  -- Bill Watterson
 My opinions are too rational and insightful to be those of any organization.





--
To unsubscribe, e-mail:   
For additional commands, e-mail: 

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene Speed under diff JVMs

2002-12-05 Thread Joshua O'Madadhain
On Thu, 5 Dec 2002, Armbrust, Daniel C. wrote:

> I'm using the class that Otis wrote (see message from about 3 weeks ago)
> for testing the scalability of lucene (more results on that later) and I
> first tried running it under different versions of Java, to see where it
> runs the fastest.  The class simply creates an index out of randomly
> generated documents.
>
> All of the following were running on a dual CPU 1 GHz PIII Windows 2000
> machine that wasn't doing much else during the benchmark.  The indexing
> program was single threaded, so it only used one of the processors of
> the machine.

[snip specific measurements]

> As you can see, the IBM jvm pretty much smoked Suns.  And beat out
> JRockit as well.  Just a hunch, but it wouldn't surprise me if search
> times were also faster under the IBM jdk.  Has anyone else come to this
> conclusion?

Just a brief note on performance measurements and statistical sampling: no
offense, but if these are measurements of a single trial of 1000 documents
for each JVM, they're not so different that I'd be willing to conclude
that one JVM is notably faster for this task than another.  The problem is
compounded by the fact that it can be hard to tell just how much CPU is
being taken up by OS tasks (and this can fluctuate quite a lot).  If you
really want to quote statistics like this, using 5 or 10 trials would give
a more accurate notion of the real performance differences (if any).

Casuistically :),

Joshua O'Madadhain

  [EMAIL PROTECTED] Per Obscuriuswww.ics.uci.edu/~jmadden
   Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for.  -- Bill Watterson
 My opinions are too rational and insightful to be those of any organization.





--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Lucene Speed under diff JVMs

2002-12-05 Thread Leo Galambos
On Thu, 5 Dec 2002, Armbrust, Daniel C. wrote:

> I'm using the class that Otis wrote (see message from about 3 weeks ago)
> for testing the scalability of lucene (more results on that later) and I

May I ask you where one can get the source code? I cannot find it in 
archive. Thank you

-g-



--
To unsubscribe, e-mail:   
For additional commands, e-mail: