Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1

Michael McCandless Mon, 24 Mar 2008 01:55:32 -0700


Ivan can you describe more about your application?

The overall time for indexing has gotten much faster in 2.3, but thisis assuming things like retrieving a document from its originalsource, filtering it, etc, are minimal. If you have an applicationwhere most of the time is spent outside Lucene then the 2.3 speedupswon't result in a very large speedup for your application.


Mike

Uwe Goetzke wrote:

Hi Ivan,
No, we do not use StandardAnalyser or StandardTokenizer.

Most data is processed by
fTextTokenStream = result = neworg.apache.lucene.analysis.WhitespaceTokenizer(reader);result = new ISOLatin2AccentFilter(result); //ISOLatin1AccentFilter modified that ö -> oe
        result = new org.apache.lucene.analysis.LowerCaseFilter(result);
result = new org.apache.lucene.analysis.NGramStemFilter(result,2); //just a bigram tokenizer
We use our own queryparser. The bigramms are searched with atolerant phrase query, scoring in a doc the greatest bigrammsclusters covering the phrase token.
Best Regards

Uwe

-----Ursprüngliche Nachricht-----
Von: Ivan Vasilev [mailto:[EMAIL PROTECTED]
Gesendet: Freitag, 21. März 2008 16:25
An: java-user@lucene.apache.org
Betreff: Re: feedback: Indexing speed improvement lucene 2.2->2.3.1

Hi Uwe,
Could you tell what Analyzer do you use when you marked so bigindexing
speedup?
If you use StandardAnalyzer (that uses StandardTokenizer) may be the
reason is in it. You can see the pre last report in the thread"IndexingSpeed: 2.3 vs 2.2 (real world numbers)". According to the reporterJakeMannix this is because now StandardTokenizer usesStandardTokenizerImpl
that now is generated by JFlex instead of JavaCC.
I am asking because I noticed a great speedup in adding documents to
index in our system. We have time control on this in the debugmode. NOW
THEY ARE ADDED 5 TIMES FASTER!!!
But in the same time the total process of indexing in our case has
improvement of about 8%. As our system is very big and complex I am
wondering if really the whole process of indexing is reduces so
remarkably and our system causes this slowdown or may be Lucene does
some optimizations on the index, merges or something else and this is
the reason the total process of indexing to be not so reasonablyfaster.
Best Regards,
Ivan



Uwe Goetzke wrote:
This week I switched the lucene library version on one customersystem.The indexing speed went down from 46m32s to 16m20s for thecomplete task
including optimisation. Great Job!
We index product catalogs from several suppliers, in this case around
56.000 product groups and 360.000 products including descriptionswere
indexed.

Regards

Uwe
-----------------------------------------------------------------------
Healy Hudson GmbH - D-55252 Mainz Kastel
Geschaftsfuhrer Christian Konhauser - Amtsgericht Wiesbaden HRB 12076
Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigteEmpfanger sind, durfen Sie die Informationen nicht offen legenoder benutzen. Wenn Sie diese Email durch einen Fehler bekommenhaben, teilen Sie uns dies bitte umgehend mit, indem Sie dieseEmail an den Absender zuruckschicken. Bitte loschen Sie danachdiese Email.This email is confidential. If you are not the intended recipient,you must not disclose or use this information contained in it. Ifyou have received this email in error please tell us immediatelyby return email and delete the document.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


__________ NOD32 2913 (20080301) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-----------------------------------------------------------------------
Healy Hudson GmbH - D-55252 Mainz Kastel
Geschäftsführer Christian Konhäuser - Amtsgericht Wiesbaden HRB 12076
Diese Email ist vertraulich. Wenn Sie nicht der beabsichtigteEmpfänger sind, dürfen Sie die Informationen nicht offen legen oderbenutzen. Wenn Sie diese Email durch einen Fehler bekommen haben,teilen Sie uns dies bitte umgehend mit, indem Sie diese Email anden Absender zurückschicken. Bitte löschen Sie danach diese Email.This email is confidential. If you are not the intended recipient,you must not disclose or use this information contained in it. Ifyou have received this email in error please tell us immediately byreturn email and delete the document.
---------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: AW: feedback: Indexing speed improvement lucene 2.2->2.3.1

Reply via email to