[jira] Closed: (COCOON-2065) huge performance increase of LuceneIndexTransformer on large Lucene indexes

2008-02-07 Thread Alfred Nathaniel (JIRA)

 [ 
https://issues.apache.org/jira/browse/COCOON-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alfred Nathaniel closed COCOON-2065.


Resolution: Fixed

Please check 2.1.12-dev and reopen issue in case there is a problem.
Thanks again for providing the patch.

 huge performance increase of LuceneIndexTransformer on large Lucene indexes
 ---

 Key: COCOON-2065
 URL: https://issues.apache.org/jira/browse/COCOON-2065
 Project: Cocoon
  Issue Type: Improvement
  Components: Blocks: Lucene
Affects Versions: 2.1.6, 2.1.7, 2.1.8, 2.1.9, 2.1.10, 2.1.11, 2.2-dev 
 (Current SVN)
Reporter: Dominique De Munck
Assignee: Alfred Nathaniel
Priority: Minor
 Fix For: 2.1.12-dev (Current SVN), 2.2-dev (Current SVN)

 Attachments: LuceneIndexTransformer.patch


 PROBLEM:
 The LuceneIndexTransformer optimizes the Lucene index every time you add an 
 entry to the index.
 This slows down enormously the indexing with a large index ! If upon every 
 checkin of a document eg,
 you use it to update the entry, it will slow down.
 Eg. I have a Pentium IV 2.4 Ghz, Lucene index contains 10 000 doc.
 Where the index update only takes say 60ms, the optimize that get's called, 
 can take 7 seconds!
 SOLUTION:
 I've created a patch that introduces an option optimize-frequency to 
 determine the frequency of the optimize call.
 It defaults to 1 (current behaviour), when a user sets it to 50, only once 
 every 50 updates the index will be optimized etc
 If no optimization is wanted, you can set it to 0.
 This is compliant to the Lucene documentation (fragment of Lucene FAQ):
 The IndexWriter class supports an optimize() method that compacts the index 
 database and speedup queries. You may want to use this method after 
 performing a complete indexing of your document set or after incremental 
 updates of the index. If your incremental update adds documents frequently, 
 you want to perform the optimization only once in a while to avoid the extra 
 overhead of the optimization.
 PATCH  INFO:
 added configuration option + a function  needToOptimize() which is called 
 before optimizing.
 needToOptimize() uses a random function generator, to keep code simple.
 - when the option is not set, CODE WILL BE EXECUTED AS BEFORE
 - tested one 2.1.11 SVN branch, but no differences in the main trunk thus 
 can be applied there also.
 - Updated API docs
 - if patch accepted, I will also update the Wiki:
 http://wiki.apache.org/cocoon/LuceneIndexTransformer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (COCOON-2065) huge performance increase of LuceneIndexTransformer on large Lucene indexes

2007-07-11 Thread Felix Knecht (JIRA)

 [ 
https://issues.apache.org/jira/browse/COCOON-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Knecht closed COCOON-2065.


Resolution: Fixed

Due to a lack of knowledge I haven't close the bug after fixing the issues in 
the last open active branch.

 huge performance increase of LuceneIndexTransformer on large Lucene indexes
 ---

 Key: COCOON-2065
 URL: https://issues.apache.org/jira/browse/COCOON-2065
 Project: Cocoon
  Issue Type: Improvement
  Components: Blocks: Lucene
Affects Versions: 2.1.6, 2.1.7, 2.1.8, 2.1.9, 2.1.10, 2.1.11-dev (Current 
 SVN), 2.2-dev (Current SVN)
Reporter: Dominique De Munck
Assignee: Felix Knecht
Priority: Minor
 Fix For: 2.1.11-dev (Current SVN), 2.2-dev (Current SVN)

 Attachments: LuceneIndexTransformer.patch


 PROBLEM:
 The LuceneIndexTransformer optimizes the Lucene index every time you add an 
 entry to the index.
 This slows down enormously the indexing with a large index ! If upon every 
 checkin of a document eg,
 you use it to update the entry, it will slow down.
 Eg. I have a Pentium IV 2.4 Ghz, Lucene index contains 10 000 doc.
 Where the index update only takes say 60ms, the optimize that get's called, 
 can take 7 seconds!
 SOLUTION:
 I've created a patch that introduces an option optimize-frequency to 
 determine the frequency of the optimize call.
 It defaults to 1 (current behaviour), when a user sets it to 50, only once 
 every 50 updates the index will be optimized etc
 If no optimization is wanted, you can set it to 0.
 This is compliant to the Lucene documentation (fragment of Lucene FAQ):
 The IndexWriter class supports an optimize() method that compacts the index 
 database and speedup queries. You may want to use this method after 
 performing a complete indexing of your document set or after incremental 
 updates of the index. If your incremental update adds documents frequently, 
 you want to perform the optimization only once in a while to avoid the extra 
 overhead of the optimization.
 PATCH  INFO:
 added configuration option + a function  needToOptimize() which is called 
 before optimizing.
 needToOptimize() uses a random function generator, to keep code simple.
 - when the option is not set, CODE WILL BE EXECUTED AS BEFORE
 - tested one 2.1.11 SVN branch, but no differences in the main trunk thus 
 can be applied there also.
 - Updated API docs
 - if patch accepted, I will also update the Wiki:
 http://wiki.apache.org/cocoon/LuceneIndexTransformer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.