[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-07 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-342:
-

Attachment: SOLR-342.patch

Update of patch to account for the fact that mergeFactor is only for Log based 
merges.  I left it as the mergeFactor tag, but put in an instanceof clause in 
the init method of the SolrIndexWriter to check to see if the mergeFactor is 
settable.

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-01 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-342:
-

Attachment: SOLR-342.patch

Updated to work against trunk.

As always, let me know if there is anything I can do to help get this committed.

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2007-12-03 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-342:
-

Attachment: copyLucene.sh

Dumb little script to copy over the required Lucene jars from a built Lucene 
directory.  Takes in two parameters, the location of Lucene Home and the 
version to copy over.  Requires Lucene to be built.

Belongs in the lib directory.

For example,
./copyLucene.sh path to Lucene 2.3-dev

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2007-10-26 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-342:
-

Attachment: SOLR-342.patch

Changes:

1. Updated changes.txt with recommendations on settings.

2. Changed SolrIndexWriter from last patch to allow for setting both 
maxBufferedDocs and ramBufferSizeMB.

3. Updated the various sample solrconfig.xml to have a default of 32 MB for 
ramBufferSizeMB.  Commented out maxBufferedDocs, but did not deprecate it.

4. Added a note to the various solrconfig.xml explaining what Lucene does if 
BOTH ramBufferSizeMB and maxBufferedDocs is set.

The Lucene libraries are bundled with the previous patch, but are still needed.

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2007-10-25 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-342:
-

Attachment: SOLR-342.tar.gz

First crack at implementing this.  All tests pass on OS X except SolrJ's 
SolrExceptionTest, but for some reason that is failing on a clean version, too, 
so I am convinced it is not due to anything I did.

My personal benchmarking of just the Lucene side of things (see indexing.alg in 
Lucene contrib/benchmark) show pretty significant performance gains.  This is 
also anecdotally confirmed by my basic testing in Solr.

I set the default to be 16MB, per Mike McCandless defaults in Lucene, but this 
is probably too low given the server nature of Solr where a lot more memory is 
likely to be available.

There are 4 new configuration possibilities:
ramBufferSizeMB -  When set, maxBufferedDocs is set to DISABLE_AUTO_FLUSH.  
Default is the maxBufferedDocs way, but this could be changed to be the other 
way around (and probably should be)
mergePolicy - Set the MergePolicy, default is the new Lucene 
LogByteSizeMergePolicy.  Old Lucene policy is LogDocMergePolicy.  
LogByteSizeMergePolicy by default.
mergeScheduler - Set the way merges are performed.  New way is 
ConcurrentMergeScheduler which runs the merges in separate background threads.  
Old way was SerialMergeScheduler. Concurrent by default.
luceneAutoCommit  - Specify whether Lucene IndexWriter should autoCommit 
flushes.  false is the best for performance.  Still need to develop 
recommendations for when to change this.  Named it this way to avoid confusion 
with Solr's version.  false by default.

Patch is inside the tar file, as well as a bundling of the Lucene jars (not 
technically the latest, but only a couple days old)

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2007-10-22 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-342:
-

Description: 
LUCENE-843 adds support for new indexing capabilities using the 
setRAMBufferSizeMB() method that should significantly speed up indexing for 
many applications.  To fix this, we will need trunk version of Lucene (or wait 
for the next official release of Lucene)

Side effect of this is that Lucene's new, faster StandardTokenizer will also be 
incorporated.  

Also need to think about how we want to incorporate the new merge scheduling 
functionality (new default in Lucene is to do merges in a background thread)

  was:
LUCENE-843 adds support for new indexing capabilities using the 
setRAMBufferSizeMB() method that should significantly speed up indexing for 
many applications.  To fix this, we will need trunk version of Lucene (or wait 
for the next official release of Lucene)

Side effect of this is that Lucene's new, faster StandardTokenizer will also be 
incorporated.

Summary: Add support for Lucene's new Indexing and merge features 
(excluding Document/Field/Token reuse)  (was: Add support for Lucene's new 
setRAMBufferSizeMB() method in IndexWriter)

Updated to cover the broader scope of changes that effect upgrading to Lucene 
trunk.

Plan to implement:
Add ramBufferSizeMB tag to specify the number of megabytes to give Lucene.  
Setting this value will override all other related settings (maxBufferedDocs, 
etc.) related to IndexWriter configuration

Add mergeScheduler tag that can have two values:  concurrent or serial.   Or 
would it be better to take in a classname?  Doing the latter would mean we 
would have to have a no-arg constructor, right?

Add mergePolicy tag that defines the merge policy that can have two values: 
byteSize or docCount.  Or would it be better to take a classname? 

NOTE: I am not proposing to handle the new reusable Document/Field/Token 
mechanism in Lucene, which should also be considered.




 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor

 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.