[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-342: - Attachment: SOLR-342.patch Update of patch to account for the fact that mergeFactor is only for Log based merges. I left it as the mergeFactor tag, but put in an instanceof clause in the init method of the SolrIndexWriter to check to see if the mergeFactor is settable. Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-342: - Attachment: SOLR-342.patch Updated to work against trunk. As always, let me know if there is anything I can do to help get this committed. Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-342: - Attachment: copyLucene.sh Dumb little script to copy over the required Lucene jars from a built Lucene directory. Takes in two parameters, the location of Lucene Home and the version to copy over. Requires Lucene to be built. Belongs in the lib directory. For example, ./copyLucene.sh path to Lucene 2.3-dev Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-342: - Attachment: SOLR-342.patch Changes: 1. Updated changes.txt with recommendations on settings. 2. Changed SolrIndexWriter from last patch to allow for setting both maxBufferedDocs and ramBufferSizeMB. 3. Updated the various sample solrconfig.xml to have a default of 32 MB for ramBufferSizeMB. Commented out maxBufferedDocs, but did not deprecate it. 4. Added a note to the various solrconfig.xml explaining what Lucene does if BOTH ramBufferSizeMB and maxBufferedDocs is set. The Lucene libraries are bundled with the previous patch, but are still needed. Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-342.patch, SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-342: - Attachment: SOLR-342.tar.gz First crack at implementing this. All tests pass on OS X except SolrJ's SolrExceptionTest, but for some reason that is failing on a clean version, too, so I am convinced it is not due to anything I did. My personal benchmarking of just the Lucene side of things (see indexing.alg in Lucene contrib/benchmark) show pretty significant performance gains. This is also anecdotally confirmed by my basic testing in Solr. I set the default to be 16MB, per Mike McCandless defaults in Lucene, but this is probably too low given the server nature of Solr where a lot more memory is likely to be available. There are 4 new configuration possibilities: ramBufferSizeMB - When set, maxBufferedDocs is set to DISABLE_AUTO_FLUSH. Default is the maxBufferedDocs way, but this could be changed to be the other way around (and probably should be) mergePolicy - Set the MergePolicy, default is the new Lucene LogByteSizeMergePolicy. Old Lucene policy is LogDocMergePolicy. LogByteSizeMergePolicy by default. mergeScheduler - Set the way merges are performed. New way is ConcurrentMergeScheduler which runs the merges in separate background threads. Old way was SerialMergeScheduler. Concurrent by default. luceneAutoCommit - Specify whether Lucene IndexWriter should autoCommit flushes. false is the best for performance. Still need to develop recommendations for when to change this. Named it this way to avoid confusion with Solr's version. false by default. Patch is inside the tar file, as well as a bundling of the Lucene jars (not technically the latest, but only a couple days old) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Attachments: SOLR-342.tar.gz LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)
[ https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-342: - Description: LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) was: LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Summary: Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) (was: Add support for Lucene's new setRAMBufferSizeMB() method in IndexWriter) Updated to cover the broader scope of changes that effect upgrading to Lucene trunk. Plan to implement: Add ramBufferSizeMB tag to specify the number of megabytes to give Lucene. Setting this value will override all other related settings (maxBufferedDocs, etc.) related to IndexWriter configuration Add mergeScheduler tag that can have two values: concurrent or serial. Or would it be better to take in a classname? Doing the latter would mean we would have to have a no-arg constructor, right? Add mergePolicy tag that defines the merge policy that can have two values: byteSize or docCount. Or would it be better to take a classname? NOTE: I am not proposing to handle the new reusable Document/Field/Token mechanism in Lucene, which should also be considered. Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse) --- Key: SOLR-342 URL: https://issues.apache.org/jira/browse/SOLR-342 Project: Solr Issue Type: Improvement Components: update Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor LUCENE-843 adds support for new indexing capabilities using the setRAMBufferSizeMB() method that should significantly speed up indexing for many applications. To fix this, we will need trunk version of Lucene (or wait for the next official release of Lucene) Side effect of this is that Lucene's new, faster StandardTokenizer will also be incorporated. Also need to think about how we want to incorporate the new merge scheduling functionality (new default in Lucene is to do merges in a background thread) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.