subject:"\[jira\] Updated\: \(SOLR\-342\) Add support for Lucene's new Indexing and merge features \(excluding Document\/Field\/Token reuse\)"

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-07 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-342:
-

Attachment: SOLR-342.patch

Update of patch to account for the fact that mergeFactor is only for Log based 
merges.  I left it as the mergeFactor tag, but put in an instanceof clause in 
the init method of the SolrIndexWriter to check to see if the mergeFactor is 
settable.

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2008-02-01 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-342:
-

Attachment: SOLR-342.patch

Updated to work against trunk.

As always, let me know if there is anything I can do to help get this committed.

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.patch, 
 SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2007-12-03 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-342:
-

Attachment: copyLucene.sh

Dumb little script to copy over the required Lucene jars from a built Lucene 
directory.  Takes in two parameters, the location of Lucene Home and the 
version to copy over.  Requires Lucene to be built.

Belongs in the lib directory.

For example,
./copyLucene.sh path to Lucene 2.3-dev

 Add support for Lucene's new Indexing and merge features (excluding 
 Document/Field/Token reuse)
 ---

 Key: SOLR-342
 URL: https://issues.apache.org/jira/browse/SOLR-342
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: copyLucene.sh, SOLR-342.patch, SOLR-342.tar.gz


 LUCENE-843 adds support for new indexing capabilities using the 
 setRAMBufferSizeMB() method that should significantly speed up indexing for 
 many applications.  To fix this, we will need trunk version of Lucene (or 
 wait for the next official release of Lucene)
 Side effect of this is that Lucene's new, faster StandardTokenizer will also 
 be incorporated.  
 Also need to think about how we want to incorporate the new merge scheduling 
 functionality (new default in Lucene is to do merges in a background thread)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2007-10-26 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated SOLR-342:
-

Attachment: SOLR-342.patch

Changes:

1. Updated changes.txt with recommendations on settings.

2. Changed SolrIndexWriter from last patch to allow for setting both
maxBufferedDocs and ramBufferSizeMB.

3. Updated the various sample solrconfig.xml to have a default of 32 MB for
ramBufferSizeMB. Commented out maxBufferedDocs, but did not deprecate it.

4. Added a note to the various solrconfig.xml explaining what Lucene does if
BOTH ramBufferSizeMB and maxBufferedDocs is set.

The Lucene libraries are bundled with the previous patch, but are still needed.

Add support for Lucene's new Indexing and merge features (excluding
Document/Field/Token reuse)
---

Key: SOLR-342
URL: https://issues.apache.org/jira/browse/SOLR-342
Project: Solr
Issue Type: Improvement
Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
Attachments: SOLR-342.patch, SOLR-342.tar.gz

LUCENE-843 adds support for new indexing capabilities using the
setRAMBufferSizeMB() method that should significantly speed up indexing for
many applications. To fix this, we will need trunk version of Lucene (or
wait for the next official release of Lucene)
Side effect of this is that Lucene's new, faster StandardTokenizer will also
be incorporated.
Also need to think about how we want to incorporate the new merge scheduling
functionality (new default in Lucene is to do merges in a background thread)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2007-10-25 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated SOLR-342:
-

Attachment: SOLR-342.tar.gz

First crack at implementing this. All tests pass on OS X except SolrJ's
SolrExceptionTest, but for some reason that is failing on a clean version, too,
so I am convinced it is not due to anything I did.

My personal benchmarking of just the Lucene side of things (see indexing.alg in
Lucene contrib/benchmark) show pretty significant performance gains. This is
also anecdotally confirmed by my basic testing in Solr.

I set the default to be 16MB, per Mike McCandless defaults in Lucene, but this
is probably too low given the server nature of Solr where a lot more memory is
likely to be available.

There are 4 new configuration possibilities:
ramBufferSizeMB - When set, maxBufferedDocs is set to DISABLE_AUTO_FLUSH.
Default is the maxBufferedDocs way, but this could be changed to be the other
way around (and probably should be)
mergePolicy - Set the MergePolicy, default is the new Lucene
LogByteSizeMergePolicy. Old Lucene policy is LogDocMergePolicy.
LogByteSizeMergePolicy by default.
mergeScheduler - Set the way merges are performed. New way is
ConcurrentMergeScheduler which runs the merges in separate background threads.
Old way was SerialMergeScheduler. Concurrent by default.
luceneAutoCommit - Specify whether Lucene IndexWriter should autoCommit
flushes. false is the best for performance. Still need to develop
recommendations for when to change this. Named it this way to avoid confusion
with Solr's version. false by default.

Patch is inside the tar file, as well as a bundling of the Lucene jars (not
technically the latest, but only a couple days old)

Add support for Lucene's new Indexing and merge features (excluding
Document/Field/Token reuse)
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

2007-10-22 Thread Grant Ingersoll (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated SOLR-342:
-

Description:
LUCENE-843 adds support for new indexing capabilities using the
setRAMBufferSizeMB() method that should significantly speed up indexing for
many applications. To fix this, we will need trunk version of Lucene (or wait
for the next official release of Lucene)

Side effect of this is that Lucene's new, faster StandardTokenizer will also be
incorporated.

Also need to think about how we want to incorporate the new merge scheduling
functionality (new default in Lucene is to do merges in a background thread)

was:
LUCENE-843 adds support for new indexing capabilities using the
setRAMBufferSizeMB() method that should significantly speed up indexing for
many applications. To fix this, we will need trunk version of Lucene (or wait
for the next official release of Lucene)

Side effect of this is that Lucene's new, faster StandardTokenizer will also be
incorporated.

Summary: Add support for Lucene's new Indexing and merge features
(excluding Document/Field/Token reuse) (was: Add support for Lucene's new
setRAMBufferSizeMB() method in IndexWriter)

Updated to cover the broader scope of changes that effect upgrading to Lucene
trunk.

Plan to implement:
Add ramBufferSizeMB tag to specify the number of megabytes to give Lucene.
Setting this value will override all other related settings (maxBufferedDocs,
etc.) related to IndexWriter configuration

Add mergeScheduler tag that can have two values: concurrent or serial. Or
would it be better to take in a classname? Doing the latter would mean we
would have to have a no-arg constructor, right?

Add mergePolicy tag that defines the merge policy that can have two values:
byteSize or docCount. Or would it be better to take a classname?

NOTE: I am not proposing to handle the new reusable Document/Field/Token
mechanism in Lucene, which should also be considered.

Add support for Lucene's new Indexing and merge features (excluding
Document/Field/Token reuse)
---

Key: SOLR-342
URL: https://issues.apache.org/jira/browse/SOLR-342
Project: Solr
Issue Type: Improvement
Components: update
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

[jira] Updated: (SOLR-342) Add support for Lucene's new Indexing and merge features (excluding Document/Field/Token reuse)

6 matches

Site Navigation

Mail list logo

Footer information