[jira] Updated: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-03-15 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-1677:


Attachment: SOLR-1677-lucenetrunk-branch-3.patch
SOLR-1677-lucenetrunk-branch-2.patch

Just for documentation:
Here the patches with improvements to the version support for the Lucene-trunk 
upgrade branch.

- More lenient matchVersion support (V.V)
- Default matchVersion for tests
- Remove code duplication and some additional checks for analysis plugins that 
need version support to enforce the version

 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677-lucenetrunk-branch-2.patch, 
 SOLR-1677-lucenetrunk-branch-3.patch, SOLR-1677-lucenetrunk-branch.patch, 
 SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
 much more Unicode support, almost every Tokenizer/TokenFilter needs this 
 Version parameter. In 2.9, the deprecated old ctors without Version take 
 LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
 This patch adds basic support for the Lucene Version property to the base 
 factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
 contains a helper map to decode the version strings, but in 3.0 is can be 
 replaced by Version.valueOf(String), as the Version is a subclass of Java5 
 enums. The default value is Version.LUCENE_24 (as this is the default for the 
 no-version ctors in Lucene).
 This patch also removes unneeded conversions to CharArraySet from 
 StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
 to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-03-14 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-1677:


Attachment: SOLR-1677-lucenetrunk-branch.patch

This patch was committed to the Lucene-trunk upgrade branch. It is changed to 
not make the factories CoreAware.

 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677-lucenetrunk-branch.patch, SOLR-1677.patch, 
 SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
 much more Unicode support, almost every Tokenizer/TokenFilter needs this 
 Version parameter. In 2.9, the deprecated old ctors without Version take 
 LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
 This patch adds basic support for the Lucene Version property to the base 
 factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
 contains a helper map to decode the version strings, but in 3.0 is can be 
 replaced by Version.valueOf(String), as the Version is a subclass of Java5 
 enums. The default value is Version.LUCENE_24 (as this is the default for the 
 no-version ctors in Lucene).
 This patch also removes unneeded conversions to CharArraySet from 
 StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
 to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2009-12-21 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-1677:


Attachment: SOLR-1677.patch

New patch with some schema and config hacking. Also new test:

- As a first hack the solrConfig schema has a new element luceneMatchVersion 
that contains a solr-wide default luceneMatchVersion value that is used as 
default for QueryParser, Analyzers if not specified different
- On the analyzer side, BaseTokenizerFactory and BaseTokenFilterFactory now 
extend SolrCoreAware (and I also allowed these classes to be SolrCoreAware) and 
get the SolrConfig.
- Both classes now use the default, if not local set as a param (like in the 
last patch), but the default is the one got from SolrConfig
- The parser for config strings was moved to Config
- Other components like QueryParserFactories can get the default matchVersion 
in the same way
- The default is LUCENE_24 as before.

This is a first idea, how it would work. Open points:
- should the default be in SolrConfig or in IndexConfig?
- I did not change the config.xsd file to reflect my change as open discussion
- all other example config files and schemas should use the default Lucene 
version shipped with the solr release (currently 2.9). So user that upgrade get 
their last lucene version their index is compatible with, and new users get the 
latest config.
- If users upgrade the default luceneMatchVersion, they have to possibly 
reindex (esp. when upgrading to LUCENE_31 soon, as new Unicode features in all 
Tokenizers/Filters)

 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
 SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
 much more Unicode support, almost every Tokenizer/TokenFilter needs this 
 Version parameter. In 2.9, the deprecated old ctors without Version take 
 LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
 This patch adds basic support for the Lucene Version property to the base 
 factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
 contains a helper map to decode the version strings, but in 3.0 is can be 
 replaced by Version.valueOf(String), as the Version is a subclass of Java5 
 enums. The default value is Version.LUCENE_24 (as this is the default for the 
 no-version ctors in Lucene).
 This patch also removes unneeded conversions to CharArraySet from 
 StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
 to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2009-12-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-1677:


Attachment: SOLR-1677.patch

Patch.

I did not go through all factories, so maybe more need to be upgraded for 
matchVersion when switching to Lucene 3.0.

 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
 much more Unicode support, almost every Tokenizer/TokenFilter needs this 
 Version parameter. In 2.9, the deprecated old ctors without Version take 
 LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
 This patch adds basic support for the Lucene Version property to the base 
 factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
 contains a helper map to decode the version strings, but in 3.0 is can be 
 replaced by Version.valueOf(String), as the Version is a subclass of Java5 
 enums. The default value is Version.LUCENE_24 (as this is the default for the 
 no-version ctors in Lucene).
 This patch also removes unneeded conversions to CharArraySet from 
 StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
 to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2009-12-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-1677:


Attachment: SOLR-1677.patch

Better patch:
- more dynamic Version map creation
- improved warning message copied from Lucene's Javadocs on 
Version.LUCENE_CURRENT.

 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch, SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
 much more Unicode support, almost every Tokenizer/TokenFilter needs this 
 Version parameter. In 2.9, the deprecated old ctors without Version take 
 LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
 This patch adds basic support for the Lucene Version property to the base 
 factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
 contains a helper map to decode the version strings, but in 3.0 is can be 
 replaced by Version.valueOf(String), as the Version is a subclass of Java5 
 enums. The default value is Version.LUCENE_24 (as this is the default for the 
 no-version ctors in Lucene).
 This patch also removes unneeded conversions to CharArraySet from 
 StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
 to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2009-12-20 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-1677:


Attachment: SOLR-1677.patch

Fix problem in one test, because the english stop word set is unmodifiable, so 
copy it.

 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
 much more Unicode support, almost every Tokenizer/TokenFilter needs this 
 Version parameter. In 2.9, the deprecated old ctors without Version take 
 LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
 This patch adds basic support for the Lucene Version property to the base 
 factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
 contains a helper map to decode the version strings, but in 3.0 is can be 
 replaced by Version.valueOf(String), as the Version is a subclass of Java5 
 enums. The default value is Version.LUCENE_24 (as this is the default for the 
 no-version ctors in Lucene).
 This patch also removes unneeded conversions to CharArraySet from 
 StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
 to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.