[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796872#action_12796872
 ] 

Uwe Schindler edited comment on SOLR-1677 at 1/5/10 10:29 PM:
--------------------------------------------------------------

In my opinion, the default in solrconfig.xml should be possible to set, because 
there is currently no requirement to set a version for all TS components. This 
default is in the shipped solrconfig.xml the version of the shipped lucene 
version. so new users can use the default config and extend it like learned in 
all courses and books about solr. They do not need to care about the version. 

If they upgrade their lucene version, their config keeps stuck on the previous 
seeting and they are fine. If they want to change some of the components (like 
query parser, index writer, index reader -- flex!!!), they can do it locally. 
So Bob could change like Ernest proposed.

If we do not have a default, all users will keep stuck with lucene 2.4, because 
they do not care about version (it is not required, because it defaults to 2.4 
for BW compatibility). So lots of configs will never use the new unicode 
features of Lucene 3.1. And suddenly Lucene 4.0 comes out and all support for 
Lucene < 3 is removed, then all users cry. With a default version set to 2.4, 
they will then get a runtime error in Lucene 4.0, saying that Version.LUCENE_24 
is no longer available as enum constant.

If you really do not want to have a default version in config (not schema, 
because it applies to *all* lucene components), then you should go the way like 
Lucene 3.0: Require a matchVersion for all components. As there may be 
tokenstream components not from lucene, make this attribute in the schema only 
mandatory for lucene-streams (this can be done by my initial patch, too: if the 
matchVersion property is missing then the matchVersion will get NULL and the 
factory should thow IAE if required. In my original patch, only the parsing 
code should be moved out of the factory into a util class in solr. Maybe also 
possible to parse "x.y"-style versions).

The problem here: Users upgrading from solr 1.4 will suddenly get errors, 
because their configs get invalid. Ahh, and because they are stupid they add 
LUCENE_29 (from where should they know that Solr 1.4 used Lucene 2.4 
compatibility?). And then the mailing list gets flooded by questions because 
suddenly the configs fail to produce results with old indexes.

      was (Author: thetaphi):
    In my opinion, the default in solrconfig.xml should be possible to set, 
because there is currently no requirement to set a version for all TS 
components. This default is in the shipped solrconfig.xml the version of the 
shipped lucene version. so new users can use the default config and extend it 
like learned in all courses and books about solr. They do not need to care 
about the version. 

If they upgrade their lucene version, their config keeps stuck on the previous 
seeting and they are fine. If they want to change some of the components (like 
query parser, index writer, index reader -- flex!!!), they can do it locally. 
So Bob could change like Ernest proposed.

If we do not have a default, all users will keep stuck with lucene 2.4, because 
they do not care about version (it is not required, because it defaults to 2.4 
for BW compatibility). So lots of configs will never use the new unicode 
features of Lucene 3.1. And suddenly Lucene 4.0 comes out and all support for 
Lucene < 3 is removed, then all users cry. With a default version set to 2.4, 
they will then get a runtime error in Lucene 4.0, saying that Version.LUCENE_24 
is no longer available as enum constant.

If you really do not want to have a default version in config (not schema, 
because it applies to *all* lucene components), then you should go the way like 
Lucene 3.0: Require a matchVersion for all components. As there may be 
tokenstream components not from lucene, make this attribute in the schema only 
mandatory for lucene-streams (this can be done by my initial patch, too: if the 
matchVersion property is missing then the matchVersion will get NULL and the 
factory should thow IAE if required. In my original patch, only the parsing 
code should be moved out of the factory into a util class in solr. Maybe also 
possible to parse "x.y"-style versions).

The problem here: Users upgrading from solr 1.4 will suddenly get errors, 
because their configs get invalid.
  
> Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
> BaseTokenFilterFactory
> -------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1677
>                 URL: https://issues.apache.org/jira/browse/SOLR-1677
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Uwe Schindler
>         Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
> SOLR-1677.patch
>
>
> Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
> compatibility with old indexes created using older versions of Lucene. The 
> most important example is StandardTokenizer, which changed its behaviour with 
> posIncr and incorrect host token types in 2.4 and also in 2.9.
> In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
> much more Unicode support, almost every Tokenizer/TokenFilter needs this 
> Version parameter. In 2.9, the deprecated old ctors without Version take 
> LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
> This patch adds basic support for the Lucene Version property to the base 
> factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
> 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
> contains a helper map to decode the version strings, but in 3.0 is can be 
> replaced by Version.valueOf(String), as the Version is a subclass of Java5 
> enums. The default value is Version.LUCENE_24 (as this is the default for the 
> no-version ctors in Lucene).
> This patch also removes unneeded conversions to CharArraySet from 
> StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
> to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to