[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

Uwe Schindler (JIRA) Mon, 19 Oct 2009 10:54:24 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Uwe Schindler updated LUCENE-1987:
----------------------------------

    Attachment: LUCENE-1987-StopFilter-backport29.patch
                LUCENE-1987-StopFilter-BW.patch
                LUCENE-1987-StopFilter.patch

Here 2 mega patches and one backport to 2.9 (want to get this in before 2.9.1):

All core tests pass, all bw tests pass. Most contrib tests also pass, but we 
have the following problems and inconsistencies:

- benchmark does not work any longer, because StandardAnalyzer has no default 
ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer
- Highlighter only works, if StandardAnalyzer is in 2.4 mde, in 2.9 mode 
(current) it fails because the position increments of stop words are not 
correctly respected. This fails in addition/combination with the following:
- Very bad inconsistency: The default of QueryParser is to ignore position 
increments, but the current version of StandardAnalyzer uses posIncr for stop 
words -> bäng. We should change the default for QueryParser(+ contrib QP), too. 
There is march rework needed and much documentation. The tests in core now 
pass, as most parts use StandardAnalyzer in 2.9 mode but have no stop words. 
And the special tests explicitely set the posIncr flag. This is totally 
disturbed, it needs fixing! (it also affects 2.9.0, if somebody uses the new 
StandardAnalyzer with LUCENE_CURRENT). 
- XMLQueryParser also fails with latest StandardAnalyzer version, because it 
cannot set the flag in QueryParser. In my opinion, the query parser should take 
the flag from the analyzer, but this is not easy to fix.
- All contrib analyzers have stopWordPosIncr turned off (backwards 
compatibility). Maybe we need a Version Parameter in all analyzers there too!

What to do? After this StopFilter/StandardAnalyzer-hell-day Aspirin and 
Paracetamol and beer is not enough to think clear again...

And please: next time when we deprecate APIs: remove all deprecated calls from 
tests and contrib and mark all deprecated-test as such!

> Remove rest of analysis deprecations (Token, CharacterCache)
> ------------------------------------------------------------
>
>                 Key: LUCENE-1987
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1987
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: Analysis
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.0
>
>         Attachments: LUCENE-1987-StopFilter-backport29.patch, 
> LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
> LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
> LUCENE-1987.patch
>
>
> These removes the rest of the deprecations in the analysis package:
> - -Token's termText field-- (DONE)
> - -eventually un-deprecate ctors of Token taking Strings (they are still 
> useful) -> if yes remove deprec in 2.9.1- (DONE)
> - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
> - Stopwords lists
> - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
> are deprecated, but we still have the VERSION constants. Do not know, how to 
> proceed. Keep the settings alive for index compatibility? Or remove it 
> together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

Reply via email to