[
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1987:
----------------------------------
Attachment: LUCENE-1987-StopFilter-backport29.patch
LUCENE-1987-StopFilter-BW.patch
LUCENE-1987-StopFilter.patch
Here 2 mega patches and one backport to 2.9 (want to get this in before 2.9.1):
All core tests pass, all bw tests pass. Most contrib tests also pass, but we
have the following problems and inconsistencies:
- benchmark does not work any longer, because StandardAnalyzer has no default
ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer
- Highlighter only works, if StandardAnalyzer is in 2.4 mde, in 2.9 mode
(current) it fails because the position increments of stop words are not
correctly respected. This fails in addition/combination with the following:
- Very bad inconsistency: The default of QueryParser is to ignore position
increments, but the current version of StandardAnalyzer uses posIncr for stop
words -> bäng. We should change the default for QueryParser(+ contrib QP), too.
There is march rework needed and much documentation. The tests in core now
pass, as most parts use StandardAnalyzer in 2.9 mode but have no stop words.
And the special tests explicitely set the posIncr flag. This is totally
disturbed, it needs fixing! (it also affects 2.9.0, if somebody uses the new
StandardAnalyzer with LUCENE_CURRENT).
- XMLQueryParser also fails with latest StandardAnalyzer version, because it
cannot set the flag in QueryParser. In my opinion, the query parser should take
the flag from the analyzer, but this is not easy to fix.
- All contrib analyzers have stopWordPosIncr turned off (backwards
compatibility). Maybe we need a Version Parameter in all analyzers there too!
What to do? After this StopFilter/StandardAnalyzer-hell-day Aspirin and
Paracetamol and beer is not enough to think clear again...
And please: next time when we deprecate APIs: remove all deprecated calls from
tests and contrib and mark all deprecated-test as such!
> Remove rest of analysis deprecations (Token, CharacterCache)
> ------------------------------------------------------------
>
> Key: LUCENE-1987
> URL: https://issues.apache.org/jira/browse/LUCENE-1987
> Project: Lucene - Java
> Issue Type: Task
> Components: Analysis
> Reporter: Uwe Schindler
> Assignee: Uwe Schindler
> Fix For: 3.0
>
> Attachments: LUCENE-1987-StopFilter-backport29.patch,
> LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch,
> LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch,
> LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch,
> LUCENE-1987.patch
>
>
> These removes the rest of the deprecations in the analysis package:
> - -Token's termText field-- (DONE)
> - -eventually un-deprecate ctors of Token taking Strings (they are still
> useful) -> if yes remove deprec in 2.9.1- (DONE)
> - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
> - Stopwords lists
> - Remove the backwards settings from analyzers (acronym, posIncr,...). They
> are deprecated, but we still have the VERSION constants. Do not know, how to
> proceed. Keep the settings alive for index compatibility? Or remove it
> together with the version constants (which were undeprecated).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]