[ https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler updated LUCENE-1987: ---------------------------------- Attachment: LUCENE-1987-StopFilter-backport29.patch LUCENE-1987-StopFilter-BW.patch LUCENE-1987-StopFilter.patch Here 2 mega patches and one backport to 2.9 (want to get this in before 2.9.1): All core tests pass, all bw tests pass. Most contrib tests also pass, but we have the following problems and inconsistencies: - benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer - Highlighter only works, if StandardAnalyzer is in 2.4 mde, in 2.9 mode (current) it fails because the position increments of stop words are not correctly respected. This fails in addition/combination with the following: - Very bad inconsistency: The default of QueryParser is to ignore position increments, but the current version of StandardAnalyzer uses posIncr for stop words -> bäng. We should change the default for QueryParser(+ contrib QP), too. There is march rework needed and much documentation. The tests in core now pass, as most parts use StandardAnalyzer in 2.9 mode but have no stop words. And the special tests explicitely set the posIncr flag. This is totally disturbed, it needs fixing! (it also affects 2.9.0, if somebody uses the new StandardAnalyzer with LUCENE_CURRENT). - XMLQueryParser also fails with latest StandardAnalyzer version, because it cannot set the flag in QueryParser. In my opinion, the query parser should take the flag from the analyzer, but this is not easy to fix. - All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too! What to do? After this StopFilter/StandardAnalyzer-hell-day Aspirin and Paracetamol and beer is not enough to think clear again... And please: next time when we deprecate APIs: remove all deprecated calls from tests and contrib and mark all deprecated-test as such! > Remove rest of analysis deprecations (Token, CharacterCache) > ------------------------------------------------------------ > > Key: LUCENE-1987 > URL: https://issues.apache.org/jira/browse/LUCENE-1987 > Project: Lucene - Java > Issue Type: Task > Components: Analysis > Reporter: Uwe Schindler > Assignee: Uwe Schindler > Fix For: 3.0 > > Attachments: LUCENE-1987-StopFilter-backport29.patch, > LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, > LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, > LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, > LUCENE-1987.patch > > > These removes the rest of the deprecations in the analysis package: > - -Token's termText field-- (DONE) > - -eventually un-deprecate ctors of Token taking Strings (they are still > useful) -> if yes remove deprec in 2.9.1- (DONE) > - -remove CharacterCache and use Character.valueOf() from Java5- (DONE) > - Stopwords lists > - Remove the backwards settings from analyzers (acronym, posIncr,...). They > are deprecated, but we still have the VERSION constants. Do not know, how to > proceed. Keep the settings alive for index compatibility? Or remove it > together with the version constants (which were undeprecated). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org