[ 
https://issues.apache.org/jira/browse/LUCENE-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-6742:
----------------------------------
    Description: 
While reviewing the warnings I noticed the following:
The LovinsStemmer and the FinishStemmer from the snowball package cannot work 
at all. Indeed Robert Muir added a comment to the test referring to: 
http://article.gmane.org/gmane.comp.search.snowball/1139

The bug is the following: The Among class looks up the method to call via 
reflection. But the stemmer compiler creates all these methods as private. As  
AccessibleObject#setAccessible() is called nowhere (we also don't want this), 
the SnowballProgram main class cannot call the method. Unfortunately it 
completely supresses all reflection errors and returns false. SnowballProgram 
then thinks, the called {{private boolean r_A()}} method returned false and s 
the whole snowball algorithm breaks.

Also the snowball stemmer classes had a bug: methodObject is a static instance 
of the Stemmer, and Among calls the private method then on wrong object. So 
when fixing the above, this would fail again (because the stemmer changes a 
static singleton object, not itsself!). We also need to change the object in 
SnowballProgram to use "this" when calling the method.

There are several possibilities to solve the private methods problem:
- call AccessibleObject.setAccessible(true) -> we don't want this! Never Ever! 
:-)
- patch the whole Snowball-generated classes to make those methods "public". 
This is a huge patch and shows many internals => no way
- Since Java 7 we can use a cool trick, my favourite: MethodHandles

The MethodHandles trick can be done the following way:
- Change the Among class to get a MethodHandles.Lookup in its ctor (instead of 
methodObject). This lookup is used to lookup the method by name. This Lookup 
Object must be private to our implementing class (the one that defines the 
private methods). If this is the case, we can get a method handle and call the 
method (from anywhere, because we "own" it).
- replace the (already broken) {{methodObject}} in every stemmer to instead be 
the correct MethodHandles.Lookup. This has 2 effects: The class to get the 
lookup gets all access rights to the class calling it (so we can access private 
methods) and also sees all methods.

The second part is done by a new Ant task: {{ant patch-snowball}}. Whenever you 
add a new Snowball stemmer, copy the java file generated by the snowball 
compiler to the ext directory and call the ant task. It will patch the 
{{methodObject}} declaration (and also add a {{@SuppressWarnings("unused")}} 
for convenience).

I reenabled the dictionary tests and the whole stemmer works. There are also no 
issues with security managers, because we do nonthing security sensitive (all 
is our own stuff, no setAccessible).

And finally: LovinsStemmer and FinishStemmer got insanely fast (and correct, of 
course).

  was:
While reviewing the warnings I noticed the following:
The LovinsStemmer and the FinishStemmer from the snowball package cannot work 
at all. Indeed Robert Muir added a comment to the test referring to: 
http://article.gmane.org/gmane.comp.search.snowball/1139

The bug is the following: The Among class looks up the method to call via 
reflection. But the stemmer compiler creates all these methods as private. As  
AccessibleObject#setAccessible() is called nowhere (we also donÄt want this), 
the SnowballProgram main class cannot call the method. Unfortunately it 
completely supresses all reflection errors and returns false. SnowballProgram 
then thinks, the called {{private boolean r_A()}} method returned false and s 
the whole snowball algorithm breaks.

Also the snowball stemmer classes had a bug: methodObject is a static instance 
of the Stemmer, and Among calls the private method then on wrong object. So 
when fixing the above, this would fail again (because the stemmer changes a 
static singleton object, not itsself!). We also need to change the object in 
SnowballProgram to use "this" when calling the method.

There are several possibilities to solve the private methods problem:
- call AccessibleObject.setAccessible(true) -> we don't want this! Never Ever! 
:-)
- patch the whole Snowball-generated classes to make those methods "public". 
This is a huge patch and shows many internals => no way
- Since Java 7 we can use a cool trick, my favourite: MethodHandles

The MethodHandles trick can be done the following way:
- Change the Among class to get a MethodHandles.Lookup in its ctor (instead of 
methodObject). This lookup is used to lookup the method by name. This Lookup 
Object must be private to our implementing class (the one that defines the 
private methods). If this is the case, we can get a method handle and call the 
method (from anywhere, because we "own" it).
- replace the (already broken) {{methodObject}} in every stemmer to instead be 
the correct MethodHandles.Lookup. This has 2 effects: The class to get the 
lookup gets all access rights to the class calling it (so we can access private 
methods) and also sees all methods.

The second part is done by a new Ant task: {{ant patch-snowball}}. Whenever you 
add a new Snowball stemmer, copy the java file generated by the snowball 
compiler to the ext directory and call the ant task. It will patch the 
{{methodObject}} declaration (and also add a {{@SuppressWarnings("unused")}} 
for convenience).

I reenabled the dictionary tests and the whole stemmer works. There are also no 
issues with security managers, because we do nonthing security sensitive (all 
is our own stuff, no setAccessible).

And finally: LovinsStemmer and FinishStemmer got insanely fast (and correct, of 
course).


> Fix SnowballFilter for Lovins & Finish (and others that use reflection)
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-6742
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6742
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: Trunk, 5.4
>
>         Attachments: LUCENE-6742.patch
>
>
> While reviewing the warnings I noticed the following:
> The LovinsStemmer and the FinishStemmer from the snowball package cannot work 
> at all. Indeed Robert Muir added a comment to the test referring to: 
> http://article.gmane.org/gmane.comp.search.snowball/1139
> The bug is the following: The Among class looks up the method to call via 
> reflection. But the stemmer compiler creates all these methods as private. As 
>  AccessibleObject#setAccessible() is called nowhere (we also don't want 
> this), the SnowballProgram main class cannot call the method. Unfortunately 
> it completely supresses all reflection errors and returns false. 
> SnowballProgram then thinks, the called {{private boolean r_A()}} method 
> returned false and s the whole snowball algorithm breaks.
> Also the snowball stemmer classes had a bug: methodObject is a static 
> instance of the Stemmer, and Among calls the private method then on wrong 
> object. So when fixing the above, this would fail again (because the stemmer 
> changes a static singleton object, not itsself!). We also need to change the 
> object in SnowballProgram to use "this" when calling the method.
> There are several possibilities to solve the private methods problem:
> - call AccessibleObject.setAccessible(true) -> we don't want this! Never 
> Ever! :-)
> - patch the whole Snowball-generated classes to make those methods "public". 
> This is a huge patch and shows many internals => no way
> - Since Java 7 we can use a cool trick, my favourite: MethodHandles
> The MethodHandles trick can be done the following way:
> - Change the Among class to get a MethodHandles.Lookup in its ctor (instead 
> of methodObject). This lookup is used to lookup the method by name. This 
> Lookup Object must be private to our implementing class (the one that defines 
> the private methods). If this is the case, we can get a method handle and 
> call the method (from anywhere, because we "own" it).
> - replace the (already broken) {{methodObject}} in every stemmer to instead 
> be the correct MethodHandles.Lookup. This has 2 effects: The class to get the 
> lookup gets all access rights to the class calling it (so we can access 
> private methods) and also sees all methods.
> The second part is done by a new Ant task: {{ant patch-snowball}}. Whenever 
> you add a new Snowball stemmer, copy the java file generated by the snowball 
> compiler to the ext directory and call the ant task. It will patch the 
> {{methodObject}} declaration (and also add a {{@SuppressWarnings("unused")}} 
> for convenience).
> I reenabled the dictionary tests and the whole stemmer works. There are also 
> no issues with security managers, because we do nonthing security sensitive 
> (all is our own stuff, no setAccessible).
> And finally: LovinsStemmer and FinishStemmer got insanely fast (and correct, 
> of course).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to