[ 
https://issues.apache.org/jira/browse/LUCENE-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698662#comment-14698662
 ] 

Uwe Schindler commented on LUCENE-6742:
---------------------------------------

I agree. I just pointed the documentation in my latest patch to the revision 
502 which has now a differenbt GIT ID: 
[https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b]
This lists the following as commit message:
{{git-svn-id: svn+userv://snowball.tartarus.org/snowball/trunk/snowball@502 
633ccae0-01f4-0310-8c99-d3591da6f01f}}
Clearly referring to rev 502. Theoretically it should be possible to use this 
exact github variant (click on "Download ZIP" on the right-bottom) for 
regenerate: 
[https://github.com/snowballstem/snowball/archive/e103b5c257383ee94a96e7fc58cab3c567bf079b.zip]

I will open a separate issue to donate our changes back to Snowball (so we have 
access to inner char[]),... and also fix the C code to generate the 
{{MethodHandles.lookup()}} that I patched in. Maybe the snowball people also 
want the much more safe Java 8 method reference implementation, but this is all 
separate.

We should also investigate later if the linear scan through all amongs is 
really the best idea, maybe we can improve that, too - but also at snowball's 
repo.

Finally: The Java 8 patch is just for reference how it could look like - but 
there is too many patching of generated files in it. So I would stay with the 
MethodHandles variant thats bugfree and fast, it just uses still the "string 
names" of the private methods, but works perfectly. I just attached the patch 
here to "save my work" and show you, [~rcmuir], how the compiled result could 
look like.

> Fix SnowballFilter for Lovins & Finnish (and others that use reflection)
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-6742
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6742
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: Trunk, 5.4
>
>         Attachments: LUCENE-6742-Java8.patch, LUCENE-6742.patch, 
> LUCENE-6742.patch, LUCENE-6742.patch
>
>
> While reviewing the warnings (LUCENE-6740) I noticed the following:
> The LovinsStemmer and the FinnishStemmer from the snowball package cannot 
> work at all. Indeed Robert Muir added a comment to the test referring to: 
> http://article.gmane.org/gmane.comp.search.snowball/1139
> The bug is the following: The Among class looks up the method to call via 
> reflection. But the stemmer compiler creates all these methods as private. As 
>  AccessibleObject#setAccessible() is called nowhere (we also don't want 
> this), the SnowballProgram main class cannot call the method. Unfortunately 
> it completely supresses all reflection errors and returns false. 
> SnowballProgram then thinks, the called {{private boolean r_A()}} method 
> returned false and so the whole snowball algorithm breaks.
> Also the snowball stemmer classes had a bug: methodObject is a static 
> instance of the Stemmer, and Among calls the private method then on wrong 
> object. So when fixing the above, this would fail again (because the stemmer 
> changes a static singleton object, not itsself!). We also need to change the 
> object in SnowballProgram to use "this" when calling the method.
> There are several possibilities to solve the private methods problem:
> - call {{AccessibleObject.setAccessible(true)}} -> we don't want this! Never 
> Ever! :-)
> - patch the whole Snowball-generated classes to make those methods "public". 
> This is a huge patch and shows many internals => no way
> - Since Java 7 we can use a cool trick, my favourite: MethodHandles
> The MethodHandles trick can be done the following way:
> - Change the Among class to get a {{MethodHandles.Lookup}} in its ctor 
> (instead of the broken static {{SnowballProgram}} instance {{methodObject}}). 
> This lookup is used to lookup the method by name. This Lookup Object must be 
> private to our implementing class (the one that defines the private methods). 
> If this is the case, we can get a method handle and call the method (from 
> anywhere, because we "own" it).
> - replace the (already broken) {{methodObject}} in every stemmer to instead 
> be the correct {{MethodHandles.Lookup}} of the implementing class containing 
> the private method. This has 2 effects: The class to get the lookup gets all 
> access rights to the class calling it (so we can access private methods) and 
> also sees all methods.
> The second part is done by a new Ant task: {{ant patch-snowball}}. Whenever 
> you add a new Snowball stemmer, copy the java file generated by the snowball 
> compiler to the ext directory and call the ant task. It will patch the 
> {{methodObject}} declaration (and also add a {{@SuppressWarnings("unused")}} 
> for convenience).
> I reenabled the dictionary tests and the whole stemmer works. There are also 
> no issues with security managers, because we do nothing security sensitive 
> (all is our own stuff, no setAccessible).
> And finally: LovinsStemmer and FinnishStemmer got insanely fast (and correct, 
> of course).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to