[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17794122#comment-17794122
 ] 

Richard Zowalla edited comment on OPENNLP-1520 at 12/7/23 10:43 AM:
--------------------------------------------------------------------

For finish, this exception can only happen for the following Among elements 
contained in a_6 (as method name is set)
{code:java}
new Among ( "den", 11, -1, "r_VI", methodObject )
new Among ( "seen", 11, -1, "r_LONG", methodObject )
new Among ( "tten", 11, -1, "r_VI", methodObject ){code}
It can be reproduced by

 
{code:java}
@Test
void testFinish()
{ 
  SnowballStemmer stemmer = new SnowballStemmer(ALGORITHM.FINNISH); 
  // https://snowballstem.org/algorithms/finnish/stemmer.html 
  Assertions.assertEquals("edeltän", stemmer.stem("edeltäneeseen"));  //r_LONG()
  
Assertions.assertEquals("no-idea-what-the-right-stem-is-but-will-trigger-r_VI() 
case.", stemmer.stem("voitaisiin")); // r_VI()
}
{code}

*edeltäneeseen* will trigger:

*private boolean opennlp.tools.stemmer.snowball.finnishStemmer.r_LONG()* 

*voitaisiin* will trigger the other case:

*private boolean opennlp.tools.stemmer.snowball.finnishStemmer.r_VI()*
 

Note, the exception is swallowed, so we need to set a break point in 
SnowballProgram line 356.


was (Author: rzo1):
For finish, this exception can only happen for the following Among elements 
contained in a_6 (as method name is set)
{code:java}
new Among ( "den", 11, -1, "r_VI", methodObject )
new Among ( "seen", 11, -1, "r_LONG", methodObject )
new Among ( "tten", 11, -1, "r_VI", methodObject ){code}
It can be reproduced by

 
{code:java}
@Test
void testFinish()
{ 
  SnowballStemmer stemmer = new SnowballStemmer(ALGORITHM.FINNISH); 
  // https://snowballstem.org/algorithms/finnish/stemmer.html 
  Assertions.assertEquals("edeltän", stemmer.stem("edeltäneeseen"));  //r_LONG()
  
Assertions.assertEquals("no-idea-what-the-right-stem-is-but-will-trigger-r_VI() 
case.", stemmer.stem("voitaisiin")); // r_VI()
}
{code}

*edeltäneeseen* will trigger:

*private boolean opennlp.tools.stemmer.snowball.finnishStemmer.r_LONG()* 

*voitaisiin* will trigger the other case:

*private boolean opennlp.tools.stemmer.snowball.finnishStemmer.r_VI()*

 

*Note, the exception is swallowed, so we need to set a break point in 
SnowballProgram line 356.*

> Generated Java code for stemmers is broken, and should be re-generated
> ----------------------------------------------------------------------
>
>                 Key: OPENNLP-1520
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1520
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Stemmer
>    Affects Versions: 2.3.0, 2.3.1
>            Reporter: Jon Marius Venstad
>            Assignee: Martin Wiesner
>            Priority: Major
>             Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to