[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794553#comment-17794553
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

rzo1 merged PR #561:
URL: https://github.com/apache/opennlp/pull/561




> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794317#comment-17794317
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

jonmv commented on PR #561:
URL: https://github.com/apache/opennlp/pull/561#issuecomment-1845599057

   Nicely done, and much quicker than I'd hoped for! Thanks, on behalf of all 
our (and your) users!




> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794294#comment-17794294
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

rzo1 commented on code in PR #561:
URL: https://github.com/apache/opennlp/pull/561#discussion_r1419128251


##
opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/arabicStemmer.java:
##
@@ -30,2094 +30,1284 @@ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
ANY WAY OUT OF THE USE
 
  */
 
-// This file was generated automatically by the Snowball to Java compiler
-
+// Generated by Snowball (build from 867c4ec70debd4daa7fb4d5a9f7759b47887d0b9)
 package opennlp.tools.stemmer.snowball;
 
-
- /**
-  * This class was automatically generated by a Snowball to Java compiler
-  * It implements the stemming algorithm defined by a snowball script.
-  */
-
+/**
+ * This class implements the stemming algorithm defined by a snowball script.
+ * 
+ * Generated by Snowball (build from 867c4ec70debd4daa7fb4d5a9f7759b47887d0b9) 
- https://github.com/snowballstem/snowball;>https://github.com/snowballstem/snowball
+ * 
+ */
+@SuppressWarnings("unused")
 public class arabicStemmer extends AbstractSnowballStemmer {

Review Comment:
   Ideally, we should make them `package-private` and rename the classes, but 
open to thoughts





> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794291#comment-17794291
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

rzo1 commented on code in PR #561:
URL: https://github.com/apache/opennlp/pull/561#discussion_r1419124646


##
opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/arabicStemmer.java:
##
@@ -30,2094 +30,1284 @@ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
ANY WAY OUT OF THE USE
 
  */
 
-// This file was generated automatically by the Snowball to Java compiler
-
+// Generated by Snowball (build from 867c4ec70debd4daa7fb4d5a9f7759b47887d0b9)
 package opennlp.tools.stemmer.snowball;
 
-
- /**
-  * This class was automatically generated by a Snowball to Java compiler
-  * It implements the stemming algorithm defined by a snowball script.
-  */
-
+/**
+ * This class implements the stemming algorithm defined by a snowball script.
+ * 
+ * Generated by Snowball (build from 867c4ec70debd4daa7fb4d5a9f7759b47887d0b9) 
- https://github.com/snowballstem/snowball;>https://github.com/snowballstem/snowball
+ * 
+ */
+@SuppressWarnings("unused")
 public class arabicStemmer extends AbstractSnowballStemmer {

Review Comment:
   Yes. Normally, it should start with a capitalized letter. For the 
generation, it does not really matter how we name the result. It could be 
`ArabicStemmer` with capitalized letters (similar convention could apply to the 
other stemmers)
   
   If we decide to change, it might break users as these classes are in the 
`public` api :)





> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794290#comment-17794290
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

rzo1 commented on code in PR #561:
URL: https://github.com/apache/opennlp/pull/561#discussion_r1419124646


##
opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/arabicStemmer.java:
##
@@ -30,2094 +30,1284 @@ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
ANY WAY OUT OF THE USE
 
  */
 
-// This file was generated automatically by the Snowball to Java compiler
-
+// Generated by Snowball (build from 867c4ec70debd4daa7fb4d5a9f7759b47887d0b9)
 package opennlp.tools.stemmer.snowball;
 
-
- /**
-  * This class was automatically generated by a Snowball to Java compiler
-  * It implements the stemming algorithm defined by a snowball script.
-  */
-
+/**
+ * This class implements the stemming algorithm defined by a snowball script.
+ * 
+ * Generated by Snowball (build from 867c4ec70debd4daa7fb4d5a9f7759b47887d0b9) 
- https://github.com/snowballstem/snowball;>https://github.com/snowballstem/snowball
+ * 
+ */
+@SuppressWarnings("unused")
 public class arabicStemmer extends AbstractSnowballStemmer {

Review Comment:
   Yes. Normally, it should start with a capitalized letter. For the 
generation, it does not really matter how we name the result. It could be 
`ArabicStemmer` with capitalized letters.
   
   If we decide to change, it might break users as these classes are in the 
`public` api :)





> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794288#comment-17794288
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

atarora commented on code in PR #561:
URL: https://github.com/apache/opennlp/pull/561#discussion_r1419121095


##
opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/arabicStemmer.java:
##
@@ -30,2094 +30,1284 @@ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 
ANY WAY OUT OF THE USE
 
  */
 
-// This file was generated automatically by the Snowball to Java compiler
-
+// Generated by Snowball (build from 867c4ec70debd4daa7fb4d5a9f7759b47887d0b9)
 package opennlp.tools.stemmer.snowball;
 
-
- /**
-  * This class was automatically generated by a Snowball to Java compiler
-  * It implements the stemming algorithm defined by a snowball script.
-  */
-
+/**
+ * This class implements the stemming algorithm defined by a snowball script.
+ * 
+ * Generated by Snowball (build from 867c4ec70debd4daa7fb4d5a9f7759b47887d0b9) 
- https://github.com/snowballstem/snowball;>https://github.com/snowballstem/snowball
+ * 
+ */
+@SuppressWarnings("unused")
 public class arabicStemmer extends AbstractSnowballStemmer {

Review Comment:
   Seem strange per naming convention ! Shouldn't this be ArabicStemmer with 
cap A ?





> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794264#comment-17794264
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

rzo1 commented on PR #561:
URL: https://github.com/apache/opennlp/pull/561#issuecomment-1845479712

   Thx for review @mawiesne 




> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794249#comment-17794249
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

rzo1 commented on code in PR #561:
URL: https://github.com/apache/opennlp/pull/561#discussion_r1419066379


##
opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/SnowballStemmer.java:
##
@@ -21,116 +21,94 @@
 
 public class SnowballStemmer implements Stemmer {
 
-  public enum ALGORITHM {
-ARABIC,
-DANISH,
-DUTCH,
-CATALAN,
-ENGLISH,
-FINNISH,
-FRENCH,
-GERMAN,
-GREEK,
-HUNGARIAN,
-INDONESIAN,
-IRISH,
-ITALIAN,
-NORWEGIAN,
-PORTER,
-PORTUGUESE,
-ROMANIAN,
-RUSSIAN,
-SPANISH,
-SWEDISH,
-TURKISH
-  }
+private final AbstractSnowballStemmer stemmer;

Review Comment:
   No config excludes it (at least not via the checkstyle maven plugin). 
Applied OpenNLP style to all of these stemmer/** classes.





> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794238#comment-17794238
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

mawiesne commented on code in PR #561:
URL: https://github.com/apache/opennlp/pull/561#discussion_r1419050209


##
opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/SnowballStemmer.java:
##
@@ -21,116 +21,94 @@
 
 public class SnowballStemmer implements Stemmer {
 
-  public enum ALGORITHM {
-ARABIC,
-DANISH,
-DUTCH,
-CATALAN,
-ENGLISH,
-FINNISH,
-FRENCH,
-GERMAN,
-GREEK,
-HUNGARIAN,
-INDONESIAN,
-IRISH,
-ITALIAN,
-NORWEGIAN,
-PORTER,
-PORTUGUESE,
-ROMANIAN,
-RUSSIAN,
-SPANISH,
-SWEDISH,
-TURKISH
-  }
+private final AbstractSnowballStemmer stemmer;

Review Comment:
   Coz, it's excluded.





> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794236#comment-17794236
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

rzo1 commented on code in PR #561:
URL: https://github.com/apache/opennlp/pull/561#discussion_r1419047800


##
opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/SnowballStemmer.java:
##
@@ -21,116 +21,94 @@
 
 public class SnowballStemmer implements Stemmer {
 
-  public enum ALGORITHM {
-ARABIC,
-DANISH,
-DUTCH,
-CATALAN,
-ENGLISH,
-FINNISH,
-FRENCH,
-GERMAN,
-GREEK,
-HUNGARIAN,
-INDONESIAN,
-IRISH,
-ITALIAN,
-NORWEGIAN,
-PORTER,
-PORTUGUESE,
-ROMANIAN,
-RUSSIAN,
-SPANISH,
-SWEDISH,
-TURKISH
-  }
+private final AbstractSnowballStemmer stemmer;

Review Comment:
   Why does checkstyle not complain Here? 



##
opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/SnowballStemmer.java:
##
@@ -21,116 +21,94 @@
 
 public class SnowballStemmer implements Stemmer {
 
-  public enum ALGORITHM {
-ARABIC,
-DANISH,
-DUTCH,
-CATALAN,
-ENGLISH,
-FINNISH,
-FRENCH,
-GERMAN,
-GREEK,
-HUNGARIAN,
-INDONESIAN,
-IRISH,
-ITALIAN,
-NORWEGIAN,
-PORTER,
-PORTUGUESE,
-ROMANIAN,
-RUSSIAN,
-SPANISH,
-SWEDISH,
-TURKISH
-  }
+private final AbstractSnowballStemmer stemmer;

Review Comment:
   Why does checkstyle not complain here? 





> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794234#comment-17794234
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

mawiesne commented on code in PR #561:
URL: https://github.com/apache/opennlp/pull/561#discussion_r1419037022


##
opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/SnowballStemmer.java:
##
@@ -21,116 +21,94 @@
 
 public class SnowballStemmer implements Stemmer {
 
-  public enum ALGORITHM {
-ARABIC,
-DANISH,
-DUTCH,
-CATALAN,
-ENGLISH,
-FINNISH,
-FRENCH,
-GERMAN,
-GREEK,
-HUNGARIAN,
-INDONESIAN,
-IRISH,
-ITALIAN,
-NORWEGIAN,
-PORTER,
-PORTUGUESE,
-ROMANIAN,
-RUSSIAN,
-SPANISH,
-SWEDISH,
-TURKISH
-  }
+private final AbstractSnowballStemmer stemmer;

Review Comment:
   These changes need reformatting with 2 space indentation. Seems 4 spaces 
sneaked in here for all of the changes.  Pls check other occurrences as well.





> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794220#comment-17794220
 ] 

ASF GitHub Bot commented on OPENNLP-1520:
-

rzo1 opened a new pull request, #561:
URL: https://github.com/apache/opennlp/pull/561

   ### For all changes:
   - [x ] Is there a JIRA ticket associated with this PR? Is it referenced 
in the commit message?
   
   - [x] Does your PR title start with OPENNLP- where  is the JIRA 
number you are trying to resolve? Pay particular attention to the hyphen "-" 
character.
   
   - [x] Has your PR been rebased against the latest commit within the target 
branch (typically main)?
   
   - [x] Is your initial contribution a single, squashed commit?
   
   ### For code changes:
   - [x] Have you ensured that the full suite of tests is executed via mvn 
clean install at the root opennlp folder?
   - [x] Have you written or updated unit tests to verify your changes?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
   - [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file in opennlp folder?
   - [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found in opennlp folder?
   
   ### For documentation related changes:
   - [ ] Have you ensured that format looks appropriate for the output in which 
it is rendered?
   
   ### Note:
   
   - Regenerates the Snowball Stemmer Java Code with the latest changes from 
https://github.com/snowballstem/snowball 
   - It is based upon **867c4ec70debd4daa7fb4d5a9f7759b47887d0b9** (which has a 
neat fix for German ;-) )
   - Adds a test case for finish, which triggered the issue described in 
OPENNLP-1520  in the old generated code. 
   




> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Richard Zowalla
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> 

[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread Richard Zowalla (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794160#comment-17794160
 ] 

Richard Zowalla commented on OPENNLP-1520:
--

For Indonesian, I took 
[https://raw.githubusercontent.com/snowballstem/snowball-data/master/indonesian/voc.txt]
 and did a quick test:

 
{code:java}
@Test
  void testIndonesian2() {
try {
  // Replace "https://example.com; with the URL you want to fetch
  String urlString = 
"https://raw.githubusercontent.com/snowballstem/snowball-data/master/indonesian/voc.txt;;
  URL url = new URL(urlString);

  // Open a connection to the URL
  URLConnection urlConnection = url.openConnection();

  // Create a BufferedReader to read the content of the URL
  try (BufferedReader bufferedReader = new BufferedReader(
  new InputStreamReader(urlConnection.getInputStream( {

// Read the content of the URL into a StringBuilder
StringBuilder content = new StringBuilder();
String line;

while ((line = bufferedReader.readLine()) != null) {
  content.append(line);
  content.append(System.lineSeparator());
}

SnowballStemmer stemmer = new SnowballStemmer(ALGORITHM.INDONESIAN);
for(String token : content.toString().split("\n")) {
  stemmer.stem(token);
}

  }
} catch (IOException e) {
  e.printStackTrace();
}

  }
{code}

This list doesn't trigger the exception because the related methods are 
"protected". 

> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Martin Wiesner
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-12-07 Thread Richard Zowalla (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794122#comment-17794122
 ] 

Richard Zowalla commented on OPENNLP-1520:
--

For finish, this exception can only happen for the Among elements contained in 
a_6:

{code:java}
new Among ( "den", 11, -1, "r_VI", methodObject )
new Among ( "seen", 11, -1, "r_LONG", methodObject )
new Among ( "tten", 11, -1, "r_VI", methodObject ){code}
Otherwise, methodname`isn't set and will result in an exception.

It can be reproduced by

 
{code:java}
@Test
void testFinish()
{ 
  SnowballStemmer stemmer = new SnowballStemmer(ALGORITHM.FINNISH); 
  // https://snowballstem.org/algorithms/finnish/stemmer.html 
  Assertions.assertEquals("edeltän", stemmer.stem("edeltäneeseen"));  //r_LONG()
  Assertions.assertEquals("no-idea-what-it-is-but-it-triggers-", 
stemmer.stem("voitaisiin")); // r_VI(
}
{code}

which will try to invoke 

*private boolean opennlp.tools.stemmer.snowball.finnishStemmer.r_LONG() *

Note, the exception is swallowed, so we need to set a break point in 
SnowballProgram line 356.

*voitaisiin* will trigger the other case:

*private boolean opennlp.tools.stemmer.snowball.finnishStemmer.r_VI()*

> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Martin Wiesner
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-11-28 Thread Jon Marius Venstad (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17790462#comment-17790462
 ] 

Jon Marius Venstad commented on OPENNLP-1520:
-

Hmm, obtaining that sample wouldn't be easy, since the illegal access exception 
is caught inside the opennlp library, and simply logged, as in the description. 
There's no way to find when this happens, and with what input, without 
modifying the library, apart from turning on external logging of input, and 
correlating with this log message, which I think is unacceptable. I'd be happy 
to check logs for occurrences of this error before and after an attempted fix, 
though. 

> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Martin Wiesner
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-11-28 Thread Jon Marius Venstad (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17790430#comment-17790430
 ] 

Jon Marius Venstad commented on OPENNLP-1520:
-

I could add that the Indonesian stemmer should also have this problem, although 
I haven't seen that crash. 

> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Martin Wiesner
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-11-27 Thread Jon Marius Venstad (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17790394#comment-17790394
 ] 

Jon Marius Venstad commented on OPENNLP-1520:
-

Hi Martin. Unfortunately, I was not able to produce such a data sample. We run 
stemming on behalf of our users, and I happened upon this stack trace when 
debugging something else. We generally have no access to our users' data, so I 
don't know exactly what text triggered it. I suppose I could try to obtain a 
sample, but that would take some time. 

Perhaps someone with knowledge of the Finnish language would be able to 
construct a sample? I tried concatenating lots of Finnish stem endings (found 
in those tables in the generated code), but had no lock with that. I also tried 
a few text samples from the internet, but also to no avail. 

> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Martin Wiesner
>Priority: Major
> Fix For: 2.3.2
>
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (OPENNLP-1520) Generated Java code for stemmers is broken, and should be re-generated

2023-11-27 Thread Martin Wiesner (Jira)


[ 
https://issues.apache.org/jira/browse/OPENNLP-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17790374#comment-17790374
 ] 

Martin Wiesner commented on OPENNLP-1520:
-

[~jonmv] Can you provide a minimal test data set to reproduce this problem, 
e.g. with some Finish text examples? This way we can verify a modernized / 
adapted _finishStemmer_ holds its promises.

> Generated Java code for stemmers is broken, and should be re-generated
> --
>
> Key: OPENNLP-1520
> URL: https://issues.apache.org/jira/browse/OPENNLP-1520
> Project: OpenNLP
>  Issue Type: Bug
>  Components: Stemmer
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Jon Marius Venstad
>Assignee: Martin Wiesner
>Priority: Major
>
> The recursive stemming, which seems hard to actually trigger, but which is 
> the intended usage of the {{methodObject and method}} in the {{Among}} class 
> (called reflectively) is completely broken. First off, it tries to invoke a 
> private method from outside the class (from a parent class, the 
> {{{}SnowballProgram{}}}), which fails with an illegal access exception; if 
> that worked, it would also have invoked _all_ such method calls on the 
> {_}same, shared, static object{_}—not on the relevant stemmer instance. 
> This was fixed 8 years ago, but it looks like the generated code in the 
> opennlp-tools is 10 years old. I would urge you to re-generate that code. 
>  
> Commit that fixed the Java code generation: 
> [https://github.com/snowballstem/snowball/commit/0f9d3d64ab965447a7f638b8ededc924f3efca75]
>  
> Relevant sample stemmer with broken Java:
> [https://github.com/apache/opennlp/blob/main/opennlp-tools/src/main/java/opennlp/tools/stemmer/snowball/finnishStemmer.java]
>  
> Stack trace showing illegal reflection access:
>  
> {noformat}
> 2023-10-26 23:21:44.200 class opennlp.tools.stemmer.snowball.SnowballProgram 
> cannot access a member of class opennlp.tools.stemmer.snowball.finnishStemmer 
> with modifiers "private" 
> exception=java.lang.IllegalAccessException: class 
> opennlp.tools.stemmer.snowball.SnowballProgram cannot access a member of 
> class opennlp.tools.stemmer.snowball.finnishStemmer with modifiers "private"
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:392)
>  
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:674)
>  
>   at java.base/java.lang.reflect.Method.invoke(Method.java:560) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballProgram.find_among_b(SnowballProgram.java:353)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.r_case_ending(finnishStemmer.java:480)
>  
>   at 
> opennlp.tools.stemmer.snowball.finnishStemmer.stem(finnishStemmer.java:1003) 
>   at 
> opennlp.tools.stemmer.snowball.SnowballStemmer.stem(SnowballStemmer.java:131) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.processToken(OpenNlpTokenizer.java:64)
>  
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.lambda$tokenize$0(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.language.simple.SimpleTokenizer.tokenize(SimpleTokenizer.java:74) 
>   at 
> com.yahoo.language.opennlp.OpenNlpTokenizer.tokenize(OpenNlpTokenizer.java:54)
>  
>   at 
> com.yahoo.vespa.indexinglanguage.linguistics.LinguisticsAnnotator.annotate(LinguisticsAnnotator.java:76)
> ...{noformat}
>  
>  
> Best, Jon Marius Venstad, developer at [vespa.ai|http://vespa.ai/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)