[jira] [Created] (OPENNLP-1518) Roberta-based Models - Add support for utilization via Onnx

Christopher Ball (Jira) Sat, 04 Nov 2023 10:39:58 -0700

Christopher Ball created OPENNLP-1518:
-----------------------------------------


             Summary: Roberta-based Models - Add support for utilization via 
Onnx
                 Key: OPENNLP-1518
                 URL: https://issues.apache.org/jira/browse/OPENNLP-1518
             Project: OpenNLP
          Issue Type: Improvement
          Components: language model
    Affects Versions: 2.3.0
            Reporter: Christopher Ball


It appears that *Roberta* based models do not work with *OpenNLP* via *ONNX*

 

*Example Model*

[https://huggingface.co/SamLowe/roberta-base-go_emotions-onnx]

 

*Comments from Jeff Zemerick*

Looks like some differences with the model:

{*}First{*}, the vocab file is a json file. OpenNLP expects a plain text file 
with each token one per line. So it's not able to load the vocab file.

{*}Second{*}, the model doesn't expect a token type ID param. By default, 
OpenNLP includes that param but you can tell it not to in the InferenceOptions 
class. (Set includeTokenTypeIds == false .)

{*}Third{*}, that model is returning a one-dimensional array back. OpenNLP is 
expecting a two-dimensional array.

I am guessing with those changes it would work. Nothing else jumps out at me. 
If you get time, please feel free to write those up as OpenNLP jira tickets and 
send me the links. We can easily support a JSON file for the vocab, and we can 
also support the 1-d array back but I might need to see how best to support 
models that return either a 1d and a 2d array. (edited) 

It might be a matter of checking whether it's a 1d or 2d and going from there.

*Stack Trace*

java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" 
because the return value of "java.util.Map.get(Object)" is null
at 
opennlp.dl.doccat.DocumentCategorizerDL.tokenize(DocumentCategorizerDL.java:281)
at 
opennlp.dl.doccat.DocumentCategorizerDL.categorize(DocumentCategorizerDL.java:104)
at 
opennlp.dl.doccat.DocumentCategorizerDL.sortedScoreMap(DocumentCategorizerDL.java:192)
at 
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.inferContent(Explore_Emotions_with_Onnx_OpenNLP.scala:47)
at 
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1(Explore_Emotions_with_Onnx_OpenNLP.scala:37)
at 
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1$adapted(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:934)
at 
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.main(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
at 
org.index.app.Explore_Emotions_with_Onnx_OpenNLP.main(Explore_Emotions_with_Onnx_OpenNLP.scala)
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 0 
out of bounds for length 0
at 
opennlp.dl.doccat.DocumentCategorizerDL.sortedScoreMap(DocumentCategorizerDL.java:198)
at 
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.inferContent(Explore_Emotions_with_Onnx_OpenNLP.scala:47)
at 
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1(Explore_Emotions_with_Onnx_OpenNLP.scala:37)
at 
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.$anonfun$main$1$adapted(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:934)
at 
org.index.app.Explore_Emotions_with_Onnx_OpenNLP$.main(Explore_Emotions_with_Onnx_OpenNLP.scala:36)
at 
org.index.app.Explore_Emotions_with_Onnx_OpenNLP.main(Explore_Emotions_with_Onnx_OpenNLP.scala)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (OPENNLP-1518) Roberta-based Models - Add support for utilization via Onnx

Reply via email to