Luca Foppiano created TIKA-4115:
-----------------------------------
Summary: Grobid Quantities Parser not working properly
Key: TIKA-4115
URL: https://issues.apache.org/jira/browse/TIKA-4115
Project: Tika
Issue Type: Bug
Components: parser, tika-app
Affects Versions: 2.8.0
Reporter: Luca Foppiano
I've fixed some bugs in GrobidNERecogniser and pushed some fix here:
[https://github.com/apache/tika/pull/1280]
However the WebClient does return an NPE when checking if the server is alive
(accessing
[http://localhost:8060/service/isalive|http://localhost:8060/service/isalive)]):
```
INFO [main] 12:51:56,317 org.apache.tika.parser.ner.NamedEntityParser going to
load, instantiate and bind the instance of
org.apache.tika.parser.ner.grobid.GrobidNERecogniser
INFO [main] 12:51:56,484 org.apache.tika.parser.ner.grobid.GrobidNERecogniser
Grobid Quantities REST Server is not running
java.lang.NullPointerException: null
at
org.apache.cxf.jaxrs.client.AbstractClient.setupOutInterceptorChain(AbstractClient.java:937)
~[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at
org.apache.cxf.jaxrs.client.AbstractClient.createMessage(AbstractClient.java:1014)
~[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at
org.apache.cxf.jaxrs.client.WebClient.finalizeMessage(WebClient.java:1111)
~[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at
org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1084)
~[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:932)
~[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:901)
~[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at org.apache.cxf.jaxrs.client.WebClient.invoke(WebClient.java:364)
~[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at org.apache.cxf.jaxrs.client.WebClient.get(WebClient.java:390)
~[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at
org.apache.tika.parser.ner.grobid.GrobidNERecogniser.<init>(GrobidNERecogniser.java:78)
[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method) ~[?:?]
at
jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[?:?]
at
jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[?:?]
at java.lang.reflect.Constructor.newInstance(Constructor.java:490) [?:?]
at java.lang.Class.newInstance(Class.java:584) [?:?]
at
org.apache.tika.parser.ner.NamedEntityParser.initialize(NamedEntityParser.java:91)
[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at
org.apache.tika.parser.ner.NamedEntityParser.parse(NamedEntityParser.java:119)
[tika-parser-nlp-package-2.8.1-SNAPSHOT.jar:?]
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:152)
[tika-app-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
[tika-app-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)
[tika-app-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:203)
[tika-app-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:1071)
[tika-app-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:493)
[tika-app-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:256)
[tika-app-2.8.1-SNAPSHOT.jar:2.8.1-SNAPSHOT]
INFO [main] 12:51:56,492 org.apache.tika.parser.ner.NamedEntityParser
org.apache.tika.parser.ner.grobid.GrobidNERecogniser is available ? false
INFO [main] 12:51:56,516
org.apache.tika.parser.sentiment.SentimentAnalysisParser Sentiment Model is at
https://raw.githubusercontent.com/USCDataScience/SentimentAnalysisParser/master/sentiment-models/src/main/resources/edu/usc/irds/sentiment/en-netflix-sentiment.bin
INFO [main] 12:51:56,885 org.apache.tika.parser.ner.NamedEntityParser Number
of NERecognisers in chain 0
Content-Length: 70
Content-Type: text/plain
X-TIKA:Parsed-By: org.apache.tika.parser.CompositeParser
X-TIKA:Parsed-By: org.apache.tika.parser.ner.NamedEntityParser
X-TIKA:Parsed-By-Full-Set: org.apache.tika.parser.CompositeParser
X-TIKA:Parsed-By-Full-Set: org.apache.tika.parser.ner.NamedEntityParser
resourceName: bao.txt
```
After spending some time with it I did not manage to find the solution
--
This message was sent by Atlassian Jira
(v8.20.10#820010)