Thanks Nick.
Just a copy and paste error in the email.
I was able to figure out how to bypass the JornalParser and just use PDF ones.
--Pei
On Wed, 24 Feb 2016, Pei Chen wrote:
> Does the default pdf parser using auto detect parser require to tika
> to run in server mode?
No
> It seems to try and open an http connection to localhost:8080 by
> default? Can it run in-process?
The stacktrace shows you're not using the PDF parser:
> at
> org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:74)
> at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
See https://wiki.apache.org/tika/GrobidJournalParser for how to configure
the grobid parser if you want to use it
Nick
On Wed, Feb 24, 2016 at 5:15 PM, Pei Chen wrote:
> Hi tika-dev,
> Does the default pdf parser using auto detect parser require to tika
> to run in server mode? It seems to try and open an http connection to
> localhost:8080 by default? Can it run in-process?
>
>
> ...
> FileInputStream stream = new
> FileInputStream("src/test/resources/somepdf.pdf");
> //works fine in-process with other doc types.
> Tika tika = new Tika();
> tika.parseToString(stream);
> ...
>
>
> 24 Feb 2016 17:06:24 WARN PhaseInterceptorChain - Interceptor for
> {http://localhost:8080/processHeaderDocument}WebClient has thrown
> exception, unwinding now
>
> org.apache.cxf.interceptor.Fault: No message body writer has been
> found for class org.apache.cxf.jaxrs.ext.multipart.MultipartBody,
> ContentType: multipart/form-data
>
> at
> org.apache.cxf.jaxrs.client.WebClient$BodyWriter.doWriteBody(WebClient.java:1220)
>
> at
> org.apache.cxf.jaxrs.client.AbstractClient$AbstractBodyWriter.handleMessage(AbstractClient.java:1044)
>
> at
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
>
> at
> org.apache.cxf.jaxrs.client.AbstractClient.doRunInterceptorChain(AbstractClient.java:623)
>
> at
> org.apache.cxf.jaxrs.client.WebClient.doChainedInvocation(WebClient.java:1084)
>
> at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:883)
>
> at org.apache.cxf.jaxrs.client.WebClient.doInvoke(WebClient.java:854)
>
> at org.apache.cxf.jaxrs.client.WebClient.invoke(WebClient.java:320)
>
> at org.apache.cxf.jaxrs.client.WebClient.post(WebClient.java:329)
>
> at
> org.apache.tika.parser.journal.GrobidRESTParser.parse(GrobidRESTParser.java:74)
>
> at org.apache.tika.parser.journal.JournalParser.parse(JournalParser.java:60)
>
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>
> at org.apache.tika.Tika.parseToString(Tika.java:496)
>
> at org.apache.tika.Tika.parseToString(Tika.java:571)