[ 
https://issues.apache.org/jira/browse/TIKA-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Burchard updated TIKA-3261:
--------------------------------
    Description: 
I've tried to parse a file using both 1.20 and 1.24.1.  The file appears valid 
when I view it in my text editor and seems to simply be a tab-delimited table 
with a mix of Hebrew and Latin characters.   In 1.20 I see an exception thrown, 
and in 1.24.1 I get JSON metadata back with no content.

My command line:

{{curl -X PUT --upload-file /tmp/choke.txt [http://localhost:9998/rmeta/text]}}

1.24.1  Result:

{{[\\{"Content-Type":"application/octet-stream","X-Parsed-By":"org.apache.tika.parser.EmptyParser","X-TIKA:embedded_depth":"0","X-TIKA:parse_time_millis":"10"}]}}

 

1.20 Result:

{{INFO Starting Apache Tika 1.20 server}}
 {{INFO Setting the server's publish address to be [http://localhost:9998/]}}
 {{INFO Logging initialized @1704ms to org.eclipse.jetty.util.log.Slf4jLog}}
 {{INFO jetty-9.4.z-SNAPSHOT; built: 2018-08-30T13:59:14.071Z; git: 
27208684755d94a92186989f695db2d7b21ebc51; jvm 8.0.6.10 - 
pwa6480sr6fp10-20200408_01(SR6 FP10)}}
 {{INFO Started ServerConnector@7b09f799{HTTP/1.1,[http/1.1]}

{localhost:9998}

}}
 {{INFO Started @2085ms}}
 {{WARN Empty contextPath}}
 {{INFO Started o.e.j.s.h.ContextHandler@-405fdc63{/,null,AVAILABLE}}}
 {{INFO Started Apache Tika server at [http://localhost:9998/]}}
 {{INFO rmeta/text (autodetecting type)}}
 {{WARN rmeta/text: Text extraction failed (null)}}
 {{org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.server.resource.TikaResource$1@74f007b}}
 \{{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)}}
 \{{ at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
 \{{ at 
org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:224)}}
 \{{ at 
org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:401)}}
 \{{ at 
org.apache.tika.server.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:144)}}
 \{{ at 
org.apache.tika.server.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:121)}}
 \{{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
 \{{ at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)}}
 \{{ at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)}}
 \{{ at java.lang.reflect.Method.invoke(Method.java:508)}}
 \{{ at 
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)}}
 \{{ at 
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)}}
 \{{ at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:193)}}
 \{{ at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:103)}}
 \{{ at 
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)}}
 \{{ at 
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)}}
 \{{ at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)}}
 \{{ at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)}}
 \{{ at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)}}
 \{{ at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)}}
 \{{ at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)}}
 \{{ at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)}}
 \{{ at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)}}
 \{{ at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340)}}
 \{{ at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)}}
 \{{ at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242)}}
 \{{ at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)}}
 \{{ at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)}}
 \{{ at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)}}
 \{{ at org.eclipse.jetty.server.Server.handle(Server.java:503)}}
 \{{ at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)}}
 \{{ at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)}}
 \{{ at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)}}
 \{{ at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)}}
 \{{ at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)}}
 \{{ at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)}}
 \{{ at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)}}
 \{{ at java.lang.Thread.run(Thread.java:820)}}
 {{Caused by: javax.ws.rs.WebApplicationException: HTTP 415 Unsupported Media 
Type}}
 \{{ at 
org.apache.tika.server.resource.TikaResource$1.parse(TikaResource.java:127)}}
 \{{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
 \{{ ... 37 more}}

 

  was:
I've tried to parse a file using both 1.20 and 1.24.1.  The file appears valid 
when I view it in my text editor and seems to simply be a tab-delimited table 
with a mix of Hebrew and Latin characters.   In 1.20 I see an exception thrown, 
and in 1.24.1 I get JSON metadata back with no content.

My command line:

{{curl -X PUT --upload-file /tmp/choke.txt http://localhost:9998/rmeta/text}}


1.24.1  Result:

{{[\{"Content-Type":"application/octet-stream","X-Parsed-By":"org.apache.tika.parser.EmptyParser","X-TIKA:embedded_depth":"0","X-TIKA:parse_time_millis":"10"}]}}

 

1.20 Result:

{{INFO Starting Apache Tika 1.20 server}}
{{INFO Setting the server's publish address to be http://localhost:9998/}}
{{INFO Logging initialized @1704ms to org.eclipse.jetty.util.log.Slf4jLog}}
{{INFO jetty-9.4.z-SNAPSHOT; built: 2018-08-30T13:59:14.071Z; git: 
27208684755d94a92186989f695db2d7b21ebc51; jvm 8.0.6.10 - 
pwa6480sr6fp10-20200408_01(SR6 FP10)}}
{{INFO Started ServerConnector@7b09f799\{HTTP/1.1,[http/1.1]}{localhost:9998}}}
{{INFO Started @2085ms}}
{{WARN Empty contextPath}}
{{INFO Started o.e.j.s.h.ContextHandler@-405fdc63\{/,null,AVAILABLE}}}
{{INFO Started Apache Tika server at http://localhost:9998/}}
{{INFO rmeta/text (autodetecting type)}}
{{WARN rmeta/text: Text extraction failed (null)}}
{{org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.server.resource.TikaResource$1@74f007b}}
{{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)}}
{{ at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
{{ at 
org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:224)}}
{{ at 
org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:401)}}
{{ at 
org.apache.tika.server.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:144)}}
{{ at 
org.apache.tika.server.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:121)}}
{{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
{{ at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)}}
{{ at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)}}
{{ at java.lang.reflect.Method.invoke(Method.java:508)}}
{{ at 
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)}}
{{ at 
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)}}
{{ at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:193)}}
{{ at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:103)}}
{{ at 
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)}}
{{ at 
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)}}
{{ at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)}}
{{ at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)}}
{{ at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)}}
{{ at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)}}
{{ at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)}}
{{ at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)}}
{{ at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)}}
{{ at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340)}}
{{ at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)}}
{{ at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242)}}
{{ at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)}}
{{ at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)}}
{{ at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)}}
{{ at org.eclipse.jetty.server.Server.handle(Server.java:503)}}
{{ at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)}}
{{ at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)}}
{{ at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)}}
{{ at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)}}
{{ at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)}}
{{ at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)}}
{{ at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)}}
{{ at java.lang.Thread.run(Thread.java:820)}}
{{Caused by: javax.ws.rs.WebApplicationException: HTTP 415 Unsupported Media 
Type}}
{{ at 
org.apache.tika.server.resource.TikaResource$1.parse(TikaResource.java:127)}}
{{ at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
{{ ... 37 more}}

 

 

 


> Text file is parsed by "EmptyParser" but the file does contain what looks 
> like valid text
> -----------------------------------------------------------------------------------------
>
>                 Key: TIKA-3261
>                 URL: https://issues.apache.org/jira/browse/TIKA-3261
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.20, 1.24.1
>         Environment: Tika is running on Windows 10 for my test machine, and 
> Windows 2016 for the production machine.  Reproducible on both.   The Linux 
> command line I used is just SLES on WSL, so it has no bearing here.
>  
> (having a problem attaching the file, Jira is giving me a 'missing token' 
> error so I'll try again after creation of the Jira issue)
>            Reporter: Josh Burchard
>            Priority: Major
>         Attachments: choke.zip
>
>
> I've tried to parse a file using both 1.20 and 1.24.1.  The file appears 
> valid when I view it in my text editor and seems to simply be a tab-delimited 
> table with a mix of Hebrew and Latin characters.   In 1.20 I see an exception 
> thrown, and in 1.24.1 I get JSON metadata back with no content.
> My command line:
> {{curl -X PUT --upload-file /tmp/choke.txt 
> [http://localhost:9998/rmeta/text]}}
> 1.24.1  Result:
> {{[\\{"Content-Type":"application/octet-stream","X-Parsed-By":"org.apache.tika.parser.EmptyParser","X-TIKA:embedded_depth":"0","X-TIKA:parse_time_millis":"10"}]}}
>  
> 1.20 Result:
> {{INFO Starting Apache Tika 1.20 server}}
>  {{INFO Setting the server's publish address to be [http://localhost:9998/]}}
>  {{INFO Logging initialized @1704ms to org.eclipse.jetty.util.log.Slf4jLog}}
>  {{INFO jetty-9.4.z-SNAPSHOT; built: 2018-08-30T13:59:14.071Z; git: 
> 27208684755d94a92186989f695db2d7b21ebc51; jvm 8.0.6.10 - 
> pwa6480sr6fp10-20200408_01(SR6 FP10)}}
>  {{INFO Started ServerConnector@7b09f799{HTTP/1.1,[http/1.1]}
> {localhost:9998}
> }}
>  {{INFO Started @2085ms}}
>  {{WARN Empty contextPath}}
>  {{INFO Started o.e.j.s.h.ContextHandler@-405fdc63{/,null,AVAILABLE}}}
>  {{INFO Started Apache Tika server at [http://localhost:9998/]}}
>  {{INFO rmeta/text (autodetecting type)}}
>  {{WARN rmeta/text: Text extraction failed (null)}}
>  {{org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.server.resource.TikaResource$1@74f007b}}
>  \{{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)}}
>  \{{ at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)}}
>  \{{ at 
> org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:224)}}
>  \{{ at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:401)}}
>  \{{ at 
> org.apache.tika.server.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:144)}}
>  \{{ at 
> org.apache.tika.server.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:121)}}
>  \{{ at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)}}
>  \{{ at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:90)}}
>  \{{ at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)}}
>  \{{ at java.lang.reflect.Method.invoke(Method.java:508)}}
>  \{{ at 
> org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)}}
>  \{{ at 
> org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)}}
>  \{{ at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:193)}}
>  \{{ at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:103)}}
>  \{{ at 
> org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)}}
>  \{{ at 
> org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)}}
>  \{{ at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)}}
>  \{{ at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)}}
>  \{{ at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:267)}}
>  \{{ at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)}}
>  \{{ at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)}}
>  \{{ at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)}}
>  \{{ at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)}}
>  \{{ at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1340)}}
>  \{{ at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:205)}}
>  \{{ at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1242)}}
>  \{{ at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)}}
>  \{{ at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)}}
>  \{{ at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)}}
>  \{{ at org.eclipse.jetty.server.Server.handle(Server.java:503)}}
>  \{{ at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)}}
>  \{{ at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)}}
>  \{{ at 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)}}
>  \{{ at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)}}
>  \{{ at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)}}
>  \{{ at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)}}
>  \{{ at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)}}
>  \{{ at java.lang.Thread.run(Thread.java:820)}}
>  {{Caused by: javax.ws.rs.WebApplicationException: HTTP 415 Unsupported Media 
> Type}}
>  \{{ at 
> org.apache.tika.server.resource.TikaResource$1.parse(TikaResource.java:127)}}
>  \{{ at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)}}
>  \{{ ... 37 more}}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to