[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-03-21 Thread Ian Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204610#comment-15204610
 ] 

Ian Williams commented on TIKA-1845:


OK - thank you for clarifying that.



> Unable to extract content from certain RTFs using tika-server versions since 
> 1.5 
> -
>
> Key: TIKA-1845
> URL: https://issues.apache.org/jira/browse/TIKA-1845
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.6, 1.9, 1.11
> Environment: Windows
>Reporter: Ian Williams
>Assignee: Tim Allison
> Fix For: 1.13
>
> Attachments: example-that-fails.rtf
>
>
> I have some patient letters that are RTF documents.  When I extract the text 
> from these documents using tika-server-1.5.jar, it works fine.
> However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
> 1.11), it fails with the stack trace and error shown below.
> I can provide a sample RTF that is failing. 
> I wondered whether the error might be related to the following change that 
> was introduced in 1.6?:
>   * Made RTFParser's list handling slightly more robust against corrupt
> list metadata (TIKA-1305)
> It's possible that there is some issue with the RTF documents, but they are 
> real patient letters and they open in Microsoft Word without any problems.
> Many thanks
> Ian
> Steps to reproduce issue
> 
> 1. HTTP PUT to Tika server using curl:
> C:\Downloads\Apache Tika>curl -X PUT --data-binary 
> @test-anonymised-letter.rtf http://localhost:9998/tika --header 
> "Content-Type: application/rtf" --header "Accept: text/plain"
> --> this works fine when running tika-server-1.5.jar, but fails with 
> tika-server-1.6.jar
> 2. Screen capture from the server:
> INFO: Starting Apache Tika 1.9 server
> Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be http://localhost:9998/
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: jetty-8.y.z-SNAPSHOT
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Started SelectChannelConnector@localhost:9998
> Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
> INFO: Started
> Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource 
> logRequest
> INFO: tika (application/rtf)
> Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: tika: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.rtf.RTFParser@32a6dc
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
> at 
> org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
> at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
> at 
> org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at 
> 

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-03-21 Thread Ian Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204531#comment-15204531
 ] 

Ian Williams commented on TIKA-1845:


Dear Tim

I noticed Tika 1.12 has been released recently but doesn't contain this fix.  
Do you know which version of Tika will contain the fix?

Many thanks
Ian



> Unable to extract content from certain RTFs using tika-server versions since 
> 1.5 
> -
>
> Key: TIKA-1845
> URL: https://issues.apache.org/jira/browse/TIKA-1845
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.6, 1.9, 1.11
> Environment: Windows
>Reporter: Ian Williams
>Assignee: Tim Allison
> Attachments: example-that-fails.rtf
>
>
> I have some patient letters that are RTF documents.  When I extract the text 
> from these documents using tika-server-1.5.jar, it works fine.
> However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
> 1.11), it fails with the stack trace and error shown below.
> I can provide a sample RTF that is failing. 
> I wondered whether the error might be related to the following change that 
> was introduced in 1.6?:
>   * Made RTFParser's list handling slightly more robust against corrupt
> list metadata (TIKA-1305)
> It's possible that there is some issue with the RTF documents, but they are 
> real patient letters and they open in Microsoft Word without any problems.
> Many thanks
> Ian
> Steps to reproduce issue
> 
> 1. HTTP PUT to Tika server using curl:
> C:\Downloads\Apache Tika>curl -X PUT --data-binary 
> @test-anonymised-letter.rtf http://localhost:9998/tika --header 
> "Content-Type: application/rtf" --header "Accept: text/plain"
> --> this works fine when running tika-server-1.5.jar, but fails with 
> tika-server-1.6.jar
> 2. Screen capture from the server:
> INFO: Starting Apache Tika 1.9 server
> Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be http://localhost:9998/
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: jetty-8.y.z-SNAPSHOT
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Started SelectChannelConnector@localhost:9998
> Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
> INFO: Started
> Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource 
> logRequest
> INFO: tika (application/rtf)
> Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: tika: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.rtf.RTFParser@32a6dc
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
> at 
> org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
> at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
> at 
> org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
> at 
> 

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-03 Thread Ian Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130686#comment-15130686
 ] 

Ian Williams commented on TIKA-1845:


I am out of the office until Thu 04 Feb 2016.

Regards
Ian



> Unable to extract content from certain RTFs using tika-server versions since 
> 1.5 
> -
>
> Key: TIKA-1845
> URL: https://issues.apache.org/jira/browse/TIKA-1845
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.6, 1.9, 1.11
> Environment: Windows
>Reporter: Ian Williams
>Assignee: Tim Allison
> Attachments: example-that-fails.rtf
>
>
> I have some patient letters that are RTF documents.  When I extract the text 
> from these documents using tika-server-1.5.jar, it works fine.
> However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
> 1.11), it fails with the stack trace and error shown below.
> I can provide a sample RTF that is failing. 
> I wondered whether the error might be related to the following change that 
> was introduced in 1.6?:
>   * Made RTFParser's list handling slightly more robust against corrupt
> list metadata (TIKA-1305)
> It's possible that there is some issue with the RTF documents, but they are 
> real patient letters and they open in Microsoft Word without any problems.
> Many thanks
> Ian
> Steps to reproduce issue
> 
> 1. HTTP PUT to Tika server using curl:
> C:\Downloads\Apache Tika>curl -X PUT --data-binary 
> @test-anonymised-letter.rtf http://localhost:9998/tika --header 
> "Content-Type: application/rtf" --header "Accept: text/plain"
> --> this works fine when running tika-server-1.5.jar, but fails with 
> tika-server-1.6.jar
> 2. Screen capture from the server:
> INFO: Starting Apache Tika 1.9 server
> Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be http://localhost:9998/
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: jetty-8.y.z-SNAPSHOT
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Started SelectChannelConnector@localhost:9998
> Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
> INFO: Started
> Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource 
> logRequest
> INFO: tika (application/rtf)
> Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: tika: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.rtf.RTFParser@32a6dc
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
> at 
> org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
> at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
> at 
> org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at 
> 

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126477#comment-15126477
 ] 

Ian Williams commented on TIKA-1845:


Tim - thanks for confirming what's in the attachment and for the heads up about 
the metadata.  I've attached a new cutdown example that fails with the same 
error.  Please use this sample for unit tests etc.

> Unable to extract content from certain RTFs using tika-server versions since 
> 1.5 
> -
>
> Key: TIKA-1845
> URL: https://issues.apache.org/jira/browse/TIKA-1845
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.6, 1.9, 1.11
> Environment: Windows
>Reporter: Ian Williams
> Attachments: example-that-fails.rtf
>
>
> I have some patient letters that are RTF documents.  When I extract the text 
> from these documents using tika-server-1.5.jar, it works fine.
> However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
> 1.11), it fails with the stack trace and error shown below.
> I can provide a sample RTF that is failing. 
> I wondered whether the error might be related to the following change that 
> was introduced in 1.6?:
>   * Made RTFParser's list handling slightly more robust against corrupt
> list metadata (TIKA-1305)
> It's possible that there is some issue with the RTF documents, but they are 
> real patient letters and they open in Microsoft Word without any problems.
> Many thanks
> Ian
> Steps to reproduce issue
> 
> 1. HTTP PUT to Tika server using curl:
> C:\Downloads\Apache Tika>curl -X PUT --data-binary 
> @test-anonymised-letter.rtf http://localhost:9998/tika --header 
> "Content-Type: application/rtf" --header "Accept: text/plain"
> --> this works fine when running tika-server-1.5.jar, but fails with 
> tika-server-1.6.jar
> 2. Screen capture from the server:
> INFO: Starting Apache Tika 1.9 server
> Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be http://localhost:9998/
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: jetty-8.y.z-SNAPSHOT
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Started SelectChannelConnector@localhost:9998
> Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
> INFO: Started
> Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource 
> logRequest
> INFO: tika (application/rtf)
> Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: tika: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.rtf.RTFParser@32a6dc
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
> at 
> org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
> at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
> at 
> org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
> at 
> 

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Williams updated TIKA-1845:
---
Attachment: example-that-fails.rtf

> Unable to extract content from certain RTFs using tika-server versions since 
> 1.5 
> -
>
> Key: TIKA-1845
> URL: https://issues.apache.org/jira/browse/TIKA-1845
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.6, 1.9, 1.11
> Environment: Windows
>Reporter: Ian Williams
> Attachments: example-that-fails.rtf
>
>
> I have some patient letters that are RTF documents.  When I extract the text 
> from these documents using tika-server-1.5.jar, it works fine.
> However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
> 1.11), it fails with the stack trace and error shown below.
> I can provide a sample RTF that is failing. 
> I wondered whether the error might be related to the following change that 
> was introduced in 1.6?:
>   * Made RTFParser's list handling slightly more robust against corrupt
> list metadata (TIKA-1305)
> It's possible that there is some issue with the RTF documents, but they are 
> real patient letters and they open in Microsoft Word without any problems.
> Many thanks
> Ian
> Steps to reproduce issue
> 
> 1. HTTP PUT to Tika server using curl:
> C:\Downloads\Apache Tika>curl -X PUT --data-binary 
> @test-anonymised-letter.rtf http://localhost:9998/tika --header 
> "Content-Type: application/rtf" --header "Accept: text/plain"
> --> this works fine when running tika-server-1.5.jar, but fails with 
> tika-server-1.6.jar
> 2. Screen capture from the server:
> INFO: Starting Apache Tika 1.9 server
> Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be http://localhost:9998/
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: jetty-8.y.z-SNAPSHOT
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Started SelectChannelConnector@localhost:9998
> Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
> INFO: Started
> Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource 
> logRequest
> INFO: tika (application/rtf)
> Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: tika: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.rtf.RTFParser@32a6dc
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
> at 
> org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
> at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
> at 
> org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at 
> 

[jira] [Created] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)
Ian Williams created TIKA-1845:
--

 Summary: Unable to extract content from certain RTFs using 
tika-server versions since 1.5 
 Key: TIKA-1845
 URL: https://issues.apache.org/jira/browse/TIKA-1845
 Project: Tika
  Issue Type: Bug
  Components: server
Affects Versions: 1.11, 1.9, 1.6
 Environment: Windows
Reporter: Ian Williams


I have some patient letters that are RTF documents.  When I extract the text 
from these documents using tika-server-1.5.jar, it works fine.

However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
1.11), it fails with the stack trace and error shown below.

I can provide a sample RTF that is failing.

I wondered whether the error might be related to the following change that was 
introduced in 1.6?:
  * Made RTFParser's list handling slightly more robust against corrupt
list metadata (TIKA-1305)

It's possible that there is some issue with the RTF documents, but they are 
real patient letters and they open in Microsoft Word without any problems.

Many thanks
Ian


Steps to reproduce issue


1. HTTP PUT to Tika server using curl:

C:\Downloads\Apache Tika>curl -X PUT --data-binary @test-anonymised-letter.rtf 
http://localhost:9998/tika --header "Content-Type: application/rtf" --header 
"Accept: text/plain"

--> this works fine when running tika-server-1.5.jar, but fails with 
tika-server-1.6.jar


2. Screen capture from the server:
INFO: Starting Apache Tika 1.9 server
Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://localhost:9998/
Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: jetty-8.y.z-SNAPSHOT
Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Started SelectChannelConnector@localhost:9998
Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
INFO: Started
Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource logRequest
INFO: tika (application/rtf)
Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
WARNING: tika: Text extraction failed
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.rtf.RTFParser@32a6dc
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
at 
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
at 
org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
at 
org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
at 
org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at 

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Williams updated TIKA-1845:
---
Description: 
I have some patient letters that are RTF documents.  When I extract the text 
from these documents using tika-server-1.5.jar, it works fine.

However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
1.11), it fails with the stack trace and error shown below.

I can provide a sample RTF that is failing.  I'm not sure how to attach files 
to this issue so here is a link to an Evernote note containing an example RTF 
that fails:
https://www.evernote.com/shard/s66/sh/4a003611-2400-4959-a1cc-2be5b3efe2cf/284a6f2dd3e0a290

I wondered whether the error might be related to the following change that was 
introduced in 1.6?:
  * Made RTFParser's list handling slightly more robust against corrupt
list metadata (TIKA-1305)

It's possible that there is some issue with the RTF documents, but they are 
real patient letters and they open in Microsoft Word without any problems.

Many thanks
Ian


Steps to reproduce issue


1. HTTP PUT to Tika server using curl:

C:\Downloads\Apache Tika>curl -X PUT --data-binary @test-anonymised-letter.rtf 
http://localhost:9998/tika --header "Content-Type: application/rtf" --header 
"Accept: text/plain"

--> this works fine when running tika-server-1.5.jar, but fails with 
tika-server-1.6.jar


2. Screen capture from the server:
INFO: Starting Apache Tika 1.9 server
Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://localhost:9998/
Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: jetty-8.y.z-SNAPSHOT
Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Started SelectChannelConnector@localhost:9998
Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
INFO: Started
Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource logRequest
INFO: tika (application/rtf)
Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
WARNING: tika: Text extraction failed
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.rtf.RTFParser@32a6dc
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
at 
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
at 
org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
at 
org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
at 
org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at 

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Williams updated TIKA-1845:
---
Description: 
I have some patient letters that are RTF documents.  When I extract the text 
from these documents using tika-server-1.5.jar, it works fine.

However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
1.11), it fails with the stack trace and error shown below.

I can provide a sample RTF that is failing.  I'm not sure how to attach files 
to this issue so here is a link to an Evernote note containing an example RTF 
that fails:
http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/

I wondered whether the error might be related to the following change that was 
introduced in 1.6?:
  * Made RTFParser's list handling slightly more robust against corrupt
list metadata (TIKA-1305)

It's possible that there is some issue with the RTF documents, but they are 
real patient letters and they open in Microsoft Word without any problems.

Many thanks
Ian


Steps to reproduce issue


1. HTTP PUT to Tika server using curl:

C:\Downloads\Apache Tika>curl -X PUT --data-binary @test-anonymised-letter.rtf 
http://localhost:9998/tika --header "Content-Type: application/rtf" --header 
"Accept: text/plain"

--> this works fine when running tika-server-1.5.jar, but fails with 
tika-server-1.6.jar


2. Screen capture from the server:
INFO: Starting Apache Tika 1.9 server
Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://localhost:9998/
Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: jetty-8.y.z-SNAPSHOT
Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Started SelectChannelConnector@localhost:9998
Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
INFO: Started
Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource logRequest
INFO: tika (application/rtf)
Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
WARNING: tika: Text extraction failed
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.rtf.RTFParser@32a6dc
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
at 
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
at 
org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
at 
org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
at 
org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at 

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Williams updated TIKA-1845:
---
Attachment: (was: test-anonymised-letter.rtf)

> Unable to extract content from certain RTFs using tika-server versions since 
> 1.5 
> -
>
> Key: TIKA-1845
> URL: https://issues.apache.org/jira/browse/TIKA-1845
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.6, 1.9, 1.11
> Environment: Windows
>Reporter: Ian Williams
>
> I have some patient letters that are RTF documents.  When I extract the text 
> from these documents using tika-server-1.5.jar, it works fine.
> However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
> 1.11), it fails with the stack trace and error shown below.
> I can provide a sample RTF that is failing.  I'm not sure how to attach files 
> to this issue so here is a link to an Evernote note containing an example RTF 
> that fails:
> http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/
> I wondered whether the error might be related to the following change that 
> was introduced in 1.6?:
>   * Made RTFParser's list handling slightly more robust against corrupt
> list metadata (TIKA-1305)
> It's possible that there is some issue with the RTF documents, but they are 
> real patient letters and they open in Microsoft Word without any problems.
> Many thanks
> Ian
> Steps to reproduce issue
> 
> 1. HTTP PUT to Tika server using curl:
> C:\Downloads\Apache Tika>curl -X PUT --data-binary 
> @test-anonymised-letter.rtf http://localhost:9998/tika --header 
> "Content-Type: application/rtf" --header "Accept: text/plain"
> --> this works fine when running tika-server-1.5.jar, but fails with 
> tika-server-1.6.jar
> 2. Screen capture from the server:
> INFO: Starting Apache Tika 1.9 server
> Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be http://localhost:9998/
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: jetty-8.y.z-SNAPSHOT
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Started SelectChannelConnector@localhost:9998
> Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
> INFO: Started
> Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource 
> logRequest
> INFO: tika (application/rtf)
> Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: tika: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.rtf.RTFParser@32a6dc
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
> at 
> org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
> at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
> at 
> org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
>

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126354#comment-15126354
 ] 

Ian Williams commented on TIKA-1845:


I've deleted the attachment for the time being - sorry.   Please contact me 
directly for a sample.  The reason is that I don't know what's in the embedded 
file within the RTF.

> Unable to extract content from certain RTFs using tika-server versions since 
> 1.5 
> -
>
> Key: TIKA-1845
> URL: https://issues.apache.org/jira/browse/TIKA-1845
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.6, 1.9, 1.11
> Environment: Windows
>Reporter: Ian Williams
>
> I have some patient letters that are RTF documents.  When I extract the text 
> from these documents using tika-server-1.5.jar, it works fine.
> However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
> 1.11), it fails with the stack trace and error shown below.
> I can provide a sample RTF that is failing.  I'm not sure how to attach files 
> to this issue so here is a link to an Evernote note containing an example RTF 
> that fails:
> http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/
> I wondered whether the error might be related to the following change that 
> was introduced in 1.6?:
>   * Made RTFParser's list handling slightly more robust against corrupt
> list metadata (TIKA-1305)
> It's possible that there is some issue with the RTF documents, but they are 
> real patient letters and they open in Microsoft Word without any problems.
> Many thanks
> Ian
> Steps to reproduce issue
> 
> 1. HTTP PUT to Tika server using curl:
> C:\Downloads\Apache Tika>curl -X PUT --data-binary 
> @test-anonymised-letter.rtf http://localhost:9998/tika --header 
> "Content-Type: application/rtf" --header "Accept: text/plain"
> --> this works fine when running tika-server-1.5.jar, but fails with 
> tika-server-1.6.jar
> 2. Screen capture from the server:
> INFO: Starting Apache Tika 1.9 server
> Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be http://localhost:9998/
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: jetty-8.y.z-SNAPSHOT
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Started SelectChannelConnector@localhost:9998
> Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
> INFO: Started
> Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource 
> logRequest
> INFO: tika (application/rtf)
> Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: tika: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.rtf.RTFParser@32a6dc
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
> at 
> org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
> at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
> at 
> org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
> at 
> 

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126375#comment-15126375
 ] 

Ian Williams commented on TIKA-1845:


Just being cautious because I don't want to share anything in a public forum 
that isn't 100% anonymised, and I don't know what's in that embedded file 
within the RTF (doesn't show up within the document in Word).  Possibly a logo 
or something like that.

> Unable to extract content from certain RTFs using tika-server versions since 
> 1.5 
> -
>
> Key: TIKA-1845
> URL: https://issues.apache.org/jira/browse/TIKA-1845
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.6, 1.9, 1.11
> Environment: Windows
>Reporter: Ian Williams
>
> I have some patient letters that are RTF documents.  When I extract the text 
> from these documents using tika-server-1.5.jar, it works fine.
> However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
> 1.11), it fails with the stack trace and error shown below.
> I can provide a sample RTF that is failing.  I'm not sure how to attach files 
> to this issue so here is a link to an Evernote note containing an example RTF 
> that fails:
> http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/
> I wondered whether the error might be related to the following change that 
> was introduced in 1.6?:
>   * Made RTFParser's list handling slightly more robust against corrupt
> list metadata (TIKA-1305)
> It's possible that there is some issue with the RTF documents, but they are 
> real patient letters and they open in Microsoft Word without any problems.
> Many thanks
> Ian
> Steps to reproduce issue
> 
> 1. HTTP PUT to Tika server using curl:
> C:\Downloads\Apache Tika>curl -X PUT --data-binary 
> @test-anonymised-letter.rtf http://localhost:9998/tika --header 
> "Content-Type: application/rtf" --header "Accept: text/plain"
> --> this works fine when running tika-server-1.5.jar, but fails with 
> tika-server-1.6.jar
> 2. Screen capture from the server:
> INFO: Starting Apache Tika 1.9 server
> Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be http://localhost:9998/
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: jetty-8.y.z-SNAPSHOT
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Started SelectChannelConnector@localhost:9998
> Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
> INFO: Started
> Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource 
> logRequest
> INFO: tika (application/rtf)
> Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: tika: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.rtf.RTFParser@32a6dc
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
> at 
> org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
> at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
> at 
> org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at 
> 

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Williams updated TIKA-1845:
---
Description: 
I have some patient letters that are RTF documents.  When I extract the text 
from these documents using tika-server-1.5.jar, it works fine.

However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
1.11), it fails with the stack trace and error shown below.

I can provide a sample RTF that is failing. 

I wondered whether the error might be related to the following change that was 
introduced in 1.6?:
  * Made RTFParser's list handling slightly more robust against corrupt
list metadata (TIKA-1305)

It's possible that there is some issue with the RTF documents, but they are 
real patient letters and they open in Microsoft Word without any problems.

Many thanks
Ian


Steps to reproduce issue


1. HTTP PUT to Tika server using curl:

C:\Downloads\Apache Tika>curl -X PUT --data-binary @test-anonymised-letter.rtf 
http://localhost:9998/tika --header "Content-Type: application/rtf" --header 
"Accept: text/plain"

--> this works fine when running tika-server-1.5.jar, but fails with 
tika-server-1.6.jar


2. Screen capture from the server:
INFO: Starting Apache Tika 1.9 server
Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
INFO: Setting the server's publish address to be http://localhost:9998/
Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: jetty-8.y.z-SNAPSHOT
Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Started SelectChannelConnector@localhost:9998
Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
INFO: Started
Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource logRequest
INFO: tika (application/rtf)
Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
WARNING: tika: Text extraction failed
org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
org.apache.tika.parser.rtf.RTFParser@32a6dc
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
at 
org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
at 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
at 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at 
org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
at 
org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
at 
org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
at 
org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
at 
org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at 
org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
at 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
at 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
at 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
at 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:370)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:651)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 

[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ian Williams updated TIKA-1845:
---
Attachment: test-anonymised-letter.rtf

> Unable to extract content from certain RTFs using tika-server versions since 
> 1.5 
> -
>
> Key: TIKA-1845
> URL: https://issues.apache.org/jira/browse/TIKA-1845
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.6, 1.9, 1.11
> Environment: Windows
>Reporter: Ian Williams
> Attachments: test-anonymised-letter.rtf
>
>
> I have some patient letters that are RTF documents.  When I extract the text 
> from these documents using tika-server-1.5.jar, it works fine.
> However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
> 1.11), it fails with the stack trace and error shown below.
> I can provide a sample RTF that is failing.  I'm not sure how to attach files 
> to this issue so here is a link to an Evernote note containing an example RTF 
> that fails:
> http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/
> I wondered whether the error might be related to the following change that 
> was introduced in 1.6?:
>   * Made RTFParser's list handling slightly more robust against corrupt
> list metadata (TIKA-1305)
> It's possible that there is some issue with the RTF documents, but they are 
> real patient letters and they open in Microsoft Word without any problems.
> Many thanks
> Ian
> Steps to reproduce issue
> 
> 1. HTTP PUT to Tika server using curl:
> C:\Downloads\Apache Tika>curl -X PUT --data-binary 
> @test-anonymised-letter.rtf http://localhost:9998/tika --header 
> "Content-Type: application/rtf" --header "Accept: text/plain"
> --> this works fine when running tika-server-1.5.jar, but fails with 
> tika-server-1.6.jar
> 2. Screen capture from the server:
> INFO: Starting Apache Tika 1.9 server
> Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be http://localhost:9998/
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: jetty-8.y.z-SNAPSHOT
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Started SelectChannelConnector@localhost:9998
> Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
> INFO: Started
> Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource 
> logRequest
> INFO: tika (application/rtf)
> Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: tika: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.rtf.RTFParser@32a6dc
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
> at 
> org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
> at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
> at 
> org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
> at 
> 

[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5

2016-02-01 Thread Ian Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126340#comment-15126340
 ] 

Ian Williams commented on TIKA-1845:


OK - thanks.  I've attached the file now.

> Unable to extract content from certain RTFs using tika-server versions since 
> 1.5 
> -
>
> Key: TIKA-1845
> URL: https://issues.apache.org/jira/browse/TIKA-1845
> Project: Tika
>  Issue Type: Bug
>  Components: server
>Affects Versions: 1.6, 1.9, 1.11
> Environment: Windows
>Reporter: Ian Williams
> Attachments: test-anonymised-letter.rtf
>
>
> I have some patient letters that are RTF documents.  When I extract the text 
> from these documents using tika-server-1.5.jar, it works fine.
> However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 
> 1.11), it fails with the stack trace and error shown below.
> I can provide a sample RTF that is failing.  I'm not sure how to attach files 
> to this issue so here is a link to an Evernote note containing an example RTF 
> that fails:
> http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/
> I wondered whether the error might be related to the following change that 
> was introduced in 1.6?:
>   * Made RTFParser's list handling slightly more robust against corrupt
> list metadata (TIKA-1305)
> It's possible that there is some issue with the RTF documents, but they are 
> real patient letters and they open in Microsoft Word without any problems.
> Many thanks
> Ian
> Steps to reproduce issue
> 
> 1. HTTP PUT to Tika server using curl:
> C:\Downloads\Apache Tika>curl -X PUT --data-binary 
> @test-anonymised-letter.rtf http://localhost:9998/tika --header 
> "Content-Type: application/rtf" --header "Accept: text/plain"
> --> this works fine when running tika-server-1.5.jar, but fails with 
> tika-server-1.6.jar
> 2. Screen capture from the server:
> INFO: Starting Apache Tika 1.9 server
> Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination
> INFO: Setting the server's publish address to be http://localhost:9998/
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: jetty-8.y.z-SNAPSHOT
> Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Started SelectChannelConnector@localhost:9998
> Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main
> INFO: Started
> Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource 
> logRequest
> INFO: tika (application/rtf)
> Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse
> WARNING: tika: Text extraction failed
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from 
> org.apache.tika.parser.rtf.RTFParser@32a6dc
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283)
> at 
> org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163)
> at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244)
> at 
> org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321)
> at 
> org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164)
> at 
> org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117)
> at 
> org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83)
> at 
> org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)
> at 
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
> at 
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261)
> at 
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024)
> at 
> 

[jira] [Commented] (TIKA-894) Add webapp mode for Tika Server, simplifies deployment

2015-08-08 Thread Ian Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662971#comment-14662971
 ] 

Ian Williams commented on TIKA-894:
---

I am out of the office until Mon 10 Aug 2015.

Regards
Ian



 Add webapp mode for Tika Server, simplifies deployment
 --

 Key: TIKA-894
 URL: https://issues.apache.org/jira/browse/TIKA-894
 Project: Tika
  Issue Type: Improvement
  Components: packaging
Affects Versions: 1.1, 1.2
Reporter: Chris Wilson
  Labels: maven, newbie, patch
 Fix For: 1.11

 Attachments: tika-server-webapp.patch


 For use in production services, Tika Server should really be deployed as a 
 WAR file, under a reliable servlet container that knows how to run as a 
 system service, for example Tomcat or JBoss.
 This is especially important on Windows, where I wasted an entire day trying 
 to make TikaServerCli run as some kind of a service. 
 Maven makes building a webapp pretty trivial. With the attached patch 
 applied, mvn war:war should work. It seems to run fine in Tomcat, which 
 makes Windows deployment much simpler. Just install Tomcat and drop the WAR 
 file into tomcat's webapps directory and you're away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-894) Add webapp mode for Tika Server, simplifies deployment

2014-04-30 Thread Ian Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985613#comment-13985613
 ] 

Ian Williams commented on TIKA-894:
---

Hi Frederik, did you get anywhere with this? I'd like to run Tika within Tomcat 
and wondered if you'd got any further?  

 Add webapp mode for Tika Server, simplifies deployment
 --

 Key: TIKA-894
 URL: https://issues.apache.org/jira/browse/TIKA-894
 Project: Tika
  Issue Type: Improvement
  Components: packaging
Affects Versions: 1.1, 1.2
Reporter: Chris Wilson
  Labels: maven, newbie, patch
 Attachments: tika-server-webapp.patch


 For use in production services, Tika Server should really be deployed as a 
 WAR file, under a reliable servlet container that knows how to run as a 
 system service, for example Tomcat or JBoss.
 This is especially important on Windows, where I wasted an entire day trying 
 to make TikaServerCli run as some kind of a service. 
 Maven makes building a webapp pretty trivial. With the attached patch 
 applied, mvn war:war should work. It seems to run fine in Tomcat, which 
 makes Windows deployment much simpler. Just install Tomcat and drop the WAR 
 file into tomcat's webapps directory and you're away.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-894) Add webapp mode for Tika Server, simplifies deployment

2014-04-30 Thread Ian Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985643#comment-13985643
 ] 

Ian Williams commented on TIKA-894:
---

Hi Frederik

Thank you for getting back to me.  I'm interested in the memory issues you 
experienced.  Does tika-server appear to leak memory over time?

Many thanks
Ian




 Add webapp mode for Tika Server, simplifies deployment
 --

 Key: TIKA-894
 URL: https://issues.apache.org/jira/browse/TIKA-894
 Project: Tika
  Issue Type: Improvement
  Components: packaging
Affects Versions: 1.1, 1.2
Reporter: Chris Wilson
  Labels: maven, newbie, patch
 Attachments: tika-server-webapp.patch


 For use in production services, Tika Server should really be deployed as a 
 WAR file, under a reliable servlet container that knows how to run as a 
 system service, for example Tomcat or JBoss.
 This is especially important on Windows, where I wasted an entire day trying 
 to make TikaServerCli run as some kind of a service. 
 Maven makes building a webapp pretty trivial. With the attached patch 
 applied, mvn war:war should work. It seems to run fine in Tomcat, which 
 makes Windows deployment much simpler. Just install Tomcat and drop the WAR 
 file into tomcat's webapps directory and you're away.



--
This message was sent by Atlassian JIRA
(v6.2#6252)