[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204610#comment-15204610 ] Ian Williams commented on TIKA-1845: OK - thank you for clarifying that. > Unable to extract content from certain RTFs using tika-server versions since > 1.5 > - > > Key: TIKA-1845 > URL: https://issues.apache.org/jira/browse/TIKA-1845 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6, 1.9, 1.11 > Environment: Windows >Reporter: Ian Williams >Assignee: Tim Allison > Fix For: 1.13 > > Attachments: example-that-fails.rtf > > > I have some patient letters that are RTF documents. When I extract the text > from these documents using tika-server-1.5.jar, it works fine. > However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and > 1.11), it fails with the stack trace and error shown below. > I can provide a sample RTF that is failing. > I wondered whether the error might be related to the following change that > was introduced in 1.6?: > * Made RTFParser's list handling slightly more robust against corrupt > list metadata (TIKA-1305) > It's possible that there is some issue with the RTF documents, but they are > real patient letters and they open in Microsoft Word without any problems. > Many thanks > Ian > Steps to reproduce issue > > 1. HTTP PUT to Tika server using curl: > C:\Downloads\Apache Tika>curl -X PUT --data-binary > @test-anonymised-letter.rtf http://localhost:9998/tika --header > "Content-Type: application/rtf" --header "Accept: text/plain" > --> this works fine when running tika-server-1.5.jar, but fails with > tika-server-1.6.jar > 2. Screen capture from the server: > INFO: Starting Apache Tika 1.9 server > Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination > INFO: Setting the server's publish address to be http://localhost:9998/ > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: jetty-8.y.z-SNAPSHOT > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Started SelectChannelConnector@localhost:9998 > Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main > INFO: Started > Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource > logRequest > INFO: tika (application/rtf) > Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse > WARNING: tika: Text extraction failed > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@32a6dc > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) > at > org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) > at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) > at > org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) > at > org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) > at > org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) > at > org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at >
[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204531#comment-15204531 ] Ian Williams commented on TIKA-1845: Dear Tim I noticed Tika 1.12 has been released recently but doesn't contain this fix. Do you know which version of Tika will contain the fix? Many thanks Ian > Unable to extract content from certain RTFs using tika-server versions since > 1.5 > - > > Key: TIKA-1845 > URL: https://issues.apache.org/jira/browse/TIKA-1845 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6, 1.9, 1.11 > Environment: Windows >Reporter: Ian Williams >Assignee: Tim Allison > Attachments: example-that-fails.rtf > > > I have some patient letters that are RTF documents. When I extract the text > from these documents using tika-server-1.5.jar, it works fine. > However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and > 1.11), it fails with the stack trace and error shown below. > I can provide a sample RTF that is failing. > I wondered whether the error might be related to the following change that > was introduced in 1.6?: > * Made RTFParser's list handling slightly more robust against corrupt > list metadata (TIKA-1305) > It's possible that there is some issue with the RTF documents, but they are > real patient letters and they open in Microsoft Word without any problems. > Many thanks > Ian > Steps to reproduce issue > > 1. HTTP PUT to Tika server using curl: > C:\Downloads\Apache Tika>curl -X PUT --data-binary > @test-anonymised-letter.rtf http://localhost:9998/tika --header > "Content-Type: application/rtf" --header "Accept: text/plain" > --> this works fine when running tika-server-1.5.jar, but fails with > tika-server-1.6.jar > 2. Screen capture from the server: > INFO: Starting Apache Tika 1.9 server > Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination > INFO: Setting the server's publish address to be http://localhost:9998/ > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: jetty-8.y.z-SNAPSHOT > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Started SelectChannelConnector@localhost:9998 > Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main > INFO: Started > Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource > logRequest > INFO: tika (application/rtf) > Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse > WARNING: tika: Text extraction failed > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@32a6dc > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) > at > org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) > at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) > at > org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) > at > org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) > at > org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) > at > org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) > at >
[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130686#comment-15130686 ] Ian Williams commented on TIKA-1845: I am out of the office until Thu 04 Feb 2016. Regards Ian > Unable to extract content from certain RTFs using tika-server versions since > 1.5 > - > > Key: TIKA-1845 > URL: https://issues.apache.org/jira/browse/TIKA-1845 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6, 1.9, 1.11 > Environment: Windows >Reporter: Ian Williams >Assignee: Tim Allison > Attachments: example-that-fails.rtf > > > I have some patient letters that are RTF documents. When I extract the text > from these documents using tika-server-1.5.jar, it works fine. > However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and > 1.11), it fails with the stack trace and error shown below. > I can provide a sample RTF that is failing. > I wondered whether the error might be related to the following change that > was introduced in 1.6?: > * Made RTFParser's list handling slightly more robust against corrupt > list metadata (TIKA-1305) > It's possible that there is some issue with the RTF documents, but they are > real patient letters and they open in Microsoft Word without any problems. > Many thanks > Ian > Steps to reproduce issue > > 1. HTTP PUT to Tika server using curl: > C:\Downloads\Apache Tika>curl -X PUT --data-binary > @test-anonymised-letter.rtf http://localhost:9998/tika --header > "Content-Type: application/rtf" --header "Accept: text/plain" > --> this works fine when running tika-server-1.5.jar, but fails with > tika-server-1.6.jar > 2. Screen capture from the server: > INFO: Starting Apache Tika 1.9 server > Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination > INFO: Setting the server's publish address to be http://localhost:9998/ > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: jetty-8.y.z-SNAPSHOT > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Started SelectChannelConnector@localhost:9998 > Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main > INFO: Started > Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource > logRequest > INFO: tika (application/rtf) > Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse > WARNING: tika: Text extraction failed > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@32a6dc > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) > at > org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) > at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) > at > org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) > at > org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) > at > org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) > at > org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at >
[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126477#comment-15126477 ] Ian Williams commented on TIKA-1845: Tim - thanks for confirming what's in the attachment and for the heads up about the metadata. I've attached a new cutdown example that fails with the same error. Please use this sample for unit tests etc. > Unable to extract content from certain RTFs using tika-server versions since > 1.5 > - > > Key: TIKA-1845 > URL: https://issues.apache.org/jira/browse/TIKA-1845 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6, 1.9, 1.11 > Environment: Windows >Reporter: Ian Williams > Attachments: example-that-fails.rtf > > > I have some patient letters that are RTF documents. When I extract the text > from these documents using tika-server-1.5.jar, it works fine. > However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and > 1.11), it fails with the stack trace and error shown below. > I can provide a sample RTF that is failing. > I wondered whether the error might be related to the following change that > was introduced in 1.6?: > * Made RTFParser's list handling slightly more robust against corrupt > list metadata (TIKA-1305) > It's possible that there is some issue with the RTF documents, but they are > real patient letters and they open in Microsoft Word without any problems. > Many thanks > Ian > Steps to reproduce issue > > 1. HTTP PUT to Tika server using curl: > C:\Downloads\Apache Tika>curl -X PUT --data-binary > @test-anonymised-letter.rtf http://localhost:9998/tika --header > "Content-Type: application/rtf" --header "Accept: text/plain" > --> this works fine when running tika-server-1.5.jar, but fails with > tika-server-1.6.jar > 2. Screen capture from the server: > INFO: Starting Apache Tika 1.9 server > Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination > INFO: Setting the server's publish address to be http://localhost:9998/ > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: jetty-8.y.z-SNAPSHOT > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Started SelectChannelConnector@localhost:9998 > Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main > INFO: Started > Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource > logRequest > INFO: tika (application/rtf) > Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse > WARNING: tika: Text extraction failed > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@32a6dc > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) > at > org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) > at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) > at > org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) > at > org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) > at > org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) > at > org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) > at >
[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Attachment: example-that-fails.rtf > Unable to extract content from certain RTFs using tika-server versions since > 1.5 > - > > Key: TIKA-1845 > URL: https://issues.apache.org/jira/browse/TIKA-1845 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6, 1.9, 1.11 > Environment: Windows >Reporter: Ian Williams > Attachments: example-that-fails.rtf > > > I have some patient letters that are RTF documents. When I extract the text > from these documents using tika-server-1.5.jar, it works fine. > However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and > 1.11), it fails with the stack trace and error shown below. > I can provide a sample RTF that is failing. > I wondered whether the error might be related to the following change that > was introduced in 1.6?: > * Made RTFParser's list handling slightly more robust against corrupt > list metadata (TIKA-1305) > It's possible that there is some issue with the RTF documents, but they are > real patient letters and they open in Microsoft Word without any problems. > Many thanks > Ian > Steps to reproduce issue > > 1. HTTP PUT to Tika server using curl: > C:\Downloads\Apache Tika>curl -X PUT --data-binary > @test-anonymised-letter.rtf http://localhost:9998/tika --header > "Content-Type: application/rtf" --header "Accept: text/plain" > --> this works fine when running tika-server-1.5.jar, but fails with > tika-server-1.6.jar > 2. Screen capture from the server: > INFO: Starting Apache Tika 1.9 server > Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination > INFO: Setting the server's publish address to be http://localhost:9998/ > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: jetty-8.y.z-SNAPSHOT > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Started SelectChannelConnector@localhost:9998 > Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main > INFO: Started > Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource > logRequest > INFO: tika (application/rtf) > Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse > WARNING: tika: Text extraction failed > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@32a6dc > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) > at > org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) > at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) > at > org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) > at > org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) > at > org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) > at > org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > at >
[jira] [Created] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
Ian Williams created TIKA-1845: -- Summary: Unable to extract content from certain RTFs using tika-server versions since 1.5 Key: TIKA-1845 URL: https://issues.apache.org/jira/browse/TIKA-1845 Project: Tika Issue Type: Bug Components: server Affects Versions: 1.11, 1.9, 1.6 Environment: Windows Reporter: Ian Williams I have some patient letters that are RTF documents. When I extract the text from these documents using tika-server-1.5.jar, it works fine. However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 1.11), it fails with the stack trace and error shown below. I can provide a sample RTF that is failing. I wondered whether the error might be related to the following change that was introduced in 1.6?: * Made RTFParser's list handling slightly more robust against corrupt list metadata (TIKA-1305) It's possible that there is some issue with the RTF documents, but they are real patient letters and they open in Microsoft Word without any problems. Many thanks Ian Steps to reproduce issue 1. HTTP PUT to Tika server using curl: C:\Downloads\Apache Tika>curl -X PUT --data-binary @test-anonymised-letter.rtf http://localhost:9998/tika --header "Content-Type: application/rtf" --header "Accept: text/plain" --> this works fine when running tika-server-1.5.jar, but fails with tika-server-1.6.jar 2. Screen capture from the server: INFO: Starting Apache Tika 1.9 server Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination INFO: Setting the server's publish address to be http://localhost:9998/ Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info INFO: jetty-8.y.z-SNAPSHOT Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Started SelectChannelConnector@localhost:9998 Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main INFO: Started Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource logRequest INFO: tika (application/rtf) Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse WARNING: tika: Text extraction failed org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@32a6dc at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) at org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at
[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Description: I have some patient letters that are RTF documents. When I extract the text from these documents using tika-server-1.5.jar, it works fine. However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 1.11), it fails with the stack trace and error shown below. I can provide a sample RTF that is failing. I'm not sure how to attach files to this issue so here is a link to an Evernote note containing an example RTF that fails: https://www.evernote.com/shard/s66/sh/4a003611-2400-4959-a1cc-2be5b3efe2cf/284a6f2dd3e0a290 I wondered whether the error might be related to the following change that was introduced in 1.6?: * Made RTFParser's list handling slightly more robust against corrupt list metadata (TIKA-1305) It's possible that there is some issue with the RTF documents, but they are real patient letters and they open in Microsoft Word without any problems. Many thanks Ian Steps to reproduce issue 1. HTTP PUT to Tika server using curl: C:\Downloads\Apache Tika>curl -X PUT --data-binary @test-anonymised-letter.rtf http://localhost:9998/tika --header "Content-Type: application/rtf" --header "Accept: text/plain" --> this works fine when running tika-server-1.5.jar, but fails with tika-server-1.6.jar 2. Screen capture from the server: INFO: Starting Apache Tika 1.9 server Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination INFO: Setting the server's publish address to be http://localhost:9998/ Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info INFO: jetty-8.y.z-SNAPSHOT Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Started SelectChannelConnector@localhost:9998 Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main INFO: Started Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource logRequest INFO: tika (application/rtf) Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse WARNING: tika: Text extraction failed org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@32a6dc at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) at org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at
[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Description: I have some patient letters that are RTF documents. When I extract the text from these documents using tika-server-1.5.jar, it works fine. However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 1.11), it fails with the stack trace and error shown below. I can provide a sample RTF that is failing. I'm not sure how to attach files to this issue so here is a link to an Evernote note containing an example RTF that fails: http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/ I wondered whether the error might be related to the following change that was introduced in 1.6?: * Made RTFParser's list handling slightly more robust against corrupt list metadata (TIKA-1305) It's possible that there is some issue with the RTF documents, but they are real patient letters and they open in Microsoft Word without any problems. Many thanks Ian Steps to reproduce issue 1. HTTP PUT to Tika server using curl: C:\Downloads\Apache Tika>curl -X PUT --data-binary @test-anonymised-letter.rtf http://localhost:9998/tika --header "Content-Type: application/rtf" --header "Accept: text/plain" --> this works fine when running tika-server-1.5.jar, but fails with tika-server-1.6.jar 2. Screen capture from the server: INFO: Starting Apache Tika 1.9 server Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination INFO: Setting the server's publish address to be http://localhost:9998/ Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info INFO: jetty-8.y.z-SNAPSHOT Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Started SelectChannelConnector@localhost:9998 Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main INFO: Started Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource logRequest INFO: tika (application/rtf) Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse WARNING: tika: Text extraction failed org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@32a6dc at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) at org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at
[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Attachment: (was: test-anonymised-letter.rtf) > Unable to extract content from certain RTFs using tika-server versions since > 1.5 > - > > Key: TIKA-1845 > URL: https://issues.apache.org/jira/browse/TIKA-1845 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6, 1.9, 1.11 > Environment: Windows >Reporter: Ian Williams > > I have some patient letters that are RTF documents. When I extract the text > from these documents using tika-server-1.5.jar, it works fine. > However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and > 1.11), it fails with the stack trace and error shown below. > I can provide a sample RTF that is failing. I'm not sure how to attach files > to this issue so here is a link to an Evernote note containing an example RTF > that fails: > http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/ > I wondered whether the error might be related to the following change that > was introduced in 1.6?: > * Made RTFParser's list handling slightly more robust against corrupt > list metadata (TIKA-1305) > It's possible that there is some issue with the RTF documents, but they are > real patient letters and they open in Microsoft Word without any problems. > Many thanks > Ian > Steps to reproduce issue > > 1. HTTP PUT to Tika server using curl: > C:\Downloads\Apache Tika>curl -X PUT --data-binary > @test-anonymised-letter.rtf http://localhost:9998/tika --header > "Content-Type: application/rtf" --header "Accept: text/plain" > --> this works fine when running tika-server-1.5.jar, but fails with > tika-server-1.6.jar > 2. Screen capture from the server: > INFO: Starting Apache Tika 1.9 server > Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination > INFO: Setting the server's publish address to be http://localhost:9998/ > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: jetty-8.y.z-SNAPSHOT > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Started SelectChannelConnector@localhost:9998 > Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main > INFO: Started > Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource > logRequest > INFO: tika (application/rtf) > Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse > WARNING: tika: Text extraction failed > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@32a6dc > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) > at > org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) > at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) > at > org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) > at > org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) > at > org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) > at > org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) >
[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126354#comment-15126354 ] Ian Williams commented on TIKA-1845: I've deleted the attachment for the time being - sorry. Please contact me directly for a sample. The reason is that I don't know what's in the embedded file within the RTF. > Unable to extract content from certain RTFs using tika-server versions since > 1.5 > - > > Key: TIKA-1845 > URL: https://issues.apache.org/jira/browse/TIKA-1845 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6, 1.9, 1.11 > Environment: Windows >Reporter: Ian Williams > > I have some patient letters that are RTF documents. When I extract the text > from these documents using tika-server-1.5.jar, it works fine. > However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and > 1.11), it fails with the stack trace and error shown below. > I can provide a sample RTF that is failing. I'm not sure how to attach files > to this issue so here is a link to an Evernote note containing an example RTF > that fails: > http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/ > I wondered whether the error might be related to the following change that > was introduced in 1.6?: > * Made RTFParser's list handling slightly more robust against corrupt > list metadata (TIKA-1305) > It's possible that there is some issue with the RTF documents, but they are > real patient letters and they open in Microsoft Word without any problems. > Many thanks > Ian > Steps to reproduce issue > > 1. HTTP PUT to Tika server using curl: > C:\Downloads\Apache Tika>curl -X PUT --data-binary > @test-anonymised-letter.rtf http://localhost:9998/tika --header > "Content-Type: application/rtf" --header "Accept: text/plain" > --> this works fine when running tika-server-1.5.jar, but fails with > tika-server-1.6.jar > 2. Screen capture from the server: > INFO: Starting Apache Tika 1.9 server > Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination > INFO: Setting the server's publish address to be http://localhost:9998/ > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: jetty-8.y.z-SNAPSHOT > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Started SelectChannelConnector@localhost:9998 > Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main > INFO: Started > Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource > logRequest > INFO: tika (application/rtf) > Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse > WARNING: tika: Text extraction failed > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@32a6dc > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) > at > org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) > at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) > at > org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) > at > org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) > at > org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) > at > org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) > at >
[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126375#comment-15126375 ] Ian Williams commented on TIKA-1845: Just being cautious because I don't want to share anything in a public forum that isn't 100% anonymised, and I don't know what's in that embedded file within the RTF (doesn't show up within the document in Word). Possibly a logo or something like that. > Unable to extract content from certain RTFs using tika-server versions since > 1.5 > - > > Key: TIKA-1845 > URL: https://issues.apache.org/jira/browse/TIKA-1845 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6, 1.9, 1.11 > Environment: Windows >Reporter: Ian Williams > > I have some patient letters that are RTF documents. When I extract the text > from these documents using tika-server-1.5.jar, it works fine. > However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and > 1.11), it fails with the stack trace and error shown below. > I can provide a sample RTF that is failing. I'm not sure how to attach files > to this issue so here is a link to an Evernote note containing an example RTF > that fails: > http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/ > I wondered whether the error might be related to the following change that > was introduced in 1.6?: > * Made RTFParser's list handling slightly more robust against corrupt > list metadata (TIKA-1305) > It's possible that there is some issue with the RTF documents, but they are > real patient letters and they open in Microsoft Word without any problems. > Many thanks > Ian > Steps to reproduce issue > > 1. HTTP PUT to Tika server using curl: > C:\Downloads\Apache Tika>curl -X PUT --data-binary > @test-anonymised-letter.rtf http://localhost:9998/tika --header > "Content-Type: application/rtf" --header "Accept: text/plain" > --> this works fine when running tika-server-1.5.jar, but fails with > tika-server-1.6.jar > 2. Screen capture from the server: > INFO: Starting Apache Tika 1.9 server > Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination > INFO: Setting the server's publish address to be http://localhost:9998/ > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: jetty-8.y.z-SNAPSHOT > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Started SelectChannelConnector@localhost:9998 > Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main > INFO: Started > Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource > logRequest > INFO: tika (application/rtf) > Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse > WARNING: tika: Text extraction failed > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@32a6dc > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) > at > org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) > at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) > at > org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) > at > org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) > at > org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) > at > org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) > at >
[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Description: I have some patient letters that are RTF documents. When I extract the text from these documents using tika-server-1.5.jar, it works fine. However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and 1.11), it fails with the stack trace and error shown below. I can provide a sample RTF that is failing. I wondered whether the error might be related to the following change that was introduced in 1.6?: * Made RTFParser's list handling slightly more robust against corrupt list metadata (TIKA-1305) It's possible that there is some issue with the RTF documents, but they are real patient letters and they open in Microsoft Word without any problems. Many thanks Ian Steps to reproduce issue 1. HTTP PUT to Tika server using curl: C:\Downloads\Apache Tika>curl -X PUT --data-binary @test-anonymised-letter.rtf http://localhost:9998/tika --header "Content-Type: application/rtf" --header "Accept: text/plain" --> this works fine when running tika-server-1.5.jar, but fails with tika-server-1.6.jar 2. Screen capture from the server: INFO: Starting Apache Tika 1.9 server Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination INFO: Setting the server's publish address to be http://localhost:9998/ Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info INFO: jetty-8.y.z-SNAPSHOT Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Started SelectChannelConnector@localhost:9998 Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main INFO: Started Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource logRequest INFO: tika (application/rtf) Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse WARNING: tika: Text extraction failed org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@32a6dc at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) at org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) at org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) at org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) at org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) at org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) at org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) at org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:651) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
[jira] [Updated] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ian Williams updated TIKA-1845: --- Attachment: test-anonymised-letter.rtf > Unable to extract content from certain RTFs using tika-server versions since > 1.5 > - > > Key: TIKA-1845 > URL: https://issues.apache.org/jira/browse/TIKA-1845 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6, 1.9, 1.11 > Environment: Windows >Reporter: Ian Williams > Attachments: test-anonymised-letter.rtf > > > I have some patient letters that are RTF documents. When I extract the text > from these documents using tika-server-1.5.jar, it works fine. > However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and > 1.11), it fails with the stack trace and error shown below. > I can provide a sample RTF that is failing. I'm not sure how to attach files > to this issue so here is a link to an Evernote note containing an example RTF > that fails: > http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/ > I wondered whether the error might be related to the following change that > was introduced in 1.6?: > * Made RTFParser's list handling slightly more robust against corrupt > list metadata (TIKA-1305) > It's possible that there is some issue with the RTF documents, but they are > real patient letters and they open in Microsoft Word without any problems. > Many thanks > Ian > Steps to reproduce issue > > 1. HTTP PUT to Tika server using curl: > C:\Downloads\Apache Tika>curl -X PUT --data-binary > @test-anonymised-letter.rtf http://localhost:9998/tika --header > "Content-Type: application/rtf" --header "Accept: text/plain" > --> this works fine when running tika-server-1.5.jar, but fails with > tika-server-1.6.jar > 2. Screen capture from the server: > INFO: Starting Apache Tika 1.9 server > Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination > INFO: Setting the server's publish address to be http://localhost:9998/ > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: jetty-8.y.z-SNAPSHOT > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Started SelectChannelConnector@localhost:9998 > Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main > INFO: Started > Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource > logRequest > INFO: tika (application/rtf) > Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse > WARNING: tika: Text extraction failed > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@32a6dc > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) > at > org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) > at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) > at > org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) > at > org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) > at > org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) > at > org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) > at >
[jira] [Commented] (TIKA-1845) Unable to extract content from certain RTFs using tika-server versions since 1.5
[ https://issues.apache.org/jira/browse/TIKA-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126340#comment-15126340 ] Ian Williams commented on TIKA-1845: OK - thanks. I've attached the file now. > Unable to extract content from certain RTFs using tika-server versions since > 1.5 > - > > Key: TIKA-1845 > URL: https://issues.apache.org/jira/browse/TIKA-1845 > Project: Tika > Issue Type: Bug > Components: server >Affects Versions: 1.6, 1.9, 1.11 > Environment: Windows >Reporter: Ian Williams > Attachments: test-anonymised-letter.rtf > > > I have some patient letters that are RTF documents. When I extract the text > from these documents using tika-server-1.5.jar, it works fine. > However, in tika-server-1.6.jar and later versions (I've tried 1.6, 1.9 and > 1.11), it fails with the stack trace and error shown below. > I can provide a sample RTF that is failing. I'm not sure how to attach files > to this issue so here is a link to an Evernote note containing an example RTF > that fails: > http://www.evernote.com/l/AEJKADYRJABJWaHMK-Wz7-LPKEpvLdPgopA/ > I wondered whether the error might be related to the following change that > was introduced in 1.6?: > * Made RTFParser's list handling slightly more robust against corrupt > list metadata (TIKA-1305) > It's possible that there is some issue with the RTF documents, but they are > real patient letters and they open in Microsoft Word without any problems. > Many thanks > Ian > Steps to reproduce issue > > 1. HTTP PUT to Tika server using curl: > C:\Downloads\Apache Tika>curl -X PUT --data-binary > @test-anonymised-letter.rtf http://localhost:9998/tika --header > "Content-Type: application/rtf" --header "Accept: text/plain" > --> this works fine when running tika-server-1.5.jar, but fails with > tika-server-1.6.jar > 2. Screen capture from the server: > INFO: Starting Apache Tika 1.9 server > Feb 01, 2016 2:26:10 PM org.apache.cxf.endpoint.ServerImpl initDestination > INFO: Setting the server's publish address to be http://localhost:9998/ > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: jetty-8.y.z-SNAPSHOT > Feb 01, 2016 2:26:10 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Started SelectChannelConnector@localhost:9998 > Feb 01, 2016 2:26:10 PM org.apache.tika.server.TikaServerCli main > INFO: Started > Feb 01, 2016 2:26:24 PM org.apache.tika.server.resource.TikaResource > logRequest > INFO: tika (application/rtf) > Feb 01, 2016 2:26:25 PM org.apache.tika.server.resource.TikaResource parse > WARNING: tika: Text extraction failed > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.rtf.RTFParser@32a6dc > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:283) > at > org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:163) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:281) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.tika.server.resource.TikaResource.parse(TikaResource.java:244) > at > org.apache.tika.server.resource.TikaResource$4.write(TikaResource.java:321) > at > org.apache.cxf.jaxrs.provider.BinaryDataProvider.writeTo(BinaryDataProvider.java:164) > at > org.apache.cxf.jaxrs.utils.JAXRSUtils.writeMessageBody(JAXRSUtils.java:1363) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.serializeMessage(JAXRSOutInterceptor.java:244) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.processResponse(JAXRSOutInterceptor.java:117) > at > org.apache.cxf.jaxrs.interceptor.JAXRSOutInterceptor.handleMessage(JAXRSOutInterceptor.java:80) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.interceptor.OutgoingChainInterceptor.handleMessage(OutgoingChainInterceptor.java:83) > at > org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) > at > org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) > at > org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:251) > at > org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:261) > at > org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:70) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1088) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1024) > at >
[jira] [Commented] (TIKA-894) Add webapp mode for Tika Server, simplifies deployment
[ https://issues.apache.org/jira/browse/TIKA-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662971#comment-14662971 ] Ian Williams commented on TIKA-894: --- I am out of the office until Mon 10 Aug 2015. Regards Ian Add webapp mode for Tika Server, simplifies deployment -- Key: TIKA-894 URL: https://issues.apache.org/jira/browse/TIKA-894 Project: Tika Issue Type: Improvement Components: packaging Affects Versions: 1.1, 1.2 Reporter: Chris Wilson Labels: maven, newbie, patch Fix For: 1.11 Attachments: tika-server-webapp.patch For use in production services, Tika Server should really be deployed as a WAR file, under a reliable servlet container that knows how to run as a system service, for example Tomcat or JBoss. This is especially important on Windows, where I wasted an entire day trying to make TikaServerCli run as some kind of a service. Maven makes building a webapp pretty trivial. With the attached patch applied, mvn war:war should work. It seems to run fine in Tomcat, which makes Windows deployment much simpler. Just install Tomcat and drop the WAR file into tomcat's webapps directory and you're away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-894) Add webapp mode for Tika Server, simplifies deployment
[ https://issues.apache.org/jira/browse/TIKA-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985613#comment-13985613 ] Ian Williams commented on TIKA-894: --- Hi Frederik, did you get anywhere with this? I'd like to run Tika within Tomcat and wondered if you'd got any further? Add webapp mode for Tika Server, simplifies deployment -- Key: TIKA-894 URL: https://issues.apache.org/jira/browse/TIKA-894 Project: Tika Issue Type: Improvement Components: packaging Affects Versions: 1.1, 1.2 Reporter: Chris Wilson Labels: maven, newbie, patch Attachments: tika-server-webapp.patch For use in production services, Tika Server should really be deployed as a WAR file, under a reliable servlet container that knows how to run as a system service, for example Tomcat or JBoss. This is especially important on Windows, where I wasted an entire day trying to make TikaServerCli run as some kind of a service. Maven makes building a webapp pretty trivial. With the attached patch applied, mvn war:war should work. It seems to run fine in Tomcat, which makes Windows deployment much simpler. Just install Tomcat and drop the WAR file into tomcat's webapps directory and you're away. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-894) Add webapp mode for Tika Server, simplifies deployment
[ https://issues.apache.org/jira/browse/TIKA-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985643#comment-13985643 ] Ian Williams commented on TIKA-894: --- Hi Frederik Thank you for getting back to me. I'm interested in the memory issues you experienced. Does tika-server appear to leak memory over time? Many thanks Ian Add webapp mode for Tika Server, simplifies deployment -- Key: TIKA-894 URL: https://issues.apache.org/jira/browse/TIKA-894 Project: Tika Issue Type: Improvement Components: packaging Affects Versions: 1.1, 1.2 Reporter: Chris Wilson Labels: maven, newbie, patch Attachments: tika-server-webapp.patch For use in production services, Tika Server should really be deployed as a WAR file, under a reliable servlet container that knows how to run as a system service, for example Tomcat or JBoss. This is especially important on Windows, where I wasted an entire day trying to make TikaServerCli run as some kind of a service. Maven makes building a webapp pretty trivial. With the attached patch applied, mvn war:war should work. It seems to run fine in Tomcat, which makes Windows deployment much simpler. Just install Tomcat and drop the WAR file into tomcat's webapps directory and you're away. -- This message was sent by Atlassian JIRA (v6.2#6252)