[jira] [Commented] (TIKA-3961) When a parser exception happens, the "resourceName" key becomes "esourceName"

2023-01-30 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682120#comment-17682120
 ] 

Tim Allison commented on TIKA-3961:
---

I'm not able to reproduce this on linux at least.  I'm wondering why you're 
getting a different exception than I am?

I'll break out my Windows laptop and see if I can reproduce it there.  

I'm wondering if there's something weird going on with the \r on Windows?

{noformat}curl -X PUT -H "Content-Disposition: attachment; 
filename=something.docx" --upload-file testWORD_protected_passtika.docx 
http://localhost:9998/rmeta
{noformat}

On linux, I get back:

{noformat}
[{"X-TIKA:EXCEPTION:container_exception":"org.apache.tika.exception.EncryptedDocumentException:
 Unable to process: document is encrypted\n\tat 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:262)\n\tat
 
org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\n\tat
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\n\tat 
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\n\tat 
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)\n\tat 
org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:163)\n\tat
 
org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)\n\tat
 
org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:78)\n\tat
 
org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadataToMetadataList(RecursiveMetadataResource.java:190)\n\tat
 
org.apache.tika.server.core.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:179)\n\tat
 java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)\n\tat 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)\n\tat
 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
 java.base/java.lang.reflect.Method.invoke(Method.java:568)\n\tat 
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)\n\tat
 
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\n\tat
 org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\n\tat 
org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)\n\tat 
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\n\tat
 
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\n\tat
 
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)\n\tat
 
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\n\tat
 
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\n\tat
 
org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\n\tat
 
org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat 
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)\n\tat
 org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat
 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
 org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat 
org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\n\tat
 
java.base/java.lang.Thread.run(Thread.java:833)\n","resourceName":"something.docx"}]
{noformat}

> When a parser exception happens, the "resourceName" key becomes "esourceName"
> --

[jira] [Created] (TIKA-3962) Set RFC822 parser to noRecurse

2023-01-30 Thread Tim Allison (Jira)
Tim Allison created TIKA-3962:
-

 Summary: Set RFC822 parser to noRecurse
 Key: TIKA-3962
 URL: https://issues.apache.org/jira/browse/TIKA-3962
 Project: Tika
  Issue Type: Task
Reporter: Tim Allison


On our test file {{testGroupWiseEml.eml}}, there's an embedded rfc822 
attachment that is currently not treated as an attachment but is inlined. 

The relevant section of the test file is:

{noformat}
Content-Type: message/rfc822
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.eml"
{noformat}

When I open the email in several email clients, it shows this {{test.eml}} 
correctly as an attachment.  

It turns out there's a setting on mime4j's parser "setNoRecurse" that yields 
the correct behavior on this test file.  Given that Tika handles files 
recursively already by default, I _think_ we should be safe to set no recurse 
in the mime4j parser and rely on Tika's own recursive parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: next release?

2023-01-30 Thread Tim Allison
All,
  After I fix TIKA-3962, I'll start the regression tests in
preparation for a 2.7.0 release.  Please let me know if there are any
blockers or if you're working on something that you want to get into
the next release.
  Thank you!

 Best,

 Tim

On Thu, Jan 19, 2023 at 10:15 AM Tim Allison  wrote:
>
> All,
>   I'm thinking we should cut a release in the next week or so.  I can
> start the regression tests next week (possibly late in the week).  I
> think that the changes move us into the "minor" version update, so
> 2.7.0.
>   WDYT?  Are there any imminent releases of our dependencies that we
> should wait for?  Anything else we'd want to get into the next
> release?
>   Thank you!
>
>  Best,
>
>  Tim


[jira] [Resolved] (TIKA-3962) Set RFC822 parser to noRecurse

2023-01-30 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3962.
---
Fix Version/s: 2.7.0
   Resolution: Fixed

> Set RFC822 parser to noRecurse
> --
>
> Key: TIKA-3962
> URL: https://issues.apache.org/jira/browse/TIKA-3962
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.7.0
>
>
> On our test file {{testGroupWiseEml.eml}}, there's an embedded rfc822 
> attachment that is currently not treated as an attachment but is inlined. 
> The relevant section of the test file is:
> {noformat}
> Content-Type: message/rfc822
> Content-Transfer-Encoding: base64
> Content-Disposition: attachment; filename="test.eml"
> {noformat}
> When I open the email in several email clients, it shows this {{test.eml}} 
> correctly as an attachment.  
> It turns out there's a setting on mime4j's parser "setNoRecurse" that yields 
> the correct behavior on this test file.  Given that Tika handles files 
> recursively already by default, I _think_ we should be safe to set no recurse 
> in the mime4j parser and rely on Tika's own recursive parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-3961) When a parser exception happens, the "resourceName" key becomes "esourceName"

2023-01-30 Thread Josh Burchard (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Burchard updated TIKA-3961:

Attachment: encrypted.docx

> When a parser exception happens, the "resourceName" key becomes "esourceName"
> -
>
> Key: TIKA-3961
> URL: https://issues.apache.org/jira/browse/TIKA-3961
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.1
> Environment: Windows 10.   Tika 2.4.1.  Tika server.   
>Reporter: Josh Burchard
>Priority: Major
> Attachments: encrypted.docx
>
>
> Test env: Windows 10
> Tika 2.4.1, tika server
>  
> In my config I've specified:
>      class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter">
>       
>         
>           X-TIKA:content
>           dc:creator
>           dc:title
>           resourceName
>           X-TIKA:EXCEPTION:container_exception
>         
>       
>     
>  
> For a password-protected docx file Tika returns the following (see bold txt 
> at the bottom):
> [{"X-TIKA:EXCEPTION:container_exception":"org.apache.poi.EncryptedDocumentException:
>  java.security.NoSuchAlgorithmException: Cannot find any provider supporting 
> AES/CBC/NoPadding\r\n\tat 
> org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions[7B14:0002-7080]
>  java:274)\r\n\tat 
> org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:223)\r\n\tat
>  
> org.apache.poi.poifs.crypt.agile.AgileDecryptor.hashInput(AgileDecryptor.java:196)\r\n\tat
>  
> org.apache.poi.poifs.crypt.agile.AgileDecryptor.verifyPasswrd(AgileDecryptor.java:102)\r\n\tat
>  
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:261)\r\n\tat
>  
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat
>  
> org.apache.tika.parser.CompositeParser.parse(CompositParser.java:298)\r\n\tat 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\r\n\tat
>  
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)\r\n\tat
>  
> org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWraper.java:163)\r\n\tat
>  
> org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)\r\n\tat
>  
> org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:78)\r\n\tat
>  
> org.apache.tika.server.cor.resource.RecursiveMetadataResource.parseMetadataToMetadataList(RecursiveMetadataResource.java:190)\r\n\tat
>  
> org.apache.tika.server.core.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:179)\r\n\tat
>  sun.reflect.GeneratedMethodAcessor7.invoke(Unknown Source)\r\n\tat 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r\n\tat
>  java.lang.reflect.Method.invoke(Method.java:498)\r\n\tat 
> org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(bstractInvoker.java:179)\r\n\tat
>  
> org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\r\n\tat
>  org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\r\n\tat 
> org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)r\n\tat 
> org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\r\n\tat
>  
> org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\r\n\tat
>  
> org.apache.cxf.phase.PhaseInterceptrChain.doIntercept(PhaseInterceptorChain.java:307)\r\n\tat
>  
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\r\n\tat
>  
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\\n\tat
>  
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\r\n\tat
>  
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\r\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.andle(HandlerWrapper.java:127)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScpedHandler.nextScope(ScopedHandler.java:190)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
>  
> org.eclipse.jetty.server.hndler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\r\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\r\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:516)\r\n\tat 
> o

[jira] [Commented] (TIKA-3961) When a parser exception happens, the "resourceName" key becomes "esourceName"

2023-01-30 Thread Josh Burchard (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682185#comment-17682185
 ] 

Josh Burchard commented on TIKA-3961:
-

I attached the particular file that I reproduced the problem with. 

> When a parser exception happens, the "resourceName" key becomes "esourceName"
> -
>
> Key: TIKA-3961
> URL: https://issues.apache.org/jira/browse/TIKA-3961
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.1
> Environment: Windows 10.   Tika 2.4.1.  Tika server.   
>Reporter: Josh Burchard
>Priority: Major
> Attachments: encrypted.docx
>
>
> Test env: Windows 10
> Tika 2.4.1, tika server
>  
> In my config I've specified:
>      class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter">
>       
>         
>           X-TIKA:content
>           dc:creator
>           dc:title
>           resourceName
>           X-TIKA:EXCEPTION:container_exception
>         
>       
>     
>  
> For a password-protected docx file Tika returns the following (see bold txt 
> at the bottom):
> [{"X-TIKA:EXCEPTION:container_exception":"org.apache.poi.EncryptedDocumentException:
>  java.security.NoSuchAlgorithmException: Cannot find any provider supporting 
> AES/CBC/NoPadding\r\n\tat 
> org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions[7B14:0002-7080]
>  java:274)\r\n\tat 
> org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:223)\r\n\tat
>  
> org.apache.poi.poifs.crypt.agile.AgileDecryptor.hashInput(AgileDecryptor.java:196)\r\n\tat
>  
> org.apache.poi.poifs.crypt.agile.AgileDecryptor.verifyPasswrd(AgileDecryptor.java:102)\r\n\tat
>  
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:261)\r\n\tat
>  
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat
>  
> org.apache.tika.parser.CompositeParser.parse(CompositParser.java:298)\r\n\tat 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\r\n\tat
>  
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)\r\n\tat
>  
> org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWraper.java:163)\r\n\tat
>  
> org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)\r\n\tat
>  
> org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:78)\r\n\tat
>  
> org.apache.tika.server.cor.resource.RecursiveMetadataResource.parseMetadataToMetadataList(RecursiveMetadataResource.java:190)\r\n\tat
>  
> org.apache.tika.server.core.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:179)\r\n\tat
>  sun.reflect.GeneratedMethodAcessor7.invoke(Unknown Source)\r\n\tat 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r\n\tat
>  java.lang.reflect.Method.invoke(Method.java:498)\r\n\tat 
> org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(bstractInvoker.java:179)\r\n\tat
>  
> org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\r\n\tat
>  org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\r\n\tat 
> org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)r\n\tat 
> org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\r\n\tat
>  
> org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\r\n\tat
>  
> org.apache.cxf.phase.PhaseInterceptrChain.doIntercept(PhaseInterceptorChain.java:307)\r\n\tat
>  
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\r\n\tat
>  
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\\n\tat
>  
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\r\n\tat
>  
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\r\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.andle(HandlerWrapper.java:127)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScpedHandler.nextScope(ScopedHandler.java:190)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
>  
> org.eclipse.jetty.server.hndler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\r\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWr

[jira] [Commented] (TIKA-3961) When a parser exception happens, the "resourceName" key becomes "esourceName"

2023-01-30 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682192#comment-17682192
 ] 

Tim Allison commented on TIKA-3961:
---

Thank you!  I'm getting the same exception I got before.  This shouldn't change 
the nature of the problem tho!  Will break out Windows laptop in a few hours.  
Thank you!

> When a parser exception happens, the "resourceName" key becomes "esourceName"
> -
>
> Key: TIKA-3961
> URL: https://issues.apache.org/jira/browse/TIKA-3961
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.1
> Environment: Windows 10.   Tika 2.4.1.  Tika server.   
>Reporter: Josh Burchard
>Priority: Major
> Attachments: encrypted.docx
>
>
> Test env: Windows 10
> Tika 2.4.1, tika server
>  
> In my config I've specified:
>      class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter">
>       
>         
>           X-TIKA:content
>           dc:creator
>           dc:title
>           resourceName
>           X-TIKA:EXCEPTION:container_exception
>         
>       
>     
>  
> For a password-protected docx file Tika returns the following (see bold txt 
> at the bottom):
> [{"X-TIKA:EXCEPTION:container_exception":"org.apache.poi.EncryptedDocumentException:
>  java.security.NoSuchAlgorithmException: Cannot find any provider supporting 
> AES/CBC/NoPadding\r\n\tat 
> org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions[7B14:0002-7080]
>  java:274)\r\n\tat 
> org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:223)\r\n\tat
>  
> org.apache.poi.poifs.crypt.agile.AgileDecryptor.hashInput(AgileDecryptor.java:196)\r\n\tat
>  
> org.apache.poi.poifs.crypt.agile.AgileDecryptor.verifyPasswrd(AgileDecryptor.java:102)\r\n\tat
>  
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:261)\r\n\tat
>  
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat
>  
> org.apache.tika.parser.CompositeParser.parse(CompositParser.java:298)\r\n\tat 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\r\n\tat
>  
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)\r\n\tat
>  
> org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWraper.java:163)\r\n\tat
>  
> org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)\r\n\tat
>  
> org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:78)\r\n\tat
>  
> org.apache.tika.server.cor.resource.RecursiveMetadataResource.parseMetadataToMetadataList(RecursiveMetadataResource.java:190)\r\n\tat
>  
> org.apache.tika.server.core.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:179)\r\n\tat
>  sun.reflect.GeneratedMethodAcessor7.invoke(Unknown Source)\r\n\tat 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r\n\tat
>  java.lang.reflect.Method.invoke(Method.java:498)\r\n\tat 
> org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(bstractInvoker.java:179)\r\n\tat
>  
> org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\r\n\tat
>  org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\r\n\tat 
> org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)r\n\tat 
> org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\r\n\tat
>  
> org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\r\n\tat
>  
> org.apache.cxf.phase.PhaseInterceptrChain.doIntercept(PhaseInterceptorChain.java:307)\r\n\tat
>  
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\r\n\tat
>  
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\\n\tat
>  
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\r\n\tat
>  
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\r\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.andle(HandlerWrapper.java:127)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScpedHandler.nextScope(ScopedHandler.java:190)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
>  
> org.eclipse.jetty.server.hndler.ContextHandlerCollection.handle(ContextHandle

[jira] [Commented] (TIKA-3961) When a parser exception happens, the "resourceName" key becomes "esourceName"

2023-01-30 Thread Josh Burchard (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682198#comment-17682198
 ] 

Josh Burchard commented on TIKA-3961:
-

I'm going to close this for now.  I'm getting inconsistent results so it must 
be due to the state of my test environment.  Now I'm not getting the exception, 
I'm getting nothing back from Tika for the file I used.

> When a parser exception happens, the "resourceName" key becomes "esourceName"
> -
>
> Key: TIKA-3961
> URL: https://issues.apache.org/jira/browse/TIKA-3961
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.1
> Environment: Windows 10.   Tika 2.4.1.  Tika server.   
>Reporter: Josh Burchard
>Priority: Major
> Attachments: encrypted.docx
>
>
> Test env: Windows 10
> Tika 2.4.1, tika server
>  
> In my config I've specified:
>      class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter">
>       
>         
>           X-TIKA:content
>           dc:creator
>           dc:title
>           resourceName
>           X-TIKA:EXCEPTION:container_exception
>         
>       
>     
>  
> For a password-protected docx file Tika returns the following (see bold txt 
> at the bottom):
> [{"X-TIKA:EXCEPTION:container_exception":"org.apache.poi.EncryptedDocumentException:
>  java.security.NoSuchAlgorithmException: Cannot find any provider supporting 
> AES/CBC/NoPadding\r\n\tat 
> org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions[7B14:0002-7080]
>  java:274)\r\n\tat 
> org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:223)\r\n\tat
>  
> org.apache.poi.poifs.crypt.agile.AgileDecryptor.hashInput(AgileDecryptor.java:196)\r\n\tat
>  
> org.apache.poi.poifs.crypt.agile.AgileDecryptor.verifyPasswrd(AgileDecryptor.java:102)\r\n\tat
>  
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:261)\r\n\tat
>  
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat
>  
> org.apache.tika.parser.CompositeParser.parse(CompositParser.java:298)\r\n\tat 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\r\n\tat
>  
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)\r\n\tat
>  
> org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWraper.java:163)\r\n\tat
>  
> org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)\r\n\tat
>  
> org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:78)\r\n\tat
>  
> org.apache.tika.server.cor.resource.RecursiveMetadataResource.parseMetadataToMetadataList(RecursiveMetadataResource.java:190)\r\n\tat
>  
> org.apache.tika.server.core.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:179)\r\n\tat
>  sun.reflect.GeneratedMethodAcessor7.invoke(Unknown Source)\r\n\tat 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r\n\tat
>  java.lang.reflect.Method.invoke(Method.java:498)\r\n\tat 
> org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(bstractInvoker.java:179)\r\n\tat
>  
> org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\r\n\tat
>  org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\r\n\tat 
> org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)r\n\tat 
> org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\r\n\tat
>  
> org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\r\n\tat
>  
> org.apache.cxf.phase.PhaseInterceptrChain.doIntercept(PhaseInterceptorChain.java:307)\r\n\tat
>  
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\r\n\tat
>  
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\\n\tat
>  
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\r\n\tat
>  
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\r\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.andle(HandlerWrapper.java:127)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScpedHandler.nextScope(ScopedHandler.java:190)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
>  
> org.eclipse.jetty.server.hndl

[jira] [Closed] (TIKA-3961) When a parser exception happens, the "resourceName" key becomes "esourceName"

2023-01-30 Thread Josh Burchard (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Burchard closed TIKA-3961.
---
Resolution: Cannot Reproduce

Reproduction was consistent at the time I wrote this bug, but now it's not 
reproducible so the initial problem may have been due to the state of my test 
machine.  Closing for now.

> When a parser exception happens, the "resourceName" key becomes "esourceName"
> -
>
> Key: TIKA-3961
> URL: https://issues.apache.org/jira/browse/TIKA-3961
> Project: Tika
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.4.1
> Environment: Windows 10.   Tika 2.4.1.  Tika server.   
>Reporter: Josh Burchard
>Priority: Major
> Attachments: encrypted.docx
>
>
> Test env: Windows 10
> Tika 2.4.1, tika server
>  
> In my config I've specified:
>      class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter">
>       
>         
>           X-TIKA:content
>           dc:creator
>           dc:title
>           resourceName
>           X-TIKA:EXCEPTION:container_exception
>         
>       
>     
>  
> For a password-protected docx file Tika returns the following (see bold txt 
> at the bottom):
> [{"X-TIKA:EXCEPTION:container_exception":"org.apache.poi.EncryptedDocumentException:
>  java.security.NoSuchAlgorithmException: Cannot find any provider supporting 
> AES/CBC/NoPadding\r\n\tat 
> org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions[7B14:0002-7080]
>  java:274)\r\n\tat 
> org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:223)\r\n\tat
>  
> org.apache.poi.poifs.crypt.agile.AgileDecryptor.hashInput(AgileDecryptor.java:196)\r\n\tat
>  
> org.apache.poi.poifs.crypt.agile.AgileDecryptor.verifyPasswrd(AgileDecryptor.java:102)\r\n\tat
>  
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:261)\r\n\tat
>  
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat
>  
> org.apache.tika.parser.CompositeParser.parse(CompositParser.java:298)\r\n\tat 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\r\n\tat
>  
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)\r\n\tat
>  
> org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWraper.java:163)\r\n\tat
>  
> org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)\r\n\tat
>  
> org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:78)\r\n\tat
>  
> org.apache.tika.server.cor.resource.RecursiveMetadataResource.parseMetadataToMetadataList(RecursiveMetadataResource.java:190)\r\n\tat
>  
> org.apache.tika.server.core.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:179)\r\n\tat
>  sun.reflect.GeneratedMethodAcessor7.invoke(Unknown Source)\r\n\tat 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r\n\tat
>  java.lang.reflect.Method.invoke(Method.java:498)\r\n\tat 
> org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(bstractInvoker.java:179)\r\n\tat
>  
> org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\r\n\tat
>  org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\r\n\tat 
> org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)r\n\tat 
> org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\r\n\tat
>  
> org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\r\n\tat
>  
> org.apache.cxf.phase.PhaseInterceptrChain.doIntercept(PhaseInterceptorChain.java:307)\r\n\tat
>  
> org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\r\n\tat
>  
> org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\\n\tat
>  
> org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\r\n\tat
>  
> org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\r\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.andle(HandlerWrapper.java:127)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScpedHandler.nextScope(ScopedHandler.java:190)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\r\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
>  
> org.eclipse.jetty.server.hndler.ContextHandlerCollection.handle(ContextHandlerCollection.

[jira] [Commented] (TIKA-3962) Set RFC822 parser to noRecurse

2023-01-30 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682256#comment-17682256
 ] 

Hudson commented on TIKA-3962:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #1003 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/1003/])
TIKA-3962 - set rfc822 parser to no recurse (tallison: 
[https://github.com/apache/tika/commit/bff14f39513d7624c04f0e8f0173099ac4d14699])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-mail-module/src/test/resources/test-documents/testGroupWiseEml.eml
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-mail-module/src/main/java/org/apache/tika/parser/mail/RFC822Parser.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-mail-module/src/test/java/org/apache/tika/parser/mail/RFC822ParserTest.java


> Set RFC822 parser to noRecurse
> --
>
> Key: TIKA-3962
> URL: https://issues.apache.org/jira/browse/TIKA-3962
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Priority: Minor
> Fix For: 2.7.0
>
>
> On our test file {{testGroupWiseEml.eml}}, there's an embedded rfc822 
> attachment that is currently not treated as an attachment but is inlined. 
> The relevant section of the test file is:
> {noformat}
> Content-Type: message/rfc822
> Content-Transfer-Encoding: base64
> Content-Disposition: attachment; filename="test.eml"
> {noformat}
> When I open the email in several email clients, it shows this {{test.eml}} 
> correctly as an attachment.  
> It turns out there's a setting on mime4j's parser "setNoRecurse" that yields 
> the correct behavior on this test file.  Given that Tika handles files 
> recursively already by default, I _think_ we should be safe to set no recurse 
> in the mime4j parser and rely on Tika's own recursive parsing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-3960) PGP encrypted files get detected as application/octet-stream

2023-01-30 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682352#comment-17682352
 ] 

Nick Burch commented on TIKA-3960:
--

If possible, please include a small test file and update 
{{tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java}} to test 
the detection

> PGP encrypted files get detected as application/octet-stream
> 
>
> Key: TIKA-3960
> URL: https://issues.apache.org/jira/browse/TIKA-3960
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Affects Versions: 2.6.0
>Reporter: Tayseer Sabha
>Priority: Major
>
> We use Tika for detecting and validating uploaded files using their 
> content/magic bytes and not only their names/extension.
> Passing a PGP/GPG encrypted file to Tika.detect(InputStream stream) will 
> always return application/octet-stream instead of application/pgp-encrypted 
> defined in tika-mimetypes.xml
> The issue occurs because the application/pgp-encrypted mime-type defined in 
> tika-mimetypes.xml is lacking a magic match and only has  pattern="*.pgp"/>
> I managed to fix the issue for us temporarily by adding 
> application/pgp-encrypted including a magic match in our custom-mimetypes.xml 
> file. I will create a Pull Request on Github with the fix to resolve the 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [tika] THausherr merged pull request #926: Bump aws.version from 1.12.395 to 1.12.396

2023-01-30 Thread via GitHub


THausherr merged PR #926:
URL: https://github.com/apache/tika/pull/926


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org