[jira] [Created] (TIKA-3646) MP4 files have their mime type detected as video/quicktime

2022-01-13 Thread Apachae Tika User (Jira)
Apachae Tika User created TIKA-3646:
---

 Summary: MP4 files have their mime type detected as video/quicktime
 Key: TIKA-3646
 URL: https://issues.apache.org/jira/browse/TIKA-3646
 Project: Tika
  Issue Type: Bug
  Components: detector
Reporter: Apachae Tika User
 Attachments: Video.mp4

I was using ScreenToGif tool which allos to record screen and create gifs or 
MP4 files (with ffmpeg). I've tried to use Tika Detector for such files but the 
file is being detected as  video/quicktime with .qt extension. How is that?

Attaching small video for example.

I see some other people complaining for same thing here
https://stackoverflow.com/questions/48021617/use-apache-tika-get-mp4-file-contenttype-got-video-quicktime



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TIKA-3646) MP4 files have their mime type detected as video/quicktime

2022-01-13 Thread Apachae Tika User (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apachae Tika User updated TIKA-3646:

Description: 
I was using ScreenToGif tool which allos to record screen and create gifs or 
MP4 files (with ffmpeg). I've tried to use Tika Detector for such files but the 
file is being detected as  video/quicktime with .qt extension. How is that?

Attaching small video for example which was generated with ScreenToGif and 
saved as mp4.

I see some other people complaining for same thing here
[https://stackoverflow.com/questions/48021617/use-apache-tika-get-mp4-file-contenttype-got-video-quicktime]

  was:
I was using ScreenToGif tool which allos to record screen and create gifs or 
MP4 files (with ffmpeg). I've tried to use Tika Detector for such files but the 
file is being detected as  video/quicktime with .qt extension. How is that?

Attaching small video for example.

I see some other people complaining for same thing here
https://stackoverflow.com/questions/48021617/use-apache-tika-get-mp4-file-contenttype-got-video-quicktime


> MP4 files have their mime type detected as video/quicktime
> --
>
> Key: TIKA-3646
> URL: https://issues.apache.org/jira/browse/TIKA-3646
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Reporter: Apachae Tika User
>Priority: Major
> Attachments: Video.mp4
>
>
> I was using ScreenToGif tool which allos to record screen and create gifs or 
> MP4 files (with ffmpeg). I've tried to use Tika Detector for such files but 
> the file is being detected as  video/quicktime with .qt extension. How is 
> that?
> Attaching small video for example which was generated with ScreenToGif and 
> saved as mp4.
> I see some other people complaining for same thing here
> [https://stackoverflow.com/questions/48021617/use-apache-tika-get-mp4-file-contenttype-got-video-quicktime]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3646) MP4 files have their mime type detected as video/quicktime

2022-01-13 Thread Nick Burch (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475269#comment-17475269
 ] 

Nick Burch commented on TIKA-3646:
--

I think this is probably the same issue as TIKA-2935 - the same work described 
there still needs to be done by someone who has the time + energy + interest...

> MP4 files have their mime type detected as video/quicktime
> --
>
> Key: TIKA-3646
> URL: https://issues.apache.org/jira/browse/TIKA-3646
> Project: Tika
>  Issue Type: Bug
>  Components: detector
>Reporter: Apachae Tika User
>Priority: Major
> Attachments: Video.mp4
>
>
> I was using ScreenToGif tool which allos to record screen and create gifs or 
> MP4 files (with ffmpeg). I've tried to use Tika Detector for such files but 
> the file is being detected as  video/quicktime with .qt extension. How is 
> that?
> Attaching small video for example which was generated with ScreenToGif and 
> saved as mp4.
> I see some other people complaining for same thing here
> [https://stackoverflow.com/questions/48021617/use-apache-tika-get-mp4-file-contenttype-got-video-quicktime]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (TIKA-3647) Failed to get content and metadata for .hwp files

2022-01-13 Thread Tika User (Jira)
Tika User created TIKA-3647:
---

 Summary: Failed to get content and metadata for .hwp files
 Key: TIKA-3647
 URL: https://issues.apache.org/jira/browse/TIKA-3647
 Project: Tika
  Issue Type: Bug
Reporter: Tika User


Wen trying to parse .hwp file no metadata is returning. This is working fine in 
the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TIKA-3647) Failed to get content and metadata for .hwp files

2022-01-13 Thread Tika User (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tika User updated TIKA-3647:

Attachment: P1.PC.0071.hwp

> Failed to get content and metadata for .hwp files
> -
>
> Key: TIKA-3647
> URL: https://issues.apache.org/jira/browse/TIKA-3647
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine 
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TIKA-3647) Failed to get content and metadata for .hwp files

2022-01-13 Thread Tika User (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tika User updated TIKA-3647:

Attachment: (was: P1.PC.0071.hwp)

> Failed to get content and metadata for .hwp files
> -
>
> Key: TIKA-3647
> URL: https://issues.apache.org/jira/browse/TIKA-3647
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine 
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TIKA-3647) Failed to get content and metadata for .hwp files

2022-01-13 Thread Tika User (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tika User updated TIKA-3647:

Attachment: test.hwp

> Failed to get content and metadata for .hwp files
> -
>
> Key: TIKA-3647
> URL: https://issues.apache.org/jira/browse/TIKA-3647
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Attachments: test.hwp
>
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine 
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3647) Failed to get content and metadata for .hwp files

2022-01-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475319#comment-17475319
 ] 

Tim Allison commented on TIKA-3647:
---

Are you missing all metadata or just some fields?  We made breaking changes in 
2.x to streamline the metadata keys.  If only missing some fields, see the 
Metadata section: 
https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0

 

In tika 1.28 with tika-app:

 
{noformat}
java -jar ~/tools/tika/tika-app-1.28.jar ~/Downloads/test.hwp 
Jan 13, 2022 7:34:39 AM org.apache.tika.config.InitializableProblemHandler$3 
handleInitializableProblem
WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.Jan 13, 2022 7:34:39 AM 
org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
http://www.w3.org/1999/xhtml";>




























 {noformat}
In 2.2.1, I get this:
{noformat}

java -jar ~/tools/tika/tika-app-2.2.1.jar ~/Downloads/test.hwp 
http://www.w3.org/1999/xhtml";>
















... {noformat}

> Failed to get content and metadata for .hwp files
> -
>
> Key: TIKA-3647
> URL: https://issues.apache.org/jira/browse/TIKA-3647
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Attachments: test.hwp
>
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine 
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3644) OfficeParser can not detect embedded zip bomb in the office documents

2022-01-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475324#comment-17475324
 ] 

Tim Allison commented on TIKA-3644:
---

Thank you.  Looking now.

> OfficeParser can not detect embedded zip bomb in the office documents
> -
>
> Key: TIKA-3644
> URL: https://issues.apache.org/jira/browse/TIKA-3644
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Sergen Bağ
>Priority: Minor
> Attachments: 10_2_2_2_2.zip, tika_exception.PNG, zipbomb.doc, 
> zipbomb.docx, zipbomb.ppt, zipbomb.pptx, zipbomb.xls, zipbomb.xlsx
>
>
> Hi, I am trying to get "zip bomb detection" exception but I can't. I used 
> attachments as below and I saw this situation like that:
> When I send "zipbomb.xls" and "zipbomb.doc" to Tika, Tika threw exception.
> When I send "zipbomb.xlsx","zipbomb.docx","zipbomb.ppt" and "zipbomb.pptx" to 
> Tika, Tika didn't throw exception.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3647) Failed to get content and metadata for .hwp files

2022-01-13 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475329#comment-17475329
 ] 

Tika User commented on TIKA-3647:
-

Entire metadata is missing.


getting only two : 

X-TIKA:Parsed-By : (0 = "org.apache.tika.parser.CompositeParser"
1 = "org.apache.tika.parser.microsoft.OfficeParser")

Content-Type : application/x-tika-msoffice (I think this should be /x-hwp-v5)


COnfig has : 

> Failed to get content and metadata for .hwp files
> -
>
> Key: TIKA-3647
> URL: https://issues.apache.org/jira/browse/TIKA-3647
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Attachments: test.hwp
>
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine 
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3647) Failed to get content and metadata for .hwp files

2022-01-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475340#comment-17475340
 ] 

Tim Allison commented on TIKA-3647:
---

Yes, the file type should be x-hwp-v5.  Are you doing anything custom with the 
detector?  Do you have tika-parser-miscoffice-module (or 
tika-parsers-standard-package) on your classpath?

 

The detector that distinguishes x-tika-msoffice from hwp-v5 is in the 
tika-parser-miscoffice-module and is called 
org.apache.tika.detect.ole.MiscOLEDetector.

> Failed to get content and metadata for .hwp files
> -
>
> Key: TIKA-3647
> URL: https://issues.apache.org/jira/browse/TIKA-3647
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Attachments: test.hwp
>
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine 
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3647) Failed to get content and metadata for .hwp files

2022-01-13 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475346#comment-17475346
 ] 

Tika User commented on TIKA-3647:
-

Yep. Included detector working fine now.Thanks.


Closing the issue

> Failed to get content and metadata for .hwp files
> -
>
> Key: TIKA-3647
> URL: https://issues.apache.org/jira/browse/TIKA-3647
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine 
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (TIKA-3647) Failed to get content and metadata for .hwp files

2022-01-13 Thread Tika User (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tika User updated TIKA-3647:

Attachment: (was: test.hwp)

> Failed to get content and metadata for .hwp files
> -
>
> Key: TIKA-3647
> URL: https://issues.apache.org/jira/browse/TIKA-3647
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine 
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (TIKA-3647) Failed to get content and metadata for .hwp files

2022-01-13 Thread Tika User (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tika User closed TIKA-3647.
---
Resolution: Fixed

> Failed to get content and metadata for .hwp files
> -
>
> Key: TIKA-3647
> URL: https://issues.apache.org/jira/browse/TIKA-3647
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
>
> Wen trying to parse .hwp file no metadata is returning. This is working fine 
> in the 1.27 version of tika. currently, we are using the 2.2.1 version now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2022-01-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475393#comment-17475393
 ] 

Hudson commented on TIKA-3634:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #415 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/415/])
TIKA-3634 -- improve detection of iworks 13 files and extraction of thumbnails 
and attachments (tallison: 
[https://github.com/apache/tika/commit/0a8da94c5fc49e706a77245da323088115cc22c9])
* (edit) CHANGES.txt
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java


> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Assignee: Tim Allison
>Priority: Major
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files

2022-01-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475454#comment-17475454
 ] 

Tim Allison commented on TIKA-3634:
---

That was a bad commit message.  Mea culpa.  That was for TIKA-3642

> Failed to Parser Apple related files
> 
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Tika User
>Assignee: Tim Allison
>Priority: Major
> Attachments: brochure.pages, keynotecreated.key, 
> mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml 
> file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3644) OfficeParser can not detect embedded zip bomb in the office documents

2022-01-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475557#comment-17475557
 ] 

Tim Allison commented on TIKA-3644:
---

It looks like the package-depth detector (the 5 in your config) is not 
triggered if the embeddedDocument extractor calls the parse with 
outputHTML=false.  In the MSOffice parser, outputHTML=true; however in the 
OOXMLParser, outputHTML=false.  I propose that we change all outputHTML=true 
throughout the parsers.

 

That said, I did get a zip bomb exception when I set the maxDepth to 5 on both 
MSOffice and ooxml files.

 

Looking at the SecureContentHandler, I'm frankly not certain what the 
difference between packageDepth and depth is. :P

> OfficeParser can not detect embedded zip bomb in the office documents
> -
>
> Key: TIKA-3644
> URL: https://issues.apache.org/jira/browse/TIKA-3644
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Sergen Bağ
>Priority: Minor
> Attachments: 10_2_2_2_2.zip, tika_exception.PNG, zipbomb.doc, 
> zipbomb.docx, zipbomb.ppt, zipbomb.pptx, zipbomb.xls, zipbomb.xlsx
>
>
> Hi, I am trying to get "zip bomb detection" exception but I can't. I used 
> attachments as below and I saw this situation like that:
> When I send "zipbomb.xls" and "zipbomb.doc" to Tika, Tika threw exception.
> When I send "zipbomb.xlsx","zipbomb.docx","zipbomb.ppt" and "zipbomb.pptx" to 
> Tika, Tika didn't throw exception.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (TIKA-3644) OfficeParser can not detect embedded zip bomb in the office documents

2022-01-13 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475557#comment-17475557
 ] 

Tim Allison edited comment on TIKA-3644 at 1/13/22, 5:26 PM:
-

It looks like the package-depth detector (the 5 in your config) is not 
triggered if the embeddedDocument extractor calls the parse with 
outputHTML=false.  In the MSOffice parser, outputHTML=true; however in the 
OOXMLParser, outputHTML=false.  I propose that we change all outputHTML=true 
throughout the parsers.

 

That said, I did get a zip bomb exception when I set the maxDepth to 5 on both 
MSOffice and ooxml files.

 

-Looking at the SecureContentHandler, I'm frankly not certain what the 
difference between packageDepth and depth is.- :P

 

It looks like the difference is that maxPackageDepth should cover embedded 
items, where as maxDepth covers all html entities.


was (Author: talli...@mitre.org):
It looks like the package-depth detector (the 5 in your config) is not 
triggered if the embeddedDocument extractor calls the parse with 
outputHTML=false.  In the MSOffice parser, outputHTML=true; however in the 
OOXMLParser, outputHTML=false.  I propose that we change all outputHTML=true 
throughout the parsers.

 

That said, I did get a zip bomb exception when I set the maxDepth to 5 on both 
MSOffice and ooxml files.

 

Looking at the SecureContentHandler, I'm frankly not certain what the 
difference between packageDepth and depth is. :P

> OfficeParser can not detect embedded zip bomb in the office documents
> -
>
> Key: TIKA-3644
> URL: https://issues.apache.org/jira/browse/TIKA-3644
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Sergen Bağ
>Priority: Minor
> Attachments: 10_2_2_2_2.zip, tika_exception.PNG, zipbomb.doc, 
> zipbomb.docx, zipbomb.ppt, zipbomb.pptx, zipbomb.xls, zipbomb.xlsx
>
>
> Hi, I am trying to get "zip bomb detection" exception but I can't. I used 
> attachments as below and I saw this situation like that:
> When I send "zipbomb.xls" and "zipbomb.doc" to Tika, Tika threw exception.
> When I send "zipbomb.xlsx","zipbomb.docx","zipbomb.ppt" and "zipbomb.pptx" to 
> Tika, Tika didn't throw exception.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (TIKA-3644) OfficeParser can not detect embedded zip bomb in the office documents

2022-01-13 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-3644.
---
Fix Version/s: 2.2.2
   Resolution: Fixed

> OfficeParser can not detect embedded zip bomb in the office documents
> -
>
> Key: TIKA-3644
> URL: https://issues.apache.org/jira/browse/TIKA-3644
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Sergen Bağ
>Priority: Minor
> Fix For: 2.2.2
>
> Attachments: 10_2_2_2_2.zip, tika_exception.PNG, zipbomb.doc, 
> zipbomb.docx, zipbomb.ppt, zipbomb.pptx, zipbomb.xls, zipbomb.xlsx
>
>
> Hi, I am trying to get "zip bomb detection" exception but I can't. I used 
> attachments as below and I saw this situation like that:
> When I send "zipbomb.xls" and "zipbomb.doc" to Tika, Tika threw exception.
> When I send "zipbomb.xlsx","zipbomb.docx","zipbomb.ppt" and "zipbomb.pptx" to 
> Tika, Tika didn't throw exception.
> Thanks.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (TIKA-3644) OfficeParser can not detect embedded zip bomb in the office documents

2022-01-13 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17475803#comment-17475803
 ] 

Hudson commented on TIKA-3644:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk8 #416 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/416/])
TIKA-3644 -- Improve consistency in reporting package-entry divs across all 
parsers for embedded files (tallison: 
[https://github.com/apache/tika/commit/118734a1317fa13ad66959fdc28969ca50a49643])
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/main/java/org/apache/tika/parser/odf/FlatOpenDocumentMacroHandler.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-module/src/main/java/org/apache/tika/parser/html/HtmlHandler.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-apple-module/src/main/java/org/apache/tika/parser/apple/PListParser.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/xwpf/ml2006/BinaryDataHandler.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/xml/WordMLParser.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/onenote/OneNoteTreeWalker.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/test/java/org/apache/tika/parser/odf/ODFParserTest.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/ImageGraphicsEngine.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-apple-module/src/main/java/org/apache/tika/parser/apple/AppleSingleFileParser.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/rtf/RTFEmbObjHandler.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/RecursiveParserWrapperTest.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/EMFParser.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/OOXMLExtractorFactory.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/TNEFParser.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/main/java/org/apache/tika/parser/odf/OpenDocumentBodyHandler.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-microsoft-module/src/main/java/org/apache/tika/parser/microsoft/ooxml/AbstractOOXMLExtractor.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-package/src/test/java/org/apache/tika/parser/microsoft/XML2003ParserTest.java
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-miscoffice-module/src/main/java/org/apache/tika/parser/epub/EpubParser.java
* (edit) CHANGES.txt
* (edit) 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-crypto-module/src/main/java/org/apache/tika/parser/crypto/TSDParser.java


> OfficeParser can not detect embedded zip bomb in the office documents
> -
>
> Key: TIKA-3644
> URL: https://issues.apache.org/jira/browse/TIKA-3644
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 2.2.1
>Reporter: Sergen Bağ
>Priority: Minor
> Fix For: 2.2.2
>
> Attachments: 10_2_2_2_2.zip, tika_exception.PNG, zipbomb.doc, 
> zipbomb.docx, zipbomb.ppt, zipbomb.pptx, zipbomb.xls, zipbomb.xlsx
>
>
> Hi, I am trying to get "zip bomb detection" exception but I can't. I used 
> attachments as below and I saw this situation like that:
> When I send "zipbomb.xls" and "zipbomb.doc" to Tika, Tika threw exception.
> When I send "zipbomb.xlsx","zipbomb.docx","z