[
https://issues.apache.org/jira/browse/TIKA-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17955047#comment-17955047
]
Tim Allison edited comment on TIKA-4424 at 5/29/25 8:51 PM:
------------------------------------------------------------
When working in our {{branch_3x}} in the AutoDetectParserTest in
tika-parsers-standard-package, if I run the following.
I get the same correct application/kmz for all attempts with no exceptions.
This unit test passes.
I need help with a reproducer. Does this fail on a different kmz file? Do we
have the same dependencies? How are you calling Tika's detect?
{noformat}
@Test
public void testOne() throws Exception {
Path kmz =
Paths.get("/home/tallison/Intellij/tika-3x/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/test/resources/test"
+
"-documents/testKMZ.kmz");
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(kmz));
try (TikaInputStream tis = TikaInputStream.get(kmz)) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(tis));
}
try (TikaInputStream tis =
TikaInputStream.get(Files.newInputStream(kmz))) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(tis));
}
try (InputStream is = Files.newInputStream(kmz)) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(is));
}
ByteArrayOutputStream bos = new ByteArrayOutputStream();
Files.copy(kmz, bos);
try (InputStream is = new ByteArrayInputStream(bos.toByteArray())) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(is));
}
//With name
String name = kmz.getFileName().toString();
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(kmz));
try (TikaInputStream tis = TikaInputStream.get(kmz)) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(tis, name));
}
try (TikaInputStream tis =
TikaInputStream.get(Files.newInputStream(kmz))) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(tis, name));
}
try (InputStream is = Files.newInputStream(kmz)) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(is, name));
}
bos = new ByteArrayOutputStream();
Files.copy(kmz, bos);
try (InputStream is = new ByteArrayInputStream(bos.toByteArray())) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(is, name));
}
}
{noformat}
was (Author: [email protected]):
When working in our {{branch_3x}} in the AutoDetectParserTest in
tika-parsers-standard-package, if I run the following.
I get the same correct application/kmz for all attempts with no exceptions.
This unit test passes.
I need help with a reproducer.
{noformat}
@Test
public void testOne() throws Exception {
Path kmz =
Paths.get("/home/tallison/Intellij/tika-3x/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-zip-commons/src/test/resources/test"
+
"-documents/testKMZ.kmz");
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(kmz));
try (TikaInputStream tis = TikaInputStream.get(kmz)) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(tis));
}
try (TikaInputStream tis =
TikaInputStream.get(Files.newInputStream(kmz))) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(tis));
}
try (InputStream is = Files.newInputStream(kmz)) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(is));
}
ByteArrayOutputStream bos = new ByteArrayOutputStream();
Files.copy(kmz, bos);
try (InputStream is = new ByteArrayInputStream(bos.toByteArray())) {
assertEquals("application/vnd.google-earth.kmz", new
Tika().detect(is));
}
}
{noformat}
> Regression in zip-based detection with an InputStream in 3.2.0
> --------------------------------------------------------------
>
> Key: TIKA-4424
> URL: https://issues.apache.org/jira/browse/TIKA-4424
> Project: Tika
> Issue Type: Task
> Reporter: Tim Allison
> Priority: Major
>
> On the user list, Craig Muchinsky and Pontus Amberg noted new problems with
> detection of zip based files.
> Craig noted that this affects InputStream detection, and Pontus noted that
> even if he switched to a TikaInputStream, his kmz file was getting detected
> as a zip.
> This is Pontus' code:
> {noformat}
> Tike.detect(InputStream stream, String name)
> {noformat}
> {noformat}
> pp//org.apache.tika.io.BoundedInputStream.reset(BoundedInputStream.java:115)
> app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detectStreaming(DefaultZipContainerDetector.java:279)
> app//org.apache.tika.detect.zip.DefaultZipContainerDetector.detect(DefaultZipContainerDetector.java:192)
> app//org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:84)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)