[ 
https://issues.apache.org/jira/browse/COMPRESS-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Katsubo updated COMPRESS-222:
------------------------------------

    Description: 
The problem is relevant to COMPRESS-189, in particular it relates to processing 
of inner ZIP files.

Problem description:

If the archive entry is not fully read, then partial reading returns incorrect 
contents.

In particular the given example loops trough all entries of "09815141_4.zip" 
ZIP archive, probing each entry to be a TIFF file. The probe assumes that given 
file is TIFF, if it starts with bytes [0x49 0x49 0x2A 0x0 0x8 0x0 0x0 0x0 0x14 
0x0].

Most entries are correctly reported as TIFF, except:

{code}
[ArchiveTest] 000017.tif is something else
[ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x8 0x0 0x0 0x0 0x0 0x0 
[ArchiveTest] 000033.tif is something else
[ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
[ArchiveTest] 000056.tif is something else
[ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x8 0x0 0x0 0x0 0x0 0x0 
[ArchiveTest] 000069.tif is something else
[ArchiveTest] Header contents: 0x49 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
{code}

As I can see, the problem can be introduced at any random byte.

If the program is set {{READ_WHOLE_ENTRY=true}} then all entries are fully read 
and MD5 sum is calculated. MD5 sum matches and all entries are correctly 
reported as TIFF. Thus the problem is only present when entry is not fully read 
and {{ArchiveInputStream.getNextEntry()}} is called.

Test ZIP can be downloaded from: 
https://www.dropbox.com/s/h20wo6t0mwbgsqc/09815141_4.zip
It was originally taken from WIPO FTP (i.e. it is in public domain) and was a 
bit stripped.

Difficult to say what is the impact of this bug, but for 475 ZIP-in-ZIPs in my 
collection I have found 3 examples of incorrect contents extraction.

  was:
The problem is relevant to COMPRESS-189, in particular it relates to processing 
of inner ZIP files.

Problem description:

If the archive entry is not fully read, then partial reading returns incorrect 
contents.

In particular the given example loops trough all entries of "09815141_4.zip" 
ZIP archive, probing each entry to be a TIFF file. The probe assumes that given 
file if TIFF, if it starts with bytes [0x49 0x49 0x2A 0x0 0x8 0x0 0x0 0x0 0x14 
0x0].

Most entries are correctly reported as TIFF, except:

{code}
[ArchiveTest] 000017.tif is something else
[ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x8 0x0 0x0 0x0 0x0 0x0 
[ArchiveTest] 000033.tif is something else
[ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
[ArchiveTest] 000056.tif is something else
[ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x8 0x0 0x0 0x0 0x0 0x0 
[ArchiveTest] 000069.tif is something else
[ArchiveTest] Header contents: 0x49 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
{code}

As I can see, the problem can be introduced at any random byte.

If the program is set {{READ_WHOLE_ENTRY=true}} then all entries are fully read 
and MD5 sum is calculated. MD5 sum matches and all entries are correctly 
reported as TIFF. Thus the problem is only present when entry is not fully read 
and {{ArchiveInputStream.getNextEntry()}} is called.

Test ZIP can be downloaded from: 
https://www.dropbox.com/s/h20wo6t0mwbgsqc/09815141_4.zip
It was originally taken from WIPO FTP, i.e. it is in public domain.

    
> ZipArchiveInputStream may read incorrect bytes from stream when processing 
> nested ZIP
> -------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-222
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-222
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.5
>            Reporter: Dmitry Katsubo
>         Attachments: ArchiveTest.java, log_read_whole_entry.txt, log.txt, 
> md5.correct.txt
>
>
> The problem is relevant to COMPRESS-189, in particular it relates to 
> processing of inner ZIP files.
> Problem description:
> If the archive entry is not fully read, then partial reading returns 
> incorrect contents.
> In particular the given example loops trough all entries of "09815141_4.zip" 
> ZIP archive, probing each entry to be a TIFF file. The probe assumes that 
> given file is TIFF, if it starts with bytes [0x49 0x49 0x2A 0x0 0x8 0x0 0x0 
> 0x0 0x14 0x0].
> Most entries are correctly reported as TIFF, except:
> {code}
> [ArchiveTest] 000017.tif is something else
> [ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x8 0x0 0x0 0x0 0x0 0x0 
> [ArchiveTest] 000033.tif is something else
> [ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
> [ArchiveTest] 000056.tif is something else
> [ArchiveTest] Header contents: 0x49 0x49 0x2A 0x0 0x8 0x0 0x0 0x0 0x0 0x0 
> [ArchiveTest] 000069.tif is something else
> [ArchiveTest] Header contents: 0x49 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 
> {code}
> As I can see, the problem can be introduced at any random byte.
> If the program is set {{READ_WHOLE_ENTRY=true}} then all entries are fully 
> read and MD5 sum is calculated. MD5 sum matches and all entries are correctly 
> reported as TIFF. Thus the problem is only present when entry is not fully 
> read and {{ArchiveInputStream.getNextEntry()}} is called.
> Test ZIP can be downloaded from: 
> https://www.dropbox.com/s/h20wo6t0mwbgsqc/09815141_4.zip
> It was originally taken from WIPO FTP (i.e. it is in public domain) and was a 
> bit stripped.
> Difficult to say what is the impact of this bug, but for 475 ZIP-in-ZIPs in 
> my collection I have found 3 examples of incorrect contents extraction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to