[jira] [Work logged] (COMPRESS-505) Multiple Reads of One SevenZArchiveEntry Fails

ASF GitHub Bot (Jira) Wed, 04 Mar 2020 23:38:09 -0800


     [ 
https://issues.apache.org/jira/browse/COMPRESS-505?focusedWorklogId=398179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398179
 ]


ASF GitHub Bot logged work on COMPRESS-505:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Mar/20 07:37
            Start Date: 05/Mar/20 07:37
    Worklog Time Spent: 10m 
      Work Description: PeterAlfreadLee commented on pull request #95: 
COMPRESS-505 : bug fix for random access of 7z
URL: https://github.com/apache/commons-compress/pull/95
 
 
   There are some problems in my PR about random access of 7z 
[#83](COMPRESS-342 random access of 7z files) :
   
   1. I was thinking that the `currentFolderInputStream` can be repositioned by 
changing the position of the `channel`, which turns out to be impossible. This 
PR fixesit by reopening the `currentFolderInputStream`.
   
   2. There are 2 ways to access the content of a 7z archive now : by 
sequential access(getNextEntry) and by random access(getInputStream). They may 
be used one after another. So there're some conditions we need to deal with :
   
   2.1 In a random access, if `currentEntryIndex` == `entryIndex` && the entry 
has not been read yet :
   This means the input stream of the entry we want has already been put in the 
`deferredBlockStreams` as the last array member. We SHOULD NOT build a new 
input stream for the entry again, because this will make same the existed 
stream in `deferredBlockStreams` be skipped. We should just do nothing cause 
the input stream is already in the `deferredBlockStreams`.
   
   2.2 In a random access, if `currentEntryIndex` == `entryIndex` && the entry 
has already been read :
   This means the entry we want has been read(maybe some of entry or all of the 
entry has been read, it does not matter). Then we should reopen the 
`currentFolderInputStream` and skip all the entries before the entry we want.
   BTW : we could determine if the file has been read or not by comparing the 
`bytesRemaining` of the input stream(as a `CRC32VerifyingInputStream`) and the 
actual size of the file.
   
   2.3 In a random access, if `currentEntryIndex` < `entryIndex` && the last 
entry in `deferredBlockStreams` has not been read :
   The input streams whose index equals or less than `currentEntryIndex` has 
already been put into the 
   `deferredBlockStreams`. We could just add the remaining entries to the 
`deferredBlockStreams`.
   
   2.4 In a random access, if `currentEntryIndex` < `entryIndex` && the last 
entry in `deferredBlockStreams` has already been read :
   Like 2.2, we have no other choices but reopen the `currentFolderInputStream` 
and skip all the entries again.
   
   2.5 In a random access, if `currentEntryIndex` > `entryIndex` :
   This means the entry we want has already been read or skipped beforehand. We 
could only reopen the `currentFolderInputStream` and skip all the entries again.
   
   In short, we should do nothing in 2.1, skip the remaining entries in 2.3, 
and reopen the `currentFolderInputStream` in 2.2/2.4/2.5. I have to admit this 
is a bit complicated, but I didn't find any other better ideas building the 
logic. :(
   
   I made some refactoring and added some new comments to make the code more 
clear. The corresponding testcases are also included in this PR.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 398179)
    Remaining Estimate: 0h
            Time Spent: 10m

> Multiple Reads of One SevenZArchiveEntry Fails
> ----------------------------------------------
>
>                 Key: COMPRESS-505
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-505
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.20
>            Reporter: Steven Fontaine
>            Priority: Minor
>         Attachments: CC0 Images.7z
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I've run into a bug which occurs when attempting to read the same 
> SevenZArchiveEntry stream multiple times. The following code illustrates the 
> problem.
> {code:java}
> File archive = new File("CC0 Images.7z");
> char[] password = "password".toCharArray();
> try (SevenZFile f = new SevenZFile(archive, password))
> {
>   SevenZArchiveEntry entry = 
> StreamSupport.stream(f.getEntries().spliterator(), false)
>     .filter(e -> 
> "alberta-amazing-attraction-banff-417074.jpg".equals(e.getName()))
>     .findFirst().orElseGet(null);
>   assert entry != null;
>   for (int i = 0; i < 100; i++)
>   {
>     InputStream is = f.getInputStream(entry);
>     BufferedImage img = ImageIO.read(is);
>     assert img != null;
>     System.out.println("Succeeded " + (i + 1) + " times");
>   }
> }{code}
> Below is the output I receive on version 1.20
> {code:java}
> Succeeded 1 times
> Succeeded 2 times
> Exception in thread "main" java.io.IOException: Checksum verification failed
>     at 
> org.apache.commons.compress.utils.ChecksumVerifyingInputStream.read(ChecksumVerifyingInputStream.java:61)
>     at 
> org.apache.commons.compress.utils.ChecksumVerifyingInputStream.skip(ChecksumVerifyingInputStream.java:102)
>     at org.apache.commons.compress.utils.IOUtils.skip(IOUtils.java:113)
>     at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.getCurrentStream(SevenZFile.java:1318)
>     at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.getInputStream(SevenZFile.java:1354)
>     at org.abitoff.dmav.Test.main(Test.java:11)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (COMPRESS-505) Multiple Reads of One SevenZArchiveEntry Fails

Reply via email to