[ https://issues.apache.org/jira/browse/COMPRESS-505?focusedWorklogId=398179&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398179 ]
ASF GitHub Bot logged work on COMPRESS-505: ------------------------------------------- Author: ASF GitHub Bot Created on: 05/Mar/20 07:37 Start Date: 05/Mar/20 07:37 Worklog Time Spent: 10m Work Description: PeterAlfreadLee commented on pull request #95: COMPRESS-505 : bug fix for random access of 7z URL: https://github.com/apache/commons-compress/pull/95 There are some problems in my PR about random access of 7z [#83](COMPRESS-342 random access of 7z files) : 1. I was thinking that the `currentFolderInputStream` can be repositioned by changing the position of the `channel`, which turns out to be impossible. This PR fixesit by reopening the `currentFolderInputStream`. 2. There are 2 ways to access the content of a 7z archive now : by sequential access(getNextEntry) and by random access(getInputStream). They may be used one after another. So there're some conditions we need to deal with : 2.1 In a random access, if `currentEntryIndex` == `entryIndex` && the entry has not been read yet : This means the input stream of the entry we want has already been put in the `deferredBlockStreams` as the last array member. We SHOULD NOT build a new input stream for the entry again, because this will make same the existed stream in `deferredBlockStreams` be skipped. We should just do nothing cause the input stream is already in the `deferredBlockStreams`. 2.2 In a random access, if `currentEntryIndex` == `entryIndex` && the entry has already been read : This means the entry we want has been read(maybe some of entry or all of the entry has been read, it does not matter). Then we should reopen the `currentFolderInputStream` and skip all the entries before the entry we want. BTW : we could determine if the file has been read or not by comparing the `bytesRemaining` of the input stream(as a `CRC32VerifyingInputStream`) and the actual size of the file. 2.3 In a random access, if `currentEntryIndex` < `entryIndex` && the last entry in `deferredBlockStreams` has not been read : The input streams whose index equals or less than `currentEntryIndex` has already been put into the `deferredBlockStreams`. We could just add the remaining entries to the `deferredBlockStreams`. 2.4 In a random access, if `currentEntryIndex` < `entryIndex` && the last entry in `deferredBlockStreams` has already been read : Like 2.2, we have no other choices but reopen the `currentFolderInputStream` and skip all the entries again. 2.5 In a random access, if `currentEntryIndex` > `entryIndex` : This means the entry we want has already been read or skipped beforehand. We could only reopen the `currentFolderInputStream` and skip all the entries again. In short, we should do nothing in 2.1, skip the remaining entries in 2.3, and reopen the `currentFolderInputStream` in 2.2/2.4/2.5. I have to admit this is a bit complicated, but I didn't find any other better ideas building the logic. :( I made some refactoring and added some new comments to make the code more clear. The corresponding testcases are also included in this PR. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 398179) Remaining Estimate: 0h Time Spent: 10m > Multiple Reads of One SevenZArchiveEntry Fails > ---------------------------------------------- > > Key: COMPRESS-505 > URL: https://issues.apache.org/jira/browse/COMPRESS-505 > Project: Commons Compress > Issue Type: Bug > Affects Versions: 1.20 > Reporter: Steven Fontaine > Priority: Minor > Attachments: CC0 Images.7z > > Time Spent: 10m > Remaining Estimate: 0h > > I've run into a bug which occurs when attempting to read the same > SevenZArchiveEntry stream multiple times. The following code illustrates the > problem. > {code:java} > File archive = new File("CC0 Images.7z"); > char[] password = "password".toCharArray(); > try (SevenZFile f = new SevenZFile(archive, password)) > { > SevenZArchiveEntry entry = > StreamSupport.stream(f.getEntries().spliterator(), false) > .filter(e -> > "alberta-amazing-attraction-banff-417074.jpg".equals(e.getName())) > .findFirst().orElseGet(null); > assert entry != null; > for (int i = 0; i < 100; i++) > { > InputStream is = f.getInputStream(entry); > BufferedImage img = ImageIO.read(is); > assert img != null; > System.out.println("Succeeded " + (i + 1) + " times"); > } > }{code} > Below is the output I receive on version 1.20 > {code:java} > Succeeded 1 times > Succeeded 2 times > Exception in thread "main" java.io.IOException: Checksum verification failed > at > org.apache.commons.compress.utils.ChecksumVerifyingInputStream.read(ChecksumVerifyingInputStream.java:61) > at > org.apache.commons.compress.utils.ChecksumVerifyingInputStream.skip(ChecksumVerifyingInputStream.java:102) > at org.apache.commons.compress.utils.IOUtils.skip(IOUtils.java:113) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.getCurrentStream(SevenZFile.java:1318) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.getInputStream(SevenZFile.java:1354) > at org.abitoff.dmav.Test.main(Test.java:11) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)