[ https://issues.apache.org/jira/browse/COMPRESS-585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392942#comment-17392942 ]
Matthijs Laan edited comment on COMPRESS-585 at 8/4/21, 9:53 AM: ----------------------------------------------------------------- Yes, see the following test case: {code:java} @Test public void testCommentLongerThanCopyBufSize() throws Exception { final Field COPY_BUF_SIZE = IOUtils.class.getDeclaredField("COPY_BUF_SIZE"); COPY_BUF_SIZE.setAccessible(true); final int copyBufSize = (int)COPY_BUF_SIZE.get(null); String repeatingString = "...a comment that is longer than the copy buf size"; // Repeat the comment string until we are sure to exceed the COPY_BUF_SIZE int numberOfRepeats = (copyBufSize / repeatingString.length()) + 1; String comment = String.join("", Collections.nCopies(numberOfRepeats, repeatingString)); final File f = File.createTempFile("commons-compress-zipfiletest-longcomment", ".zip"); f.deleteOnExit(); try (final OutputStream o = Files.newOutputStream(f.toPath()); final ZipArchiveOutputStream zo = new ZipArchiveOutputStream(o)) { ZipArchiveEntry ze = new ZipArchiveEntry("foo"); ze.setComment(comment); zo.putArchiveEntry(ze); zo.write("bar".getBytes(UTF_8)); zo.closeArchiveEntry(); } try(final ZipFile zip = new ZipFile(FileChannel.open(f.toPath(), StandardOpenOption.READ))) { ZipArchiveEntry entry = zip.getEntry("foo"); assertNotNull(entry); assertEquals(comment, entry.getComment()); } }{code} My understanding of {{IOUtils.readRange()}} was incorrect, it can read more instead of less. This testcase creates a zip file with a 8050 length comment. In my case {{IOUtils.readRange()}} returns 8072 bytes when reading the comment, although this may depend on the platform. The cause is the same as COMPRESS-584, see [https://github.com/apache/commons-compress/pull/214]. With this {{IOUtils.readRange()}} fix it works: {code:java} public static byte[] readRange(final ReadableByteChannel input, final int len) throws IOException { final ByteArrayOutputStream output = new ByteArrayOutputStream(); final ByteBuffer b = ByteBuffer.allocate(Math.min(len, COPY_BUF_SIZE)); int read = 0; while (read < len) { // Make sure we never read more than len bytes b.limit(Math.min(len - read, b.remaining())); final int readNow = input.read(b); if (readNow <= 0) { break; } output.write(b.array(), 0, readNow); b.rewind(); read += readNow; }{code} The testcase can be made platform independent by reading from memory and trickling bytes explicitly like the {{readRangeFromChannelDoesntReadMoreThanAskedForWhenItGotLessInFirstReadCall()}} testcase in the PR for COMPRESS-584. was (Author: matthijsln): Yes, see the following test case: {code:java} @Test public void testCommentLongerThanCopyBufSize() throws Exception { final Field COPY_BUF_SIZE = IOUtils.class.getDeclaredField("COPY_BUF_SIZE"); COPY_BUF_SIZE.setAccessible(true); final int copyBufSize = (int)COPY_BUF_SIZE.get(null); String repeatingString = "...a comment that is longer than the copy buf size"; // Repeat the comment string until we are sure to exceed the COPY_BUF_SIZE int numberOfRepeats = (copyBufSize / repeatingString.length()) + 1; String comment = String.join("", Collections.nCopies(numberOfRepeats, repeatingString)); final File f = File.createTempFile("commons-compress-zipfiletest-longcomment", ".zip"); f.deleteOnExit(); try (final OutputStream o = Files.newOutputStream(f.toPath()); final ZipArchiveOutputStream zo = new ZipArchiveOutputStream(o)) { ZipArchiveEntry ze = new ZipArchiveEntry("foo"); ze.setComment(comment); zo.putArchiveEntry(ze); zo.write("bar".getBytes(UTF_8)); zo.closeArchiveEntry(); } try(final ZipFile zip = new ZipFile(FileChannel.open(f.toPath(), StandardOpenOption.READ))) { ZipArchiveEntry entry = zip.getEntry("foo"); assertNotNull(entry); assertEquals(comment, entry.getComment()); } }{code} My understanding of {{IOUtils.readRange()}} was incorrect, it can read more instead of less. This testcase creates a zip file with a 8050 length comment. In my case {{IOUtils.readRange()}} returns 8072 bytes instead, although this may depend on the platform. The cause is the same as COMPRESS-584, see [https://github.com/apache/commons-compress/pull/214]. With this {{IOUtils.readRange()}} fix it works: {code:java} public static byte[] readRange(final ReadableByteChannel input, final int len) throws IOException { final ByteArrayOutputStream output = new ByteArrayOutputStream(); final ByteBuffer b = ByteBuffer.allocate(Math.min(len, COPY_BUF_SIZE)); int read = 0; while (read < len) { // Make sure we never read more than len bytes b.limit(Math.min(len - read, b.remaining())); final int readNow = input.read(b); if (readNow <= 0) { break; } output.write(b.array(), 0, readNow); b.rewind(); read += readNow; }{code} The testcase can be made platform independent by reading from memory and trickling bytes explicitly like the {{readRangeFromChannelDoesntReadMoreThanAskedForWhenItGotLessInFirstReadCall()}} testcase in the PR for COMPRESS-584. > ZipFile fails to read a zipfile with a comment or extra data longer than 8024 > bytes > ----------------------------------------------------------------------------------- > > Key: COMPRESS-585 > URL: https://issues.apache.org/jira/browse/COMPRESS-585 > Project: Commons Compress > Issue Type: Bug > Components: Archivers > Affects Versions: 1.21 > Reporter: Matthijs Laan > Priority: Minor > Attachments: COMPRESS-585-test.patch > > > See the attached patch for a unit test demonstrating the issue with a long > comment. > The cause is that {{ZipFile.readCentralDirectoryEntry()}} calls > {{IOUtils.readRange()}} and assumes if it returns less than the length it > asked for that the EOF is reached, however this is not how that method works: > it returns max COPY_BUF_SIZE bytes (8024), even if EOF has not been reached. > Besides comments and extra data (in the central directory or local file > header) longer than 8024 bytes the only other place {{readRange()}} is called > is reading filenames, but that seems like a remote edge case and an EOF > exception is fine. > The IOUtils.readRange() JavaDoc does not specify how it communicates EOF. > With a blocking channel this would be when it returns a zero length array. It > could throw an exception when {{Channel.read()}} returns 0 bytes, because > that only happens on non-blocking channels. -- This message was sent by Atlassian Jira (v8.3.4#803005)