[jira] [Commented] (COMPRESS-674) commons-compress-1.26.1 false positive on detecting archive
[ https://issues.apache.org/jira/browse/COMPRESS-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830083#comment-17830083 ] Tim Allison commented on COMPRESS-674: -- Noticing this, too, just now, over on Tika: https://issues.apache.org/jira/browse/TIKA-4220 We'll fork the fix and see how that improves results on our regression corpus. Thank you so much [~ggregory] and [~fatkudu]! > commons-compress-1.26.1 false positive on detecting archive > --- > > Key: COMPRESS-674 > URL: https://issues.apache.org/jira/browse/COMPRESS-674 > Project: Commons Compress > Issue Type: Bug > Components: Archivers >Affects Versions: 1.26.1 > Environment: Intel running macOS Sonoma - but doubt this is > significant >Reporter: Gren Elliot >Priority: Minor > Fix For: 1.27.0 > > > I’m finding that commons-compress-1.26.1 is recognising a utf-16 text file as > a tar archive – unlike the previous version > > This is the code that changed in that release in ArchiveStreamFactory - > *public static String detect(final InputStream in) throws ArchiveException {* > that differs in detection: > > {{if (signatureLength >= {_}TAR_HEADER_SIZE{_}) {}} > {{ try (TarArchiveInputStream inputStream = new TarArchiveInputStream(new > ByteArrayInputStream(tarHeader))) {}} > {{ _// COMPRESS-191 - verify the header checksum_}} > {{ _// COMPRESS-644 - do not allow zero byte file entries_}} > {{ __ TarArchiveEntry entry = inputStream.getNextEntry();}} > {{ _// try to find the first non-directory entry within the first 10 > entries._}} > {{ __ int count = 0;}} > {{ while (entry != null && entry.isDirectory() && count++ < > {_}TAR_TEST_ENTRY_COUNT{_}) {}} > {{ entry = inputStream.getNextEntry();}} > {{ }}} > {{ if (entry != null && entry.isCheckSumOK() && !entry.isDirectory() > && entry.getSize() > 0 || count > 0) {}} > {{ return {_}TAR{_};}} > {{ }}} > {{ } catch (final Exception e) { _// NOPMD NOSONAR_}} > {{ _// can generate IllegalArgumentException as well as IOException > auto-detection, simply not a TAR ignored_}} > {{ __ }}} > {{}}} > > I feel this is being too lenient. For instance at the last “if” statement, > for the test file, entry is null and count=1. The code suggests it is > looking for the first non-directory entry. It hasn’t found a non-directory > entry in our case. > > For instance, the earlier code at least checked that the checksum was OK for > the one entry it checked (it isn’t for our test file…) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (COMPRESS-674) commons-compress-1.26.1 false positive on detecting archive
[ https://issues.apache.org/jira/browse/COMPRESS-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829324#comment-17829324 ] Gary D. Gregory commented on COMPRESS-674: -- Hello [~fatkudu] Thank you for your report. The best path forward is to provide a failing unit test as a PR on GitHub. If you have a fix, include it in the PR. Note that your description of the code is a mix of code and markup. > commons-compress-1.26.1 false positive on detecting archive > --- > > Key: COMPRESS-674 > URL: https://issues.apache.org/jira/browse/COMPRESS-674 > Project: Commons Compress > Issue Type: Bug > Components: Archivers >Affects Versions: 1.26.1 > Environment: Intel running macOS Sonoma - but doubt this is > significant >Reporter: Gren Elliot >Priority: Minor > > I’m finding that commons-compress-1.26.1 is recognising a utf-16 text file as > a tar archive – unlike the previous version > > This is the code that changed in that release in ArchiveStreamFactory - > *public static String detect(final InputStream in) throws ArchiveException {* > that differs in detection: > > {{if (signatureLength >= {_}TAR_HEADER_SIZE{_}) {}} > {{ try (TarArchiveInputStream inputStream = new TarArchiveInputStream(new > ByteArrayInputStream(tarHeader))) {}} > {{ _// COMPRESS-191 - verify the header checksum_}} > {{ _// COMPRESS-644 - do not allow zero byte file entries_}} > {{ __ TarArchiveEntry entry = inputStream.getNextEntry();}} > {{ _// try to find the first non-directory entry within the first 10 > entries._}} > {{ __ int count = 0;}} > {{ while (entry != null && entry.isDirectory() && count++ < > {_}TAR_TEST_ENTRY_COUNT{_}) {}} > {{ entry = inputStream.getNextEntry();}} > {{ }}} > {{ if (entry != null && entry.isCheckSumOK() && !entry.isDirectory() > && entry.getSize() > 0 || count > 0) {}} > {{ return {_}TAR{_};}} > {{ }}} > {{ } catch (final Exception e) { _// NOPMD NOSONAR_}} > {{ _// can generate IllegalArgumentException as well as IOException > auto-detection, simply not a TAR ignored_}} > {{ __ }}} > {{}}} > > I feel this is being too lenient. For instance at the last “if” statement, > for the test file, entry is null and count=1. The code suggests it is > looking for the first non-directory entry. It hasn’t found a non-directory > entry in our case. > > For instance, the earlier code at least checked that the checksum was OK for > the one entry it checked (it isn’t for our test file…) -- This message was sent by Atlassian Jira (v8.20.10#820010)