Gren Elliot created COMPRESS-674: ------------------------------------ Summary: commons-compress-1.26.1 false positive on detecting archive Key: COMPRESS-674 URL: https://issues.apache.org/jira/browse/COMPRESS-674 Project: Commons Compress Issue Type: Bug Components: Archivers Affects Versions: 1.26.1 Environment: Intel running macOS Sonoma - but doubt this is significant Reporter: Gren Elliot
I’m finding that commons-compress-1.26.1 is recognising a utf-16 text file as a tar archive – unlike the previous version This is the code that changed in that release in ArchiveStreamFactory - *public static String detect(final InputStream in) throws ArchiveException {* that differs in detection: {{if (signatureLength >= {_}TAR_HEADER_SIZE{_}) {}} {{ try (TarArchiveInputStream inputStream = new TarArchiveInputStream(new ByteArrayInputStream(tarHeader))) {}} {{ _// COMPRESS-191 - verify the header checksum_}} {{ _// COMPRESS-644 - do not allow zero byte file entries_}} {{ __ TarArchiveEntry entry = inputStream.getNextEntry();}} {{ _// try to find the first non-directory entry within the first 10 entries._}} {{ __ int count = 0;}} {{ while (entry != null && entry.isDirectory() && count++ < {_}TAR_TEST_ENTRY_COUNT{_}) {}} {{ entry = inputStream.getNextEntry();}} {{ }}} {{ if (entry != null && entry.isCheckSumOK() && !entry.isDirectory() && entry.getSize() > 0 || count > 0) {}} {{ return {_}TAR{_};}} {{ }}} {{ } catch (final Exception e) { _// NOPMD NOSONAR_}} {{ _// can generate IllegalArgumentException as well as IOException auto-detection, simply not a TAR ignored_}} {{ __ }}} {{}}} I feel this is being too lenient. For instance at the last “if” statement, for the test file, entry is null and count=1. The code suggests it is looking for the first non-directory entry. It hasn’t found a non-directory entry in our case. For instance, the earlier code at least checked that the checksum was OK for the one entry it checked (it isn’t for our test file…) -- This message was sent by Atlassian Jira (v8.20.10#820010)