Gren Elliot created COMPRESS-674:
------------------------------------

             Summary: commons-compress-1.26.1 false positive on detecting 
archive
                 Key: COMPRESS-674
                 URL: https://issues.apache.org/jira/browse/COMPRESS-674
             Project: Commons Compress
          Issue Type: Bug
          Components: Archivers
    Affects Versions: 1.26.1
         Environment: Intel running macOS Sonoma - but doubt this is significant
            Reporter: Gren Elliot


I’m finding that commons-compress-1.26.1 is recognising a utf-16 text file as a 
tar archive – unlike the previous version

 

This is the code that changed in that release in ArchiveStreamFactory - *public 
static String detect(final InputStream in) throws ArchiveException {*

that differs in detection:

 

{{if (signatureLength >= {_}TAR_HEADER_SIZE{_}) {}}
{{    try (TarArchiveInputStream inputStream = new TarArchiveInputStream(new 
ByteArrayInputStream(tarHeader))) {}}
{{        _// COMPRESS-191 - verify the header checksum_}}
{{        _// COMPRESS-644 - do not allow zero byte file entries_}}
{{        __        TarArchiveEntry entry = inputStream.getNextEntry();}}
{{        _// try to find the first non-directory entry within the first 10 
entries._}}
{{        __        int count = 0;}}
{{        while (entry != null && entry.isDirectory() && count++ < 
{_}TAR_TEST_ENTRY_COUNT{_}) {}}
{{            entry = inputStream.getNextEntry();}}
{{        }}}
{{        if (entry != null && entry.isCheckSumOK() && !entry.isDirectory() && 
entry.getSize() > 0 || count > 0) {}}
{{            return {_}TAR{_};}}
{{        }}}
{{    } catch (final Exception e) { _// NOPMD NOSONAR_}}
{{        _// can generate IllegalArgumentException as well as IOException 
auto-detection, simply not a TAR ignored_}}
{{    __    }}}
{{}}}

 

I feel this is being too lenient.  For instance at the last “if” statement, for 
the test file, entry is null and count=1.  The code suggests it is looking for 
the first non-directory entry.  It hasn’t found a non-directory entry in our 
case.

 

For instance, the earlier code at least checked that the checksum was OK for 
the one entry it checked (it isn’t for our test file…)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to