[jira] [Commented] (COMPRESS-674) commons-compress-1.26.1 false positive on detecting archive

2024-03-23 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830083#comment-17830083
 ] 

Tim Allison commented on COMPRESS-674:
--

Noticing this, too, just now, over on Tika: 
https://issues.apache.org/jira/browse/TIKA-4220

We'll fork the fix and see how that improves results on our regression corpus. 
Thank you so much [~ggregory] and [~fatkudu]!

> commons-compress-1.26.1 false positive on detecting archive
> ---
>
> Key: COMPRESS-674
> URL: https://issues.apache.org/jira/browse/COMPRESS-674
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.26.1
> Environment: Intel running macOS Sonoma - but doubt this is 
> significant
>Reporter: Gren Elliot
>Priority: Minor
> Fix For: 1.27.0
>
>
> I’m finding that commons-compress-1.26.1 is recognising a utf-16 text file as 
> a tar archive – unlike the previous version
>  
> This is the code that changed in that release in ArchiveStreamFactory - 
> *public static String detect(final InputStream in) throws ArchiveException {*
> that differs in detection:
>  
> {{if (signatureLength >= {_}TAR_HEADER_SIZE{_}) {}}
> {{    try (TarArchiveInputStream inputStream = new TarArchiveInputStream(new 
> ByteArrayInputStream(tarHeader))) {}}
> {{    _// COMPRESS-191 - verify the header checksum_}}
> {{    _// COMPRESS-644 - do not allow zero byte file entries_}}
> {{    __    TarArchiveEntry entry = inputStream.getNextEntry();}}
> {{    _// try to find the first non-directory entry within the first 10 
> entries._}}
> {{    __    int count = 0;}}
> {{    while (entry != null && entry.isDirectory() && count++ < 
> {_}TAR_TEST_ENTRY_COUNT{_}) {}}
> {{    entry = inputStream.getNextEntry();}}
> {{    }}}
> {{    if (entry != null && entry.isCheckSumOK() && !entry.isDirectory() 
> && entry.getSize() > 0 || count > 0) {}}
> {{    return {_}TAR{_};}}
> {{    }}}
> {{    } catch (final Exception e) { _// NOPMD NOSONAR_}}
> {{    _// can generate IllegalArgumentException as well as IOException 
> auto-detection, simply not a TAR ignored_}}
> {{    __    }}}
> {{}}}
>  
> I feel this is being too lenient.  For instance at the last “if” statement, 
> for the test file, entry is null and count=1.  The code suggests it is 
> looking for the first non-directory entry.  It hasn’t found a non-directory 
> entry in our case.
>  
> For instance, the earlier code at least checked that the checksum was OK for 
> the one entry it checked (it isn’t for our test file…)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (COMPRESS-674) commons-compress-1.26.1 false positive on detecting archive

2024-03-20 Thread Gary D. Gregory (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829324#comment-17829324
 ] 

Gary D. Gregory commented on COMPRESS-674:
--

Hello [~fatkudu] 

Thank you for your report.

The best path forward is to provide a failing unit test as a PR on GitHub. If 
you have a fix, include it in the PR.

Note that your description of the code is a mix of code and markup.

 

> commons-compress-1.26.1 false positive on detecting archive
> ---
>
> Key: COMPRESS-674
> URL: https://issues.apache.org/jira/browse/COMPRESS-674
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.26.1
> Environment: Intel running macOS Sonoma - but doubt this is 
> significant
>Reporter: Gren Elliot
>Priority: Minor
>
> I’m finding that commons-compress-1.26.1 is recognising a utf-16 text file as 
> a tar archive – unlike the previous version
>  
> This is the code that changed in that release in ArchiveStreamFactory - 
> *public static String detect(final InputStream in) throws ArchiveException {*
> that differs in detection:
>  
> {{if (signatureLength >= {_}TAR_HEADER_SIZE{_}) {}}
> {{    try (TarArchiveInputStream inputStream = new TarArchiveInputStream(new 
> ByteArrayInputStream(tarHeader))) {}}
> {{    _// COMPRESS-191 - verify the header checksum_}}
> {{    _// COMPRESS-644 - do not allow zero byte file entries_}}
> {{    __    TarArchiveEntry entry = inputStream.getNextEntry();}}
> {{    _// try to find the first non-directory entry within the first 10 
> entries._}}
> {{    __    int count = 0;}}
> {{    while (entry != null && entry.isDirectory() && count++ < 
> {_}TAR_TEST_ENTRY_COUNT{_}) {}}
> {{    entry = inputStream.getNextEntry();}}
> {{    }}}
> {{    if (entry != null && entry.isCheckSumOK() && !entry.isDirectory() 
> && entry.getSize() > 0 || count > 0) {}}
> {{    return {_}TAR{_};}}
> {{    }}}
> {{    } catch (final Exception e) { _// NOPMD NOSONAR_}}
> {{    _// can generate IllegalArgumentException as well as IOException 
> auto-detection, simply not a TAR ignored_}}
> {{    __    }}}
> {{}}}
>  
> I feel this is being too lenient.  For instance at the last “if” statement, 
> for the test file, entry is null and count=1.  The code suggests it is 
> looking for the first non-directory entry.  It hasn’t found a non-directory 
> entry in our case.
>  
> For instance, the earlier code at least checked that the checksum was OK for 
> the one entry it checked (it isn’t for our test file…)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)