[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2021-06-14 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17362915#comment-17362915
 ] 

A Kelday commented on COMPRESS-514:
---

Hi,

I see there's a new github comment about rebasing the current (very old) PR. 
I'll hopefully have some time to look at this again soon, but I no longer have 
the "test case" 7zip to work with. It contained 3rd party data which cannot be 
retained or shared, plus it was over 1TB.

I'll attempt to create a smaller test case 7zip which reproduces the original 
problem, but it could take some time since it probably requires at least 20 
million central directory paths.

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-542) Corrupt 7z allocates huge amount of SevenZEntries

2020-08-06 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172118#comment-17172118
 ] 

A Kelday commented on COMPRESS-542:
---

Here are a couple test files which will I think reproduce the issue:

[^endheadercorrupted.7z]

[^endheadercorrupted2.7z]

> Corrupt 7z allocates huge amount of SevenZEntries
> -
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:336)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:128)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (COMPRESS-542) Corrupt 7z allocates huge amount of SevenZEntries

2020-08-06 Thread A Kelday (Jira)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A Kelday updated COMPRESS-542:
--
Attachment: endheadercorrupted.7z

> Corrupt 7z allocates huge amount of SevenZEntries
> -
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:336)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:128)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (COMPRESS-542) Corrupt 7z allocates huge amount of SevenZEntries

2020-08-06 Thread A Kelday (Jira)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A Kelday updated COMPRESS-542:
--
Attachment: endheadercorrupted2.7z

> Corrupt 7z allocates huge amount of SevenZEntries
> -
>
> Key: COMPRESS-542
> URL: https://issues.apache.org/jira/browse/COMPRESS-542
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.20
>Reporter: Robin Schimpf
>Priority: Major
> Attachments: 
> Reduced_memory_allocation_for_corrupted_7z_archives.patch, 
> endheadercorrupted.7z, endheadercorrupted2.7z
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> We ran into a problem where a 1.43GB corrupt 7z file tried to allocate about 
> 138 million SevenZArchiveEntries which will use about 12GB of memory. Sadly 
> I'm unable to share the file. If you have enough Memory available the 
> following exception is thrown.
> {code:java}
> java.io.IOException: Start header corrupt and unable to guess end Header
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.tryToLocateEndHeader(SevenZFile.java:511)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:470)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:336)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:128)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:369)
> {code}
> 7z itself aborts really quick when I'm trying to list the content of the file.
> {code:java}
> 7z l "corrupt.7z"
> 7-Zip 18.01 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-01-28
> Scanning the drive for archives:
> 1 file, 1537752212 bytes (1467 MiB)
> Listing archive: corrupt.7z
> ERROR: corrupt.7z : corrupt.7z
> Open ERROR: Can not open the file as [7z] archive
> ERRORS:
> Is not archive
> Errors: 1
> {code}
> I hacked together the attached patch which will reduce the memory allocation 
> to about 1GB. So lazy instantiation of the entries could be a good solution 
> to the problem. Optimal would be to only create the entries if the headers 
> could be parsed correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-06-01 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121361#comment-17121361
 ] 

A Kelday commented on COMPRESS-514:
---

Thanks [~bodewig] and [~peterlee] for the input. Happy to work on it more at 
some point if you choose an option (I'll keep thinking about it anyway and do 
some more checks).

Peter explained my concern exactly: that in most cases given corrupt data, we 
could expect an exception other than the one triggered by the CRC check to 
happen _before_ the end of stream is ever reached (because we aren't just 
transferring data, we're branching based on it). That's really a best case, 
because worse than that is some garbage filename list being created. What I'm 
very conscious of is making the common use case code worse.

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-21 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113466#comment-17113466
 ] 

A Kelday edited comment on COMPRESS-514 at 5/21/20, 6:54 PM:
-

After digging in a bit more this takes me back to the same CRC problem as 
before, but with some new info after looking at the 7zip source.

It looks like 7zip does nearly the same as the current Commons Compress; read 
the whole header buffer into ram and CRC before parsing. The difference is 
that's an unsigned int, so maximum 4GiB (above that is unsupported). Indeed 
7zip uses over 5GiB ram simply to show the files list of this 1.2TB archive.

That leads to at least three options:
 # 7zip method: read all into ram (with multiple buffers up to 4G) for CRC and 
parse
 # Read the header twice if necessary: once streamed for CRC, the next using a 
small buffer to parse. If the header fits in our small buffer entirely no extra 
read is required.
 # Read/parse the header and compute CRC at the same time (bad because you 
don't find out the data is wrong until it's too late)

It would be great to have some opinion here, because this is more than I'd 
hoped it would require to fix. There's always the choice to just not support 
over 2G...


was (Author: akelday):
After digging in a bit more this takes me back to the same CRC problem as 
before, but with some new info after looking at the 7zip source.

It looks like 7zip does nearly the same as the current Commons Compress; read 
the whole header buffer into ram and CRC before parsing. The difference is 
that's an unsigned int, so maximum 4GiB (above that is unsupported). Indeed 
7zip uses over 5GiB ram simply to show the files list of this 1.2TB archive.

That leads to at least three options:
 # 7zip method: read all into ram (with multiple buffers up to 4G) for CRC and 
parse
 # Read the header twice if necessary: once streamed for CRC, the next using a 
small buffer to parse. If the header fits in our small buffer entirely no extra 
read is required.
 # Read the header and compute CRC at the same time (bad because you don't find 
out the data is wrong until it's too late)

It would be great to have some opinion here, because this is more than I'd 
hoped it would require to fix. There's always the choice to just not support 
over 2G...

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-21 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113466#comment-17113466
 ] 

A Kelday commented on COMPRESS-514:
---

After digging in a bit more this takes me back to the same CRC problem as 
before, but with some new info after looking at the 7zip source.

It looks like 7zip does nearly the same as the current Commons Compress; read 
the whole header buffer into ram and CRC before parsing. The difference is 
that's an unsigned int, so maximum 4GiB (above that is unsupported). Indeed 
7zip uses over 5GiB ram simply to show the files list of this 1.2TB archive.

That leads to at least three options:
 # 7zip method: read all into ram (with multiple buffers up to 4G) for CRC and 
parse
 # Read the header twice if necessary: once streamed for CRC, the next using a 
small buffer to parse. If the header fits in our small buffer entirely no extra 
read is required.
 # Read the header and compute CRC at the same time (bad because you don't find 
out the data is wrong until it's too late)

It would be great to have some opinion here, because this is more than I'd 
hoped it would require to fix. There's always the choice to just not support 
over 2G...

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-18 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17110652#comment-17110652
 ] 

A Kelday commented on COMPRESS-514:
---

Hi [~peterlee] ,

As mentioned in the PR the resource close is not handled correctly for encoded 
headers, so that's definitely not fit to merge (sorry about that). The main 
reason is `readEncodedHeader` now returns with an open inputstream.

A quick fix would be to add `Closeable` for `HeaderBuffer`, but would need to 
ensure `close()` is called only when an _encoded_ header was read (otherwise 
the underlying file channel would be closed!). I'll go with that plan if 
nothing else springs to mind.

If you have better ideas I'd be glad to hear them.

 

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-13 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106270#comment-17106270
 ] 

A Kelday commented on COMPRESS-514:
---

[~ggregory] I think the current PR fixes this issue without any new problems.

It sidesteps the problem of the end header being fully in memory for the CRC 
check (that's what current master branch does anyway), but should make it 
easier to tackle that later. I think that ought to be a separate issue. I might 
have time to work on that myself at some point if nobody else does.

[~peterlee] thanks very much for your PR comments so far, it's been most 
helpful!

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104944#comment-17104944
 ] 

A Kelday edited comment on COMPRESS-514 at 5/11/20, 10:57 PM:
--

[~ggregory] for what it's worth, that the PR in.

Argh, helps if the tests are in the correct location!

What happens when rushing to get to bed... sorted now. JDK 14 doesn't appear to 
build, the rest do.


was (Author: akelday):
[~ggregory] for what it's worth, that the PR in.

Argh, helps if the tests are in the correct location!

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104944#comment-17104944
 ] 

A Kelday edited comment on COMPRESS-514 at 5/11/20, 10:47 PM:
--

[~ggregory] for what it's worth, that the PR in.

Argh, helps if the tests are in the correct location!


was (Author: akelday):
[~ggregory] for what it's worth, that the PR in.

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104944#comment-17104944
 ] 

A Kelday commented on COMPRESS-514:
---

[~ggregory] for what it's worth, that the PR in.

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104904#comment-17104904
 ] 

A Kelday commented on COMPRESS-514:
---

Hi [~ggregory] ,

Attached now is the main class I patched in (in place of direct ByteBuffer 
usage). There's obviously more to it so a diff or PR would make more sense, but 
I expect you folks will have a nicer solution anyway!

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread A Kelday (Jira)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

A Kelday updated COMPRESS-514:
--
Attachment: HeaderChannelBuffer.java

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
> Attachments: HeaderChannelBuffer.java
>
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104320#comment-17104320
 ] 

A Kelday edited comment on COMPRESS-514 at 5/11/20, 11:09 AM:
--

Hi [~peterlee] ,

There's no problem for me - as I said above, I patched commons compress to make 
it work. My question is whether you would expect this to be possible or not 
(e.g. maybe it's been decided not to support it).

If you would prefer to support it then I have code to do so, which I'm happy to 
share :)

There are however some questions regarding CRC checks if we follow that path...


was (Author: akelday):
Hi Peter,

There's no problem for me - as I said above, I patched commons compress to make 
it work. My question is whether you would expect this to be possible or not 
(e.g. maybe it's been decided not to support it).

If you would prefer to support it then I have code to do so, which I'm happy to 
share :)

There are however some questions regarding CRC checks if we follow that path...

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-11 Thread A Kelday (Jira)


[ 
https://issues.apache.org/jira/browse/COMPRESS-514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104320#comment-17104320
 ] 

A Kelday commented on COMPRESS-514:
---

Hi Peter,

There's no problem for me - as I said above, I patched commons compress to make 
it work. My question is whether you would expect this to be possible or not 
(e.g. maybe it's been decided not to support it).

If you would prefer to support it then I have code to do so, which I'm happy to 
share :)

There are however some questions regarding CRC checks if we follow that path...

> SevenZFile fails with encoded header over 2GiB
> --
>
> Key: COMPRESS-514
> URL: https://issues.apache.org/jira/browse/COMPRESS-514
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.20
>Reporter: A Kelday
>Priority: Minor
>
> When reading what some may call a large encrypted 7zip file (1.2TB with 22 
> million files), the read fails at the header stage with the trace below. Is 
> this within the spec? I've written some code to handle it, because I did 
> actually need to extract the file in java. If that's of any use I can provide 
> it (it's a naive wrapper that just pages in a buffer at a time).
>  
> {code:java}
> Exception in thread "main" java.io.IOException: Cannot handle 
> unpackSize241696
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
> at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
> {code}
> 7zip itself can also open it (and display/extract etc.), here are the stats:
>  
>  
> {code:java}
> Size: 2 489 903 580 875
> Packed Size: 1 349 110 308 832
> Folders: 40 005
> Files: 22 073 957
> CRC: E26F6A96
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (COMPRESS-514) SevenZFile fails with encoded header over 2GiB

2020-05-08 Thread A Kelday (Jira)
A Kelday created COMPRESS-514:
-

 Summary: SevenZFile fails with encoded header over 2GiB
 Key: COMPRESS-514
 URL: https://issues.apache.org/jira/browse/COMPRESS-514
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.20
Reporter: A Kelday


When reading what some may call a large encrypted 7zip file (1.2TB with 22 
million files), the read fails at the header stage with the trace below. Is 
this within the spec? I've written some code to handle it, because I did 
actually need to extract the file in java. If that's of any use I can provide 
it (it's a naive wrapper that just pages in a buffer at a time).

 
{code:java}
Exception in thread "main" java.io.IOException: Cannot handle 
unpackSize241696
at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.assertFitsIntoInt(SevenZFile.java:1523)
at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:622)
at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.initializeArchive(SevenZFile.java:532)
at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:468)
at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:337)
at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:129)
at 
org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:116)
{code}
7zip itself can also open it (and display/extract etc.), here are the stats:

 

 
{code:java}
Size: 2 489 903 580 875
Packed Size: 1 349 110 308 832
Folders: 40 005
Files: 22 073 957
CRC: E26F6A96
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)