[jira] [Commented] (COMPRESS-185) BZip2CompressorInputStream truncates files compressed with pbzip2

2012-03-29 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241235#comment-13241235
 ] 

Stefan Bodewig commented on COMPRESS-185:
-

Don't worry.

> BZip2CompressorInputStream truncates files compressed with pbzip2
> -
>
> Key: COMPRESS-185
> URL: https://issues.apache.org/jira/browse/COMPRESS-185
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
>Reporter: Karsten Loesing
> Fix For: 1.4
>
>
> I'm using BZip2CompressorInputStream in Compress 1.3 to decompress a file 
> that was created with pbzip2 1.1.6 (http://compression.ca/pbzip2/).  The 
> stream ends early after 90 bytes, truncating the rest of the 
> pbzip2-compressed file.  Decompressing the file with bunzip2 or compressing 
> the original file with bzip2 both fix the issue.  I think both pbzip2 and 
> Compress are to blame here: pbzip2 apparently does something non-standard 
> when compressing files, and Compress should handle the non-standard format 
> rather than pretending to be done decompressing.  Another option is that I'm 
> doing something wrong; in that case please let me know! :)
> Here's how the problem can be reproduced:
>  1. Generate a file that's 90+ bytes large: dd if=/dev/zero of=1mbfile 
> count=1 bs=1M
>  2. Compress with pbzip2: pbzip2 1mbfile
>  3. Decompress with Bunzip2 class below
>  4. Notice how the resulting 1mbfile is 90 bytes large, not 1M.
> Now compare to using bunzip2/bzip2:
>  - Do the steps above, but instead of 2, compress with bzip2: bzip2 1mbfile
>  - Do the steps above, but instead of 3, decompress with bunzip2: bunzip2 
> 1mbfile.bz2
> import java.io.*;
> import org.apache.commons.compress.compressors.bzip2.*;
> public class Bunzip2 {
>   public static void main(String[] args) throws Exception {
> File inFile = new File(args[0]);
> File outFile = new File(args[0].substring(0, args[0].length() - 4));
> FileInputStream fis = new FileInputStream(inFile);
> BZip2CompressorInputStream bz2cis =
> new BZip2CompressorInputStream(fis);
> BufferedInputStream bis = new BufferedInputStream(bz2cis);
> BufferedOutputStream bos = new BufferedOutputStream(
> new FileOutputStream(outFile));
> int len;
> byte[] data = new byte[1024];
> while ((len = bis.read(data, 0, 1024)) >= 0) {
>   bos.write(data, 0, len);
> }   
> bos.close();
> bis.close();
>   }
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-183) Support for de/encoding of tar entry names other than plain 8BIT conversion.

2012-03-23 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237447#comment-13237447
 ] 

Stefan Bodewig commented on COMPRESS-183:
-

I need to add comments and want to fix handling of linkName for tar entries 
that represent links but in general the code should be fixed with svn revision 
1304709

The tar package now uses the platform's native encoding by default (this may 
change to ISO-8859-1 before the release).  Encoding can be overridden inside 
the constructor.

The outputstream has an additional option that can be used to tell it to write 
non-ASCII file names to PAX extension headers, this should work for any modern 
implemenation of tar and is the only way to get portable archives - at the 
expense of an additional 512 bytes block.

The input stream will read and apply PAX extension headers transparently.

> Support for de/encoding of tar entry names other than plain 8BIT conversion.
> 
>
> Key: COMPRESS-183
> URL: https://issues.apache.org/jira/browse/COMPRESS-183
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.3
>Reporter: Joao Schim
>  Labels: patch
> Fix For: 1.4
>
> Attachments: patch-tar-name-encoding.diff, 
> patch-tar-name-encoding.diff, patch-tar-name-encoding.diff
>
>
> The names of tar entries are currently encoded/decoded by means of plain 8bit 
> conversions of byte to char and vice-versa. This prohibits the use of 
> encodings like UTF8 in the file names. Whether the use of UTF8 (or any other 
> non ASCII) in file names is sensible is a chapter of its own. However tar 
> archives that contain files which names have been encoded with UTF8 do float 
> around. These files currently can not be read correctly by commons-compress 
> due to the encoding being hardcoded to plain 8BIT only. 
> The supplied patch allows to use encodings other than 8BIT using a 
> TarArchiveCodec structure. It does not change the standard functionality, but 
> adds to it the possibility of using a different encoding. 
> A method was added to the TarUtilsTest junit test to test the added 
> functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-183) Support for de/encoding of tar entry names other than plain 8BIT conversion.

2012-03-16 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231866#comment-13231866
 ] 

Stefan Bodewig commented on COMPRESS-183:
-

The zip package already contains code that is similar to the codec in your 
patch, I'll look into reusing that.

Modern (POSIX) tars support non-ASCII encodings via PAX extension headers, 
which current trunk already supports on the reading side - it shouldn't be too 
hard for the writing side.

> Support for de/encoding of tar entry names other than plain 8BIT conversion.
> 
>
> Key: COMPRESS-183
> URL: https://issues.apache.org/jira/browse/COMPRESS-183
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.3
>Reporter: Joao Schim
>  Labels: patch
> Fix For: 1.4
>
> Attachments: patch-tar-name-encoding.diff, 
> patch-tar-name-encoding.diff, patch-tar-name-encoding.diff
>
>
> The names of tar entries are currently encoded/decoded by means of plain 8bit 
> conversions of byte to char and vice-versa. This prohibits the use of 
> encodings like UTF8 in the file names. Whether the use of UTF8 (or any other 
> non ASCII) in file names is sensible is a chapter of its own. However tar 
> archives that contain files which names have been encoded with UTF8 do float 
> around. These files currently can not be read correctly by commons-compress 
> due to the encoding being hardcoded to plain 8BIT only. 
> The supplied patch allows to use encodings other than 8BIT using a 
> TarArchiveCodec structure. It does not change the standard functionality, but 
> adds to it the possibility of using a different encoding. 
> A method was added to the TarUtilsTest junit test to test the added 
> functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-182) Support big or even negative numbers in all numeric TAR headers

2012-03-05 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222998#comment-13222998
 ] 

Stefan Bodewig commented on COMPRESS-182:
-

Write support is in with svn revision 1297339

We need a new name for setBigFileMode - setBigNumberMode?

Other than that, I need to update the docs and cover the devMajor/devMinor 
headers for PAX as well before this issue can be closed.

> Support big or even negative numbers in all numeric TAR headers
> ---
>
> Key: COMPRESS-182
> URL: https://issues.apache.org/jira/browse/COMPRESS-182
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.3
>Reporter: Stefan Bodewig
> Fix For: 1.4
>
>
> This is a superset of the functionality that addressed COMPRESS-175
> Jörg Schillig's star and GNU tar may use binary encoding for all numeric 
> fields, PAX/POSIX also provides them inside the extension headers.
> The timestamp field may even contain negative numbers.
> IMHO Commons Compress should:
> * be able to parse numeric fields using binary encoding (positive and 
> negative)
> * fix the current binary parser (see discussion in COMPRESS-16) and add a 
> workaround for broken writers (see COMPRESS-181)
> * be able to parse all standard fields of PAX headers, including the numeric 
> ones (I haven't checked, maybe it already does)
> * have an option to write numbers too big/small in binary encoding much like 
> BIGFILE_STAR does for the file size in trunk
> * have an option to write numbers too big/small in PAX headers much like 
> BIGFILE_POSIX does for the file size in trunk
> * replace bigFileMode and the constants with a more generic property that 
> controls all numeric fields.  We can remove the bigFileMode stuff as it has 
> been added after the 1.3 release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-182) Support big or even negative numbers in all numeric TAR headers

2012-03-03 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13221821#comment-13221821
 ] 

Stefan Bodewig commented on COMPRESS-182:
-

Read support should be complete with svn revision 1296764

> Support big or even negative numbers in all numeric TAR headers
> ---
>
> Key: COMPRESS-182
> URL: https://issues.apache.org/jira/browse/COMPRESS-182
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.3
>Reporter: Stefan Bodewig
> Fix For: 1.4
>
>
> This is a superset of the functionality that addressed COMPRESS-175
> Jörg Schillig's star and GNU tar may use binary encoding for all numeric 
> fields, PAX/POSIX also provides them inside the extension headers.
> The timestamp field may even contain negative numbers.
> IMHO Commons Compress should:
> * be able to parse numeric fields using binary encoding (positive and 
> negative)
> * fix the current binary parser (see discussion in COMPRESS-16) and add a 
> workaround for broken writers (see COMPRESS-181)
> * be able to parse all standard fields of PAX headers, including the numeric 
> ones (I haven't checked, maybe it already does)
> * have an option to write numbers too big/small in binary encoding much like 
> BIGFILE_STAR does for the file size in trunk
> * have an option to write numbers too big/small in PAX headers much like 
> BIGFILE_POSIX does for the file size in trunk
> * replace bigFileMode and the constants with a more generic property that 
> controls all numeric fields.  We can remove the bigFileMode stuff as it has 
> been added after the 1.3 release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-181) Tar files created by AIX native tar, and which contain symlinks, cannot be read by TarArchiveInputStream

2012-03-01 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219982#comment-13219982
 ] 

Stefan Bodewig commented on COMPRESS-181:
-

Robert, could you delete and re-add the attachment, granting the ASF a license 
to include it this time?  That way we could add the tar to our testsuite.

> Tar files created by AIX native tar, and which contain symlinks, cannot be 
> read by TarArchiveInputStream
> 
>
> Key: COMPRESS-181
> URL: https://issues.apache.org/jira/browse/COMPRESS-181
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.2, 1.3, 1.4
> Environment: AIX 5.3
>Reporter: Robert Clark
> Attachments: simple-aix-native-tar.tar
>
>
> A simple tar file created on AIX using the native ({{/usr/bin/tar}} tar 
> utility) *and* which contains a symbolic link, cannot be loaded by 
> TarArchiveInputStream:
> {noformat}
> java.io.IOException: Error detected parsing the header
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:201)
>   at Extractor.extract(Extractor.java:13)
>   at Extractor.main(Extractor.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.tools.ant.taskdefs.ExecuteJava.run(ExecuteJava.java:217)
>   at 
> org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:152)
>   at org.apache.tools.ant.taskdefs.Java.run(Java.java:771)
>   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:221)
>   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:135)
>   at org.apache.tools.ant.taskdefs.Java.execute(Java.java:108)
>   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at org.apache.tools.ant.Target.execute(Target.java:390)
>   at org.apache.tools.ant.Target.performTasks(Target.java:411)
>   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
>   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
>   at 
> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
>   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
>   at org.apache.tools.ant.Main.runBuild(Main.java:809)
>   at org.apache.tools.ant.Main.startAnt(Main.java:217)
>   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
>   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
> Caused by: java.lang.IllegalArgumentException: Invalid byte 0 at offset 0 in 
> '{NUL}1722000726 ' len=12
>   at 
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:99)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:819)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.(TarArchiveEntry.java:314)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:199)
>   ... 29 more
> {noformat}
> Tested with 1.2 and the 1.4 nightly build from Feb 23 
> ({{Implementation-Build: trunk@r1292625; 2012-02-23 03:20:30+}})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-181) Tar files created by AIX native tar, and which contain symlinks, cannot be read by TarArchiveInputStream

2012-02-28 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218245#comment-13218245
 ] 

Stefan Bodewig commented on COMPRESS-181:
-

We don't really have an option to ignore a timestamp unless we allow 
ArchiveEntry#getLastModifiedDate to return null.

What I was trying to say is it doesn't matter much which timestamp we return as 
any choice is wrong.  Returning the equivalent of a 0 timestamp is fine with 
me.  Unfortunately we don't have an infrastructure for warnings (would have 
been good for COMPRESS-176 as well), something for an API redesign in 2.0, I 
guess.

> Tar files created by AIX native tar, and which contain symlinks, cannot be 
> read by TarArchiveInputStream
> 
>
> Key: COMPRESS-181
> URL: https://issues.apache.org/jira/browse/COMPRESS-181
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.2, 1.3, 1.4
> Environment: AIX 5.3
>Reporter: Robert Clark
> Attachments: simple-aix-native-tar.tar
>
>
> A simple tar file created on AIX using the native ({{/usr/bin/tar}} tar 
> utility) *and* which contains a symbolic link, cannot be loaded by 
> TarArchiveInputStream:
> {noformat}
> java.io.IOException: Error detected parsing the header
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:201)
>   at Extractor.extract(Extractor.java:13)
>   at Extractor.main(Extractor.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.tools.ant.taskdefs.ExecuteJava.run(ExecuteJava.java:217)
>   at 
> org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:152)
>   at org.apache.tools.ant.taskdefs.Java.run(Java.java:771)
>   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:221)
>   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:135)
>   at org.apache.tools.ant.taskdefs.Java.execute(Java.java:108)
>   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at org.apache.tools.ant.Target.execute(Target.java:390)
>   at org.apache.tools.ant.Target.performTasks(Target.java:411)
>   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
>   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
>   at 
> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
>   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
>   at org.apache.tools.ant.Main.runBuild(Main.java:809)
>   at org.apache.tools.ant.Main.startAnt(Main.java:217)
>   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
>   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
> Caused by: java.lang.IllegalArgumentException: Invalid byte 0 at offset 0 in 
> '{NUL}1722000726 ' len=12
>   at 
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:99)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:819)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.(TarArchiveEntry.java:314)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:199)
>   ... 29 more
> {noformat}
> Tested with 1.2 and the 1.4 nightly build from Feb 23 
> ({{Implementation-Build: trunk@r1292625; 2012-02-23 03:20:30+}})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-181) Tar files created by AIX native tar, and which contain symlinks, cannot be read by TarArchiveInputStream

2012-02-28 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218198#comment-13218198
 ] 

Stefan Bodewig commented on COMPRESS-181:
-

It doesn't look like an overflow was the reason but if you look at the 
timestamp it certainly reads as if the first byte was a binary 0 by accident 
(if you put an ASCII 1 in there it is identical to the timestamp of the 
directory).

In any case the resulting timestamp is not what it used to be, so using any 
other timestamp would be as valid as trying to parse the rest.

> Tar files created by AIX native tar, and which contain symlinks, cannot be 
> read by TarArchiveInputStream
> 
>
> Key: COMPRESS-181
> URL: https://issues.apache.org/jira/browse/COMPRESS-181
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.2, 1.3, 1.4
> Environment: AIX 5.3
>Reporter: Robert Clark
> Attachments: simple-aix-native-tar.tar
>
>
> A simple tar file created on AIX using the native ({{/usr/bin/tar}} tar 
> utility) *and* which contains a symbolic link, cannot be loaded by 
> TarArchiveInputStream:
> {noformat}
> java.io.IOException: Error detected parsing the header
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:201)
>   at Extractor.extract(Extractor.java:13)
>   at Extractor.main(Extractor.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.tools.ant.taskdefs.ExecuteJava.run(ExecuteJava.java:217)
>   at 
> org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:152)
>   at org.apache.tools.ant.taskdefs.Java.run(Java.java:771)
>   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:221)
>   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:135)
>   at org.apache.tools.ant.taskdefs.Java.execute(Java.java:108)
>   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at org.apache.tools.ant.Target.execute(Target.java:390)
>   at org.apache.tools.ant.Target.performTasks(Target.java:411)
>   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
>   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
>   at 
> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
>   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
>   at org.apache.tools.ant.Main.runBuild(Main.java:809)
>   at org.apache.tools.ant.Main.startAnt(Main.java:217)
>   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
>   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
> Caused by: java.lang.IllegalArgumentException: Invalid byte 0 at offset 0 in 
> '{NUL}1722000726 ' len=12
>   at 
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:99)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:819)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.(TarArchiveEntry.java:314)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:199)
>   ... 29 more
> {noformat}
> Tested with 1.2 and the 1.4 nightly build from Feb 23 
> ({{Implementation-Build: trunk@r1292625; 2012-02-23 03:20:30+}})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-181) Tar files created by AIX native tar, and which contain symlinks, cannot be read by TarArchiveInputStream

2012-02-28 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218093#comment-13218093
 ] 

Stefan Bodewig commented on COMPRESS-181:
-

GNU tar from_header in list.c contains a workaround for this case:

  /* Accommodate buggy tar of unknown vintage, which outputs leading
 NUL if the previous field overflows.  */
  where += !*where;

this basically skips the first byte if it is a binary 0.

> Tar files created by AIX native tar, and which contain symlinks, cannot be 
> read by TarArchiveInputStream
> 
>
> Key: COMPRESS-181
> URL: https://issues.apache.org/jira/browse/COMPRESS-181
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.2, 1.3, 1.4
> Environment: AIX 5.3
>Reporter: Robert Clark
> Attachments: simple-aix-native-tar.tar
>
>
> A simple tar file created on AIX using the native ({{/usr/bin/tar}} tar 
> utility) *and* which contains a symbolic link, cannot be loaded by 
> TarArchiveInputStream:
> {noformat}
> java.io.IOException: Error detected parsing the header
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:201)
>   at Extractor.extract(Extractor.java:13)
>   at Extractor.main(Extractor.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.tools.ant.taskdefs.ExecuteJava.run(ExecuteJava.java:217)
>   at 
> org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:152)
>   at org.apache.tools.ant.taskdefs.Java.run(Java.java:771)
>   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:221)
>   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:135)
>   at org.apache.tools.ant.taskdefs.Java.execute(Java.java:108)
>   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at org.apache.tools.ant.Target.execute(Target.java:390)
>   at org.apache.tools.ant.Target.performTasks(Target.java:411)
>   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
>   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
>   at 
> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
>   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
>   at org.apache.tools.ant.Main.runBuild(Main.java:809)
>   at org.apache.tools.ant.Main.startAnt(Main.java:217)
>   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
>   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
> Caused by: java.lang.IllegalArgumentException: Invalid byte 0 at offset 0 in 
> '{NUL}1722000726 ' len=12
>   at 
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:99)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:819)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.(TarArchiveEntry.java:314)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:199)
>   ... 29 more
> {noformat}
> Tested with 1.2 and the 1.4 nightly build from Feb 23 
> ({{Implementation-Build: trunk@r1292625; 2012-02-23 03:20:30+}})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-16) unable to extract a TAR file that contains an entry which is 10 GB in size

2012-02-28 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218088#comment-13218088
 ] 

Stefan Bodewig commented on COMPRESS-16:


Our code is wrong.

to_chars in src/create.c in GNU tar only uses the remaining bytes and sets the 
first one to 255 or 128 for negative/positive numbers.  Negative numbers only 
occur in time fields where we don't support anything non-octal ATM anyway, so 
this isn't a real problem right now.  It becomes one if we support star/GNU 
tar/POSIX dialects for the other numeric fields as well.  This would be 
required for COMPRESS-177.

I suggest to broaden and reopen COMPRESS-177 to something like "extend 
STAR/POSIX support to all numeric fields" or alternatively create a new issue 
and close this one again.

> unable to extract a TAR file that contains an entry which is 10 GB in size
> --
>
> Key: COMPRESS-16
> URL: https://issues.apache.org/jira/browse/COMPRESS-16
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
> Environment: I am using win xp sp3, but this should be platform 
> independent.
>Reporter: Sam Smith
> Fix For: 1.4
>
> Attachments: 
> 0001-Accept-GNU-tar-files-with-entries-over-8GB-in-size.patch, 
> 0002-Allow-creating-tar-archives-with-files-over-8GB.patch, 
> 0004-Prefer-octal-over-binary-size-representation.patch, ant-8GB-tar.patch, 
> patch-for-compress.txt
>
>
> I made a TAR file which contains a file entry where the file is 10 GB in size.
> When I attempt to extract the file using TarInputStream, it fails with the 
> following stack trace:
>   java.io.IOException: unexpected EOF with 24064 bytes unread
>   at 
> org.apache.commons.compress.archivers.tar.TarInputStream.read(TarInputStream.java:348)
>   at 
> org.apache.commons.compress.archivers.tar.TarInputStream.copyEntryContents(TarInputStream.java:388)
> So, TarInputStream does not seem to support large (> 8 GB?) files.
> Here is something else to note: I created that TAR file using TarOutputStream 
> , which did not complain when asked to write a 10 GB file into the TAR file, 
> so I assume that TarOutputStream has no file size limits?  That, or does it 
> silently create corrupted TAR files (which would be the worst situation of 
> all...)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-181) Tar files created by AIX native tar, and which contain symlinks, cannot be read by TarArchiveInputStream

2012-02-28 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13218073#comment-13218073
 ] 

Stefan Bodewig commented on COMPRESS-181:
-

GNU tar extracts it with a date/time of 1978-02-15 08:55 - which more or less 
looks as if it had translated the leading null to an ASCII 0 (and it looks as 
if that was supposed to be an ASCII 1 to match the timestamp of the dir).



> Tar files created by AIX native tar, and which contain symlinks, cannot be 
> read by TarArchiveInputStream
> 
>
> Key: COMPRESS-181
> URL: https://issues.apache.org/jira/browse/COMPRESS-181
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.2, 1.3, 1.4
> Environment: AIX 5.3
>Reporter: Robert Clark
> Attachments: simple-aix-native-tar.tar
>
>
> A simple tar file created on AIX using the native ({{/usr/bin/tar}} tar 
> utility) *and* which contains a symbolic link, cannot be loaded by 
> TarArchiveInputStream:
> {noformat}
> java.io.IOException: Error detected parsing the header
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:201)
>   at Extractor.extract(Extractor.java:13)
>   at Extractor.main(Extractor.java:28)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.tools.ant.taskdefs.ExecuteJava.run(ExecuteJava.java:217)
>   at 
> org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:152)
>   at org.apache.tools.ant.taskdefs.Java.run(Java.java:771)
>   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:221)
>   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:135)
>   at org.apache.tools.ant.taskdefs.Java.execute(Java.java:108)
>   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perform(Task.java:348)
>   at org.apache.tools.ant.Target.execute(Target.java:390)
>   at org.apache.tools.ant.Target.performTasks(Target.java:411)
>   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
>   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
>   at 
> org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
>   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
>   at org.apache.tools.ant.Main.runBuild(Main.java:809)
>   at org.apache.tools.ant.Main.startAnt(Main.java:217)
>   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
>   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
> Caused by: java.lang.IllegalArgumentException: Invalid byte 0 at offset 0 in 
> '{NUL}1722000726 ' len=12
>   at 
> org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:99)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:819)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveEntry.(TarArchiveEntry.java:314)
>   at 
> org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:199)
>   ... 29 more
> {noformat}
> Tested with 1.2 and the 1.4 nightly build from Feb 23 
> ({{Implementation-Build: trunk@r1292625; 2012-02-23 03:20:30+}})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-27 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217899#comment-13217899
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

Workaround and tests are in svn revision 1294460

I'll look into creating a test archive for the opposite direction today.

> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
> Umlauts
> 
>
> Key: COMPRESS-176
> URL: https://issues.apache.org/jira/browse/COMPRESS-176
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.3
> Environment: Windows 7
>Reporter: Wurstbrot mit Senf
>Assignee: Stefan Bodewig
> Fix For: 1.4
>
> Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, 
> testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in 
> directories.
> I'm accessing a zip file created with WinZip containing a directory with an 
> umlaut ("ä") with ArchiveInputStream. When creating the zip file the 
> unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a 
> directory and the file names for the directory and all files contained in 
> that directory contain backslashes instead of slashes (i.e. completely 
> different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which 
> ArchiveInputStream to create or when using the ZipArchiveInputStream 
> constructor with the correct encoding (I've tried different encodings CP437, 
> CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed 
> by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-16) unable to extract a TAR file that contains an entry which is 10 GB in size

2012-02-27 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217123#comment-13217123
 ] 

Stefan Bodewig commented on COMPRESS-16:


I plan to look up what the GNU tar implementation does, may take a few days, 
though.

I agree with Gili this issue has by now outgrown COMPRESS-16 as it specifically 
only dealt with lengths.  OTOH it is not restricted to star either, if we start 
supporting "bigger numbers" for group or date, we should support star as well 
as PAX.

> unable to extract a TAR file that contains an entry which is 10 GB in size
> --
>
> Key: COMPRESS-16
> URL: https://issues.apache.org/jira/browse/COMPRESS-16
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
> Environment: I am using win xp sp3, but this should be platform 
> independent.
>Reporter: Sam Smith
> Fix For: 1.4
>
> Attachments: 
> 0001-Accept-GNU-tar-files-with-entries-over-8GB-in-size.patch, 
> 0002-Allow-creating-tar-archives-with-files-over-8GB.patch, 
> 0004-Prefer-octal-over-binary-size-representation.patch, ant-8GB-tar.patch, 
> patch-for-compress.txt
>
>
> I made a TAR file which contains a file entry where the file is 10 GB in size.
> When I attempt to extract the file using TarInputStream, it fails with the 
> following stack trace:
>   java.io.IOException: unexpected EOF with 24064 bytes unread
>   at 
> org.apache.commons.compress.archivers.tar.TarInputStream.read(TarInputStream.java:348)
>   at 
> org.apache.commons.compress.archivers.tar.TarInputStream.copyEntryContents(TarInputStream.java:388)
> So, TarInputStream does not seem to support large (> 8 GB?) files.
> Here is something else to note: I created that TAR file using TarOutputStream 
> , which did not complain when asked to write a 10 GB file into the TAR file, 
> so I assume that TarOutputStream has no file size limits?  That, or does it 
> silently create corrupted TAR files (which would be the worst situation of 
> all...)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-27 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217122#comment-13217122
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

Whether we need forward slashes in Unicode extra fields can only be answered by 
somebody using WinZIP.  The best would be creating a test archive with a 
directory that contains a character in its name that is not part of CP437 - and 
to be safe not part of the platform's default encoding either.

> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
> Umlauts
> 
>
> Key: COMPRESS-176
> URL: https://issues.apache.org/jira/browse/COMPRESS-176
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.3
> Environment: Windows 7
>Reporter: Wurstbrot mit Senf
>Assignee: Stefan Bodewig
> Fix For: 1.4
>
> Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, 
> testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in 
> directories.
> I'm accessing a zip file created with WinZip containing a directory with an 
> umlaut ("ä") with ArchiveInputStream. When creating the zip file the 
> unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a 
> directory and the file names for the directory and all files contained in 
> that directory contain backslashes instead of slashes (i.e. completely 
> different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which 
> ArchiveInputStream to create or when using the ZipArchiveInputStream 
> constructor with the correct encoding (I've tried different encodings CP437, 
> CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed 
> by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-25 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216652#comment-13216652
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

In extract.c of unzip60 line 1310ff there is this code that replaces 
backslashes with slashes.  It only replaces them in names that don't contain 
forward slashes (MBSCHR looks up a character in a character array) and only if 
"hostnum" indicates a FAT system.

{noformat}
/* for files from DOS FAT, check for use of backslash instead
 *  of slash as directory separator (bug in some zipper(s); so
 *  far, not a problem in HPFS, NTFS or VFAT systems)
 */
#ifndef SFX
if (G.pInfo->hostnum == FS_FAT_ && !MBSCHR(G.filename, '/')) {
char *p=G.filename;

if (*p) do {
if (*p == '\\') {
if (!G.reported_backslash) {
Info(slide, 0x21, ((char *)slide,
  LoadFarString(BackslashPathSep), G.zipfn));
G.reported_backslash = TRUE;
if (!error_in_archive)
error_in_archive = PK_WARN;
}
*p = '/';
}
} while (*PREINCSTR(p));
}
#endif /* !SFX */
{noformat}

"hostnum" is the upper byte of "version made by" inside the central directory 
header - this is ZipArchiveEntry's get/setPlatform - and FS_FAT_ is 0 
(ZipArchiveEntry#PLATFORM_FAT).  We'd have all pieces together to emulate this.

> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
> Umlauts
> 
>
> Key: COMPRESS-176
> URL: https://issues.apache.org/jira/browse/COMPRESS-176
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.3
> Environment: Windows 7
>Reporter: Wurstbrot mit Senf
> Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, 
> testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in 
> directories.
> I'm accessing a zip file created with WinZip containing a directory with an 
> umlaut ("ä") with ArchiveInputStream. When creating the zip file the 
> unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a 
> directory and the file names for the directory and all files contained in 
> that directory contain backslashes instead of slashes (i.e. completely 
> different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which 
> ArchiveInputStream to create or when using the ZipArchiveInputStream 
> constructor with the correct encoding (I've tried different encodings CP437, 
> CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed 
> by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-25 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216647#comment-13216647
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

OK, this means nobody except for Commons Compress and InfoZIP tools seems to 
read the Unicode extra field.

This is what I get when trying to extract the original ZIP on Linux:

{noformat}
stefan@birdy:~/Desktop$ unzip test-winzip.zip 
Archive:  test-winzip.zip
  inflating: doc.txt.gz  
 extracting: doc2.txt
warning:  test-winzip.zip appears to use backslashes as path separators
   creating: ??/
  inflating: ??/??zip.zip
 extracting: ??/??.txt  
{noformat}

and it creates an "ä" directory.  I'll try to look through InfoZIPs sources 
what it bases it heuristics on, maybe we can use the same in Commons Compress 
to turn backslashes into slashes.


> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
> Umlauts
> 
>
> Key: COMPRESS-176
> URL: https://issues.apache.org/jira/browse/COMPRESS-176
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.3
> Environment: Windows 7
>Reporter: Wurstbrot mit Senf
> Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, 
> testzap-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in 
> directories.
> I'm accessing a zip file created with WinZip containing a directory with an 
> umlaut ("ä") with ArchiveInputStream. When creating the zip file the 
> unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a 
> directory and the file names for the directory and all files contained in 
> that directory contain backslashes instead of slashes (i.e. completely 
> different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which 
> ArchiveInputStream to create or when using the ZipArchiveInputStream 
> constructor with the correct encoding (I've tried different encodings CP437, 
> CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed 
> by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-25 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13216446#comment-13216446
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

AFAIK what we have written down based on findings by Wolfgang Glas in 
http://commons.apache.org/compress/zip.html still stands, WinZIP is the only 
one using Unicode extra fields, all other implementations have switched to the 
language encoding flag.  The only exceptions are Windows compressed folders - 
which doesn't understand either - and InfoZIP based tools if they are compiled 
to use the extra fields.

A question to the original reporter (I'm German so I know the name's a fake 
8-): since you also have an installation of 7zip, what does 7zip think of your 
WinZIP created archive?

> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
> Umlauts
> 
>
> Key: COMPRESS-176
> URL: https://issues.apache.org/jira/browse/COMPRESS-176
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.3
> Environment: Windows 7
>Reporter: Wurstbrot mit Senf
> Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in 
> directories.
> I'm accessing a zip file created with WinZip containing a directory with an 
> umlaut ("ä") with ArchiveInputStream. When creating the zip file the 
> unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a 
> directory and the file names for the directory and all files contained in 
> that directory contain backslashes instead of slashes (i.e. completely 
> different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which 
> ArchiveInputStream to create or when using the ZipArchiveInputStream 
> constructor with the correct encoding (I've tried different encodings CP437, 
> CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed 
> by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-24 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215704#comment-13215704
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

This is what InfoZIP's zip on Linux says:

{noformat}
stefanb@brick:~$ zip -Tv Desktop/test-winzip.zip 
Archive:  Desktop/test-winzip.zip
testing: doc.txt.gz   OK
testing: doc2.txt OK
testing: ??\  OK
testing: ??\??zip.zip OK
testing: ??\??.txtOK
No errors detected in compressed data of Desktop/test-winzip.zip.
test of Desktop/test-winzip.zip OK
{noformat}

The entry for the directory contains a Unicode extra field with 0xc3 0xa4 0x5c 
as UTF-8 encoded name.  This actually is "ä\".

Since directory names in ZIP archives must end with "/" Compress doesn't detect 
this as a directory.  It may be possible to create a workaround like "if the 
'plain name ends with a / and the unicode name uses a \ then bend it", but I 
can't say I'd like that.

Java6 likely works because it doesn't have any idea about unicode extra fields 
and simply uses the "plain" name.  You'd get the same behavior from 
ZipArchiveInputStream by setting useUnicodeExtraFields to false in the 
constructor.

> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
> Umlauts
> 
>
> Key: COMPRESS-176
> URL: https://issues.apache.org/jira/browse/COMPRESS-176
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.3
> Environment: Windows 7
>Reporter: Wurstbrot mit Senf
> Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in 
> directories.
> I'm accessing a zip file created with WinZip containing a directory with an 
> umlaut ("ä") with ArchiveInputStream. When creating the zip file the 
> unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a 
> directory and the file names for the directory and all files contained in 
> that directory contain backslashes instead of slashes (i.e. completely 
> different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which 
> ArchiveInputStream to create or when using the ZipArchiveInputStream 
> constructor with the correct encoding (I've tried different encodings CP437, 
> CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed 
> by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-16) unable to extract a TAR file that contains an entry which is 10 GB in size

2012-02-21 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212741#comment-13212741
 ] 

Stefan Bodewig commented on COMPRESS-16:


Sorry, I saw COMPRESS-177 but am too busy to look into it right now.

The test in TarUtils only checks whether the most significant bit is set, it 
doesn't check the actual value of the first byte so 0xFF should be detected as 
well.  I haven't checked whether the end result is a negative number, though.

> unable to extract a TAR file that contains an entry which is 10 GB in size
> --
>
> Key: COMPRESS-16
> URL: https://issues.apache.org/jira/browse/COMPRESS-16
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
> Environment: I am using win xp sp3, but this should be platform 
> independent.
>Reporter: Sam Smith
> Fix For: 1.4
>
> Attachments: 
> 0001-Accept-GNU-tar-files-with-entries-over-8GB-in-size.patch, 
> 0002-Allow-creating-tar-archives-with-files-over-8GB.patch, 
> 0004-Prefer-octal-over-binary-size-representation.patch, ant-8GB-tar.patch, 
> patch-for-compress.txt
>
>
> I made a TAR file which contains a file entry where the file is 10 GB in size.
> When I attempt to extract the file using TarInputStream, it fails with the 
> following stack trace:
>   java.io.IOException: unexpected EOF with 24064 bytes unread
>   at 
> org.apache.commons.compress.archivers.tar.TarInputStream.read(TarInputStream.java:348)
>   at 
> org.apache.commons.compress.archivers.tar.TarInputStream.copyEntryContents(TarInputStream.java:388)
> So, TarInputStream does not seem to support large (> 8 GB?) files.
> Here is something else to note: I created that TAR file using TarOutputStream 
> , which did not complain when asked to write a 10 GB file into the TAR file, 
> so I assume that TarOutputStream has no file size limits?  That, or does it 
> silently create corrupted TAR files (which would be the worst situation of 
> all...)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-168) getName of ZipArchiveEntry

2011-12-22 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174900#comment-13174900
 ] 

Stefan Bodewig commented on COMPRESS-168:
-

Any news?

> getName of ZipArchiveEntry
> --
>
> Key: COMPRESS-168
> URL: https://issues.apache.org/jira/browse/COMPRESS-168
> Project: Commons Compress
>  Issue Type: Test
>  Components: Archivers
>Affects Versions: 1.2
> Environment: J2EE Environment with jdk 1.4
>Reporter: Pavithra Kumar
> Attachments: TestZip.zip
>
>
> getName method of ZipArchiveEntry is not giving arabic file names. Instead of 
> that it gives some chunked characters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-168) getName of ZipArchiveEntry

2011-12-14 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169412#comment-13169412
 ] 

Stefan Bodewig commented on COMPRESS-168:
-

Windows compressed folders facility seems to use the platform's native encoding 
when creating ZIPs.
See 

Do you get correct file names if you use something like

new ZipFile(zipname, ENCODING)

where ENCODING is whatever Java calls you platform's native encoding. I don't 
have any idea what that
would be and wouldn't recognize correct arabic characters either so I didn't 
try it myself on your
test archive.  One list I know of is 
http://docs.oracle.com/javase/1.5.0/docs/guide/intl/encoding.doc.html and 
ISO8859_6 or Cp1256 don't sound bad (but you probably know better than me).

> getName of ZipArchiveEntry
> --
>
> Key: COMPRESS-168
> URL: https://issues.apache.org/jira/browse/COMPRESS-168
> Project: Commons Compress
>  Issue Type: Test
>  Components: Archivers
>Affects Versions: 1.2
> Environment: J2EE Environment with jdk 1.4
>Reporter: Pavithra Kumar
> Attachments: TestZip.zip
>
>
> getName method of ZipArchiveEntry is not giving arabic file names. Instead of 
> that it gives some chunked characters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-168) getName of ZipArchiveEntry

2011-12-12 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167454#comment-13167454
 ] 

Stefan Bodewig commented on COMPRESS-168:
-

Where does the ZipArchiveE (do you create it yourself or read it from a ZipFile 
or
a ZipArchiveInputStream)?

If you have read it from somewhere, is there any chance you could provide a 
small
sample archive that doesn't work for you?


> getName of ZipArchiveEntry
> --
>
> Key: COMPRESS-168
> URL: https://issues.apache.org/jira/browse/COMPRESS-168
> Project: Commons Compress
>  Issue Type: Test
>  Components: Archivers
>Affects Versions: 1.2
> Environment: J2EE Environment with jdk 1.4
>Reporter: Pavithra Kumar
>
> getName method of ZipArchiveEntry is not giving arabic file names. Instead of 
> that it gives some chunked characters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-166) Support POSIX/Pax variant for long file names in tar

2011-12-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165297#comment-13165297
 ] 

Stefan Bodewig commented on COMPRESS-166:
-

Code and tests are in with svn revision 1211943 - just needs docs.

> Support POSIX/Pax variant for long file names in tar
> 
>
> Key: COMPRESS-166
> URL: https://issues.apache.org/jira/browse/COMPRESS-166
> Project: Commons Compress
>  Issue Type: Improvement
>  Components: Archivers
>Affects Versions: 1.3
>Reporter: Stefan Bodewig
>
> Once we add support for writing Pax headers for COMPRESS-165 it will be 
> pretty easy to support the same headers for long file names as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-165) Support writing entries > 8GiB in tar

2011-12-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165240#comment-13165240
 ] 

Stefan Bodewig commented on COMPRESS-165:
-

PAX headers are in with svn revision 1211931 - only need to add docs to resolve 
this.

> Support writing entries > 8GiB in tar
> -
>
> Key: COMPRESS-165
> URL: https://issues.apache.org/jira/browse/COMPRESS-165
> Project: Commons Compress
>  Issue Type: New Feature
>  Components: Archivers
>Affects Versions: 1.3
>Reporter: Stefan Bodewig
>
> We already parse PAX headers and the star/GNU tar/BSD tar dialects used for 
> big
> entries.
> Similar to the way we handle long file names there should be a user option to 
> chose
> between "error", "star" and "posix" or "pax".  "star" is Jörg Schilling's tar 
> which was the
> first one to use the binary size representation later adopted by GNU and BSD 
> tar as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-163) Unable to extract a file larger than 8GB from a Posix-format tar archive

2011-12-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165222#comment-13165222
 ] 

Stefan Bodewig commented on COMPRESS-163:
-

After applying John's other patches from COMPRESS-16 with 1211892 for 
COMPRESS-165 it now is the output stream that throws if a big file is added 
(and star extensions haven't been enabled).  So I've removed the adjustSize 
again and changed setSize as this patch suggested.

> Unable to extract a file larger than 8GB from a Posix-format tar archive
> 
>
> Key: COMPRESS-163
> URL: https://issues.apache.org/jira/browse/COMPRESS-163
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
> Environment: The tar archive used for testing was created by GNU tar, 
> but the problem will occur with any Posix-formatted tar file containing files 
> over 8GB in size.
>Reporter: John Kodis
>Priority: Minor
> Fix For: 1.4
>
> Attachments: 
> 0003-Allow-reading-large-files-from-Posix-tar-archives.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> An attempt to read a posix-format tar archive containing a file in excess of 
> 8^11 bytes in size will fail with a "Size out of range" illegal argument 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-165) Support writing entries > 8GiB in tar

2011-12-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165219#comment-13165219
 ] 

Stefan Bodewig commented on COMPRESS-165:
-

The star/GNU/BSD version has been added with svn revision 1211892

> Support writing entries > 8GiB in tar
> -
>
> Key: COMPRESS-165
> URL: https://issues.apache.org/jira/browse/COMPRESS-165
> Project: Commons Compress
>  Issue Type: New Feature
>  Components: Archivers
>Affects Versions: 1.3
>Reporter: Stefan Bodewig
>
> We already parse PAX headers and the star/GNU tar/BSD tar dialects used for 
> big
> entries.
> Similar to the way we handle long file names there should be a user option to 
> chose
> between "error", "star" and "posix" or "pax".  "star" is Jörg Schilling's tar 
> which was the
> first one to use the binary size representation later adopted by GNU and BSD 
> tar as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-165) Support writing entries > 8GiB in tar

2011-12-07 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164339#comment-13164339
 ] 

Stefan Bodewig commented on COMPRESS-165:
-

COMPRESS-16 contains patches that enable the star method of writing such 
entries.

> Support writing entries > 8GiB in tar
> -
>
> Key: COMPRESS-165
> URL: https://issues.apache.org/jira/browse/COMPRESS-165
> Project: Commons Compress
>  Issue Type: New Feature
>  Components: Archivers
>Affects Versions: 1.3
>Reporter: Stefan Bodewig
>
> We already parse PAX headers and the star/GNU tar/BSD tar dialects used for 
> big
> entries.
> Similar to the way we handle long file names there should be a user option to 
> chose
> between "error", "star" and "posix" or "pax".  "star" is Jörg Schilling's tar 
> which was the
> first one to use the binary size representation later adopted by GNU and BSD 
> tar as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-163) Unable to extract a file larger than 8GB from a Posix-format tar archive

2011-12-06 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163814#comment-13163814
 ] 

Stefan Bodewig commented on COMPRESS-163:
-

Sorry, I somehow missed the fact that we already had code that was parsing PAX 
headers.

I agree the code should be corrected.

> Unable to extract a file larger than 8GB from a Posix-format tar archive
> 
>
> Key: COMPRESS-163
> URL: https://issues.apache.org/jira/browse/COMPRESS-163
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
> Environment: The tar archive used for testing was created by GNU tar, 
> but the problem will occur with any Posix-formatted tar file containing files 
> over 8GB in size.
>Reporter: John Kodis
>Priority: Minor
> Fix For: 1.4
>
> Attachments: 
> 0003-Allow-reading-large-files-from-Posix-tar-archives.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> An attempt to read a posix-format tar archive containing a file in excess of 
> 8^11 bytes in size will fail with a "Size out of range" illegal argument 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-163) Unable to extract a file larger than 8GB from a Posix-format tar archive

2011-12-06 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163640#comment-13163640
 ] 

Stefan Bodewig commented on COMPRESS-163:
-

I may be missing it, but is the code that actaully reads the PAX header and 
applies the size read available as well?


> Unable to extract a file larger than 8GB from a Posix-format tar archive
> 
>
> Key: COMPRESS-163
> URL: https://issues.apache.org/jira/browse/COMPRESS-163
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
> Environment: The tar archive used for testing was created by GNU tar, 
> but the problem will occur with any Posix-formatted tar file containing files 
> over 8GB in size.
>Reporter: John Kodis
>Priority: Minor
> Fix For: 1.4
>
> Attachments: 
> 0003-Allow-reading-large-files-from-Posix-tar-archives.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> An attempt to read a posix-format tar archive containing a file in excess of 
> 8^11 bytes in size will fail with a "Size out of range" illegal argument 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-164) Cannot Read Winzip Archives With Unicode Extra Fields

2011-12-05 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162869#comment-13162869
 ] 

Stefan Bodewig commented on COMPRESS-164:
-

It was easier and cleaner to fix in Ant's code base where nobody cares for the 
order of entries from the central directory.

See svn revision 1210522

> Cannot Read Winzip Archives With Unicode Extra Fields
> -
>
> Key: COMPRESS-164
> URL: https://issues.apache.org/jira/browse/COMPRESS-164
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
>Affects Versions: 1.3
> Environment: Windows 7, Oracle JDK 6
>Reporter: Volker Leidl
> Fix For: 1.4
>
> Attachments: UTF8ZipFilesTest.patch, ZipFile.patch
>
>
> I have a zip file created with WinZip containing Unicode extra fields. Upon 
> attempting to extract it with 
> org.apache.commons.compress.archivers.zip.ZipFile, ZipFile.getInputStream() 
> returns null for ZipArchiveEntries previously retrieved with 
> ZipFile.getEntry() or even ZipFile.getEntries(). See UTF8ZipFilesTest.patch 
> in the attachments for a test case exposing the bug. The original test case 
> stopped short of trying to read the entries, that's why this wasn't flagged 
> up before. 
> The problem lies in the fact that inside ZipFile.java entries are stored in a 
> HashMap. However, at one point after populating the HashMap, the unicode 
> extra fields are read, which leads to a change of the ZipArchiveEntry name, 
> and therefore a change of its hash code. Because of this, subsequent gets on 
> the HashMap fail to retrieve the original values.
> ZipFile.patch contains an (admittedly simple-minded) fix for this problem by 
> reconstructing the entries HashMap after the Unicode extra fields have been 
> parsed. The purpose of this patch is mainly to show that the problem is 
> indeed what I think, rather than providing a well-designed solution.
> The patches have been tested against revision 1210416.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-16) unable to extract a TAR file that contains an entry which is 10 GB in size

2011-12-05 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162702#comment-13162702
 ] 

Stefan Bodewig commented on COMPRESS-16:


Read-support for the GNU version is in with svn revision 1210386

Before I look into write support I'll need to reshuffle a few things so we can
have testcases.  The current run-it Maven profile is not really sufficient as it
would mean you'd have to run all ZIP ITs as well.

> unable to extract a TAR file that contains an entry which is 10 GB in size
> --
>
> Key: COMPRESS-16
> URL: https://issues.apache.org/jira/browse/COMPRESS-16
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Archivers
> Environment: I am using win xp sp3, but this should be platform 
> independent.
>Reporter: Sam Smith
> Attachments: 
> 0001-Accept-GNU-tar-files-with-entries-over-8GB-in-size.patch, 
> 0002-Allow-creating-tar-archives-with-files-over-8GB.patch, 
> ant-8GB-tar.patch, patch-for-compress.txt
>
>
> I made a TAR file which contains a file entry where the file is 10 GB in size.
> When I attempt to extract the file using TarInputStream, it fails with the 
> following stack trace:
>   java.io.IOException: unexpected EOF with 24064 bytes unread
>   at 
> org.apache.commons.compress.archivers.tar.TarInputStream.read(TarInputStream.java:348)
>   at 
> org.apache.commons.compress.archivers.tar.TarInputStream.copyEntryContents(TarInputStream.java:388)
> So, TarInputStream does not seem to support large (> 8 GB?) files.
> Here is something else to note: I created that TAR file using TarOutputStream 
> , which did not complain when asked to write a 10 GB file into the TAR file, 
> so I assume that TarOutputStream has no file size limits?  That, or does it 
> silently create corrupted TAR files (which would be the worst situation of 
> all...)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-137) TarArchiveEntry.getFile() always returns null + no way to get an InputStream from TarArchiveInputStream similar to what you do with (java.util.zip.ZipFile())..getInpu

2011-11-25 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157177#comment-13157177
 ] 

Stefan Bodewig commented on COMPRESS-137:
-

You can wrap the stream in something like the BoundedInputStream found as 
nested class in ZipFile or more conveniently in commons-io.

> TarArchiveEntry.getFile() always returns null + no way to get an InputStream 
> from TarArchiveInputStream similar to what you do with 
> (java.util.zip.ZipFile())..getInputStream(ZipEntry);
> 
>
> Key: COMPRESS-137
> URL: https://issues.apache.org/jira/browse/COMPRESS-137
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.1
> Environment: $ uname -a
> Linux Microknoppix 2.6.31.6 #4 SMP PREEMPT Tue Nov 10 19:11:11 CET 2009 i686 
> GNU/Linux
> $ java -version
> java version "1.6.0_16"
> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
> Java HotSpot(TM) Client VM (build 14.2-b01, mixed mode, sharing)
> $ echo $CLASSPATH
> /media/sdb3/prjx/Java/JUtils:/media/sdb3/prjx/Java/JUtils/jars/commons-compress-1.1.jar:.
>Reporter: Albretch Mueller
>Assignee: Torsten Curdt
>Priority: Critical
>  Labels: zip_through_XMLReader
> Fix For: 1.1
>
>
> ~ 
>  this is a test run using httpd-2.2.19.tar[.gz,bz2] to show what I mean
> ~ 
>  lbrtchx
> ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 
> ~ ~ ~ 
> $ wget http://apache.cyberuse.com//httpd/httpd-2.2.19.tar.gz
> --2011-06-27 11:21:46--  http://apache.cyberuse.com//httpd/httpd-2.2.19.tar.gz
> Resolving apache.cyberuse.com... 174.132.149.89
> Connecting to apache.cyberuse.com|174.132.149.89|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 7113418 (6.8M) [application/x-gzip]
> Saving to: `httpd-2.2.19.tar.gz'
> 100%[===...===>] 7,113,418296K/s   in 24s 
> 2011-06-27 11:22:10 (285 KB/s) - `httpd-2.2.19.tar.gz' saved [7113418/7113418]
> $ wget http://apache.cyberuse.com//httpd/httpd-2.2.19.tar.bz2
> --2011-06-27 11:22:19--  
> http://apache.cyberuse.com//httpd/httpd-2.2.19.tar.bz2
> Resolving apache.cyberuse.com... 174.132.149.89
> Connecting to apache.cyberuse.com|174.132.149.89|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 5322082 (5.1M) [application/x-bzip2]
> Saving to: `httpd-2.2.19.tar.bz2'
> 100%[===...===>] 5,322,082256K/s   in 25s 
> 2011-06-27 11:22:44 (207 KB/s) - `httpd-2.2.19.tar.bz2' saved 
> [5322082/5322082]
> $ wget http://www.apache.org/dist/httpd/httpd-2.2.19.tar.gz.md5
> --2011-06-27 11:22:51--  
> http://www.apache.org/dist/httpd/httpd-2.2.19.tar.gz.md5
> Resolving www.apache.org... 140.211.11.131
> Connecting to www.apache.org|140.211.11.131|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 54 [text/plain]
> Saving to: `httpd-2.2.19.tar.gz.md5'
> 100%[===...===>] 54  --.-K/s   in 0s  
> 2011-06-27 11:22:51 (4.23 MB/s) - `httpd-2.2.19.tar.gz.md5' saved [54/54]
> $ wget http://www.apache.org/dist/httpd/httpd-2.2.19.tar.bz2.md5
> --2011-06-27 11:23:02--  
> http://www.apache.org/dist/httpd/httpd-2.2.19.tar.bz2.md5
> Resolving www.apache.org... 140.211.11.131
> Connecting to www.apache.org|140.211.11.131|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 55 [text/plain]
> Saving to: `httpd-2.2.19.tar.bz2.md5'
> 100%[===...===>] 55  --.-K/s   in 0s  
> 2011-06-27 11:23:02 (4.91 MB/s) - `httpd-2.2.19.tar.bz2.md5' saved [55/55]
> $ ls -l httpd-2.2.19.tar.*
> -rw-r--r-- 1 knoppix knoppix 5322082 May 21 18:58 httpd-2.2.19.tar.bz2
> -rw-r--r-- 1 knoppix knoppix  55 May 21 18:58 httpd-2.2.19.tar.bz2.md5
> -rw-r--r-- 1 knoppix knoppix 7113418 May 21 18:58 httpd-2.2.19.tar.gz
> -rw-r--r-- 1 knoppix knoppix  54 May 21 18:58 httpd-2.2.19.tar.gz.md5
> $ md5sum -b httpd-2.2.19.tar.bz2
> 832f96a6ec4b8fc7cf49b9efd4e89060 *httpd-2.2.19.tar.bz2
> $ md5sum -b httpd-2.2.19.tar.gz
> e9f5453e1e4d7aeb0e7ec7184c6784b5 *httpd-2.2.19.tar.gz
> $ cat *.md5
> 832f96a6ec4b8fc7cf49b9efd4e89060 *httpd-2.2.19.tar.bz2
> e9f5453e1e4d7aeb0e7ec7184c6784b5 *httpd-2.2.19.tar.gz
> ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 
> ~ ~ ~ 
> /*
>  TarArchiveEntry.getFile() always returns null + no way to get an InputStream 
> from TarArchiveInputStream similar to what you do with 
> (java.util.zip.ZipFile())..getInputStream(ZipEntry);
> */
> import org.apache.commons.compress.compressors.bzip2.*;
> import org.apache.commons.compress.compressors.gzip.*;
> import org.apache.commons.compress.archivers.tar.*;
> import org

[jira] [Commented] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

2011-11-09 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147139#comment-13147139
 ] 

Stefan Bodewig commented on COMPRESS-146:
-

I've updated the documentation with svn revision 1199823.

It would be good to have testcases with concatenated streams, I'll look into 
creating some.

Yes, I think we should change the defaults with 2.0.  Deprecations won't help 
in the light of our factory that people may use instead of using the 
constructors directly.  Adding a new flag to the factory method looks wrong 
since there are formats (pack200) supported by the factory that don't know 
anything about concatenated streams.

> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should 
> treat this as EOS
> 
>
> Key: COMPRESS-146
> URL: https://issues.apache.org/jira/browse/COMPRESS-146
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
> Environment: all
>Reporter: Dmitriy Smirnov
>Priority: Critical
>  Labels: 0x177245385090
> Fix For: 1.4
>
> Attachments: bzip2-concatenated.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should 
> treat this as EOS
> This error occurs mostly on large size files as sudden EOF somwere in the 
> middle of the file.
> An example of data from archived file:
> $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
> 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
> 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
> --
> 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
> 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
> --
> 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
> 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
> .
> Suggested solution:
> private void initBlock() throws IOException {
> char magic0 = bsGetUByte();
> char magic1 = bsGetUByte();
> char magic2 = bsGetUByte();
> char magic3 = bsGetUByte();
> char magic4 = bsGetUByte();
> char magic5 = bsGetUByte();
> if( magic0 == 0x17 && magic1 == 0x72 && magic2 == 0x45
> && magic3 == 0x38 && magic4 == 0x50 && magic5 == 0x90 ) 
>   
> {
>   if( complete() ) // end of file);
>   {
>   return;
>   } else
>   {
>   magic0 = bsGetUByte();
> magic1 = bsGetUByte();
> magic2 = bsGetUByte();
> magic3 = bsGetUByte();
> magic4 = bsGetUByte();
> magic5 = bsGetUByte();
>   }
> } 
> if (magic0 != 0x31 || // '1'
>magic1 != 0x41 || // 'A'
>magic2 != 0x59 || // 'Y'
>magic3 != 0x26 || // '&'
>magic4 != 0x53 || // 'S'
>magic5 != 0x59 // 'Y'
>) {
> this.currentState = EOF;
> throw new IOException("bad block header");
> } else {
> this.storedBlockCRC = bsGetInt();
> this.blockRandomised = bsR(1) == 1;
> /**
>  * Allocate data here instead in constructor, so we do not 
> allocate
>  * it if the input file is empty.
>  */
> if (this.data == null) {
> this.data = new Data(this.blockSize100k);
> }
> // currBlockNo++;
> getAndMoveToFrontDecode();
> this.crc.initialiseCRC();
> this.currentState = START_BLOCK_STATE;
> }
> }
> private boolean 
> complete() throws IOException 
> { 
>   boolean result = false;
> this.storedCombinedCRC = bsGetInt();
> try
> {
> if (in.available() == 0 ) 
> {
> throw new IOException( "EOF" );
> }
> checkMagicChar('B', "first");
> checkMagicChar('Z', "second");
> checkMagicChar('h', "third");
> int blockSize = this.in.read();
> if ((blockSize < '1') || (blockSize > '9')) {
> throw new IOException("Stream is not BZip2 formatted: illegal 
> "
>   + "blocksize " + (char) blockSize);
> }
> this.blockSize100k = blockSize - '0';
> this.bsLive = 0;
> this.bsBuff = 0;
> } catch( IOException e )
> {
>   this.currentState = EOF;
>   
>   result = true;
> }
> 
> this.data = null;
> if

[jira] [Commented] (COMPRESS-162) BZip2CompressorInputStream still stops after 900,000 decompressed bytes of large compressed file

2011-11-09 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147118#comment-13147118
 ] 

Stefan Bodewig commented on COMPRESS-162:
-

Thank you for checking.

> BZip2CompressorInputStream still stops after 900,000 decompressed bytes of 
> large compressed file
> 
>
> Key: COMPRESS-162
> URL: https://issues.apache.org/jira/browse/COMPRESS-162
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
> Environment: Linux (Fedora Cores 13 [2.6.34.9-69.fc13.i686.PAE] and 
> 15, at latest 'yum upgrade' as of 7 Nov 2011), Sun Java 1.6.0_22
>Reporter: Andrew Pavlin
> Fix For: 1.4
>
>
> Attempting to unzip the planet-110921.osm.bz2 file downloaded directly from 
> planet.OpenStreetMaps.org aborts after exactly 90 bytes are uncompressed. 
> The uncompressed content looks like valid XML, and causes my application's 
> parser to blow up with XML syntax errors due to missing closing tags. Tried 
> using the example code to just uncompress, and got the same exact behavior.
> Uncompressing the same file planet-110921.osm.bz2 (19357793489 bytes long 
> compressed) with the Linux bzip2 command-line utility 
> (bzip2-1.0.6-1.fc13.i686.rpm) succeeds and produces a valid (and enormous) 
> XML file that can be successfully parsed.
> Tried getting a subversion snapshot of the commons-compress trunk on 7 Nov 
> 2011 and replacing the org.apache.commons.compress.compressors.bzip2 package 
> in the commons-compress-1.3.jar with compiled code from the trunk (Subversion 
> log reported that the fix for COMPRESS-146 (?) was in). Still the same 
> failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-161) bzip2 decompression terminates after 900000 bytes

2011-11-09 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147117#comment-13147117
 ] 

Stefan Bodewig commented on COMPRESS-161:
-

Good to know trunk is actually buildable outside of my machine and our CI 
systems.

The "patches welcome" statement is some sort of standard response in OSS land, 
I didn't seriously expect you to provide a patch.  But I don't have the time or 
itch to scratch (or current knowledge TBH) to provide one myself right now 
either.

> bzip2 decompression terminates after 90 bytes
> -
>
> Key: COMPRESS-161
> URL: https://issues.apache.org/jira/browse/COMPRESS-161
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
> Environment: Windows7 64bit JDK7u1,2
>Reporter: Hans horn
>Priority: Critical
> Fix For: 1.4
>
> Attachments: INT1_aminey.inp.bz2
>
>
> bzip2 decompression terminates (w/o error) after 90 bytes
> try {
>   InputStream iin = new BZip2CompressorInputStream(new 
> FileInputStream(bzip2 compressed file that was uncompressed > 90 bytes in 
> size);
>   int data = iin.read();
>   while (data != -1) {
> System.out.print((char) data); ++nBytes;
> data = iin.read();
>   }
> } catch (IOException iox) { /**/ }
> System.out.println("#Bytes read " + nBytes);
> prints: #Bytes read 90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

2011-11-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146180#comment-13146180
 ] 

Stefan Bodewig commented on COMPRESS-146:
-

yes, we probably want all three formats to be consistent here.

I'm not sure what the danger of changing the default really would be, I 
vaguelly recall people complaining about GzipInputStream after JDK7 added 
support for concatenated streams (I may be totally wrong on this, though).

> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should 
> treat this as EOS
> 
>
> Key: COMPRESS-146
> URL: https://issues.apache.org/jira/browse/COMPRESS-146
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
> Environment: all
>Reporter: Dmitriy Smirnov
>Priority: Critical
>  Labels: 0x177245385090
> Fix For: 1.4
>
> Attachments: bzip2-concatenated.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should 
> treat this as EOS
> This error occurs mostly on large size files as sudden EOF somwere in the 
> middle of the file.
> An example of data from archived file:
> $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
> 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
> 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
> --
> 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
> 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
> --
> 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
> 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
> .
> Suggested solution:
> private void initBlock() throws IOException {
> char magic0 = bsGetUByte();
> char magic1 = bsGetUByte();
> char magic2 = bsGetUByte();
> char magic3 = bsGetUByte();
> char magic4 = bsGetUByte();
> char magic5 = bsGetUByte();
> if( magic0 == 0x17 && magic1 == 0x72 && magic2 == 0x45
> && magic3 == 0x38 && magic4 == 0x50 && magic5 == 0x90 ) 
>   
> {
>   if( complete() ) // end of file);
>   {
>   return;
>   } else
>   {
>   magic0 = bsGetUByte();
> magic1 = bsGetUByte();
> magic2 = bsGetUByte();
> magic3 = bsGetUByte();
> magic4 = bsGetUByte();
> magic5 = bsGetUByte();
>   }
> } 
> if (magic0 != 0x31 || // '1'
>magic1 != 0x41 || // 'A'
>magic2 != 0x59 || // 'Y'
>magic3 != 0x26 || // '&'
>magic4 != 0x53 || // 'S'
>magic5 != 0x59 // 'Y'
>) {
> this.currentState = EOF;
> throw new IOException("bad block header");
> } else {
> this.storedBlockCRC = bsGetInt();
> this.blockRandomised = bsR(1) == 1;
> /**
>  * Allocate data here instead in constructor, so we do not 
> allocate
>  * it if the input file is empty.
>  */
> if (this.data == null) {
> this.data = new Data(this.blockSize100k);
> }
> // currBlockNo++;
> getAndMoveToFrontDecode();
> this.crc.initialiseCRC();
> this.currentState = START_BLOCK_STATE;
> }
> }
> private boolean 
> complete() throws IOException 
> { 
>   boolean result = false;
> this.storedCombinedCRC = bsGetInt();
> try
> {
> if (in.available() == 0 ) 
> {
> throw new IOException( "EOF" );
> }
> checkMagicChar('B', "first");
> checkMagicChar('Z', "second");
> checkMagicChar('h', "third");
> int blockSize = this.in.read();
> if ((blockSize < '1') || (blockSize > '9')) {
> throw new IOException("Stream is not BZip2 formatted: illegal 
> "
>   + "blocksize " + (char) blockSize);
> }
> this.blockSize100k = blockSize - '0';
> this.bsLive = 0;
> this.bsBuff = 0;
> } catch( IOException e )
> {
>   this.currentState = EOF;
>   
>   result = true;
> }
> 
> this.data = null;
> if (this.storedCombinedCRC != this.computedCombinedCRC) {
> throw new IOException("BZip2 CRC error");
> }
> this.computedCombinedCRC = 0;
> return result;
> }

--

[jira] [Commented] (COMPRESS-161) bzip2 decompression terminates after 900000 bytes

2011-11-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146179#comment-13146179
 ] 

Stefan Bodewig commented on COMPRESS-161:
-

Hans, what kind of XZ problems did you have to work around?  trunk should be 
building out of the box.

WRT parallel bzip2 - no concrete plans, patches welcome ;-)

> bzip2 decompression terminates after 90 bytes
> -
>
> Key: COMPRESS-161
> URL: https://issues.apache.org/jira/browse/COMPRESS-161
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
> Environment: Windows7 64bit JDK7u1,2
>Reporter: Hans horn
>Priority: Critical
> Fix For: 1.4
>
> Attachments: INT1_aminey.inp.bz2
>
>
> bzip2 decompression terminates (w/o error) after 90 bytes
> try {
>   InputStream iin = new BZip2CompressorInputStream(new 
> FileInputStream(bzip2 compressed file that was uncompressed > 90 bytes in 
> size);
>   int data = iin.read();
>   while (data != -1) {
> System.out.print((char) data); ++nBytes;
> data = iin.read();
>   }
> } catch (IOException iox) { /**/ }
> System.out.println("#Bytes read " + nBytes);
> prints: #Bytes read 90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-162) BZip2CompressorInputStream still stops after 900,000 decompressed bytes of large compressed file

2011-11-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146178#comment-13146178
 ] 

Stefan Bodewig commented on COMPRESS-162:
-

Andrew, are you using the two-arg constructor for BZip2CompressorInputStream?  
Concatenated files are not supported by default but only when you ask for it.

> BZip2CompressorInputStream still stops after 900,000 decompressed bytes of 
> large compressed file
> 
>
> Key: COMPRESS-162
> URL: https://issues.apache.org/jira/browse/COMPRESS-162
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
> Environment: Linux (Fedora Cores 13 [2.6.34.9-69.fc13.i686.PAE] and 
> 15, at latest 'yum upgrade' as of 7 Nov 2011), Sun Java 1.6.0_22
>Reporter: Andrew Pavlin
>
> Attempting to unzip the planet-110921.osm.bz2 file downloaded directly from 
> planet.OpenStreetMaps.org aborts after exactly 90 bytes are uncompressed. 
> The uncompressed content looks like valid XML, and causes my application's 
> parser to blow up with XML syntax errors due to missing closing tags. Tried 
> using the example code to just uncompress, and got the same exact behavior.
> Uncompressing the same file planet-110921.osm.bz2 (19357793489 bytes long 
> compressed) with the Linux bzip2 command-line utility 
> (bzip2-1.0.6-1.fc13.i686.rpm) succeeds and produces a valid (and enormous) 
> XML file that can be successfully parsed.
> Tried getting a subversion snapshot of the commons-compress trunk on 7 Nov 
> 2011 and replacing the org.apache.commons.compress.compressors.bzip2 package 
> in the commons-compress-1.3.jar with compiled code from the trunk (Subversion 
> log reported that the fix for COMPRESS-146 (?) was in). Still the same 
> failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-161) bzip2 decompression terminates after 900000 bytes

2011-11-07 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145592#comment-13145592
 ] 

Stefan Bodewig commented on COMPRESS-161:
-

COMPRESS-146 has got a patch attached that I'll look into.  If it fixes the 
problem with your archive I'm going to commit it.

> bzip2 decompression terminates after 90 bytes
> -
>
> Key: COMPRESS-161
> URL: https://issues.apache.org/jira/browse/COMPRESS-161
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
> Environment: Windows7 64bit JDK7u1,2
>Reporter: Hans horn
>Priority: Critical
> Fix For: 1.3
>
> Attachments: INT1_aminey.inp.bz2
>
>
> bzip2 decompression terminates (w/o error) after 90 bytes
> try {
>   InputStream iin = new BZip2CompressorInputStream(new 
> FileInputStream(bzip2 compressed file that was uncompressed > 90 bytes in 
> size);
>   int data = iin.read();
>   while (data != -1) {
> System.out.print((char) data); ++nBytes;
> data = iin.read();
>   }
> } catch (IOException iox) { /**/ }
> System.out.println("#Bytes read " + nBytes);
> prints: #Bytes read 90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-111) support for lzma files

2011-11-07 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145564#comment-13145564
 ] 

Stefan Bodewig commented on COMPRESS-111:
-

I think support for the "old" lzma format would be beneficial for all people 
having to deal with legacy archives, so at least read-only support would be 
great.

Then again adding write support won't hurt either.  We could recommend people 
use XZ instead inside the docs, of course.


> support for lzma files
> --
>
> Key: COMPRESS-111
> URL: https://issues.apache.org/jira/browse/COMPRESS-111
> Project: Commons Compress
>  Issue Type: New Feature
>  Components: Compressors
>Affects Versions: 1.0
>Reporter: maurel jean francois
> Attachments: compress-trunk-lzmaRev0.patch, 
> compress-trunk-lzmaRev1.patch
>
>
> adding support for compressing and decompressing of files with LZMA algoritm 
> (Lempel-Ziv-Markov chain-Algorithm)
> (see 
> http://markmail.org/search/?q=list%3Aorg.apache.commons.users/#query:list%3Aorg.apache.commons.users%2F+page:1+mid:syn4uuvbzusevtko+state:results)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-161) bzip2 decompression terminates after 900000 bytes

2011-11-07 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145558#comment-13145558
 ] 

Stefan Bodewig commented on COMPRESS-161:
-

I think the attached file consists of multiple concatenated streams which 
Commons Compress doesn't support, yet.  COMPRESS-146

> bzip2 decompression terminates after 90 bytes
> -
>
> Key: COMPRESS-161
> URL: https://issues.apache.org/jira/browse/COMPRESS-161
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
> Environment: Windows7 64bit JDK7u1,2
>Reporter: Hans horn
>Priority: Critical
> Fix For: 1.3
>
> Attachments: INT1_aminey.inp.bz2
>
>
> bzip2 decompression terminates (w/o error) after 90 bytes
> try {
>   InputStream iin = new BZip2CompressorInputStream(new 
> FileInputStream(bzip2 compressed file that was uncompressed > 90 bytes in 
> size);
>   int data = iin.read();
>   while (data != -1) {
> System.out.print((char) data); ++nBytes;
> data = iin.read();
>   }
> } catch (IOException iox) { /**/ }
> System.out.println("#Bytes read " + nBytes);
> prints: #Bytes read 90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-161) bzip2 decompression terminates after 900000 bytes

2011-11-07 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13145550#comment-13145550
 ] 

Stefan Bodewig commented on COMPRESS-161:
-

I can confirm the limit with the file you have attached.  At the same time I 
can easily uncompress even larger files completely so there must be something 
specific to the archive you are using.

Any hints on how you have created it?

> bzip2 decompression terminates after 90 bytes
> -
>
> Key: COMPRESS-161
> URL: https://issues.apache.org/jira/browse/COMPRESS-161
> Project: Commons Compress
>  Issue Type: Bug
>  Components: Compressors
>Affects Versions: 1.3
> Environment: Windows7 64bit JDK7u1,2
>Reporter: Hans horn
>Priority: Critical
> Fix For: 1.3
>
> Attachments: INT1_aminey.inp.bz2
>
>
> bzip2 decompression terminates (w/o error) after 90 bytes
> try {
>   InputStream iin = new BZip2CompressorInputStream(new 
> FileInputStream(bzip2 compressed file that was uncompressed > 90 bytes in 
> size);
>   int data = iin.read();
>   while (data != -1) {
> System.out.print((char) data); ++nBytes;
> data = iin.read();
>   }
> } catch (IOException iox) { /**/ }
> System.out.println("#Bytes read " + nBytes);
> prints: #Bytes read 90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-156) XZ compression support

2011-11-04 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143845#comment-13143845
 ] 

Stefan Bodewig commented on COMPRESS-156:
-

1.3 has been released just three days ago, so 1.4 will take a while.  We'll 
wait to hear bug reports on the new features of 1.3.  No fixed date, as usual.

There have been about three months between 1.2 and 1.3 but almost a year 
between 1.1 and 1.2.

> XZ compression support
> --
>
> Key: COMPRESS-156
> URL: https://issues.apache.org/jira/browse/COMPRESS-156
> Project: Commons Compress
>  Issue Type: New Feature
>  Components: Compressors
>Reporter: Lasse Collin
> Fix For: 1.4
>
> Attachments: xz_support.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-156) XZ compression support

2011-11-02 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142247#comment-13142247
 ] 

Stefan Bodewig commented on COMPRESS-156:
-

Your patch is in as svn revision 1196665, thanks!

I'll have to add docs (including changes and adding Lasse as contributor) and 
at least one testcase but have to run now.

> XZ compression support
> --
>
> Key: COMPRESS-156
> URL: https://issues.apache.org/jira/browse/COMPRESS-156
> Project: Commons Compress
>  Issue Type: New Feature
>  Components: Compressors
>Reporter: Lasse Collin
> Fix For: 1.4
>
> Attachments: xz_support.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-36) Add Zip64 Suport

2011-10-18 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129805#comment-13129805
 ] 

Stefan Bodewig commented on COMPRESS-36:


As of svn revision 1185722 the zips needed for integration tests are part of a 
.tar.bz2 archive in svn trunk (which is about two MB of size and expands to 
more than 100 MB of highly redundant zips).

I'll remove the zips from my home dir on people.a.o shortly.

> Add Zip64 Suport
> 
>
> Key: COMPRESS-36
> URL: https://issues.apache.org/jira/browse/COMPRESS-36
> Project: Commons Compress
>  Issue Type: New Feature
>  Components: Archivers
>Reporter: Christian Grobmeier
>Assignee: Stefan Bodewig
> Fix For: 1.3
>
> Attachments: 5GB_of_Zeros.zip, 5GB_of_Zeros_7ZIP.zip, 
> 5GB_of_Zeros_PKZip.zip, 5GB_of_Zeros_WinZip.zip, 
> 5GB_of_Zeros_WindowsCompressedFolders.zip, 5GB_of_Zeros_jar.zip, 
> zip64-sample.zip
>
>
> Add Zip64 support. This will make it work to deal with zipfiles > 2 GB. 
> Planned for compress 1.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-158) Empty directories missing in zip archive

2011-10-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123628#comment-13123628
 ] 

Stefan Bodewig commented on COMPRESS-158:
-

Hi Daniel,

I'm completely unable to reproduce the problem.  Just to double check, I've 
downloaded and compiled your CompressionUtil class, I have downloaded and 
extracted the test.zip which creates a test directory, I've created and 
compiled the following trivial class

{code}
import org.clerezza.tools.backupfelixcache.CompressionUtil;
import java.io.File;

public class Driver {
public static void main(String[] args) throws Throwable {
CompressionUtil.zip(new File("test"), new File("output.zip"));
}
}
{code}

I'll attach the resulting output.zip created on Ubuntu 10.4 with 

{noformat}
stefan@birdy:~/Desktop/compress-158$ java -version
java version "1.6.0_20"
OpenJDK Runtime Environment (IcedTea6 1.9.9) (6b20-1.9.9-0ubuntu1~10.04.2)
OpenJDK Client VM (build 19.0-b09, mixed mode, sharing)
{noformat}

As you can see it contains everything your original test.zip contained as well.

> Empty directories missing in zip archive
> 
>
> Key: COMPRESS-158
> URL: https://issues.apache.org/jira/browse/COMPRESS-158
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.2
> Environment: Java 1.6, Ubuntu Linux (ext4 fs)
>Reporter: Daniel Spicar
>Priority: Minor
> Attachments: CompressionUtil.java, output.zip, test.zip
>
>
> When zipping a directory that contains several files and subdirectories of 
> which some can be empty, I am missing empty directories. When using a tar 
> archive format empty directories are present.
> I have found https://issues.apache.org/jira/browse/COMPRESS-105 which 
> describes a similar issue, however I am unable to reproduce the solution 
> suggested there. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira