[jira] [Commented] (COMPRESS-185) BZip2CompressorInputStream truncates files compressed with pbzip2

2012-03-29 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241235#comment-13241235
 ] 

Stefan Bodewig commented on COMPRESS-185:
-

Don't worry.

 BZip2CompressorInputStream truncates files compressed with pbzip2
 -

 Key: COMPRESS-185
 URL: https://issues.apache.org/jira/browse/COMPRESS-185
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.3
Reporter: Karsten Loesing
 Fix For: 1.4


 I'm using BZip2CompressorInputStream in Compress 1.3 to decompress a file 
 that was created with pbzip2 1.1.6 (http://compression.ca/pbzip2/).  The 
 stream ends early after 90 bytes, truncating the rest of the 
 pbzip2-compressed file.  Decompressing the file with bunzip2 or compressing 
 the original file with bzip2 both fix the issue.  I think both pbzip2 and 
 Compress are to blame here: pbzip2 apparently does something non-standard 
 when compressing files, and Compress should handle the non-standard format 
 rather than pretending to be done decompressing.  Another option is that I'm 
 doing something wrong; in that case please let me know! :)
 Here's how the problem can be reproduced:
  1. Generate a file that's 90+ bytes large: dd if=/dev/zero of=1mbfile 
 count=1 bs=1M
  2. Compress with pbzip2: pbzip2 1mbfile
  3. Decompress with Bunzip2 class below
  4. Notice how the resulting 1mbfile is 90 bytes large, not 1M.
 Now compare to using bunzip2/bzip2:
  - Do the steps above, but instead of 2, compress with bzip2: bzip2 1mbfile
  - Do the steps above, but instead of 3, decompress with bunzip2: bunzip2 
 1mbfile.bz2
 import java.io.*;
 import org.apache.commons.compress.compressors.bzip2.*;
 public class Bunzip2 {
   public static void main(String[] args) throws Exception {
 File inFile = new File(args[0]);
 File outFile = new File(args[0].substring(0, args[0].length() - 4));
 FileInputStream fis = new FileInputStream(inFile);
 BZip2CompressorInputStream bz2cis =
 new BZip2CompressorInputStream(fis);
 BufferedInputStream bis = new BufferedInputStream(bz2cis);
 BufferedOutputStream bos = new BufferedOutputStream(
 new FileOutputStream(outFile));
 int len;
 byte[] data = new byte[1024];
 while ((len = bis.read(data, 0, 1024)) = 0) {
   bos.write(data, 0, len);
 }   
 bos.close();
 bis.close();
   }
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-183) Support for de/encoding of tar entry names other than plain 8BIT conversion.

2012-03-23 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237447#comment-13237447
 ] 

Stefan Bodewig commented on COMPRESS-183:
-

I need to add comments and want to fix handling of linkName for tar entries 
that represent links but in general the code should be fixed with svn revision 
1304709

The tar package now uses the platform's native encoding by default (this may 
change to ISO-8859-1 before the release).  Encoding can be overridden inside 
the constructor.

The outputstream has an additional option that can be used to tell it to write 
non-ASCII file names to PAX extension headers, this should work for any modern 
implemenation of tar and is the only way to get portable archives - at the 
expense of an additional 512 bytes block.

The input stream will read and apply PAX extension headers transparently.

 Support for de/encoding of tar entry names other than plain 8BIT conversion.
 

 Key: COMPRESS-183
 URL: https://issues.apache.org/jira/browse/COMPRESS-183
 Project: Commons Compress
  Issue Type: Improvement
  Components: Archivers
Affects Versions: 1.3
Reporter: Joao Schim
  Labels: patch
 Fix For: 1.4

 Attachments: patch-tar-name-encoding.diff, 
 patch-tar-name-encoding.diff, patch-tar-name-encoding.diff


 The names of tar entries are currently encoded/decoded by means of plain 8bit 
 conversions of byte to char and vice-versa. This prohibits the use of 
 encodings like UTF8 in the file names. Whether the use of UTF8 (or any other 
 non ASCII) in file names is sensible is a chapter of its own. However tar 
 archives that contain files which names have been encoded with UTF8 do float 
 around. These files currently can not be read correctly by commons-compress 
 due to the encoding being hardcoded to plain 8BIT only. 
 The supplied patch allows to use encodings other than 8BIT using a 
 TarArchiveCodec structure. It does not change the standard functionality, but 
 adds to it the possibility of using a different encoding. 
 A method was added to the TarUtilsTest junit test to test the added 
 functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-183) Support for de/encoding of tar entry names other than plain 8BIT conversion.

2012-03-17 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231866#comment-13231866
 ] 

Stefan Bodewig commented on COMPRESS-183:
-

The zip package already contains code that is similar to the codec in your 
patch, I'll look into reusing that.

Modern (POSIX) tars support non-ASCII encodings via PAX extension headers, 
which current trunk already supports on the reading side - it shouldn't be too 
hard for the writing side.

 Support for de/encoding of tar entry names other than plain 8BIT conversion.
 

 Key: COMPRESS-183
 URL: https://issues.apache.org/jira/browse/COMPRESS-183
 Project: Commons Compress
  Issue Type: Improvement
  Components: Archivers
Affects Versions: 1.3
Reporter: Joao Schim
  Labels: patch
 Fix For: 1.4

 Attachments: patch-tar-name-encoding.diff, 
 patch-tar-name-encoding.diff, patch-tar-name-encoding.diff


 The names of tar entries are currently encoded/decoded by means of plain 8bit 
 conversions of byte to char and vice-versa. This prohibits the use of 
 encodings like UTF8 in the file names. Whether the use of UTF8 (or any other 
 non ASCII) in file names is sensible is a chapter of its own. However tar 
 archives that contain files which names have been encoded with UTF8 do float 
 around. These files currently can not be read correctly by commons-compress 
 due to the encoding being hardcoded to plain 8BIT only. 
 The supplied patch allows to use encodings other than 8BIT using a 
 TarArchiveCodec structure. It does not change the standard functionality, but 
 adds to it the possibility of using a different encoding. 
 A method was added to the TarUtilsTest junit test to test the added 
 functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-182) Support big or even negative numbers in all numeric TAR headers

2012-03-05 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13222998#comment-13222998
 ] 

Stefan Bodewig commented on COMPRESS-182:
-

Write support is in with svn revision 1297339

We need a new name for setBigFileMode - setBigNumberMode?

Other than that, I need to update the docs and cover the devMajor/devMinor 
headers for PAX as well before this issue can be closed.

 Support big or even negative numbers in all numeric TAR headers
 ---

 Key: COMPRESS-182
 URL: https://issues.apache.org/jira/browse/COMPRESS-182
 Project: Commons Compress
  Issue Type: Improvement
  Components: Archivers
Affects Versions: 1.3
Reporter: Stefan Bodewig
 Fix For: 1.4


 This is a superset of the functionality that addressed COMPRESS-175
 Jörg Schillig's star and GNU tar may use binary encoding for all numeric 
 fields, PAX/POSIX also provides them inside the extension headers.
 The timestamp field may even contain negative numbers.
 IMHO Commons Compress should:
 * be able to parse numeric fields using binary encoding (positive and 
 negative)
 * fix the current binary parser (see discussion in COMPRESS-16) and add a 
 workaround for broken writers (see COMPRESS-181)
 * be able to parse all standard fields of PAX headers, including the numeric 
 ones (I haven't checked, maybe it already does)
 * have an option to write numbers too big/small in binary encoding much like 
 BIGFILE_STAR does for the file size in trunk
 * have an option to write numbers too big/small in PAX headers much like 
 BIGFILE_POSIX does for the file size in trunk
 * replace bigFileMode and the constants with a more generic property that 
 controls all numeric fields.  We can remove the bigFileMode stuff as it has 
 been added after the 1.3 release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-182) Support big or even negative numbers in all numeric TAR headers

2012-03-03 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221821#comment-13221821
 ] 

Stefan Bodewig commented on COMPRESS-182:
-

Read support should be complete with svn revision 1296764

 Support big or even negative numbers in all numeric TAR headers
 ---

 Key: COMPRESS-182
 URL: https://issues.apache.org/jira/browse/COMPRESS-182
 Project: Commons Compress
  Issue Type: Improvement
  Components: Archivers
Affects Versions: 1.3
Reporter: Stefan Bodewig
 Fix For: 1.4


 This is a superset of the functionality that addressed COMPRESS-175
 Jörg Schillig's star and GNU tar may use binary encoding for all numeric 
 fields, PAX/POSIX also provides them inside the extension headers.
 The timestamp field may even contain negative numbers.
 IMHO Commons Compress should:
 * be able to parse numeric fields using binary encoding (positive and 
 negative)
 * fix the current binary parser (see discussion in COMPRESS-16) and add a 
 workaround for broken writers (see COMPRESS-181)
 * be able to parse all standard fields of PAX headers, including the numeric 
 ones (I haven't checked, maybe it already does)
 * have an option to write numbers too big/small in binary encoding much like 
 BIGFILE_STAR does for the file size in trunk
 * have an option to write numbers too big/small in PAX headers much like 
 BIGFILE_POSIX does for the file size in trunk
 * replace bigFileMode and the constants with a more generic property that 
 controls all numeric fields.  We can remove the bigFileMode stuff as it has 
 been added after the 1.3 release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-181) Tar files created by AIX native tar, and which contain symlinks, cannot be read by TarArchiveInputStream

2012-03-01 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219982#comment-13219982
 ] 

Stefan Bodewig commented on COMPRESS-181:
-

Robert, could you delete and re-add the attachment, granting the ASF a license 
to include it this time?  That way we could add the tar to our testsuite.

 Tar files created by AIX native tar, and which contain symlinks, cannot be 
 read by TarArchiveInputStream
 

 Key: COMPRESS-181
 URL: https://issues.apache.org/jira/browse/COMPRESS-181
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.2, 1.3, 1.4
 Environment: AIX 5.3
Reporter: Robert Clark
 Attachments: simple-aix-native-tar.tar


 A simple tar file created on AIX using the native ({{/usr/bin/tar}} tar 
 utility) *and* which contains a symbolic link, cannot be loaded by 
 TarArchiveInputStream:
 {noformat}
 java.io.IOException: Error detected parsing the header
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:201)
   at Extractor.extract(Extractor.java:13)
   at Extractor.main(Extractor.java:28)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.tools.ant.taskdefs.ExecuteJava.run(ExecuteJava.java:217)
   at 
 org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:152)
   at org.apache.tools.ant.taskdefs.Java.run(Java.java:771)
   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:221)
   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:135)
   at org.apache.tools.ant.taskdefs.Java.execute(Java.java:108)
   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
   at org.apache.tools.ant.Task.perform(Task.java:348)
   at org.apache.tools.ant.Target.execute(Target.java:390)
   at org.apache.tools.ant.Target.performTasks(Target.java:411)
   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
   at 
 org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
   at org.apache.tools.ant.Main.runBuild(Main.java:809)
   at org.apache.tools.ant.Main.startAnt(Main.java:217)
   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
 Caused by: java.lang.IllegalArgumentException: Invalid byte 0 at offset 0 in 
 '{NUL}1722000726 ' len=12
   at 
 org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:99)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:819)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveEntry.init(TarArchiveEntry.java:314)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:199)
   ... 29 more
 {noformat}
 Tested with 1.2 and the 1.4 nightly build from Feb 23 
 ({{Implementation-Build: trunk@r1292625; 2012-02-23 03:20:30+}})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-181) Tar files created by AIX native tar, and which contain symlinks, cannot be read by TarArchiveInputStream

2012-02-28 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218073#comment-13218073
 ] 

Stefan Bodewig commented on COMPRESS-181:
-

GNU tar extracts it with a date/time of 1978-02-15 08:55 - which more or less 
looks as if it had translated the leading null to an ASCII 0 (and it looks as 
if that was supposed to be an ASCII 1 to match the timestamp of the dir).



 Tar files created by AIX native tar, and which contain symlinks, cannot be 
 read by TarArchiveInputStream
 

 Key: COMPRESS-181
 URL: https://issues.apache.org/jira/browse/COMPRESS-181
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.2, 1.3, 1.4
 Environment: AIX 5.3
Reporter: Robert Clark
 Attachments: simple-aix-native-tar.tar


 A simple tar file created on AIX using the native ({{/usr/bin/tar}} tar 
 utility) *and* which contains a symbolic link, cannot be loaded by 
 TarArchiveInputStream:
 {noformat}
 java.io.IOException: Error detected parsing the header
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:201)
   at Extractor.extract(Extractor.java:13)
   at Extractor.main(Extractor.java:28)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.tools.ant.taskdefs.ExecuteJava.run(ExecuteJava.java:217)
   at 
 org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:152)
   at org.apache.tools.ant.taskdefs.Java.run(Java.java:771)
   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:221)
   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:135)
   at org.apache.tools.ant.taskdefs.Java.execute(Java.java:108)
   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
   at org.apache.tools.ant.Task.perform(Task.java:348)
   at org.apache.tools.ant.Target.execute(Target.java:390)
   at org.apache.tools.ant.Target.performTasks(Target.java:411)
   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
   at 
 org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
   at org.apache.tools.ant.Main.runBuild(Main.java:809)
   at org.apache.tools.ant.Main.startAnt(Main.java:217)
   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
 Caused by: java.lang.IllegalArgumentException: Invalid byte 0 at offset 0 in 
 '{NUL}1722000726 ' len=12
   at 
 org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:99)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:819)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveEntry.init(TarArchiveEntry.java:314)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:199)
   ... 29 more
 {noformat}
 Tested with 1.2 and the 1.4 nightly build from Feb 23 
 ({{Implementation-Build: trunk@r1292625; 2012-02-23 03:20:30+}})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-16) unable to extract a TAR file that contains an entry which is 10 GB in size

2012-02-28 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218088#comment-13218088
 ] 

Stefan Bodewig commented on COMPRESS-16:


Our code is wrong.

to_chars in src/create.c in GNU tar only uses the remaining bytes and sets the 
first one to 255 or 128 for negative/positive numbers.  Negative numbers only 
occur in time fields where we don't support anything non-octal ATM anyway, so 
this isn't a real problem right now.  It becomes one if we support star/GNU 
tar/POSIX dialects for the other numeric fields as well.  This would be 
required for COMPRESS-177.

I suggest to broaden and reopen COMPRESS-177 to something like extend 
STAR/POSIX support to all numeric fields or alternatively create a new issue 
and close this one again.

 unable to extract a TAR file that contains an entry which is 10 GB in size
 --

 Key: COMPRESS-16
 URL: https://issues.apache.org/jira/browse/COMPRESS-16
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
 Environment: I am using win xp sp3, but this should be platform 
 independent.
Reporter: Sam Smith
 Fix For: 1.4

 Attachments: 
 0001-Accept-GNU-tar-files-with-entries-over-8GB-in-size.patch, 
 0002-Allow-creating-tar-archives-with-files-over-8GB.patch, 
 0004-Prefer-octal-over-binary-size-representation.patch, ant-8GB-tar.patch, 
 patch-for-compress.txt


 I made a TAR file which contains a file entry where the file is 10 GB in size.
 When I attempt to extract the file using TarInputStream, it fails with the 
 following stack trace:
   java.io.IOException: unexpected EOF with 24064 bytes unread
   at 
 org.apache.commons.compress.archivers.tar.TarInputStream.read(TarInputStream.java:348)
   at 
 org.apache.commons.compress.archivers.tar.TarInputStream.copyEntryContents(TarInputStream.java:388)
 So, TarInputStream does not seem to support large ( 8 GB?) files.
 Here is something else to note: I created that TAR file using TarOutputStream 
 , which did not complain when asked to write a 10 GB file into the TAR file, 
 so I assume that TarOutputStream has no file size limits?  That, or does it 
 silently create corrupted TAR files (which would be the worst situation of 
 all...)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-181) Tar files created by AIX native tar, and which contain symlinks, cannot be read by TarArchiveInputStream

2012-02-28 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218093#comment-13218093
 ] 

Stefan Bodewig commented on COMPRESS-181:
-

GNU tar from_header in list.c contains a workaround for this case:

  /* Accommodate buggy tar of unknown vintage, which outputs leading
 NUL if the previous field overflows.  */
  where += !*where;

this basically skips the first byte if it is a binary 0.

 Tar files created by AIX native tar, and which contain symlinks, cannot be 
 read by TarArchiveInputStream
 

 Key: COMPRESS-181
 URL: https://issues.apache.org/jira/browse/COMPRESS-181
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.2, 1.3, 1.4
 Environment: AIX 5.3
Reporter: Robert Clark
 Attachments: simple-aix-native-tar.tar


 A simple tar file created on AIX using the native ({{/usr/bin/tar}} tar 
 utility) *and* which contains a symbolic link, cannot be loaded by 
 TarArchiveInputStream:
 {noformat}
 java.io.IOException: Error detected parsing the header
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:201)
   at Extractor.extract(Extractor.java:13)
   at Extractor.main(Extractor.java:28)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.tools.ant.taskdefs.ExecuteJava.run(ExecuteJava.java:217)
   at 
 org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:152)
   at org.apache.tools.ant.taskdefs.Java.run(Java.java:771)
   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:221)
   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:135)
   at org.apache.tools.ant.taskdefs.Java.execute(Java.java:108)
   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
   at org.apache.tools.ant.Task.perform(Task.java:348)
   at org.apache.tools.ant.Target.execute(Target.java:390)
   at org.apache.tools.ant.Target.performTasks(Target.java:411)
   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
   at 
 org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
   at org.apache.tools.ant.Main.runBuild(Main.java:809)
   at org.apache.tools.ant.Main.startAnt(Main.java:217)
   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
 Caused by: java.lang.IllegalArgumentException: Invalid byte 0 at offset 0 in 
 '{NUL}1722000726 ' len=12
   at 
 org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:99)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:819)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveEntry.init(TarArchiveEntry.java:314)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:199)
   ... 29 more
 {noformat}
 Tested with 1.2 and the 1.4 nightly build from Feb 23 
 ({{Implementation-Build: trunk@r1292625; 2012-02-23 03:20:30+}})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-181) Tar files created by AIX native tar, and which contain symlinks, cannot be read by TarArchiveInputStream

2012-02-28 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218198#comment-13218198
 ] 

Stefan Bodewig commented on COMPRESS-181:
-

It doesn't look like an overflow was the reason but if you look at the 
timestamp it certainly reads as if the first byte was a binary 0 by accident 
(if you put an ASCII 1 in there it is identical to the timestamp of the 
directory).

In any case the resulting timestamp is not what it used to be, so using any 
other timestamp would be as valid as trying to parse the rest.

 Tar files created by AIX native tar, and which contain symlinks, cannot be 
 read by TarArchiveInputStream
 

 Key: COMPRESS-181
 URL: https://issues.apache.org/jira/browse/COMPRESS-181
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.2, 1.3, 1.4
 Environment: AIX 5.3
Reporter: Robert Clark
 Attachments: simple-aix-native-tar.tar


 A simple tar file created on AIX using the native ({{/usr/bin/tar}} tar 
 utility) *and* which contains a symbolic link, cannot be loaded by 
 TarArchiveInputStream:
 {noformat}
 java.io.IOException: Error detected parsing the header
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:201)
   at Extractor.extract(Extractor.java:13)
   at Extractor.main(Extractor.java:28)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.tools.ant.taskdefs.ExecuteJava.run(ExecuteJava.java:217)
   at 
 org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:152)
   at org.apache.tools.ant.taskdefs.Java.run(Java.java:771)
   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:221)
   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:135)
   at org.apache.tools.ant.taskdefs.Java.execute(Java.java:108)
   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
   at org.apache.tools.ant.Task.perform(Task.java:348)
   at org.apache.tools.ant.Target.execute(Target.java:390)
   at org.apache.tools.ant.Target.performTasks(Target.java:411)
   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
   at 
 org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
   at org.apache.tools.ant.Main.runBuild(Main.java:809)
   at org.apache.tools.ant.Main.startAnt(Main.java:217)
   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
 Caused by: java.lang.IllegalArgumentException: Invalid byte 0 at offset 0 in 
 '{NUL}1722000726 ' len=12
   at 
 org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:99)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:819)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveEntry.init(TarArchiveEntry.java:314)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:199)
   ... 29 more
 {noformat}
 Tested with 1.2 and the 1.4 nightly build from Feb 23 
 ({{Implementation-Build: trunk@r1292625; 2012-02-23 03:20:30+}})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-181) Tar files created by AIX native tar, and which contain symlinks, cannot be read by TarArchiveInputStream

2012-02-28 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218245#comment-13218245
 ] 

Stefan Bodewig commented on COMPRESS-181:
-

We don't really have an option to ignore a timestamp unless we allow 
ArchiveEntry#getLastModifiedDate to return null.

What I was trying to say is it doesn't matter much which timestamp we return as 
any choice is wrong.  Returning the equivalent of a 0 timestamp is fine with 
me.  Unfortunately we don't have an infrastructure for warnings (would have 
been good for COMPRESS-176 as well), something for an API redesign in 2.0, I 
guess.

 Tar files created by AIX native tar, and which contain symlinks, cannot be 
 read by TarArchiveInputStream
 

 Key: COMPRESS-181
 URL: https://issues.apache.org/jira/browse/COMPRESS-181
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.2, 1.3, 1.4
 Environment: AIX 5.3
Reporter: Robert Clark
 Attachments: simple-aix-native-tar.tar


 A simple tar file created on AIX using the native ({{/usr/bin/tar}} tar 
 utility) *and* which contains a symbolic link, cannot be loaded by 
 TarArchiveInputStream:
 {noformat}
 java.io.IOException: Error detected parsing the header
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:201)
   at Extractor.extract(Extractor.java:13)
   at Extractor.main(Extractor.java:28)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.tools.ant.taskdefs.ExecuteJava.run(ExecuteJava.java:217)
   at 
 org.apache.tools.ant.taskdefs.ExecuteJava.execute(ExecuteJava.java:152)
   at org.apache.tools.ant.taskdefs.Java.run(Java.java:771)
   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:221)
   at org.apache.tools.ant.taskdefs.Java.executeJava(Java.java:135)
   at org.apache.tools.ant.taskdefs.Java.execute(Java.java:108)
   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
   at org.apache.tools.ant.Task.perform(Task.java:348)
   at org.apache.tools.ant.Target.execute(Target.java:390)
   at org.apache.tools.ant.Target.performTasks(Target.java:411)
   at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399)
   at org.apache.tools.ant.Project.executeTarget(Project.java:1368)
   at 
 org.apache.tools.ant.helper.DefaultExecutor.executeTargets(DefaultExecutor.java:41)
   at org.apache.tools.ant.Project.executeTargets(Project.java:1251)
   at org.apache.tools.ant.Main.runBuild(Main.java:809)
   at org.apache.tools.ant.Main.startAnt(Main.java:217)
   at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280)
   at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109)
 Caused by: java.lang.IllegalArgumentException: Invalid byte 0 at offset 0 in 
 '{NUL}1722000726 ' len=12
   at 
 org.apache.commons.compress.archivers.tar.TarUtils.parseOctal(TarUtils.java:99)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveEntry.parseTarHeader(TarArchiveEntry.java:819)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveEntry.init(TarArchiveEntry.java:314)
   at 
 org.apache.commons.compress.archivers.tar.TarArchiveInputStream.getNextTarEntry(TarArchiveInputStream.java:199)
   ... 29 more
 {noformat}
 Tested with 1.2 and the 1.4 nightly build from Feb 23 
 ({{Implementation-Build: trunk@r1292625; 2012-02-23 03:20:30+}})

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-27 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217122#comment-13217122
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

Whether we need forward slashes in Unicode extra fields can only be answered by 
somebody using WinZIP.  The best would be creating a test archive with a 
directory that contains a character in its name that is not part of CP437 - and 
to be safe not part of the platform's default encoding either.

 ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
 Umlauts
 

 Key: COMPRESS-176
 URL: https://issues.apache.org/jira/browse/COMPRESS-176
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.3
 Environment: Windows 7
Reporter: Wurstbrot mit Senf
Assignee: Stefan Bodewig
 Fix For: 1.4

 Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, 
 testzap-winzip.zip


 There is a problem when handling a WinZip-created zip with Umlauts in 
 directories.
 I'm accessing a zip file created with WinZip containing a directory with an 
 umlaut (ä) with ArchiveInputStream. When creating the zip file the 
 unicode-flag of winzip had been active.
 The following problem occurs when accessing the entries of the zip:
 the ArchiveEntry for a directory containing an umlaut is not marked as a 
 directory and the file names for the directory and all files contained in 
 that directory contain backslashes instead of slashes (i.e. completely 
 different to all other files in directories with no umlaut in their path).
 There is no difference when letting the ArchiveStreamFactory decide which 
 ArchiveInputStream to create or when using the ZipArchiveInputStream 
 constructor with the correct encoding (I've tried different encodings CP437, 
 CP850, ISO-8859-15, but still the problem persisted).
 This problem does not occur when using the very same zip file but compressed 
 by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-16) unable to extract a TAR file that contains an entry which is 10 GB in size

2012-02-27 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217123#comment-13217123
 ] 

Stefan Bodewig commented on COMPRESS-16:


I plan to look up what the GNU tar implementation does, may take a few days, 
though.

I agree with Gili this issue has by now outgrown COMPRESS-16 as it specifically 
only dealt with lengths.  OTOH it is not restricted to star either, if we start 
supporting bigger numbers for group or date, we should support star as well 
as PAX.

 unable to extract a TAR file that contains an entry which is 10 GB in size
 --

 Key: COMPRESS-16
 URL: https://issues.apache.org/jira/browse/COMPRESS-16
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
 Environment: I am using win xp sp3, but this should be platform 
 independent.
Reporter: Sam Smith
 Fix For: 1.4

 Attachments: 
 0001-Accept-GNU-tar-files-with-entries-over-8GB-in-size.patch, 
 0002-Allow-creating-tar-archives-with-files-over-8GB.patch, 
 0004-Prefer-octal-over-binary-size-representation.patch, ant-8GB-tar.patch, 
 patch-for-compress.txt


 I made a TAR file which contains a file entry where the file is 10 GB in size.
 When I attempt to extract the file using TarInputStream, it fails with the 
 following stack trace:
   java.io.IOException: unexpected EOF with 24064 bytes unread
   at 
 org.apache.commons.compress.archivers.tar.TarInputStream.read(TarInputStream.java:348)
   at 
 org.apache.commons.compress.archivers.tar.TarInputStream.copyEntryContents(TarInputStream.java:388)
 So, TarInputStream does not seem to support large ( 8 GB?) files.
 Here is something else to note: I created that TAR file using TarOutputStream 
 , which did not complain when asked to write a 10 GB file into the TAR file, 
 so I assume that TarOutputStream has no file size limits?  That, or does it 
 silently create corrupted TAR files (which would be the worst situation of 
 all...)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-27 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13217899#comment-13217899
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

Workaround and tests are in svn revision 1294460

I'll look into creating a test archive for the opposite direction today.

 ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
 Umlauts
 

 Key: COMPRESS-176
 URL: https://issues.apache.org/jira/browse/COMPRESS-176
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.3
 Environment: Windows 7
Reporter: Wurstbrot mit Senf
Assignee: Stefan Bodewig
 Fix For: 1.4

 Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, 
 testzap-winzip.zip


 There is a problem when handling a WinZip-created zip with Umlauts in 
 directories.
 I'm accessing a zip file created with WinZip containing a directory with an 
 umlaut (ä) with ArchiveInputStream. When creating the zip file the 
 unicode-flag of winzip had been active.
 The following problem occurs when accessing the entries of the zip:
 the ArchiveEntry for a directory containing an umlaut is not marked as a 
 directory and the file names for the directory and all files contained in 
 that directory contain backslashes instead of slashes (i.e. completely 
 different to all other files in directories with no umlaut in their path).
 There is no difference when letting the ArchiveStreamFactory decide which 
 ArchiveInputStream to create or when using the ZipArchiveInputStream 
 constructor with the correct encoding (I've tried different encodings CP437, 
 CP850, ISO-8859-15, but still the problem persisted).
 This problem does not occur when using the very same zip file but compressed 
 by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-25 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216446#comment-13216446
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

AFAIK what we have written down based on findings by Wolfgang Glas in 
http://commons.apache.org/compress/zip.html still stands, WinZIP is the only 
one using Unicode extra fields, all other implementations have switched to the 
language encoding flag.  The only exceptions are Windows compressed folders - 
which doesn't understand either - and InfoZIP based tools if they are compiled 
to use the extra fields.

A question to the original reporter (I'm German so I know the name's a fake 
8-): since you also have an installation of 7zip, what does 7zip think of your 
WinZIP created archive?

 ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
 Umlauts
 

 Key: COMPRESS-176
 URL: https://issues.apache.org/jira/browse/COMPRESS-176
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.3
 Environment: Windows 7
Reporter: Wurstbrot mit Senf
 Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip


 There is a problem when handling a WinZip-created zip with Umlauts in 
 directories.
 I'm accessing a zip file created with WinZip containing a directory with an 
 umlaut (ä) with ArchiveInputStream. When creating the zip file the 
 unicode-flag of winzip had been active.
 The following problem occurs when accessing the entries of the zip:
 the ArchiveEntry for a directory containing an umlaut is not marked as a 
 directory and the file names for the directory and all files contained in 
 that directory contain backslashes instead of slashes (i.e. completely 
 different to all other files in directories with no umlaut in their path).
 There is no difference when letting the ArchiveStreamFactory decide which 
 ArchiveInputStream to create or when using the ZipArchiveInputStream 
 constructor with the correct encoding (I've tried different encodings CP437, 
 CP850, ISO-8859-15, but still the problem persisted).
 This problem does not occur when using the very same zip file but compressed 
 by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-25 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216647#comment-13216647
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

OK, this means nobody except for Commons Compress and InfoZIP tools seems to 
read the Unicode extra field.

This is what I get when trying to extract the original ZIP on Linux:

{noformat}
stefan@birdy:~/Desktop$ unzip test-winzip.zip 
Archive:  test-winzip.zip
  inflating: doc.txt.gz  
 extracting: doc2.txt
warning:  test-winzip.zip appears to use backslashes as path separators
   creating: ??/
  inflating: ??/??zip.zip
 extracting: ??/??.txt  
{noformat}

and it creates an ä directory.  I'll try to look through InfoZIPs sources 
what it bases it heuristics on, maybe we can use the same in Commons Compress 
to turn backslashes into slashes.


 ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
 Umlauts
 

 Key: COMPRESS-176
 URL: https://issues.apache.org/jira/browse/COMPRESS-176
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.3
 Environment: Windows 7
Reporter: Wurstbrot mit Senf
 Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, 
 testzap-winzip.zip


 There is a problem when handling a WinZip-created zip with Umlauts in 
 directories.
 I'm accessing a zip file created with WinZip containing a directory with an 
 umlaut (ä) with ArchiveInputStream. When creating the zip file the 
 unicode-flag of winzip had been active.
 The following problem occurs when accessing the entries of the zip:
 the ArchiveEntry for a directory containing an umlaut is not marked as a 
 directory and the file names for the directory and all files contained in 
 that directory contain backslashes instead of slashes (i.e. completely 
 different to all other files in directories with no umlaut in their path).
 There is no difference when letting the ArchiveStreamFactory decide which 
 ArchiveInputStream to create or when using the ZipArchiveInputStream 
 constructor with the correct encoding (I've tried different encodings CP437, 
 CP850, ISO-8859-15, but still the problem persisted).
 This problem does not occur when using the very same zip file but compressed 
 by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-25 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216652#comment-13216652
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

In extract.c of unzip60 line 1310ff there is this code that replaces 
backslashes with slashes.  It only replaces them in names that don't contain 
forward slashes (MBSCHR looks up a character in a character array) and only if 
hostnum indicates a FAT system.

{noformat}
/* for files from DOS FAT, check for use of backslash instead
 *  of slash as directory separator (bug in some zipper(s); so
 *  far, not a problem in HPFS, NTFS or VFAT systems)
 */
#ifndef SFX
if (G.pInfo-hostnum == FS_FAT_  !MBSCHR(G.filename, '/')) {
char *p=G.filename;

if (*p) do {
if (*p == '\\') {
if (!G.reported_backslash) {
Info(slide, 0x21, ((char *)slide,
  LoadFarString(BackslashPathSep), G.zipfn));
G.reported_backslash = TRUE;
if (!error_in_archive)
error_in_archive = PK_WARN;
}
*p = '/';
}
} while (*PREINCSTR(p));
}
#endif /* !SFX */
{noformat}

hostnum is the upper byte of version made by inside the central directory 
header - this is ZipArchiveEntry's get/setPlatform - and FS_FAT_ is 0 
(ZipArchiveEntry#PLATFORM_FAT).  We'd have all pieces together to emulate this.

 ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
 Umlauts
 

 Key: COMPRESS-176
 URL: https://issues.apache.org/jira/browse/COMPRESS-176
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.3
 Environment: Windows 7
Reporter: Wurstbrot mit Senf
 Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip, 
 testzap-winzip.zip


 There is a problem when handling a WinZip-created zip with Umlauts in 
 directories.
 I'm accessing a zip file created with WinZip containing a directory with an 
 umlaut (ä) with ArchiveInputStream. When creating the zip file the 
 unicode-flag of winzip had been active.
 The following problem occurs when accessing the entries of the zip:
 the ArchiveEntry for a directory containing an umlaut is not marked as a 
 directory and the file names for the directory and all files contained in 
 that directory contain backslashes instead of slashes (i.e. completely 
 different to all other files in directories with no umlaut in their path).
 There is no difference when letting the ArchiveStreamFactory decide which 
 ArchiveInputStream to create or when using the ZipArchiveInputStream 
 constructor with the correct encoding (I've tried different encodings CP437, 
 CP850, ISO-8859-15, but still the problem persisted).
 This problem does not occur when using the very same zip file but compressed 
 by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-176) ArchiveInputStream#getNextEntry(): Problems with WinZip directories with Umlauts

2012-02-24 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215704#comment-13215704
 ] 

Stefan Bodewig commented on COMPRESS-176:
-

This is what InfoZIP's zip on Linux says:

{noformat}
stefanb@brick:~$ zip -Tv Desktop/test-winzip.zip 
Archive:  Desktop/test-winzip.zip
testing: doc.txt.gz   OK
testing: doc2.txt OK
testing: ??\  OK
testing: ??\??zip.zip OK
testing: ??\??.txtOK
No errors detected in compressed data of Desktop/test-winzip.zip.
test of Desktop/test-winzip.zip OK
{noformat}

The entry for the directory contains a Unicode extra field with 0xc3 0xa4 0x5c 
as UTF-8 encoded name.  This actually is ä\.

Since directory names in ZIP archives must end with / Compress doesn't detect 
this as a directory.  It may be possible to create a workaround like if the 
'plain name ends with a / and the unicode name uses a \ then bend it, but I 
can't say I'd like that.

Java6 likely works because it doesn't have any idea about unicode extra fields 
and simply uses the plain name.  You'd get the same behavior from 
ZipArchiveInputStream by setting useUnicodeExtraFields to false in the 
constructor.

 ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
 Umlauts
 

 Key: COMPRESS-176
 URL: https://issues.apache.org/jira/browse/COMPRESS-176
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.3
 Environment: Windows 7
Reporter: Wurstbrot mit Senf
 Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip


 There is a problem when handling a WinZip-created zip with Umlauts in 
 directories.
 I'm accessing a zip file created with WinZip containing a directory with an 
 umlaut (ä) with ArchiveInputStream. When creating the zip file the 
 unicode-flag of winzip had been active.
 The following problem occurs when accessing the entries of the zip:
 the ArchiveEntry for a directory containing an umlaut is not marked as a 
 directory and the file names for the directory and all files contained in 
 that directory contain backslashes instead of slashes (i.e. completely 
 different to all other files in directories with no umlaut in their path).
 There is no difference when letting the ArchiveStreamFactory decide which 
 ArchiveInputStream to create or when using the ZipArchiveInputStream 
 constructor with the correct encoding (I've tried different encodings CP437, 
 CP850, ISO-8859-15, but still the problem persisted).
 This problem does not occur when using the very same zip file but compressed 
 by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-16) unable to extract a TAR file that contains an entry which is 10 GB in size

2012-02-21 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212741#comment-13212741
 ] 

Stefan Bodewig commented on COMPRESS-16:


Sorry, I saw COMPRESS-177 but am too busy to look into it right now.

The test in TarUtils only checks whether the most significant bit is set, it 
doesn't check the actual value of the first byte so 0xFF should be detected as 
well.  I haven't checked whether the end result is a negative number, though.

 unable to extract a TAR file that contains an entry which is 10 GB in size
 --

 Key: COMPRESS-16
 URL: https://issues.apache.org/jira/browse/COMPRESS-16
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
 Environment: I am using win xp sp3, but this should be platform 
 independent.
Reporter: Sam Smith
 Fix For: 1.4

 Attachments: 
 0001-Accept-GNU-tar-files-with-entries-over-8GB-in-size.patch, 
 0002-Allow-creating-tar-archives-with-files-over-8GB.patch, 
 0004-Prefer-octal-over-binary-size-representation.patch, ant-8GB-tar.patch, 
 patch-for-compress.txt


 I made a TAR file which contains a file entry where the file is 10 GB in size.
 When I attempt to extract the file using TarInputStream, it fails with the 
 following stack trace:
   java.io.IOException: unexpected EOF with 24064 bytes unread
   at 
 org.apache.commons.compress.archivers.tar.TarInputStream.read(TarInputStream.java:348)
   at 
 org.apache.commons.compress.archivers.tar.TarInputStream.copyEntryContents(TarInputStream.java:388)
 So, TarInputStream does not seem to support large ( 8 GB?) files.
 Here is something else to note: I created that TAR file using TarOutputStream 
 , which did not complain when asked to write a 10 GB file into the TAR file, 
 so I assume that TarOutputStream has no file size limits?  That, or does it 
 silently create corrupted TAR files (which would be the worst situation of 
 all...)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-168) getName of ZipArchiveEntry

2011-12-22 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174900#comment-13174900
 ] 

Stefan Bodewig commented on COMPRESS-168:
-

Any news?

 getName of ZipArchiveEntry
 --

 Key: COMPRESS-168
 URL: https://issues.apache.org/jira/browse/COMPRESS-168
 Project: Commons Compress
  Issue Type: Test
  Components: Archivers
Affects Versions: 1.2
 Environment: J2EE Environment with jdk 1.4
Reporter: Pavithra Kumar
 Attachments: TestZip.zip


 getName method of ZipArchiveEntry is not giving arabic file names. Instead of 
 that it gives some chunked characters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-168) getName of ZipArchiveEntry

2011-12-14 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169412#comment-13169412
 ] 

Stefan Bodewig commented on COMPRESS-168:
-

Windows compressed folders facility seems to use the platform's native encoding 
when creating ZIPs.
See http://commons.apache.org/compress/zip.html#encoding

Do you get correct file names if you use something like

new ZipFile(zipname, ENCODING)

where ENCODING is whatever Java calls you platform's native encoding. I don't 
have any idea what that
would be and wouldn't recognize correct arabic characters either so I didn't 
try it myself on your
test archive.  One list I know of is 
http://docs.oracle.com/javase/1.5.0/docs/guide/intl/encoding.doc.html and 
ISO8859_6 or Cp1256 don't sound bad (but you probably know better than me).

 getName of ZipArchiveEntry
 --

 Key: COMPRESS-168
 URL: https://issues.apache.org/jira/browse/COMPRESS-168
 Project: Commons Compress
  Issue Type: Test
  Components: Archivers
Affects Versions: 1.2
 Environment: J2EE Environment with jdk 1.4
Reporter: Pavithra Kumar
 Attachments: TestZip.zip


 getName method of ZipArchiveEntry is not giving arabic file names. Instead of 
 that it gives some chunked characters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-168) getName of ZipArchiveEntry

2011-12-12 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13167454#comment-13167454
 ] 

Stefan Bodewig commented on COMPRESS-168:
-

Where does the ZipArchiveE (do you create it yourself or read it from a ZipFile 
or
a ZipArchiveInputStream)?

If you have read it from somewhere, is there any chance you could provide a 
small
sample archive that doesn't work for you?


 getName of ZipArchiveEntry
 --

 Key: COMPRESS-168
 URL: https://issues.apache.org/jira/browse/COMPRESS-168
 Project: Commons Compress
  Issue Type: Test
  Components: Archivers
Affects Versions: 1.2
 Environment: J2EE Environment with jdk 1.4
Reporter: Pavithra Kumar

 getName method of ZipArchiveEntry is not giving arabic file names. Instead of 
 that it gives some chunked characters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-165) Support writing entries 8GiB in tar

2011-12-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165219#comment-13165219
 ] 

Stefan Bodewig commented on COMPRESS-165:
-

The star/GNU/BSD version has been added with svn revision 1211892

 Support writing entries  8GiB in tar
 -

 Key: COMPRESS-165
 URL: https://issues.apache.org/jira/browse/COMPRESS-165
 Project: Commons Compress
  Issue Type: New Feature
  Components: Archivers
Affects Versions: 1.3
Reporter: Stefan Bodewig

 We already parse PAX headers and the star/GNU tar/BSD tar dialects used for 
 big
 entries.
 Similar to the way we handle long file names there should be a user option to 
 chose
 between error, star and posix or pax.  star is Jörg Schilling's tar 
 which was the
 first one to use the binary size representation later adopted by GNU and BSD 
 tar as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-163) Unable to extract a file larger than 8GB from a Posix-format tar archive

2011-12-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165222#comment-13165222
 ] 

Stefan Bodewig commented on COMPRESS-163:
-

After applying John's other patches from COMPRESS-16 with 1211892 for 
COMPRESS-165 it now is the output stream that throws if a big file is added 
(and star extensions haven't been enabled).  So I've removed the adjustSize 
again and changed setSize as this patch suggested.

 Unable to extract a file larger than 8GB from a Posix-format tar archive
 

 Key: COMPRESS-163
 URL: https://issues.apache.org/jira/browse/COMPRESS-163
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.3
 Environment: The tar archive used for testing was created by GNU tar, 
 but the problem will occur with any Posix-formatted tar file containing files 
 over 8GB in size.
Reporter: John Kodis
Priority: Minor
 Fix For: 1.4

 Attachments: 
 0003-Allow-reading-large-files-from-Posix-tar-archives.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 An attempt to read a posix-format tar archive containing a file in excess of 
 8^11 bytes in size will fail with a Size out of range illegal argument 
 exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-165) Support writing entries 8GiB in tar

2011-12-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165240#comment-13165240
 ] 

Stefan Bodewig commented on COMPRESS-165:
-

PAX headers are in with svn revision 1211931 - only need to add docs to resolve 
this.

 Support writing entries  8GiB in tar
 -

 Key: COMPRESS-165
 URL: https://issues.apache.org/jira/browse/COMPRESS-165
 Project: Commons Compress
  Issue Type: New Feature
  Components: Archivers
Affects Versions: 1.3
Reporter: Stefan Bodewig

 We already parse PAX headers and the star/GNU tar/BSD tar dialects used for 
 big
 entries.
 Similar to the way we handle long file names there should be a user option to 
 chose
 between error, star and posix or pax.  star is Jörg Schilling's tar 
 which was the
 first one to use the binary size representation later adopted by GNU and BSD 
 tar as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-166) Support POSIX/Pax variant for long file names in tar

2011-12-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13165297#comment-13165297
 ] 

Stefan Bodewig commented on COMPRESS-166:
-

Code and tests are in with svn revision 1211943 - just needs docs.

 Support POSIX/Pax variant for long file names in tar
 

 Key: COMPRESS-166
 URL: https://issues.apache.org/jira/browse/COMPRESS-166
 Project: Commons Compress
  Issue Type: Improvement
  Components: Archivers
Affects Versions: 1.3
Reporter: Stefan Bodewig

 Once we add support for writing Pax headers for COMPRESS-165 it will be 
 pretty easy to support the same headers for long file names as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-165) Support writing entries 8GiB in tar

2011-12-07 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164339#comment-13164339
 ] 

Stefan Bodewig commented on COMPRESS-165:
-

COMPRESS-16 contains patches that enable the star method of writing such 
entries.

 Support writing entries  8GiB in tar
 -

 Key: COMPRESS-165
 URL: https://issues.apache.org/jira/browse/COMPRESS-165
 Project: Commons Compress
  Issue Type: New Feature
  Components: Archivers
Affects Versions: 1.3
Reporter: Stefan Bodewig

 We already parse PAX headers and the star/GNU tar/BSD tar dialects used for 
 big
 entries.
 Similar to the way we handle long file names there should be a user option to 
 chose
 between error, star and posix or pax.  star is Jörg Schilling's tar 
 which was the
 first one to use the binary size representation later adopted by GNU and BSD 
 tar as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-163) Unable to extract a file larger than 8GB from a Posix-format tar archive

2011-12-06 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163640#comment-13163640
 ] 

Stefan Bodewig commented on COMPRESS-163:
-

I may be missing it, but is the code that actaully reads the PAX header and 
applies the size read available as well?


 Unable to extract a file larger than 8GB from a Posix-format tar archive
 

 Key: COMPRESS-163
 URL: https://issues.apache.org/jira/browse/COMPRESS-163
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.3
 Environment: The tar archive used for testing was created by GNU tar, 
 but the problem will occur with any Posix-formatted tar file containing files 
 over 8GB in size.
Reporter: John Kodis
Priority: Minor
 Fix For: 1.4

 Attachments: 
 0003-Allow-reading-large-files-from-Posix-tar-archives.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 An attempt to read a posix-format tar archive containing a file in excess of 
 8^11 bytes in size will fail with a Size out of range illegal argument 
 exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-163) Unable to extract a file larger than 8GB from a Posix-format tar archive

2011-12-06 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163814#comment-13163814
 ] 

Stefan Bodewig commented on COMPRESS-163:
-

Sorry, I somehow missed the fact that we already had code that was parsing PAX 
headers.

I agree the code should be corrected.

 Unable to extract a file larger than 8GB from a Posix-format tar archive
 

 Key: COMPRESS-163
 URL: https://issues.apache.org/jira/browse/COMPRESS-163
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.3
 Environment: The tar archive used for testing was created by GNU tar, 
 but the problem will occur with any Posix-formatted tar file containing files 
 over 8GB in size.
Reporter: John Kodis
Priority: Minor
 Fix For: 1.4

 Attachments: 
 0003-Allow-reading-large-files-from-Posix-tar-archives.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 An attempt to read a posix-format tar archive containing a file in excess of 
 8^11 bytes in size will fail with a Size out of range illegal argument 
 exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-16) unable to extract a TAR file that contains an entry which is 10 GB in size

2011-12-05 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-16?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162702#comment-13162702
 ] 

Stefan Bodewig commented on COMPRESS-16:


Read-support for the GNU version is in with svn revision 1210386

Before I look into write support I'll need to reshuffle a few things so we can
have testcases.  The current run-it Maven profile is not really sufficient as it
would mean you'd have to run all ZIP ITs as well.

 unable to extract a TAR file that contains an entry which is 10 GB in size
 --

 Key: COMPRESS-16
 URL: https://issues.apache.org/jira/browse/COMPRESS-16
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
 Environment: I am using win xp sp3, but this should be platform 
 independent.
Reporter: Sam Smith
 Attachments: 
 0001-Accept-GNU-tar-files-with-entries-over-8GB-in-size.patch, 
 0002-Allow-creating-tar-archives-with-files-over-8GB.patch, 
 ant-8GB-tar.patch, patch-for-compress.txt


 I made a TAR file which contains a file entry where the file is 10 GB in size.
 When I attempt to extract the file using TarInputStream, it fails with the 
 following stack trace:
   java.io.IOException: unexpected EOF with 24064 bytes unread
   at 
 org.apache.commons.compress.archivers.tar.TarInputStream.read(TarInputStream.java:348)
   at 
 org.apache.commons.compress.archivers.tar.TarInputStream.copyEntryContents(TarInputStream.java:388)
 So, TarInputStream does not seem to support large ( 8 GB?) files.
 Here is something else to note: I created that TAR file using TarOutputStream 
 , which did not complain when asked to write a 10 GB file into the TAR file, 
 so I assume that TarOutputStream has no file size limits?  That, or does it 
 silently create corrupted TAR files (which would be the worst situation of 
 all...)?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-164) Cannot Read Winzip Archives With Unicode Extra Fields

2011-12-05 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13162869#comment-13162869
 ] 

Stefan Bodewig commented on COMPRESS-164:
-

It was easier and cleaner to fix in Ant's code base where nobody cares for the 
order of entries from the central directory.

See svn revision 1210522

 Cannot Read Winzip Archives With Unicode Extra Fields
 -

 Key: COMPRESS-164
 URL: https://issues.apache.org/jira/browse/COMPRESS-164
 Project: Commons Compress
  Issue Type: Bug
  Components: Archivers
Affects Versions: 1.3
 Environment: Windows 7, Oracle JDK 6
Reporter: Volker Leidl
 Fix For: 1.4

 Attachments: UTF8ZipFilesTest.patch, ZipFile.patch


 I have a zip file created with WinZip containing Unicode extra fields. Upon 
 attempting to extract it with 
 org.apache.commons.compress.archivers.zip.ZipFile, ZipFile.getInputStream() 
 returns null for ZipArchiveEntries previously retrieved with 
 ZipFile.getEntry() or even ZipFile.getEntries(). See UTF8ZipFilesTest.patch 
 in the attachments for a test case exposing the bug. The original test case 
 stopped short of trying to read the entries, that's why this wasn't flagged 
 up before. 
 The problem lies in the fact that inside ZipFile.java entries are stored in a 
 HashMap. However, at one point after populating the HashMap, the unicode 
 extra fields are read, which leads to a change of the ZipArchiveEntry name, 
 and therefore a change of its hash code. Because of this, subsequent gets on 
 the HashMap fail to retrieve the original values.
 ZipFile.patch contains an (admittedly simple-minded) fix for this problem by 
 reconstructing the entries HashMap after the Unicode extra fields have been 
 parsed. The purpose of this patch is mainly to show that the problem is 
 indeed what I think, rather than providing a well-designed solution.
 The patches have been tested against revision 1210416.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-137) TarArchiveEntry.getFile() always returns null + no way to get an InputStream from TarArchiveInputStream similar to what you do with (java.util.zip.ZipFile())..getInpu

2011-11-25 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13157177#comment-13157177
 ] 

Stefan Bodewig commented on COMPRESS-137:
-

You can wrap the stream in something like the BoundedInputStream found as 
nested class in ZipFile or more conveniently in commons-io.

 TarArchiveEntry.getFile() always returns null + no way to get an InputStream 
 from TarArchiveInputStream similar to what you do with 
 (java.util.zip.ZipFile())..getInputStream(ZipEntry);
 

 Key: COMPRESS-137
 URL: https://issues.apache.org/jira/browse/COMPRESS-137
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.1
 Environment: $ uname -a
 Linux Microknoppix 2.6.31.6 #4 SMP PREEMPT Tue Nov 10 19:11:11 CET 2009 i686 
 GNU/Linux
 $ java -version
 java version 1.6.0_16
 Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
 Java HotSpot(TM) Client VM (build 14.2-b01, mixed mode, sharing)
 $ echo $CLASSPATH
 /media/sdb3/prjx/Java/JUtils:/media/sdb3/prjx/Java/JUtils/jars/commons-compress-1.1.jar:.
Reporter: Albretch Mueller
Assignee: Torsten Curdt
Priority: Critical
  Labels: zip_through_XMLReader
 Fix For: 1.1


 ~ 
  this is a test run using httpd-2.2.19.tar[.gz,bz2] to show what I mean
 ~ 
  lbrtchx
 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 
 ~ ~ ~ 
 $ wget http://apache.cyberuse.com//httpd/httpd-2.2.19.tar.gz
 --2011-06-27 11:21:46--  http://apache.cyberuse.com//httpd/httpd-2.2.19.tar.gz
 Resolving apache.cyberuse.com... 174.132.149.89
 Connecting to apache.cyberuse.com|174.132.149.89|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: 7113418 (6.8M) [application/x-gzip]
 Saving to: `httpd-2.2.19.tar.gz'
 100%[===...===] 7,113,418296K/s   in 24s 
 2011-06-27 11:22:10 (285 KB/s) - `httpd-2.2.19.tar.gz' saved [7113418/7113418]
 $ wget http://apache.cyberuse.com//httpd/httpd-2.2.19.tar.bz2
 --2011-06-27 11:22:19--  
 http://apache.cyberuse.com//httpd/httpd-2.2.19.tar.bz2
 Resolving apache.cyberuse.com... 174.132.149.89
 Connecting to apache.cyberuse.com|174.132.149.89|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: 5322082 (5.1M) [application/x-bzip2]
 Saving to: `httpd-2.2.19.tar.bz2'
 100%[===...===] 5,322,082256K/s   in 25s 
 2011-06-27 11:22:44 (207 KB/s) - `httpd-2.2.19.tar.bz2' saved 
 [5322082/5322082]
 $ wget http://www.apache.org/dist/httpd/httpd-2.2.19.tar.gz.md5
 --2011-06-27 11:22:51--  
 http://www.apache.org/dist/httpd/httpd-2.2.19.tar.gz.md5
 Resolving www.apache.org... 140.211.11.131
 Connecting to www.apache.org|140.211.11.131|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: 54 [text/plain]
 Saving to: `httpd-2.2.19.tar.gz.md5'
 100%[===...===] 54  --.-K/s   in 0s  
 2011-06-27 11:22:51 (4.23 MB/s) - `httpd-2.2.19.tar.gz.md5' saved [54/54]
 $ wget http://www.apache.org/dist/httpd/httpd-2.2.19.tar.bz2.md5
 --2011-06-27 11:23:02--  
 http://www.apache.org/dist/httpd/httpd-2.2.19.tar.bz2.md5
 Resolving www.apache.org... 140.211.11.131
 Connecting to www.apache.org|140.211.11.131|:80... connected.
 HTTP request sent, awaiting response... 200 OK
 Length: 55 [text/plain]
 Saving to: `httpd-2.2.19.tar.bz2.md5'
 100%[===...===] 55  --.-K/s   in 0s  
 2011-06-27 11:23:02 (4.91 MB/s) - `httpd-2.2.19.tar.bz2.md5' saved [55/55]
 $ ls -l httpd-2.2.19.tar.*
 -rw-r--r-- 1 knoppix knoppix 5322082 May 21 18:58 httpd-2.2.19.tar.bz2
 -rw-r--r-- 1 knoppix knoppix  55 May 21 18:58 httpd-2.2.19.tar.bz2.md5
 -rw-r--r-- 1 knoppix knoppix 7113418 May 21 18:58 httpd-2.2.19.tar.gz
 -rw-r--r-- 1 knoppix knoppix  54 May 21 18:58 httpd-2.2.19.tar.gz.md5
 $ md5sum -b httpd-2.2.19.tar.bz2
 832f96a6ec4b8fc7cf49b9efd4e89060 *httpd-2.2.19.tar.bz2
 $ md5sum -b httpd-2.2.19.tar.gz
 e9f5453e1e4d7aeb0e7ec7184c6784b5 *httpd-2.2.19.tar.gz
 $ cat *.md5
 832f96a6ec4b8fc7cf49b9efd4e89060 *httpd-2.2.19.tar.bz2
 e9f5453e1e4d7aeb0e7ec7184c6784b5 *httpd-2.2.19.tar.gz
 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 
 ~ ~ ~ 
 /*
  TarArchiveEntry.getFile() always returns null + no way to get an InputStream 
 from TarArchiveInputStream similar to what you do with 
 (java.util.zip.ZipFile())..getInputStream(ZipEntry);
 */
 import org.apache.commons.compress.compressors.bzip2.*;
 import org.apache.commons.compress.compressors.gzip.*;
 import org.apache.commons.compress.archivers.tar.*;
 import org.apache.commons.compress.utils.*;
 import org.apache.commons.compress.archivers.*;
 import java.io.*;
 

[jira] [Commented] (COMPRESS-161) bzip2 decompression terminates after 900000 bytes

2011-11-09 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147117#comment-13147117
 ] 

Stefan Bodewig commented on COMPRESS-161:
-

Good to know trunk is actually buildable outside of my machine and our CI 
systems.

The patches welcome statement is some sort of standard response in OSS land, 
I didn't seriously expect you to provide a patch.  But I don't have the time or 
itch to scratch (or current knowledge TBH) to provide one myself right now 
either.

 bzip2 decompression terminates after 90 bytes
 -

 Key: COMPRESS-161
 URL: https://issues.apache.org/jira/browse/COMPRESS-161
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.3
 Environment: Windows7 64bit JDK7u1,2
Reporter: Hans horn
Priority: Critical
 Fix For: 1.4

 Attachments: INT1_aminey.inp.bz2


 bzip2 decompression terminates (w/o error) after 90 bytes
 try {
   InputStream iin = new BZip2CompressorInputStream(new 
 FileInputStream(bzip2 compressed file that was uncompressed  90 bytes in 
 size);
   int data = iin.read();
   while (data != -1) {
 System.out.print((char) data); ++nBytes;
 data = iin.read();
   }
 } catch (IOException iox) { /**/ }
 System.out.println(#Bytes read  + nBytes);
 prints: #Bytes read 90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-162) BZip2CompressorInputStream still stops after 900,000 decompressed bytes of large compressed file

2011-11-09 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147118#comment-13147118
 ] 

Stefan Bodewig commented on COMPRESS-162:
-

Thank you for checking.

 BZip2CompressorInputStream still stops after 900,000 decompressed bytes of 
 large compressed file
 

 Key: COMPRESS-162
 URL: https://issues.apache.org/jira/browse/COMPRESS-162
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.3
 Environment: Linux (Fedora Cores 13 [2.6.34.9-69.fc13.i686.PAE] and 
 15, at latest 'yum upgrade' as of 7 Nov 2011), Sun Java 1.6.0_22
Reporter: Andrew Pavlin
 Fix For: 1.4


 Attempting to unzip the planet-110921.osm.bz2 file downloaded directly from 
 planet.OpenStreetMaps.org aborts after exactly 90 bytes are uncompressed. 
 The uncompressed content looks like valid XML, and causes my application's 
 parser to blow up with XML syntax errors due to missing closing tags. Tried 
 using the example code to just uncompress, and got the same exact behavior.
 Uncompressing the same file planet-110921.osm.bz2 (19357793489 bytes long 
 compressed) with the Linux bzip2 command-line utility 
 (bzip2-1.0.6-1.fc13.i686.rpm) succeeds and produces a valid (and enormous) 
 XML file that can be successfully parsed.
 Tried getting a subversion snapshot of the commons-compress trunk on 7 Nov 
 2011 and replacing the org.apache.commons.compress.compressors.bzip2 package 
 in the commons-compress-1.3.jar with compiled code from the trunk (Subversion 
 log reported that the fix for COMPRESS-146 (?) was in). Still the same 
 failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

2011-11-09 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147139#comment-13147139
 ] 

Stefan Bodewig commented on COMPRESS-146:
-

I've updated the documentation with svn revision 1199823.

It would be good to have testcases with concatenated streams, I'll look into 
creating some.

Yes, I think we should change the defaults with 2.0.  Deprecations won't help 
in the light of our factory that people may use instead of using the 
constructors directly.  Adding a new flag to the factory method looks wrong 
since there are formats (pack200) supported by the factory that don't know 
anything about concatenated streams.

 BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should 
 treat this as EOS
 

 Key: COMPRESS-146
 URL: https://issues.apache.org/jira/browse/COMPRESS-146
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
 Environment: all
Reporter: Dmitriy Smirnov
Priority: Critical
  Labels: 0x177245385090
 Fix For: 1.4

 Attachments: bzip2-concatenated.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should 
 treat this as EOS
 This error occurs mostly on large size files as sudden EOF somwere in the 
 middle of the file.
 An example of data from archived file:
 $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
 --
 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
 --
 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
 .
 Suggested solution:
 private void initBlock() throws IOException {
 char magic0 = bsGetUByte();
 char magic1 = bsGetUByte();
 char magic2 = bsGetUByte();
 char magic3 = bsGetUByte();
 char magic4 = bsGetUByte();
 char magic5 = bsGetUByte();
 if( magic0 == 0x17  magic1 == 0x72  magic2 == 0x45
  magic3 == 0x38  magic4 == 0x50  magic5 == 0x90 ) 
   
 {
   if( complete() ) // end of file);
   {
   return;
   } else
   {
   magic0 = bsGetUByte();
 magic1 = bsGetUByte();
 magic2 = bsGetUByte();
 magic3 = bsGetUByte();
 magic4 = bsGetUByte();
 magic5 = bsGetUByte();
   }
 } 
 if (magic0 != 0x31 || // '1'
magic1 != 0x41 || // 'A'
magic2 != 0x59 || // 'Y'
magic3 != 0x26 || // ''
magic4 != 0x53 || // 'S'
magic5 != 0x59 // 'Y'
) {
 this.currentState = EOF;
 throw new IOException(bad block header);
 } else {
 this.storedBlockCRC = bsGetInt();
 this.blockRandomised = bsR(1) == 1;
 /**
  * Allocate data here instead in constructor, so we do not 
 allocate
  * it if the input file is empty.
  */
 if (this.data == null) {
 this.data = new Data(this.blockSize100k);
 }
 // currBlockNo++;
 getAndMoveToFrontDecode();
 this.crc.initialiseCRC();
 this.currentState = START_BLOCK_STATE;
 }
 }
 private boolean 
 complete() throws IOException 
 { 
   boolean result = false;
 this.storedCombinedCRC = bsGetInt();
 try
 {
 if (in.available() == 0 ) 
 {
 throw new IOException( EOF );
 }
 checkMagicChar('B', first);
 checkMagicChar('Z', second);
 checkMagicChar('h', third);
 int blockSize = this.in.read();
 if ((blockSize  '1') || (blockSize  '9')) {
 throw new IOException(Stream is not BZip2 formatted: illegal 
 
   + blocksize  + (char) blockSize);
 }
 this.blockSize100k = blockSize - '0';
 this.bsLive = 0;
 this.bsBuff = 0;
 } catch( IOException e )
 {
   this.currentState = EOF;
   
   result = true;
 }
 
 this.data = null;
 if (this.storedCombinedCRC != this.computedCombinedCRC) {
 throw new IOException(BZip2 CRC error);
 }
 

[jira] [Commented] (COMPRESS-162) BZip2CompressorInputStream still stops after 900,000 decompressed bytes of large compressed file

2011-11-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146178#comment-13146178
 ] 

Stefan Bodewig commented on COMPRESS-162:
-

Andrew, are you using the two-arg constructor for BZip2CompressorInputStream?  
Concatenated files are not supported by default but only when you ask for it.

 BZip2CompressorInputStream still stops after 900,000 decompressed bytes of 
 large compressed file
 

 Key: COMPRESS-162
 URL: https://issues.apache.org/jira/browse/COMPRESS-162
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.3
 Environment: Linux (Fedora Cores 13 [2.6.34.9-69.fc13.i686.PAE] and 
 15, at latest 'yum upgrade' as of 7 Nov 2011), Sun Java 1.6.0_22
Reporter: Andrew Pavlin

 Attempting to unzip the planet-110921.osm.bz2 file downloaded directly from 
 planet.OpenStreetMaps.org aborts after exactly 90 bytes are uncompressed. 
 The uncompressed content looks like valid XML, and causes my application's 
 parser to blow up with XML syntax errors due to missing closing tags. Tried 
 using the example code to just uncompress, and got the same exact behavior.
 Uncompressing the same file planet-110921.osm.bz2 (19357793489 bytes long 
 compressed) with the Linux bzip2 command-line utility 
 (bzip2-1.0.6-1.fc13.i686.rpm) succeeds and produces a valid (and enormous) 
 XML file that can be successfully parsed.
 Tried getting a subversion snapshot of the commons-compress trunk on 7 Nov 
 2011 and replacing the org.apache.commons.compress.compressors.bzip2 package 
 in the commons-compress-1.3.jar with compiled code from the trunk (Subversion 
 log reported that the fix for COMPRESS-146 (?) was in). Still the same 
 failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-161) bzip2 decompression terminates after 900000 bytes

2011-11-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146179#comment-13146179
 ] 

Stefan Bodewig commented on COMPRESS-161:
-

Hans, what kind of XZ problems did you have to work around?  trunk should be 
building out of the box.

WRT parallel bzip2 - no concrete plans, patches welcome ;-)

 bzip2 decompression terminates after 90 bytes
 -

 Key: COMPRESS-161
 URL: https://issues.apache.org/jira/browse/COMPRESS-161
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.3
 Environment: Windows7 64bit JDK7u1,2
Reporter: Hans horn
Priority: Critical
 Fix For: 1.4

 Attachments: INT1_aminey.inp.bz2


 bzip2 decompression terminates (w/o error) after 90 bytes
 try {
   InputStream iin = new BZip2CompressorInputStream(new 
 FileInputStream(bzip2 compressed file that was uncompressed  90 bytes in 
 size);
   int data = iin.read();
   while (data != -1) {
 System.out.print((char) data); ++nBytes;
 data = iin.read();
   }
 } catch (IOException iox) { /**/ }
 System.out.println(#Bytes read  + nBytes);
 prints: #Bytes read 90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-146) BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should treat this as EOS

2011-11-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146180#comment-13146180
 ] 

Stefan Bodewig commented on COMPRESS-146:
-

yes, we probably want all three formats to be consistent here.

I'm not sure what the danger of changing the default really would be, I 
vaguelly recall people complaining about GzipInputStream after JDK7 added 
support for concatenated streams (I may be totally wrong on this, though).

 BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should 
 treat this as EOS
 

 Key: COMPRESS-146
 URL: https://issues.apache.org/jira/browse/COMPRESS-146
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
 Environment: all
Reporter: Dmitriy Smirnov
Priority: Critical
  Labels: 0x177245385090
 Fix For: 1.4

 Attachments: bzip2-concatenated.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 BZip2CompressorInputStream always treats 0x177245385090 as EOF, but should 
 treat this as EOS
 This error occurs mostly on large size files as sudden EOF somwere in the 
 middle of the file.
 An example of data from archived file:
 $ cat fastq.ax.bz2 | od -t x1 | grep -A 1 '17 72 45'
 22711660 d0 ff b6 01 20 10 ff ff 17 72 45 38 50 90 2e ff
 22711700 b2 d3 42 5a 68 39 31 41 59 26 53 59 84 3c 41 75
 --
 24637020 c5 49 ff 19 80 49 20 7f ff 17 72 45 38 50 90 a4
 24637040 a8 ac bd 42 5a 68 39 31 41 59 26 53 59 0d 9a b4
 --
 40302720 ff b1 24 80 10 ff ff 17 72 45 38 50 90 24 cb c5
 40302740 90 42 5a 68 39 31 41 59 26 53 59 42 05 ae 5e 05
 .
 Suggested solution:
 private void initBlock() throws IOException {
 char magic0 = bsGetUByte();
 char magic1 = bsGetUByte();
 char magic2 = bsGetUByte();
 char magic3 = bsGetUByte();
 char magic4 = bsGetUByte();
 char magic5 = bsGetUByte();
 if( magic0 == 0x17  magic1 == 0x72  magic2 == 0x45
  magic3 == 0x38  magic4 == 0x50  magic5 == 0x90 ) 
   
 {
   if( complete() ) // end of file);
   {
   return;
   } else
   {
   magic0 = bsGetUByte();
 magic1 = bsGetUByte();
 magic2 = bsGetUByte();
 magic3 = bsGetUByte();
 magic4 = bsGetUByte();
 magic5 = bsGetUByte();
   }
 } 
 if (magic0 != 0x31 || // '1'
magic1 != 0x41 || // 'A'
magic2 != 0x59 || // 'Y'
magic3 != 0x26 || // ''
magic4 != 0x53 || // 'S'
magic5 != 0x59 // 'Y'
) {
 this.currentState = EOF;
 throw new IOException(bad block header);
 } else {
 this.storedBlockCRC = bsGetInt();
 this.blockRandomised = bsR(1) == 1;
 /**
  * Allocate data here instead in constructor, so we do not 
 allocate
  * it if the input file is empty.
  */
 if (this.data == null) {
 this.data = new Data(this.blockSize100k);
 }
 // currBlockNo++;
 getAndMoveToFrontDecode();
 this.crc.initialiseCRC();
 this.currentState = START_BLOCK_STATE;
 }
 }
 private boolean 
 complete() throws IOException 
 { 
   boolean result = false;
 this.storedCombinedCRC = bsGetInt();
 try
 {
 if (in.available() == 0 ) 
 {
 throw new IOException( EOF );
 }
 checkMagicChar('B', first);
 checkMagicChar('Z', second);
 checkMagicChar('h', third);
 int blockSize = this.in.read();
 if ((blockSize  '1') || (blockSize  '9')) {
 throw new IOException(Stream is not BZip2 formatted: illegal 
 
   + blocksize  + (char) blockSize);
 }
 this.blockSize100k = blockSize - '0';
 this.bsLive = 0;
 this.bsBuff = 0;
 } catch( IOException e )
 {
   this.currentState = EOF;
   
   result = true;
 }
 
 this.data = null;
 if (this.storedCombinedCRC != this.computedCombinedCRC) {
 throw new IOException(BZip2 CRC error);
 }
 this.computedCombinedCRC = 0;
 return result;
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (COMPRESS-161) bzip2 decompression terminates after 900000 bytes

2011-11-07 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13145550#comment-13145550
 ] 

Stefan Bodewig commented on COMPRESS-161:
-

I can confirm the limit with the file you have attached.  At the same time I 
can easily uncompress even larger files completely so there must be something 
specific to the archive you are using.

Any hints on how you have created it?

 bzip2 decompression terminates after 90 bytes
 -

 Key: COMPRESS-161
 URL: https://issues.apache.org/jira/browse/COMPRESS-161
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.3
 Environment: Windows7 64bit JDK7u1,2
Reporter: Hans horn
Priority: Critical
 Fix For: 1.3

 Attachments: INT1_aminey.inp.bz2


 bzip2 decompression terminates (w/o error) after 90 bytes
 try {
   InputStream iin = new BZip2CompressorInputStream(new 
 FileInputStream(bzip2 compressed file that was uncompressed  90 bytes in 
 size);
   int data = iin.read();
   while (data != -1) {
 System.out.print((char) data); ++nBytes;
 data = iin.read();
   }
 } catch (IOException iox) { /**/ }
 System.out.println(#Bytes read  + nBytes);
 prints: #Bytes read 90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-111) support for lzma files

2011-11-07 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13145564#comment-13145564
 ] 

Stefan Bodewig commented on COMPRESS-111:
-

I think support for the old lzma format would be beneficial for all people 
having to deal with legacy archives, so at least read-only support would be 
great.

Then again adding write support won't hurt either.  We could recommend people 
use XZ instead inside the docs, of course.


 support for lzma files
 --

 Key: COMPRESS-111
 URL: https://issues.apache.org/jira/browse/COMPRESS-111
 Project: Commons Compress
  Issue Type: New Feature
  Components: Compressors
Affects Versions: 1.0
Reporter: maurel jean francois
 Attachments: compress-trunk-lzmaRev0.patch, 
 compress-trunk-lzmaRev1.patch


 adding support for compressing and decompressing of files with LZMA algoritm 
 (Lempel-Ziv-Markov chain-Algorithm)
 (see 
 http://markmail.org/search/?q=list%3Aorg.apache.commons.users/#query:list%3Aorg.apache.commons.users%2F+page:1+mid:syn4uuvbzusevtko+state:results)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-161) bzip2 decompression terminates after 900000 bytes

2011-11-07 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13145592#comment-13145592
 ] 

Stefan Bodewig commented on COMPRESS-161:
-

COMPRESS-146 has got a patch attached that I'll look into.  If it fixes the 
problem with your archive I'm going to commit it.

 bzip2 decompression terminates after 90 bytes
 -

 Key: COMPRESS-161
 URL: https://issues.apache.org/jira/browse/COMPRESS-161
 Project: Commons Compress
  Issue Type: Bug
  Components: Compressors
Affects Versions: 1.3
 Environment: Windows7 64bit JDK7u1,2
Reporter: Hans horn
Priority: Critical
 Fix For: 1.3

 Attachments: INT1_aminey.inp.bz2


 bzip2 decompression terminates (w/o error) after 90 bytes
 try {
   InputStream iin = new BZip2CompressorInputStream(new 
 FileInputStream(bzip2 compressed file that was uncompressed  90 bytes in 
 size);
   int data = iin.read();
   while (data != -1) {
 System.out.print((char) data); ++nBytes;
 data = iin.read();
   }
 } catch (IOException iox) { /**/ }
 System.out.println(#Bytes read  + nBytes);
 prints: #Bytes read 90

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-156) XZ compression support

2011-11-02 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142247#comment-13142247
 ] 

Stefan Bodewig commented on COMPRESS-156:
-

Your patch is in as svn revision 1196665, thanks!

I'll have to add docs (including changes and adding Lasse as contributor) and 
at least one testcase but have to run now.

 XZ compression support
 --

 Key: COMPRESS-156
 URL: https://issues.apache.org/jira/browse/COMPRESS-156
 Project: Commons Compress
  Issue Type: New Feature
  Components: Compressors
Reporter: Lasse Collin
 Fix For: 1.4

 Attachments: xz_support.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-36) Add Zip64 Suport

2011-10-18 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13129805#comment-13129805
 ] 

Stefan Bodewig commented on COMPRESS-36:


As of svn revision 1185722 the zips needed for integration tests are part of a 
.tar.bz2 archive in svn trunk (which is about two MB of size and expands to 
more than 100 MB of highly redundant zips).

I'll remove the zips from my home dir on people.a.o shortly.

 Add Zip64 Suport
 

 Key: COMPRESS-36
 URL: https://issues.apache.org/jira/browse/COMPRESS-36
 Project: Commons Compress
  Issue Type: New Feature
  Components: Archivers
Reporter: Christian Grobmeier
Assignee: Stefan Bodewig
 Fix For: 1.3

 Attachments: 5GB_of_Zeros.zip, 5GB_of_Zeros_7ZIP.zip, 
 5GB_of_Zeros_PKZip.zip, 5GB_of_Zeros_WinZip.zip, 
 5GB_of_Zeros_WindowsCompressedFolders.zip, 5GB_of_Zeros_jar.zip, 
 zip64-sample.zip


 Add Zip64 support. This will make it work to deal with zipfiles  2 GB. 
 Planned for compress 1.1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (COMPRESS-158) Empty directories missing in zip archive

2011-10-08 Thread Stefan Bodewig (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/COMPRESS-158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123628#comment-13123628
 ] 

Stefan Bodewig commented on COMPRESS-158:
-

Hi Daniel,

I'm completely unable to reproduce the problem.  Just to double check, I've 
downloaded and compiled your CompressionUtil class, I have downloaded and 
extracted the test.zip which creates a test directory, I've created and 
compiled the following trivial class

{code}
import org.clerezza.tools.backupfelixcache.CompressionUtil;
import java.io.File;

public class Driver {
public static void main(String[] args) throws Throwable {
CompressionUtil.zip(new File(test), new File(output.zip));
}
}
{code}

I'll attach the resulting output.zip created on Ubuntu 10.4 with 

{noformat}
stefan@birdy:~/Desktop/compress-158$ java -version
java version 1.6.0_20
OpenJDK Runtime Environment (IcedTea6 1.9.9) (6b20-1.9.9-0ubuntu1~10.04.2)
OpenJDK Client VM (build 19.0-b09, mixed mode, sharing)
{noformat}

As you can see it contains everything your original test.zip contained as well.

 Empty directories missing in zip archive
 

 Key: COMPRESS-158
 URL: https://issues.apache.org/jira/browse/COMPRESS-158
 Project: Commons Compress
  Issue Type: Bug
Affects Versions: 1.2
 Environment: Java 1.6, Ubuntu Linux (ext4 fs)
Reporter: Daniel Spicar
Priority: Minor
 Attachments: CompressionUtil.java, output.zip, test.zip


 When zipping a directory that contains several files and subdirectories of 
 which some can be empty, I am missing empty directories. When using a tar 
 archive format empty directories are present.
 I have found https://issues.apache.org/jira/browse/COMPRESS-105 which 
 describes a similar issue, however I am unable to reproduce the solution 
 suggested there. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira