Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Commons Wiki" for 
change notification.

The following page has been changed by KenTanaka:
http://wiki.apache.org/jakarta-commons/ExtractAndDecompressGzipFiles

The comment on the change is:
Found a more direct method of extracting files and simplified the code example

------------------------------------------------------------------------------
  = Overview =
  Try using VFS to read the content of a compressed (gz) file inside of a tar 
file. Extract tar file objects. If they are gzip files, decompress them. Any 
directory structure in the tarfile is not being preserved, the contents are 
pulled out to the same location regardless of directory hierarchy (for the 
purposes of this example, all objects in the tar file have unique names, so 
there are no file name conflicts).
  
+ Use a two phase approach.
+  1. look at each of the files in the tar file
+  2. if it's a directory, recursively process it, otherwise
+     * if it's a non-gzipped file, extract it to a file
+     * if it's a gzipped file, decompress gzipped content to file
- Use a multiple step approach.
-  1. extract gzipped file from tar file
-  2. decompress gzipped content to a temporary directory
-  3. move decompressed content to desired destination
-  4. remove temporary directory
-  5. remove gzipped file
- 
- There should be a cleaner, more direct route. Maybe someone more familiar 
with VFS can post better code.
  
  Conceptually there is a tar file:
  {{{
  archive.tar
   +- tardir/
       +- content.txt.gz
+      +- non-gzip.txt
  }}}
- I'd like to end up with an uncompressed file "content.txt". 
+ I'd like to end up with an uncompressed file "content.txt" and 
"non-gzip.txt". 
  
  = Sample data file =
  Create this sample {{{archive.tar}}} file with some (unix) commands along the 
lines of:
  {{{
  ls -l > context.txt
  gzip content.txt
+ ls -l > non-gzip.txt
  mkdir tardir
- mv content.txt.gz tardir
+ mv content.txt.gz non-gzip.txt tardir
  tar cvf archive.tar tardir
  rm -r tardir
  }}}
- The content of the {{{content.txt}}} file is just a directory listing, dump 
in anything you want here.
+ The contents of the {{{content.txt}}} and {{{non-gzip.txt}}} files are just a 
directory listings, dump in anything you want here.
- For this example the sample {{{archive.tar}}} is located in the 
{{{/extra/data/tryVfs}}} directory. You can see that hardcoded in the java 
example below. The {{{content.txt}}} file will be extracted into the same 
location.
+ For this example the sample {{{archive.tar}}} is located in the 
{{{/extra/data/tryVfs}}} directory. You can see that hardcoded in the java 
example below. The {{{content.txt}}} and {{{non-gzip.txt}}} files will be 
extracted into the same location.
  
  = pom.xml Project file =
  This example uses Maven2. There is a '''{{{pom.xml}}}''' to define the project
@@ -67, +66 @@

                      </descriptorRefs>
                      <archive>
                          <manifest>
-                             
<mainClass>gov.noaa.eds.tryVfs.MultiStep</mainClass>
+                             
<mainClass>gov.noaa.eds.tryVfs.ExtractFromGzipInTar</mainClass>
                          </manifest>
                      </archive>
                  </configuration>
@@ -91, +90 @@

  }}}
  
  = Source Code =
- Content of '''{{{src/main/java/gov/noaa/eds/tryVfs/MultiStep.java}}}'''
+ Content of 
'''{{{src/main/java/gov/noaa/eds/tryVfs/ExtractFromGzipInTar.java}}}'''
  {{{
  /*
-  * MultiStep.java
+  * ExtractFromGzipInTar.java
   */
  package gov.noaa.eds.tryVfs;
  
@@ -116, +115 @@

   * the purposes of this example, all objects in the tar file have unique 
names,
   * so there are no file name conflicts).
   *
-  * Use a multiple step approach.
-  * 1. extract gzipped file from tar file
-  * 2. decompress gzipped content to a temporary directory
-  * 3. move decompressed content to desired destination
-  * 4. remove temporary directory
-  * 5. remove gzipped file
-  *
-  * There should be a cleaner more direct route, but I haven't discovered it 
yet.
-  * 
-  * @author ktanaka
+  * @author Ken Tanaka
   */
- public class MultiStep {
+ public class ExtractFromGzipInTar 
+ {
      FileSystemManager fsManager = null;
      static String extractDirname = "/extra/data/tryVfs";
-     LocalFile extractDir = null;
      
      /**
       * Extract files from a tar file. If the file extracted is gzipped,
       * decompress it and remove the gzipped version.
       * @param args command line arguments are currently not used
       */
-     public static void main( String[] args ) {
+     public static void main( String[] args )
-         MultiStep msExtract = new MultiStep();
-         
+     {
+         ExtractFromGzipInTar extract = new ExtractFromGzipInTar();
-         try {
+         
+         try {
-             msExtract.fsManager = VFS.getManager();
+             extract.fsManager = VFS.getManager();
          } catch (FileSystemException ex) {
              throw new RuntimeException("failed to get fsManager from VFS", 
ex);
          }
          
-         try {
-             msExtract.extractDir = (LocalFile) 
msExtract.fsManager.resolveFile("file://"
-                     + extractDirname);
-             if (! msExtract.extractDir.exists()) {
-                 msExtract.extractDir.createFolder();
-             }
-         } catch (FileSystemException ex) {
-             throw new RuntimeException("failed to prepare extract directory " 
-                     + extractDirname, ex);
-         }
+         
+         /* Create a tarFile FileObject to connect to the tarfile on disk */
-         
-         
-         /* Create a tarFile object */
          FileObject tarFile;
          try {
+             String tarName = new String("tar:file://" + extractDirname + 
"/archive.tar");
-             System.out.println("Resolve tar file:");
+             System.out.println("Resolve " + tarName);
-             tarFile = msExtract.fsManager.resolveFile(
+             tarFile = extract.fsManager.resolveFile(tarName);
-                     "tar:/extra/data/tryVfs/archive.tar");
              
              FileName tarFileName = tarFile.getName();
              System.out.println("  Path     : " + tarFileName.getPath());
@@ -181, +161 @@

          }
          
          for (FileObject f : children) {
-             msExtract.processChild(f);
+             extract.processChild(f);
-         }
-         
+         }
      } // main( String[] args )
      
      private void processChild(FileObject f) {
@@ -196, +175 @@

                  }
              } else {
                  FileName fname = f.getName();
-                 String extractName = new String(this.extractDir.getName() + 
"/"
+                 String extractName = new String("file://" + extractDirname + 
"/"
                          + fname.getBaseName());
                  System.out.println("Extracting " + extractName);
                  LocalFile extractFile = (LocalFile) 
this.fsManager.resolveFile(extractName);
-                 extractFile.copyFrom(f, new AllFileSelector());
                  
                  // if the file is gzipped, decompress it
                  if (extractFile.getName().getExtension().equals("gz")) {
                      System.out.println("Decompressing " + extractName);
+                     
+                     // The uncompressed filename we seek
+                     // content.txt
+                     String fileName = 
extractFile.getName().getBaseName().replaceAll(".gz$", "");
+                     
+                     // Build the direct path to the uncompressed content of 
the 
+                     // gzip file in the tar file.
+                     // 
gz:tar:file:///archive.tar!/tardir/content.txt.gz!content.txt
-                     String gzName = new String("gz://" + 
extractFile.getName().getPath());
+                     String gzName = new String("gz:" + fname.getURI() + "!" + 
fileName);
-                     System.out.println("gzName=" + gzName);
                      FileObject gzFile = this.fsManager.resolveFile(gzName);
-                     String fileName = 
extractFile.getName().getBaseName().replaceAll(".gz$", "");
                      
                      // The decompressed path we want
-                     String decompName = new String(this.extractDir.getName() 
+ "/" 
+                     String decompName = new String("file://" + extractDirname 
+ "/" 
                              + fileName);
+                     LocalFile decompFile = (LocalFile) 
this.fsManager.resolveFile(decompName);
-                     
-                     // A temporary Directory
-                     String tmpDirname = new String(this.extractDir.getName() 
+ "/" 
-                             + fileName + ".tmp");
-                     
-                     // A temporary file path
-                     String tmpFilename = new String(tmpDirname + "/" + 
fileName);
                      
                      // Some debug lines
                      System.out.println("fileName   =" + fileName);
                      System.out.println("decompName =" + decompName);
-                     System.out.println("tmpDirname =" + tmpDirname);
+                     System.out.println("gzName=" + gzName);
-                     System.out.println("tmpFilename=" + tmpFilename);
-                     
-                     // Extracting from gzip file ends up with a directory 
containing what
-                     // we want.
+                     
-                     LocalFile tmpDir = (LocalFile) 
this.fsManager.resolveFile(tmpDirname);
+                     // Extracting
-                     tmpDir.copyFrom(gzFile, new 
FileTypeSelector(FileType.FILE));
+                     decompFile.copyFrom(gzFile, new 
FileTypeSelector(FileType.FILE));
-                     
+                 } else {
+                     // just extract the non-gzip file
-                     // Move the uncompressed file to the location desired.
-                     LocalFile tmpFile = (LocalFile) 
this.fsManager.resolveFile(tmpFilename);
-                     LocalFile decompFile = (LocalFile) 
this.fsManager.resolveFile(decompName);
-                     tmpFile.moveTo(decompFile);
-                     
-                     // Delete the temporary directory.
-                     tmpDir.delete(new AllFileSelector());
-                     
-                     // Delete the gzip file now that we have the uncompressed 
version.
-                     // Note that the plain file FileObject (extractFile) is 
used 
-                     // for deleting instead of the gzip FileObject (gzFile).
-                     extractFile.delete(new AllFileSelector());
+                     extractFile.copyFrom(f, new AllFileSelector());
                  }
              }
          } catch (FileSystemException ex) {
@@ -269, +234 @@

  
  == Sample Output ==
  {{{
- Nov 6, 2007 2:38:56 PM org.apache.commons.vfs.VfsLog info
+ Nov 7, 2007 12:22:01 PM org.apache.commons.vfs.VfsLog info
  INFO: Using "/tmp/vfs_cache" as temporary files store.
- Resolve tar file:
+ Resolve tar:file:///extra/data/tryVfs/archive.tar
    Path     : /
    URI      : tar:file:///extra/data/tryVfs/archive.tar!/
+ Extracting file:///extra/data/tryVfs/non-gzip.txt
  Extracting file:///extra/data/tryVfs/content.txt.gz
  Decompressing file:///extra/data/tryVfs/content.txt.gz
- gzName=gz:///extra/data/tryVfs/content.txt.gz
  fileName   =content.txt
  decompName =file:///extra/data/tryVfs/content.txt
+ 
gzName=gz:tar:file:///extra/data/tryVfs/archive.tar!/tardir/content.txt.gz!content.txt
- tmpDirname =file:///extra/data/tryVfs/content.txt.tmp
- tmpFilename=file:///extra/data/tryVfs/content.txt.tmp/content.txt
  }}}
- In addition to the {{{archive.tar}}} file, there should now be a 
{{{content.txt}}} file in the same location.
+ In addition to the {{{archive.tar}}} file, there should now be 
{{{content.txt}}} and {{{non-gzip.txt}}} files in the same location.
  

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to