Decompressing a file retrieved by URL seems too complex

John Nagle Thu, 12 Aug 2010 13:26:03 -0700

   I'm reading a URL which is a .gz file, and decompressing
it.  This works, but it seems far too complex.  Yet
none of the "wrapping" you might expect to work
actually does.  You can't wrap a GzipFile around
an HTTP connection, because GzipFile, reasonably enough,
needs random access, and tries to do "seek" and "tell".
Nor is the output descriptor from gzip general; it fails
on "readline", but accepts "read". (No good reason
for that.) So I had to make a second copy.


                                John Nagle

def readurl(url) :
    if url.endswith(".gz") :                                  
        nd = urllib2.urlopen(url,timeout=TIMEOUTSECS)   
        td1 = tempfile.TemporaryFile()  # compressed file
        td1.write(nd.read())    # fetch and copy file
        nd.close() # done with network
        td2 = tempfile.TemporaryFile()  # decompressed file
        td1.seek(0) # rewind
        gd = gzip.GzipFile(fileobj=td1, mode="rb") # wrap unzip
        td2.write(gd.read()) # decompress file
        td1.close() # done with compressed copy
        td2.seek(0) # rewind
        return(td2) # return file object for compressed object
    else :
        return(urllib2.urlopen(url,timeout=TIMEOUTSECS))
--
http://mail.python.org/mailman/listinfo/python-list

Decompressing a file retrieved by URL seems too complex

Reply via email to