> Date: Wed, 28 Oct 2009 20:31:43 +0100 > From: Peter Schmidtke <pschmid...@mmb.pcb.ub.es> > Subject: [Numpy-discussion] reading gzip compressed files using > numpy.fromfile > To: numpy-discussion@scipy.org > Message-ID: <fc345224bfa26132e9474287e32e0...@mmb.pcb.ub.es> > Content-Type: text/plain; charset="UTF-8" > > Dear Numpy Mailing List Readers, > > I have a quite simple problem, for what I did not find a solution for now. > I have a gzipped file lying around that has some numbers stored in it and I > want to read them into a numpy array as fast as possible but only a bunch > of data at a time. > So I would like to use numpys fromfile funtion. > > For now I have somehow the following code : > > > > f=gzip.open( "myfile.gz", "r" ) > xyz=npy.fromfile(f,dtype="float32",count=400) > > > So I would read 400 entries from the file, keep it open, process my data, > come back and read the next 400 entries. If I do this, numpy is complaining > that the file handle f is not a normal file handle : > OError: first argument must be an open file > > but in fact it is a zlib file handle. But gzip gives access to the normal > filehandle through f.fileobj. > > So I tried xyz=npy.fromfile(f.fileobj,dtype="float32",count=400) > > But there I get just meaningless values (not the actual data) and when I > specify the sep=" " argument for npy.fromfile I get just .1 and nothing > else. > > Can you tell me why and how to fix this problem? I know that I could read > everything to memory, but these files are rather big, so I simply have to > avoid this. > > Thanks in advance. > > > -- > > Peter Schmidtke > > ---------------------- > PhD Student at the Molecular Modeling and Bioinformatics Group > Dep. Physical Chemistry > Faculty of Pharmacy > University of Barcelona > > > > ------------------------------ > > Message: 2 > Date: Wed, 28 Oct 2009 14:33:11 -0500 > From: Robert Kern <robert.k...@gmail.com> > Subject: Re: [Numpy-discussion] reading gzip compressed files using > numpy.fromfile > To: Discussion of Numerical Python <numpy-discussion@scipy.org> > Message-ID: > <3d375d730910281233r5cadd0fcubea14676a3a97...@mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Wed, Oct 28, 2009 at 14:31, Peter Schmidtke <pschmid...@mmb.pcb.ub.es> > wrote: >> Dear Numpy Mailing List Readers, >> >> I have a quite simple problem, for what I did not find a solution for >> now. >> I have a gzipped file lying around that has some numbers stored in it and >> I >> want to read them into a numpy array as fast as possible but only a bunch >> of data at a time. >> So I would like to use numpys fromfile funtion. >> >> For now I have somehow the following code : >> >> >> >> ? ? ? ?f=gzip.open( "myfile.gz", "r" ) >> xyz=npy.fromfile(f,dtype="float32",count=400) >> >> >> So I would read 400 entries from the file, keep it open, process my data, >> come back and read the next 400 entries. If I do this, numpy is >> complaining >> that the file handle f is not a normal file handle : >> OError: first argument must be an open file >> >> but in fact it is a zlib file handle. But gzip gives access to the normal >> filehandle through f.fileobj. > > np.fromfile() requires a true file object, not just a file-like > object. np.fromfile() works by grabbing the FILE* pointer underneath > and using C system calls to read the data, not by calling the .read() > method. > >> So I tried ?xyz=npy.fromfile(f.fileobj,dtype="float32",count=400) >> >> But there I get just meaningless values (not the actual data) and when I >> specify the sep=" " argument for npy.fromfile I get just .1 and nothing >> else. > > This is reading the compressed data, not the data that you want. > >> Can you tell me why and how to fix this problem? I know that I could read >> everything to memory, but these files are rather big, so I simply have to >> avoid this. > > Read in reasonably-sized chunks of bytes at a time, and use > np.fromstring() to create arrays from them. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > > > ------------------------------ > > Message: 3 > Date: Wed, 28 Oct 2009 13:26:41 -0700 > From: Christopher Barker <chris.bar...@noaa.gov> > Subject: Re: [Numpy-discussion] reading gzip compressed files using > numpy.fromfile > To: Discussion of Numerical Python <numpy-discussion@scipy.org> > Message-ID: <4ae8a901.3060...@noaa.gov> > Content-Type: text/plain; charset=UTF-8; format=flowed > > Robert Kern wrote: >>> f=gzip.open( "myfile.gz", "r" ) >>> xyz=npy.fromfile(f,dtype="float32",count=400) > >> Read in reasonably-sized chunks of bytes at a time, and use >> np.fromstring() to create arrays from them. > > Something like: > > count = 400 > xyz = np.fromstring(f.read(count*4), dtype=np.float32) > > should work (untested...) > > -Chris > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > chris.bar...@noaa.gov > >
Thanks Robert and Chris...indeed I managed to read it quite fast this way. ++ Peter Schmidtke ---------------------- PhD Student at the Molecular Modeling and Bioinformatics Group Dep. Physical Chemistry Faculty of Pharmacy University of Barcelona _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion