[Numpy-discussion] fast numpy.fromfile skipping data chunks

Andrea Cimatoribus Wed, 13 Mar 2013 06:45:43 -0700

Hi everybody, I hope this has not been discussed before, I couldn't find a 
solution elsewhere.
I need to read some binary data, and I am using numpy.fromfile to do this. 
Since the files are huge, and would make me run out of memory, I need to read 
data skipping some records (I am reading data recorded at high frequency, so 
basically I want to read subsampling).
At the moment, I came up with the code below, which is then compiled using 
cython. Despite the significant performance increase from the pure python 
version, the function is still much slower than numpy.fromfile, and only reads 
one kind of data (in this case uint32), otherwise I do not know how to define 
the array type in advance. I have basically no experience with cython nor c, so 
I am a bit stuck. How can I try to make this more efficient and possibly more 
generic?
Thanks


import numpy as np
#For cython!
cimport numpy as np
from libc.stdint cimport uint32_t

def cffskip32(fid, int count=1, int skip=0):

    cdef int k=0
    cdef np.ndarray[uint32_t, ndim=1] data = np.zeros(count, dtype=np.uint32)
    
    if skip>=0:
        while k<count:
            try:
                data[k] = np.fromfile(fid, count=1, dtype=np.uint32)
                fid.seek(skip, 1)
                k +=1
            except ValueError:
                data = data[:k]
                break
        return data
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] fast numpy.fromfile skipping data chunks

Reply via email to