On 10/25/2012 08:17 AM, Dag Sverre Seljebotn wrote: > On 10/24/2012 09:00 PM, Michael Aye wrote: >> As numpy.fromfile seems to require full file object functionalities >> like seek, I can not use it with the sys.stdin pipe. >> So how could I stream a binary pipe directly into numpy? >> I can imagine storing the data in a string and use StringIO but the >> files are 3.6 GB large, just the binary, and that will most likely be >> much more as a string object. > > A Python 2 string is just a bytes object and would take 3.6 GB as well > (or did you mean in text encoding?) > >> Reading binary files on disk is NOT the problem, I would like to avoid >> the temporary file if possible. > > Read in chunks? Something like > > 1) Create array arr > > 2) > > arr_bytes = arr.view(np.uint8).reshape(np.prod(arr.shape)) > # check that modifying arr_bytes modifies arr, > # if not, work with reshape arguments > > 3) > > while not done: > arr_bytes[i:i + chunk_size] = f.read(chunk_size) > ... > > Alternatively, one could write some C or Cython code to read directly > into the NumPy array buffer, which avoids an extra copy over the memory > bus of the data. (Since unfortunately it doesn't look like "fromfile" > has an out argument.)
Actually, as long as you make sure chunk_size is on the order of 1 MB or so, the Python overhead may not matter and the chunks fit in cache so an extra copy is avoided, so a C solution may be overkill. Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion