On Thu, Apr 3, 2008 at 3:30 PM, Nicolas Bigaouette <[EMAIL PROTECTED]> wrote: > Hi, > > I have a C program which outputs large (~GB) files. It is a simple binary > dump of an array of structure containing 9 doubles. You can see this as a > double 1D array of size 9*Stot (Stot being the allocated size of the array > of structure). The 1D array represents a 3D array (Sx * Sy * Sz = Stot) > containing 9 values per cell. > > I want to read these files in the most efficient way possible, and I would > like to have your insight on this. > > Right now, the fastest way I found was: > imzeros = zeros((Sy,Sz),dtype=float64,order='C') > imex = imshow(imzeros) > f = open(filename, 'rb') > data = numpy.fromfile(file=f, dtype=numpy.float64, count=9*Stot) > mask_Ex = numpy.arange(6,9*Stot,9)
This is something you can do much, much more efficiently by using a slice instead of indexing with an integer array. > Ex = data[mask].reshape((Sz,Sy,Sx), order='C').transpose() > imex.set_array(squeeze(Ex3D[:,:,z])) > > The arrays will be big, so everything should be well optimized. I have > multiple questions: > > 1) Should I change this: > Ex = data[mask].reshape((Sz,Sy,Sx), order='C').transpose() > imex.set_array(squeeze(Ex3D[:,:,z])) > to: > imex.set_array(squeeze(data[mask].reshape((Sz,Sy,Sx), > order='C').transpose()[:,:,z])) > I mean, is I don't use a temporary variable, will it be faster or less > memory hungry? No. The temporary exists whether you give it a name or not. If you use data[6::9] instead of data[mask], you won't be using any extra memory at all. The arrays will just be views into the original array. > 2) If not, is the operation "Ex = " update the variable data or create > another one? It just reassigns the name "Ex" to a different object specified on the right-hand side of the assignment. The relevant question is whether expression on the right-hand side takes up more memory. > Ideally I would like to only update it. Maybe this would be > better: > > Ex[:,:,:] = data[mask].reshape((Sz,Sy,Sx), order='C').transpose()Would it? If you use data[6::9] instead of data[mask], you should just use "Ex = " since no new memory will be used on the RHS. > 3) The machine where the code will be run might be big-endian. Is there a > way for python to read the big-endian file and "translate" it automatically > to little-endian? Something like "numpy.fromfile(file=f, > dtype=numpy.float64, count=9*Stot, endianness='big')"? dtype=numpy.dtype('>f8') -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion