On Apr 23, 8:22 pm, Ole Streicher <ole-usenet-s...@gmx.net> wrote: > Hi, > > for my application, I need to use quite large data arrays > (100.000 x 4000 values) with floating point numbers where I need a fast > row-wise and column-wise access (main case: return a column with the sum > over a number of selected rows, and vice versa). > > I would use the numpy array for that, but they seem to be > memory-resistent. So, one of these arrays would use about 1.6 GB > memory which far too much. So I was thinking about a memory mapped > file for that. As far as I understand, there is one in numpy. > > For this, I have two questions: > > 1. Are the "numpy.memmap" array unlimited in size (resp. only limited > by the maximal file size)? And are they part of the system's memory > limit (~3GB for 32bit systems)? > > 2. Since I need row-wise as well as column-wise access, a simple usage > of a big array as memory mapped file will probably lead to a very poor > performance, since one of them would need to read values splattered > around the whole file. Are there any "plug and play" solutions for > that? If not: what would be the best way to solve this problem? > Probably, one needs to use someting like the "Morton layout" for the > data. Would one then build a subclass of memmap (or ndarray?) that > implements this specific layout? How would one do that? (Sorry, I am > still a beginner with respect to python).
The Morton layout wastes space if the matrix is not square. Your 100K x 4K is very non-square. Looks like you might want to use e.g. 25 Morton arrays, each 4K x 4K. Cheers, John -- http://mail.python.org/mailman/listinfo/python-list