On 06/27/2011 05:55 PM, Mark Wiebe wrote: > First I'd like to thank everyone for all the feedback you're providing, > clearly this is an important topic to many people, and the discussion > has helped clarify the ideas for me. I've renamed and updated the NEP, > then placed it into the master NumPy repository so it has a more > permanent home here: > > https://github.com/numpy/numpy/blob/master/doc/neps/missing-data.rst
One thing to think about is the presence of SSE/AVX instructions, which has the potential to change some of the memory/speed trade-offs here. In the newest Intel-platform CPUs you can do 256-bit operations, translating to a theoretical factor 8 speedup for in-cache single precision data, and the instruction set is constructed for future expansion possibilites to 512 or 1024 bit registers. I feel one should take care to not design oneself into a corner where this can't (eventually) be leveraged. 1) The shuffle instructions takes a single byte as a control character for moving around data in different ways in 128-bit registers. One could probably implement fast IGNORE-style NA with a seperate mask using 1 byte per 16 bytes of data (with 4 or 8-byte elements). OTOH, I'm not sure if 1 byte per element kind of mask would be that fast (but I don't know much about this and haven't looked at the details). 2) The alternative "Parameterized Data Type Which Adds Additional Memory for the NA Flag" would mean that contiguous arrays with NA's/IGNORE's would not be subject to vector instructions, or create a mess of copying in and out prior to operating on the data. This really seems like the worst of all possibilites to me. (FWIW, my vote is in favour of both NA-using-NaN and IGNORE-using-explicit-masks, and keep the two as entirely seperate worlds to avoid confusion.) Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion