On 10/14/2015 01:23 AM, Nadav Horesh wrote: > > I have binary files of size range between few MB to 1GB, which I read process > as memory mapped files (via np.memmap). Until numpy 1.9 the creation of > recarray on an existing file (without reading its content) was instantaneous, > and now it takes ~6 seconds (system: archlinux on sandy bridge). A profiling > (using ipython %prun) top of the list is: > > > ncalls tottime percall cumtime percall filename:lineno(function) > 21 3.037 0.145 4.266 0.203 > _internal.py:372(_check_field_overlap) > 3713431 1.663 0.000 1.663 0.000 _internal.py:366(<genexpr>) > 3713750 0.790 0.000 0.790 0.000 {range} > 3713709 0.406 0.000 0.406 0.000 {method 'update' of 'set' > objects} > 322 0.320 0.001 1.984 0.006 {method 'extend' of 'list' > objects} > > Nadav.
Hi Nadav, The slowdown is due to a problem in PR I introduced to add safety checks to views of structured arrays (to prevent segfaults involving object fields), which will hopefully be fixed quickly. It is being discussed here https://github.com/numpy/numpy/issues/6467 Also, I do not think the problem is with memmap - as far as I have tested, memmmap is still fast. Most likely what is slowing your script down is subsequent access to the fields of the array, which is what has regressed. Is that right? Allan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion