On Sat, Nov 21, 2015 at 8:54 PM, G Jones <glenn.calt...@gmail.com> wrote:
> Hi, > Using the latest numpy from anaconda (1.10.1) on Python 2.7, I found that > the following code works OK if npackets = 2, but acts bizarrely if npackets > is large (2**12): > > ----------- > > npackets = 2**12 > dlen=2048 > PacketType = np.dtype([('timestamp','float64'), > ('pkts',np.dtype(('int8',(npackets,dlen)))), > ('data',np.dtype(('int8',(npackets*dlen,)))), > ]) > > b = np.zeros((1,),dtype=PacketType) > > b['timestamp'] # Should return array([0.0]) > > ---------------- > > Specifically, if npackets is large, i.e. 2**12 or 2**16, trying to access > b['timestamp'] results in 100% CPU usage while the memory consumption is > increasing by hundreds of MB per second. When I interrupt, I find the > traceback in numpy/core/_internal.pyc : _get_all_field_offsets > Since it seems to work for small values of npackets, I suspect that if I > had the memory and time, the access to b['timestamp'] would eventually > return, so I think the issue is that the algorithm doesn't scale well with > record dtypes made up of lots of bytes. > Looking on Github, I can see this code has been in flux recently, but I > can't quite tell if the issue I'm seeing is addressed by the issues being > discussed and tackled there. > This should be fixed in 1.10.2. 1.10.2rc1 is up on sourceforge if you want to test it. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion