Thanks for the help, I'll test out this simple example.
On Tue, Nov 10, 2009 at 2:28 PM, Keith Goodman <kwgood...@gmail.com> wrote: > On Tue, Nov 10, 2009 at 11:14 AM, Keith Goodman <kwgood...@gmail.com> > wrote: > > On Tue, Nov 10, 2009 at 10:53 AM, Darryl Wallace > > <darryl.wall...@prosensus.ca> wrote: > >> I currently do as you suggested. But when the dataset size becomes > large, > >> it gets to be quite slow due to the overhead of python looping. > > > > Are you using a for loop? Is so, something like this might be faster: > > > >>> x = [1, 2, '', 3, 4, 'String'] > >>> from numpy import nan > >>> [(z, nan)[type(z) is str] for z in x] > > [1, 2, nan, 3, 4, nan] > > > > I use something similar in my code, so I'm interested to see if anyone > > can speed things up using python or numpy, or both. I run it on each > > row of the file replacing '' with None. Here's the benchmark code: > > > >>> x = [1, 2, '', 4, 5, '', 7, 8, 9, 10] > >>> timeit [(z, None)[z is ''] for z in x] > > 100000 loops, best of 3: 2.32 µs per loop > > If there are few missing values (my use case), this seems to be faster: > > def myfunc(x): > while '' in x: > x[x.index('')] = None > return x > > >> timeit myfunc(x) > 1000000 loops, best of 3: 697 ns per loop > > Note that it works inplace. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- ______________________________________ Darryl Wallace: Project Leader ProSensus Inc. McMaster Innovation Park 175 Longwood Road South, Suite 301 Hamilton, Ontario, L8P 0A1 Canada (GMT -05:00) Tel: 1-905-528-9136 Fax: 1-905-546-1372 Web site: http://www.prosensus.ca/ ______________________________________
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion