On Tue, Nov 10, 2009 at 11:14 AM, Keith Goodman <kwgood...@gmail.com> wrote:
> On Tue, Nov 10, 2009 at 10:53 AM, Darryl Wallace
> <darryl.wall...@prosensus.ca> wrote:
>> I currently do as you suggested.  But when the dataset size becomes large,
>> it gets to be quite slow due to the overhead of python looping.
>
> Are you using a for loop? Is so, something like this might be faster:
>
>>> x = [1, 2, '', 3, 4, 'String']
>>> from numpy import nan
>>> [(z, nan)[type(z) is str] for z in x]
>   [1, 2, nan, 3, 4, nan]
>
> I use something similar in my code, so I'm interested to see if anyone
> can speed things up using python or numpy, or both. I run it on each
> row of the file replacing '' with None. Here's the benchmark code:
>
>>> x = [1, 2, '', 4, 5, '', 7, 8, 9, 10]
>>> timeit [(z, None)[z is ''] for z in x]
> 100000 loops, best of 3: 2.32 µs per loop

If there are few missing values (my use case), this seems to be faster:

def myfunc(x):
    while '' in x:
        x[x.index('')] = None
    return x

>> timeit myfunc(x)
1000000 loops, best of 3: 697 ns per loop

Note that it works inplace.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to