Re: Optimizing if statement check over a numpy value

2015-07-23 Thread MRAB

On 2015-07-23 10:21, Heli Nix wrote:

Dear all,

I have the following piece of code. I am reading a numpy dataset from an hdf5 
file and I am changing values to a new value if they equal 1.

  There is 90 percent chance that (if id not in myList:) is true and in 10 
percent of time is false.

with h5py.File(inputFile, 'r') as f1:
 with h5py.File(inputFile2, 'w') as f2:
 ds=f1["MyDataset"].value
 myList=[list of Indices that must not be given the new_value]

 new_value=1e-20
 for index,val in np.ndenumerate(ds):
 if val==1.0 :
 id=index[0]+1
 if id not in myList:
 ds[index]=new_value

 dset1 = f2.create_dataset("Cell Ids", data=cellID_ds)
 dset2 = f2.create_dataset("Porosity", data=poros_ds)

My numpy array has 16M data and it takes 9 hrs to run. If I comment my if 
statement (if id not in myList:) it only takes 5 minutes to run.

Is there any way that I can optimize this if statement.

Thank you very much in Advance for your help.

Best Regards,


When checking for presence in a list, it has to check every entry. The
time taken is proportional to the length of the list.

The time taken to check for presence in a set, however, is a constant.

Replace the list myList with a set.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Optimizing if statement check over a numpy value

2015-07-23 Thread Laura Creighton
Take a look at the sorted collection recipe:
http://code.activestate.com/recipes/577197-sortedcollection/

You want myList to be a sorted List.  You want lookups to be fast.

See if that improves things enough for you.  It may be possible to
have better speedups if instead of myList you write myTree and store
the values in a tree, depending on what the values of id are --  it
could be completely useless for you, as well.

Laura


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Optimizing if statement check over a numpy value

2015-07-23 Thread Jeremy Sanders
Heli Nix wrote:

> Is there any way that I can optimize this if statement.

Array processing is much faster in numpy. Maybe this is close to what you 
want

import numpy as N
# input data
vals = N.array([42, 1, 5, 3.14, 53, 1, 12, 11, 1])
# list of items to exclude
exclude = [1]
# convert to a boolean array
exclbool = N.zeros(vals.shape, dtype=bool)
exclbool[exclude] = True
# do replacement
ones = vals==1.0
# Note: ~ is numpy.logical_not
vals[ones & (~exclbool)] = 1e-20

I think you'll have to convert your HDF array into a numpy array first, 
using numpy.array().

Jeremy


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Optimizing if statement check over a numpy value

2015-07-29 Thread Heli Nix
On Thursday, July 23, 2015 at 1:43:00 PM UTC+2, Jeremy Sanders wrote:
> Heli Nix wrote:
> 
> > Is there any way that I can optimize this if statement.
> 
> Array processing is much faster in numpy. Maybe this is close to what you 
> want
> 
> import numpy as N
> # input data
> vals = N.array([42, 1, 5, 3.14, 53, 1, 12, 11, 1])
> # list of items to exclude
> exclude = [1]
> # convert to a boolean array
> exclbool = N.zeros(vals.shape, dtype=bool)
> exclbool[exclude] = True
> # do replacement
> ones = vals==1.0
> # Note: ~ is numpy.logical_not
> vals[ones & (~exclbool)] = 1e-20
> 
> I think you'll have to convert your HDF array into a numpy array first, 
> using numpy.array().
> 
> Jeremy

Dear all, 

I tried the sorted python list, but this did not really help the runtime. 

I havenĀ“t had time to check the sorted collections.  I solved my runtime 
problem by using the script from Jeremy up here. 

It was a life saviour and it is amazing how powerful numpy is. Thanks a lot 
Jeremy for this. By the way, I did not have to do any array conversion. The 
array read from hdf5 file using h5py is already a numpy array. 

The runtime over an array of around 16M reduced from around 12 hours (previous 
script) to 3 seconds using numpy on the same machine. 


Thanks alot for your help,
-- 
https://mail.python.org/mailman/listinfo/python-list