On 2015-07-23 10:21, Heli Nix wrote:
Dear all,I have the following piece of code. I am reading a numpy dataset from an hdf5 file and I am changing values to a new value if they equal 1. There is 90 percent chance that (if id not in myList:) is true and in 10 percent of time is false. with h5py.File(inputFile, 'r') as f1: with h5py.File(inputFile2, 'w') as f2: ds=f1["MyDataset"].value myList=[list of Indices that must not be given the new_value] new_value=1e-20 for index,val in np.ndenumerate(ds): if val==1.0 : id=index[0]+1 if id not in myList: ds[index]=new_value dset1 = f2.create_dataset("Cell Ids", data=cellID_ds) dset2 = f2.create_dataset("Porosity", data=poros_ds) My numpy array has 16M data and it takes 9 hrs to run. If I comment my if statement (if id not in myList:) it only takes 5 minutes to run. Is there any way that I can optimize this if statement. Thank you very much in Advance for your help. Best Regards,
When checking for presence in a list, it has to check every entry. The time taken is proportional to the length of the list. The time taken to check for presence in a set, however, is a constant. Replace the list myList with a set. -- https://mail.python.org/mailman/listinfo/python-list
