Re: Numpy outlier removal

MRAB Sun, 06 Jan 2013 15:25:18 -0800

On 2013-01-06 22:33, Hans Mulder wrote:

On 6/01/13 20:44:08, Joseph L. Casale wrote:

I have a dataset that consists of a dict with text descriptions and values that 
are integers. If
required, I collect the values into a list and create a numpy array running it 
through a simple
routine: data[abs(data - mean(data)) < m * std(data)] where m is the number of 
std deviations
to include.



The problem is I loos track of which were removed so the original display of 
the dataset is
misleading when the processed average is returned as it includes the removed 
key/values.


Ayone know how I can maintain the relationship and when I exclude a value, 
remove it from
the dict?


Assuming your data and the dictionary are keyed by a common set of keys:

for key in descriptions:
     if abs(data[key] - mean(data)) >= m * std(data):
         del data[key]
         del descriptions[key]

It's generally a bad idea to modify a collection over which you're
iterating. It's better to, say, make a list of what you're going to
delete and then iterate over that list to make the deletions:

deletions = []

for key in in descriptions:
    if abs(data[key] - mean(data)) >= m * std(data):
        deletions.append(key)

for key in deletions:
    del data[key]
    del descriptions[key]

--
http://mail.python.org/mailman/listinfo/python-list

Re: Numpy outlier removal

Reply via email to