numpy and filtering (was: Fastest way to store ints and floats on disk)

Laszlo Nagy Fri, 08 Aug 2008 04:08:29 -0700

Attached there is an example program that only requires numpy. At theend I have two numpy array:


rdims:


[[3 1 1]
[0 0 4]
[1 3 0]
[2 2 0]
[3 3 3]
[0 0 2]]


rmeas:

[[100000.0 254.0]
[40000.0 200.0]
[50000.0 185.0]
[5000.0 160.0]
[150000.0 260.0]
[20000.0 180.0]]

I would like to use numpy to create statistic, for example the meanvalue of the prices:


>>> rmeas[:,0] # Prices of cars

array([100000.0, 40000.0, 50000.0, 5000.0, 150000.0, 20000.0],dtype=float96)

>>> rmeas[:,0].mean() # Mean price
60833.3333333333333321

However, I only want to do this for 'color=yellow' or 'year=2003,make=Ford' etc. I wonder if there a built-in numpy method that canfilter out rows using a set of values. E.g. create a view of theoriginal array or a new array that contains only the filtered rows. Iknow how to do it from Python with iterators, but I wonder if there is abetter way to do it in numpy. (I'm new to numpy please forgive me ifthis is a dumb question.)


Thanks,

  Laszlo

import numpy

columns = ['Color','Year','Make','Price','VMax']
dimension_columns = [0,1,2]
measure_columns = [3,4]
data = [
    ['Yellow',     '2000',     'Ferrari',     100000.,    254.],
    ['Blue',       '2003',     'Volvo',        40000.,    200.],
    ['Black',      '2005',     'Ford',         50000.,    185.],
    ['Red',        '1990',     'Ford',          5000.,    160.],
    ['Yellow',     '2005',     'Lamborgini',  150000.,    260.],
    ['Blue',       '2003',     'Suzuki',       20000.,    180.],
]
print "Original data"
print "---------------"
for row in data:
    print row

# Create dimension values list
dimensions = []
for colindex in dimension_columns:
    dimensions.append({
        'name':columns[colindex],
        'colindex':colindex,
        'values': list(set(  map( lambda row: row[colindex], data )   )),
    })

print "Dimensions"
print "---------------"
for d in dimensions:
    print d

# Create a numpy array from dimensions
nrows = len(data)
ncols = len(dimension_columns)
rdims = numpy.empty( (nrows,ncols), dtype=numpy.uint32 )
for rindex,row in enumerate(data):
    for dindex,cindex in enumerate(dimension_columns):
        dimension = dimensions[dindex]
        rdims[rindex,cindex] = dimension['values'].index(row[cindex])
print "Dimension value indexes"
print "-----------------------"
print rdims

# Create numpy array from values
nrows = len(data)
ncols = len(measure_columns)
rmeas = numpy.empty( (nrows,ncols), dtype=numpy.float96 )
for rindex,row in enumerate(data):
    for mindex,cindex in enumerate(measure_columns):
        rmeas[rindex,mindex] = row[cindex]
print "Measure values"
print "-----------------------"
print rmeas

--
http://mail.python.org/mailman/listinfo/python-list

numpy and filtering (was: Fastest way to store ints and floats on disk)

Reply via email to