Attached there is an example program that only requires numpy. At the
end I have two numpy array:
rdims:
[[3 1 1]
[0 0 4]
[1 3 0]
[2 2 0]
[3 3 3]
[0 0 2]]
rmeas:
[[100000.0 254.0]
[40000.0 200.0]
[50000.0 185.0]
[5000.0 160.0]
[150000.0 260.0]
[20000.0 180.0]]
I would like to use numpy to create statistic, for example the mean
value of the prices:
>>> rmeas[:,0] # Prices of cars
array([100000.0, 40000.0, 50000.0, 5000.0, 150000.0, 20000.0],
dtype=float96)
>>> rmeas[:,0].mean() # Mean price
60833.3333333333333321
However, I only want to do this for 'color=yellow' or 'year=2003,
make=Ford' etc. I wonder if there a built-in numpy method that can
filter out rows using a set of values. E.g. create a view of the
original array or a new array that contains only the filtered rows. I
know how to do it from Python with iterators, but I wonder if there is a
better way to do it in numpy. (I'm new to numpy please forgive me if
this is a dumb question.)
Thanks,
Laszlo
import numpy
columns = ['Color','Year','Make','Price','VMax']
dimension_columns = [0,1,2]
measure_columns = [3,4]
data = [
['Yellow', '2000', 'Ferrari', 100000., 254.],
['Blue', '2003', 'Volvo', 40000., 200.],
['Black', '2005', 'Ford', 50000., 185.],
['Red', '1990', 'Ford', 5000., 160.],
['Yellow', '2005', 'Lamborgini', 150000., 260.],
['Blue', '2003', 'Suzuki', 20000., 180.],
]
print "Original data"
print "---------------"
for row in data:
print row
# Create dimension values list
dimensions = []
for colindex in dimension_columns:
dimensions.append({
'name':columns[colindex],
'colindex':colindex,
'values': list(set( map( lambda row: row[colindex], data ) )),
})
print "Dimensions"
print "---------------"
for d in dimensions:
print d
# Create a numpy array from dimensions
nrows = len(data)
ncols = len(dimension_columns)
rdims = numpy.empty( (nrows,ncols), dtype=numpy.uint32 )
for rindex,row in enumerate(data):
for dindex,cindex in enumerate(dimension_columns):
dimension = dimensions[dindex]
rdims[rindex,cindex] = dimension['values'].index(row[cindex])
print "Dimension value indexes"
print "-----------------------"
print rdims
# Create numpy array from values
nrows = len(data)
ncols = len(measure_columns)
rmeas = numpy.empty( (nrows,ncols), dtype=numpy.float96 )
for rindex,row in enumerate(data):
for mindex,cindex in enumerate(measure_columns):
rmeas[rindex,mindex] = row[cindex]
print "Measure values"
print "-----------------------"
print rmeas
--
http://mail.python.org/mailman/listinfo/python-list