On Wed, Apr 4, 2012 at 4:17 PM, Abhishek Pratap > close to a 900K points using DBSCAN algo. My input is a list of ~900k > tuples each having two points (x,y) coordinates. I am converting them > to numpy array and passing them to pdist method of > scipy.spatial.distance for calculating distance between each point.
I think pdist creates an array that is: sum(range(num+points)) in size. That's going to be pretty darn big: 404999550000 elements I think that's about 3 terabytes: In [41]: sum(range(900000)) / 1024. / 1024 / 1024 / 1024 * 8 Out[41]: 2.946759559563361 (for 64 bit floats) > I think the error has something to do with the default double dtype > of numpy array of pdist function. you *may* be able to get it to use float32 -- but as you can see, that probably won't help enough! You'll need a different approach! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion