Hey Guys I am new to both python and more so to numpy. I am trying to cluster close to a 900K points using DBSCAN algo. My input is a list of ~900k tuples each having two points (x,y) coordinates. I am converting them to numpy array and passing them to pdist method of scipy.spatial.distance for calculating distance between each point.
Here is some size info on my numpy array shape of input array : (828575, 2) Size : 6872000 bytes I think the error has something to do with the default double dtype of numpy array of pdist function. I would appreciate if you could help me debug this. I am sure I overlooking some naive thing here See the traceback below. MemoryError Traceback (most recent call last) /house/homedirs/a/apratap/Dropbox/dev/ipython/<ipython-input-83-ee29361b7276> in <module>() 36 37 print cleaned_senseBam ---> 38 cluster_pet_points_per_chromosome(sense_bamFile) /house/homedirs/a/apratap/Dropbox/dev/ipython/<ipython-input-83-ee29361b7276> in cluster_pet_points_per_chromosome(bamFile) 30 print 'Size of list points is %d' % sys.getsizeof(points) 31 print 'Size of numpy array is %d' % sys.getsizeof(points_array) ---> 32 cluster_points_DBSCAN(points_array) 33 #print points_array 34 /house/homedirs/a/apratap/Dropbox/dev/ipython/<ipython-input-72-77005d7cd900> in cluster_points_DBSCAN(data_numpy_array) 9 def cluster_points_DBSCAN(data_numpy_array): 10 #eucledian distance calculation ---> 11 D = distance.pdist(data_numpy_array) 12 S = distance.squareform(D) 13 H = 1 - S/np.max(S) /house/homedirs/a/apratap/playground/software/epd-7.2-2-rh5-x86_64/lib/python2.7/site-packages/scipy/spatial/distance.pyc in pdist(X, metric, p, w, V, VI) 1155 1156 m, n = s -> 1157 dm = np.zeros((m * (m - 1) / 2,), dtype=np.double) 1158 1159 wmink_names = ['wminkowski', 'wmi', 'wm', 'wpnorm'] _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion