Dear Sirs, I'm trying to use Numpy to solve a speed problem with Python, I need to perform agglomerative clustering as a first step to k-means clustering. My problem is that I'm using a very large list in Pyhton and the script is taking more than 9minutes to process all the information, so I'm trying to use Numpy to create a matrix. I'm reading the vectors from a text file and I end up with an array of 115*2634 float elements, How can I create this structure with numpy?
Where is my code in python: #Read each document vector to a matrix doclist = [] matrix = [] list = [] for line in vecfile: list = line.split() for elem in range(1, len(list)): list[elem] = float(list[elem]) matrix.append (list[1:]) vecfile.close() #Read the desired number of final clusters numclust = input('Input the desired number of clusters: ') #Clustering process clust = rows ind = [-1, -1] list_j=[] list_k=[] while (clust > numclust): min = 2147483647 print('Number of Clusters %d \n' % clust) #Find the 2 most similares vectors in the file for j in range(0, clust): list_j=matrix[j] for k in range(j+1, clust): list_k=matrix[k] dist=0 for e in range(0, columns): result = list_j[e] - list_k[e] dist += result * result if (dist < min): ind[0] = j ind[1] = k min = dist #Combine the two most similaires vectores by median for e in range(0, columns): matrix[ind[0]][e] = (matrix[ind[0]][e] + matrix[ind[1]][e]) / 2.0 clust = clust -1 #Move up all the remaining vectors for k in range(ind[1], (rows - 1)): for e in range(0, columns): matrix[k][e]=matrix[k+1][e]
_______________________________________________ Numpy-discussion mailing list [EMAIL PROTECTED] http://projects.scipy.org/mailman/listinfo/numpy-discussion