Dear Sirs,
I'm trying to use Numpy to solve a speed problem with Python, I need to
perform agglomerative clustering as a first step to k-means clustering.
My problem is that I'm using a very large list in Pyhton and the script is
taking more than 9minutes to process all the information, so I'm trying to
use Numpy to create a matrix.
I'm reading the vectors from a text file and I end up with an array of
115*2634 float elements, How can I create this structure with numpy?

Where is my code in python:
#Read each document vector to a matrix
   doclist = []
   matrix = []
   list = []
   for line in vecfile:
       list = line.split()
       for elem in range(1, len(list)):
           list[elem] = float(list[elem])
       matrix.append (list[1:])
   vecfile.close()

   #Read the desired number of final clusters
   numclust = input('Input the desired number of clusters: ')

#Clustering process
   clust = rows
   ind = [-1, -1]
   list_j=[]
   list_k=[]
   while (clust > numclust):
       min = 2147483647
       print('Number of Clusters %d \n' % clust)
       #Find the 2 most similares vectors in the file
       for j in range(0, clust):
           list_j=matrix[j]
           for k in range(j+1, clust):
               list_k=matrix[k]
               dist=0
               for e in range(0, columns):
                   result = list_j[e] - list_k[e]
                   dist += result * result
               if (dist < min):
                   ind[0] = j
                   ind[1] = k
                   min = dist

       #Combine the two most similaires vectores by median
       for e in range(0, columns): matrix[ind[0]][e] = (matrix[ind[0]][e] +
matrix[ind[1]][e]) / 2.0
       clust = clust -1

       #Move up all the remaining vectors
       for k in range(ind[1], (rows - 1)):
           for e in range(0, columns): matrix[k][e]=matrix[k+1][e]
_______________________________________________
Numpy-discussion mailing list
[EMAIL PROTECTED]
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to