K-means recluster data with given cluster centers Dear R user,
I have several large data sets. Over time additional new data sets will be created. I want to cluster all the data in a similar/ identical way with the k-means algorithm. With the first data set I will find my cluster centers and save the cluster centers to a file [1]. This first data set is huge, it is guarantied that cluster centers will converge. Afterwards I load my cluster centers and cluster via k-means all other datasets with the same cluster centers [2]. I tried this but now I'm getting in the reclustering step following error message: "Error: empty cluster: try a better set of initial centers" That one of the clusters is empty (has no datapoint) should not be a problem. This can happen because the new data sets can be smaller. What am I doing wrong? Is there a other way to cluster new data in the same way like the old datasets? Thanks Peter 1: R code to find cluster center and save them to file #---INITIAL CLUSTERING TO FIND CLUSTER CENTERS # LOAD LIB library(cluster) # LOAD DATA data_unclean <- read.table("dataset1.dat") data.matrix<-as.matrix(data_unclean,"any") # CLUSTER Nclust <- 100 # amount cluster centers Imax <- 200 # amount of iteration for convergence of clustering set.seed(100) # set seed of random nr generator init <- sample(dim(data.matrix)[1], Nclust) # this is the initial Nclust prototypes km <- kmeans(data.matrix, centers=data.matrix[init,], iter.max=Imax) # WRITE OUT CLUSTER CENTERS km$centers # print cluster center (columns: dim component; rows: clusters) km$size # print amount of data in each cluster clusterCenters=km$centers save(file="clusterCenters.RData", list='clusterCenters') # Beispiel write.table(km$centers, file = "clusterCenters.dat", sep = ",", col.names= FALSE, row.names= FALSE) 2: R code to recluster new data #---RECLUSTER NEW DATA WITH GIVEN CLUSTER CENTERS # LOAD LIB, SET PARAMETER library(cluster) loopStart="0" loopEnd="10" # LOAD CLUSTER CENTER load("clusterCenters.RData") # load cluster centers # LOOP OVER TRAJ AND RECLUSTER THEM for(ii in loopStart:loopEnd){ # DEFINE FILENAME #print(paste("test",ii,sep="")) filenameInput=paste("dataset",ii,"dat",sep="") filenameOutput=paste("dataset",ii,"datClusters",sep="") print(filenameInput) print(filenameOutput) # LOAD DATA data_unclean <- read.table(filenameInput) data.matrix<-as.matrix(data_unclean,"any") # RECLUSTER DATA kmRecluster <- kmeans(data.matrix, centers=clusterCenters, iter.max=1) kmRecluster$size # WRITE OUT CLUSTERS FOR EACH DATA write.table(kmRecluster$cluster, file = filenameOutput, sep = ",", col.names= FALSE, row.names= FALSE) } -- Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 - sicherer, schneller und einfacher! http://portal.gmx.net/de/go/chbrowser ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.