Re: [R] PAM Clustering
Grazie mille, So grateful for your kindness in answering questions. Regards. On Thu, Aug 17, 2017 at 8:50 PM, Germano Rossi wrote: > Sorry, I never use pam. In the help, you can see that pam require a > dataframe OR a dissimilarity matrix. If diss=FALSE then "euclidean" was > use.So, I interpret that a matrix of dissimilarity is generated > automatically. > > Problems may be in your data. Indeed > > pam(ruspini, 4)$diss > > write a dissimilaty matrix > > while > pam(MYdata,10)$diss > > wite NULL > > > 2017-08-17 16:03 GMT+02:00 Sema Atasever : > >> Dear Germano, >> >> Thank you for your fast reply, >> >> In the above code, *MYData *is the actual data set. >> >> Do not we need to convert *MYData to *the dissimilarity matrix using >> *pam(as.dist(**MYData**), k = 10, diss = TRUE*)* code line?* >> >> *Regards.* >> >> On Thu, Aug 17, 2017 at 2:58 PM, Germano Rossi >> wrote: >> >>> try this >>> >>> MYdata <- read.csv2("data.txt",dec='.') >>> library(cluster) >>> cluster.pam = pam(MYdata,10) >>> table(cluster.pam$clustering) >>> filenameclu = paste("clusters", ".txt") >>> write.table(cluster.pam$clustering, file=filenameclu,sep=",") >>> >>> >>> 2017-08-17 10:28 GMT+02:00 Sema Atasever : >>> Dear Authorized Sir / Madam, I have a data set in which each row indicates an amino asid and each column corresponds to a feature (in total 539 features). I want to use PAM Clustering usign this data set. *when i ran R script i am getting this error:* *Error in pam(d, 10) : x is not a numeric dataframe or matrix.* *Execution halted* How can i fix this error? Is there a problem with my dataset? Thanks in advance. *PAM clustering codes:* MYdata <- read.csv2("data.txt", dec = ".") attach(MYdata) d=as.matrix(MYdata) library(cluster) cluster.pam = pam(d,10) table(cluster.pam$clustering) filenameclu = paste("clusters", ".txt") write.table(cluster.pam$clustering, file=filenameclu,sep=",") __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posti ng-guide.html and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> -- >>> == >>> Germano Rossi, Dipartimento di Psicologia, Universita' degli Studi di >>> Milano Bicocca >>> Piazza dell'Ateneo Nuovo, 1- 20126 Milano - Italy >>> >> >> > > > -- > == > Germano Rossi, Dipartimento di Psicologia, Universita' degli Studi di > Milano Bicocca > Piazza dell'Ateneo Nuovo, 1- 20126 Milano - Italy > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PAM Clustering
Sorry, I never use pam. In the help, you can see that pam require a dataframe OR a dissimilarity matrix. If diss=FALSE then "euclidean" was use.So, I interpret that a matrix of dissimilarity is generated automatically. Problems may be in your data. Indeed pam(ruspini, 4)$diss write a dissimilaty matrix while pam(MYdata,10)$diss wite NULL 2017-08-17 16:03 GMT+02:00 Sema Atasever : > Dear Germano, > > Thank you for your fast reply, > > In the above code, *MYData *is the actual data set. > > Do not we need to convert *MYData to *the dissimilarity matrix using > *pam(as.dist(**MYData**), k = 10, diss = TRUE*)* code line?* > > *Regards.* > > On Thu, Aug 17, 2017 at 2:58 PM, Germano Rossi > wrote: > >> try this >> >> MYdata <- read.csv2("data.txt",dec='.') >> library(cluster) >> cluster.pam = pam(MYdata,10) >> table(cluster.pam$clustering) >> filenameclu = paste("clusters", ".txt") >> write.table(cluster.pam$clustering, file=filenameclu,sep=",") >> >> >> 2017-08-17 10:28 GMT+02:00 Sema Atasever : >> >>> Dear Authorized Sir / Madam, >>> >>> I have a data set in which each row indicates an amino asid and each >>> column corresponds >>> to a feature (in total 539 features). >>> I want to use PAM Clustering usign this data set. >>> >>> >>> *when i ran R script i am getting this error:* >>> *Error in pam(d, 10) : x is not a numeric dataframe or matrix.* >>> *Execution halted* >>> >>> How can i fix this error? Is there a problem with my dataset? >>> >>> Thanks in advance. >>> >>> >>> *PAM clustering codes:* >>> >>> MYdata <- read.csv2("data.txt", dec = ".") >>> attach(MYdata) >>> d=as.matrix(MYdata) >>> library(cluster) >>> cluster.pam = pam(d,10) >>> table(cluster.pam$clustering) >>> >>> filenameclu = paste("clusters", ".txt") >>> write.table(cluster.pam$clustering, file=filenameclu,sep=",") >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posti >>> ng-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> == >> Germano Rossi, Dipartimento di Psicologia, Universita' degli Studi di >> Milano Bicocca >> Piazza dell'Ateneo Nuovo, 1- 20126 Milano - Italy >> > > -- == Germano Rossi, Dipartimento di Psicologia, Universita' degli Studi di Milano Bicocca Piazza dell'Ateneo Nuovo, 1- 20126 Milano - Italy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PAM Clustering
Dear Germano, Thank you for your fast reply, In the above code, *MYData *is the actual data set. Do not we need to convert *MYData to *the dissimilarity matrix using *pam(as.dist(**MYData**), k = 10, diss = TRUE*)* code line?* *Regards.* On Thu, Aug 17, 2017 at 2:58 PM, Germano Rossi wrote: > try this > > MYdata <- read.csv2("data.txt",dec='.') > library(cluster) > cluster.pam = pam(MYdata,10) > table(cluster.pam$clustering) > filenameclu = paste("clusters", ".txt") > write.table(cluster.pam$clustering, file=filenameclu,sep=",") > > > 2017-08-17 10:28 GMT+02:00 Sema Atasever : > >> Dear Authorized Sir / Madam, >> >> I have a data set in which each row indicates an amino asid and each >> column corresponds >> to a feature (in total 539 features). >> I want to use PAM Clustering usign this data set. >> >> >> *when i ran R script i am getting this error:* >> *Error in pam(d, 10) : x is not a numeric dataframe or matrix.* >> *Execution halted* >> >> How can i fix this error? Is there a problem with my dataset? >> >> Thanks in advance. >> >> >> *PAM clustering codes:* >> >> MYdata <- read.csv2("data.txt", dec = ".") >> attach(MYdata) >> d=as.matrix(MYdata) >> library(cluster) >> cluster.pam = pam(d,10) >> table(cluster.pam$clustering) >> >> filenameclu = paste("clusters", ".txt") >> write.table(cluster.pam$clustering, file=filenameclu,sep=",") >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > == > Germano Rossi, Dipartimento di Psicologia, Universita' degli Studi di > Milano Bicocca > Piazza dell'Ateneo Nuovo, 1- 20126 Milano - Italy > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PAM Clustering
try this MYdata <- read.csv2("data.txt",dec='.') library(cluster) cluster.pam = pam(MYdata,10) table(cluster.pam$clustering) filenameclu = paste("clusters", ".txt") write.table(cluster.pam$clustering, file=filenameclu,sep=",") 2017-08-17 10:28 GMT+02:00 Sema Atasever : > Dear Authorized Sir / Madam, > > I have a data set in which each row indicates an amino asid and each > column corresponds > to a feature (in total 539 features). > I want to use PAM Clustering usign this data set. > > > *when i ran R script i am getting this error:* > *Error in pam(d, 10) : x is not a numeric dataframe or matrix.* > *Execution halted* > > How can i fix this error? Is there a problem with my dataset? > > Thanks in advance. > > > *PAM clustering codes:* > > MYdata <- read.csv2("data.txt", dec = ".") > attach(MYdata) > d=as.matrix(MYdata) > library(cluster) > cluster.pam = pam(d,10) > table(cluster.pam$clustering) > > filenameclu = paste("clusters", ".txt") > write.table(cluster.pam$clustering, file=filenameclu,sep=",") > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- == Germano Rossi, Dipartimento di Psicologia, Universita' degli Studi di Milano Bicocca Piazza dell'Ateneo Nuovo, 1- 20126 Milano - Italy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PAM Clustering
Hi Sema, read.csv2 use ',' as the decimal separator. Since '.' is used in your file, everything becomes a character which in turn makes pam complain that what you pass to the function isn't numeric. Use read.csv2("data.csv", dec = ".") and it should work. You can also use class(d) to check the class of the matrix before you pass it to pam(). See ?read.table for more options. There is a base function called 'data', so naming a variable data is a poor choice. HTH Ulrik On Mon, 10 Jul 2017 at 17:25 Sema Atasever wrote: > Dear Authorized Sir / Madam, > > I have an R script file in which it includes PAM Clustering codes: > > *when i ran R script i am getting this error:* > *Error in pam(d, 10) : x is not a numeric dataframe or matrix.* > *Execution halted* > > How can i fix this error? > > Thanks in advance. > > data.csv > < > https://drive.google.com/file/d/0B4rY6f4kvHeCcVpLRTQ5VDhDNUk/view?usp=drive_web > > > > > *pam.R* > data <- read.csv2("data.csv") > attach(data) > d=as.matrix(data) > library(cluster) > cluster.pam = pam(d,10) > table(cluster.pam$clustering) > > filenameclu = paste("clusters", ".txt") > write.table(cluster.pam$clustering, file=filenameclu,sep=",") > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pam() clustering for large data sets
Dear Lilia, I'm not sure whether this is particularly helpful in your situation, but sometimes it is possible to emulate the same (or approximately the same) distance measure as Euclidean distance between points that are somehow rescaled and retransformed. In this case, you can rescale and retransform your original data from which you computed the distances, and use clara, which then implicitly computes Euclidean distances. Of course whether this works depends on the nature of your data and the distance measure that you want to use. Another possibility is to draw a random subset of, say, 3,000 observations, run pam on it, and assign the remaining ones to their closest medoid "manually". Actually this is about what clara does anyway. Best regards, Christian On Mon, 16 May 2011, Lilia Nedialkova wrote: Hello everyone, I need to do k-medoids clustering for data which consists of 50,000 observations. I have computed distances between the observations separately and tried to use those with pam(). I got the "cannot allocate vector of length" error and I realize this job is too memory intensive. I am at a bit of a loss on what to do at this point. I can't use clara(), because I want to use the already computed distances. What is it that people do to perform clustering for such large data sets? I would greatly appreciate any form of suggestions that people may have. Thank you very much in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *** --- *** Christian Hennig University College London, Department of Statistical Science Gower St., London WC1E 6BT, phone +44 207 679 1698 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.