Hello, Yes, that's right, it is a values matrix. Not a dissimilarity matrix.
i.e. > str(iMatrix) num [1:23371, 1:56] -0.407 0.198 NA -0.133 NA ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:56] "-8100" "-7900" "-7700" "-7500" ... For the snippet of checking for NAs, I get all TRUEs, so I have at least one NA in each column. The part of the agnes documentation I was referring to is : "In case of a matrix or data frame, each row corresponds to an observation, and each column corresponds to a variable. All variables must be numeric. Missing values (NAs) are allowed." So, I'm under the impression it handles NAs on its own ? - Dario. ---- Original message ---- >Date: Thu, 27 Jan 2011 12:53:27 +0000 >From: Gavin Simpson <gavin.simp...@ucl.ac.uk> >Subject: Re: [R] agnes clustering and NAs >To: Uwe Ligges <lig...@statistik.tu-dortmund.de> >Cc: d.strbe...@garvan.org.au, r-help@r-project.org > >On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote: >> >> On 27.01.2011 05:00, Dario Strbenac wrote: >> > Hello, >> > >> > In the documentation for agnes in the package 'cluster', it says that NAs >> > are allowed, and sure enough it works for a small example like : >> > >> >> m<- matrix(c( >> > 1, 1, 1, 2, >> > 1, NA, 1, 1, >> > 1, 2, 2, 2), nrow = 3, byrow = TRUE) >> >> agnes(m) >> > Call: agnes(x = m) >> > Agglomerative coefficient: 0.1614168 >> > Order of objects: >> > [1] 1 2 3 >> > Height (summary): >> > Min. 1st Qu. Median Mean 3rd Qu. Max. >> > 1.155 1.247 1.339 1.339 1.431 1.524 >> > >> > Available components: >> > [1] "order" "height" "ac" "merge" "diss" "call" "method" "data" >> > >> > But I have a large matrix (23371 rows, 50 columns) with some NAs in it and >> > it runs for about a minute, then gives an error : >> > >> >> agnes(iMatrix) >> > Error in agnes(iMatrix) : >> > No clustering performed, NA-values in the dissimilarity matrix. >> > >> > I've also tried getting rid of rows with all NAs in them, and it still >> > gave me the same error. Is this a bug in agnes() ? It doesn't seem to >> > fulfil the claim made by its documentation. >> >> >> I haven't looked in the file, but you need to get rid of all NA, or in >> other words, all rows that contain *any* NA values. > >If one believes the documentation, then that only applies to the case >where `x` is a dissimilarity matrix. `NA`s are allowed if x is the raw >data matrix or data frame. > >The only way the OP could have gotten that error with the call shown is >if iMatrix were not a dissimilarity matrix inheriting from class "dist", >so `NA`s should be allowed. > >My guess would be that the OP didn't get rid of all the `NA`s. > >Dario: what does: > >sapply(iMatrix, function(x) any(is.na(x))) > >or if iMatrix is a matrix: > >apply(iMatrix, 2, function(x) any(is.na(x))) > >say? > >G > >> Uwe Ligges >> >> >> >> > The matrix I'm using can be obtained here : >> > http://129.94.136.7/file_dump/dario/iMatrix.obj >> > >> > -------------------------------------- >> > Dario Strbenac >> > Research Assistant >> > Cancer Epigenetics >> > Garvan Institute of Medical Research >> > Darlinghurst NSW 2010 >> > Australia >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- >%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk >%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > -------------------------------------- Dario Strbenac Research Assistant Cancer Epigenetics Garvan Institute of Medical Research Darlinghurst NSW 2010 Australia ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.