subject:"\[R\] agnes clustering and NAs"

Re: [R] agnes clustering and NAs

2011-01-29 Thread Dario Strbenac

Hello,

Thankyou for the clarification about the NAs. For your interest, thankfully my 
end goal was not to plot a dendrogram with 23371 elements, but just to use the 
output of the clustering to re-order the rows of a matrix before plotting it 
with image(). Since clara() and pam() are partitioning based approaches, I 
suppose I could instead stay with hclust() after removing the offending rows, 
so that I have the ordering position of each gene, not its cluster membership. 
I have 12 GB RAM on my 64-bit system, so the time it takes to run should be my 
only problem.

- Dario.

 Original message 
>Date: Fri, 28 Jan 2011 12:34:26 +0100
>From: Martin Maechler   
>Subject: Re: [R] agnes clustering and NAs  
>To: gavin.simp...@ucl.ac.uk
>Cc: d.strbe...@garvan.org.au, r-help@r-project.org, Uwe Ligges 
>
>
>>>>>> Gavin Simpson 
>>>>>> on Fri, 28 Jan 2011 09:23:05 + writes:
>
>> On Fri, 2011-01-28 at 10:00 +1100, Dario Strbenac wrote:
>>> Hello,
>>> 
>>> Yes, that's right, it is a values matrix. Not a dissimilarity matrix.
>>> 
>>> i.e.
>>> 
>>> > str(iMatrix)
>>> num [1:23371, 1:56] -0.407 0.198 NA -0.133 NA ...
>>> - attr(*, "dimnames")=List of 2
>>> ..$ : NULL
>>> ..$ : chr [1:56] "-8100" "-7900" "-7700" "-7500" ...
>
>Ok, so in the end you want to draw a dendrogram for  23'371
>observational units, really ?
>
>I think I would not use a hierarchical clustering method for so
>many units, but rather  clara() or maybe pam() or then model
>based or other methods, rather than fully hierarchical ones
>...
>but yes, that's not the issue here, and see further down ...
>
>BTW:  The object 'iMatrix' you provided for download has only 50
>  columns, not 56...
>>> 
>>> For the snippet of checking for NAs, I get all TRUEs, so I have at 
> least one NA in each column.
>
>GS> Sorry, my bad. Try this:
>
>GS> apply(iMatrix, 1, function(x) all(is.na(x)))
>
>GS> will check that you have no fully `NA` rows.
>
>GS> Also look at str(iMatrix) for potential problems.
>
>GS> Finally, try:
>
>GS> out <- dist(iMatrix) any(is.na(out))
>
>GS> should repeat what agnes is doing to compute the
>GS> dissimilarity matrix.  If that returns TRUE, go and find
>GS> which samples are giving NA dissimilarity and why.
>
>GS> The issue is not NA in the input data, but that your
>GS> input data is leading to NA in the computed
>GS> dissimilarities. This might be due to NA's in your input
>GS> data, where a pair of samples has no common set of data
>GS> for example.
>
>Yes, that's right on spot, thank you Gavin.
>
>This is indeed to true:  
>It *does* allow for NA's (in the data matrix), but if the
>pattern of NA's is such that the dissimilarity between two
>observations becomes undefined, namely e.g. if they have no
>common non-missings, then ``that's too much''.
>
>In general, I'd recommend to use 
>  dm <- daisy(,...) 
>trying methods, that are better with NAs, e.g. Gower's metric,
>until dm() has {nearly} no NAs,
>and then figure out some imputation to replace all NA's in   dm
>by "reasonable values",
>then do clustering with the resulting dissimilarity "matrix" dm.
>
>HOWEVER, in your case, dm would correspond to 
> 23371 x 23371 dissimilarity matrix,
>stored as a double precision matrix (on a 64-bit platform)
>that's an object of size 4.4 GBytes, not very convenient to work
>with.
>as dissimilarity object it will only be about half of that size,
>but that's still ``a bit large''..
>As I said above, for such data, I would never do fully
>hierarchical clustering,
>but rather something else.
>
>Martin Maechler, ETH Zurich
>
>
>GS> HTH
>GS> G
>
>>> The part of the agnes documentation I was referring to is :
>>> 
>>> "In case of a matrix or data frame, each row corresponds to an 
> observation, and each column corresponds to a variable. All variables must be 
> numeric.  Missing values (NAs) are allowed."
>>> 
>>> So, I'm under the impression it handles NAs on its own ?
>>> 
>>> - Dario.
>>> 
>>>  Original message 
>>> >Date: Thu, 27 Jan 2011 12:53:27 +
>>> >From: Gavin Simpson   
>>> >Subject:

Re: [R] agnes clustering and NAs

2011-01-28 Thread Martin Maechler

>>>>> Gavin Simpson 
>>>>> on Fri, 28 Jan 2011 09:23:05 + writes:

> On Fri, 2011-01-28 at 10:00 +1100, Dario Strbenac wrote:
>> Hello,
>> 
>> Yes, that's right, it is a values matrix. Not a dissimilarity matrix.
>> 
>> i.e.
>> 
>> > str(iMatrix)
>> num [1:23371, 1:56] -0.407 0.198 NA -0.133 NA ...
>> - attr(*, "dimnames")=List of 2
>> ..$ : NULL
>> ..$ : chr [1:56] "-8100" "-7900" "-7700" "-7500" ...

Ok, so in the end you want to draw a dendrogram for  23'371
observational units, really ?

I think I would not use a hierarchical clustering method for so
many units, but rather  clara() or maybe pam() or then model
based or other methods, rather than fully hierarchical ones
...
but yes, that's not the issue here, and see further down ...

BTW:  The object 'iMatrix' you provided for download has only 50
  columns, not 56...
>> 
>> For the snippet of checking for NAs, I get all TRUEs, so I have at least 
one NA in each column.

GS> Sorry, my bad. Try this:

GS> apply(iMatrix, 1, function(x) all(is.na(x)))

GS> will check that you have no fully `NA` rows.

GS> Also look at str(iMatrix) for potential problems.

GS> Finally, try:

GS> out <- dist(iMatrix) any(is.na(out))

GS> should repeat what agnes is doing to compute the
GS> dissimilarity matrix.  If that returns TRUE, go and find
GS> which samples are giving NA dissimilarity and why.

GS> The issue is not NA in the input data, but that your
GS> input data is leading to NA in the computed
GS> dissimilarities. This might be due to NA's in your input
GS> data, where a pair of samples has no common set of data
GS> for example.

Yes, that's right on spot, thank you Gavin.

This is indeed to true:  
It *does* allow for NA's (in the data matrix), but if the
pattern of NA's is such that the dissimilarity between two
observations becomes undefined, namely e.g. if they have no
common non-missings, then ``that's too much''.

In general, I'd recommend to use 
  dm <- daisy(,...) 
trying methods, that are better with NAs, e.g. Gower's metric,
until dm() has {nearly} no NAs,
and then figure out some imputation to replace all NA's in   dm
by "reasonable values",
then do clustering with the resulting dissimilarity "matrix" dm.

HOWEVER, in your case, dm would correspond to 
 23371 x 23371 dissimilarity matrix,
stored as a double precision matrix (on a 64-bit platform)
that's an object of size 4.4 GBytes, not very convenient to work
with.
as dissimilarity object it will only be about half of that size,
but that's still ``a bit large''..
As I said above, for such data, I would never do fully
hierarchical clustering,
but rather something else.

Martin Maechler, ETH Zurich

GS> HTH
GS> G

>> The part of the agnes documentation I was referring to is :
>> 
>> "In case of a matrix or data frame, each row corresponds to an 
observation, and each column corresponds to a variable. All variables must be 
numeric.  Missing values (NAs) are allowed."
>> 
>> So, I'm under the impression it handles NAs on its own ?
>> 
>> - Dario.
>> 
>>  Original message 
>> >Date: Thu, 27 Jan 2011 12:53:27 +
>> >From: Gavin Simpson   
>> >Subject: Re: [R] agnes clustering and NAs  
>> >To: Uwe Ligges 
>> >Cc: d.strbe...@garvan.org.au, r-help@r-project.org
>> >
>> >On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote:
>> >> 
>> >> On 27.01.2011 05:00, Dario Strbenac wrote:
>> >> > Hello,
>> >> >
>> >> > In the documentation for agnes in the package 'cluster', it says 
that NAs are allowed, and sure enough it works for a small example like :
>> >> >
>> >> >> m<- matrix(c(
>> >> > 1, 1, 1, 2,
>> >> > 1, NA, 1, 1,
>> >> > 1, 2, 2, 2), nrow = 3, byrow = TRUE)
>> >> >> agnes(m)
>> >> > Call:agnes(x = m)
>> >> > Agglomerative coefficient:  0.1614168
>> >> > Order of objects:
>> >> > [1] 1 2 3
>> >> > Height (summary):
>> >> > Min. 1st Qu.  MedianMean 3rd Qu.Max.
>> >> >1.155   1.247   1.339   1.339   1.431   1.524
>> >> >
>> >> > Ava

Re: [R] agnes clustering and NAs

2011-01-28 Thread Gavin Simpson

On Fri, 2011-01-28 at 10:00 +1100, Dario Strbenac wrote:
> Hello,
> 
> Yes, that's right, it is a values matrix. Not a dissimilarity matrix.
> 
> i.e.
> 
> > str(iMatrix)
>  num [1:23371, 1:56] -0.407 0.198 NA -0.133 NA ...
>  - attr(*, "dimnames")=List of 2
>   ..$ : NULL
>   ..$ : chr [1:56] "-8100" "-7900" "-7700" "-7500" ...
> 
> For the snippet of checking for NAs, I get all TRUEs, so I have at least one 
> NA in each column.

Sorry, my bad. Try this:

apply(iMatrix, 1, function(x) all(is.na(x)))

will check that you have no fully `NA` rows.

Also look at str(iMatrix) for potential problems.

Finally, try:

out <- dist(iMatrix)
any(is.na(out))

should repeat what agnes is doing to compute the dissimilarity matrix.
If that returns TRUE, go and find which samples are giving NA
dissimilarity and why.

The issue is not NA in the input data, but that your input data is
leading to NA in the computed dissimilarities. This might be due to NA's
in your input data, where a pair of samples has no common set of data
for example.

HTH

G

> The part of the agnes documentation I was referring to is :
> 
> "In case of a matrix or data frame, each row corresponds to an observation, 
> and each column corresponds to a variable. All variables must be numeric.  
> Missing values (NAs) are allowed."
> 
> So, I'm under the impression it handles NAs on its own ?
> 
> - Dario.
> 
>  Original message 
> >Date: Thu, 27 Jan 2011 12:53:27 +
> >From: Gavin Simpson   
> >Subject: Re: [R] agnes clustering and NAs  
> >To: Uwe Ligges 
> >Cc: d.strbe...@garvan.org.au, r-help@r-project.org
> >
> >On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote:
> >> 
> >> On 27.01.2011 05:00, Dario Strbenac wrote:
> >> > Hello,
> >> >
> >> > In the documentation for agnes in the package 'cluster', it says that 
> >> > NAs are allowed, and sure enough it works for a small example like :
> >> >
> >> >> m<- matrix(c(
> >> > 1, 1, 1, 2,
> >> > 1, NA, 1, 1,
> >> > 1, 2, 2, 2), nrow = 3, byrow = TRUE)
> >> >> agnes(m)
> >> > Call:agnes(x = m)
> >> > Agglomerative coefficient:  0.1614168
> >> > Order of objects:
> >> > [1] 1 2 3
> >> > Height (summary):
> >> > Min. 1st Qu.  MedianMean 3rd Qu.Max.
> >> >1.155   1.247   1.339   1.339   1.431   1.524
> >> >
> >> > Available components:
> >> > [1] "order"  "height" "ac" "merge"  "diss"   "call"   "method" "data"
> >> >
> >> > But I have a large matrix (23371 rows, 50 columns) with some NAs in it 
> >> > and it runs for about a minute, then gives an error :
> >> >
> >> >> agnes(iMatrix)
> >> > Error in agnes(iMatrix) :
> >> >No clustering performed, NA-values in the dissimilarity matrix.
> >> >
> >> > I've also tried getting rid of rows with all NAs in them, and it still 
> >> > gave me the same error. Is this a bug in agnes() ? It doesn't seem to 
> >> > fulfil the claim made by its documentation.
> >> 
> >> 
> >> I haven't looked in the file, but you need to get rid of all NA, or in 
> >> other words, all rows that contain *any* NA values.
> >
> >If one believes the documentation, then that only applies to the case
> >where `x` is a dissimilarity matrix. `NA`s are allowed if x is the raw
> >data matrix or data frame.
> >
> >The only way the OP could have gotten that error with the call shown is
> >if iMatrix were not a dissimilarity matrix inheriting from class "dist",
> >so `NA`s should be allowed.
> >
> >My guess would be that the OP didn't get rid of all the `NA`s.
> >
> >Dario: what does:
> >
> >sapply(iMatrix, function(x) any(is.na(x)))
> >
> >or if iMatrix is a matrix:
> >
> >apply(iMatrix, 2, function(x) any(is.na(x)))
> >
> >say?
> >
> >G
> >
> >> Uwe Ligges
> >> 
> >> 
> >> 
> >> > The matrix I'm using can be obtained here :
> >> > http://129.94.136.7/file_dump/dario/iMatrix.obj
> >> >
> >> > --
> >> > Dario Strbenac
> >> > Research Assistant
> >> > Cancer Epigenetics
> >> > Garvan Institute of Medical Resea

Re: [R] agnes clustering and NAs

2011-01-27 Thread Dario Strbenac

Hello,

Yes, that's right, it is a values matrix. Not a dissimilarity matrix.

i.e.

> str(iMatrix)
 num [1:23371, 1:56] -0.407 0.198 NA -0.133 NA ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:56] "-8100" "-7900" "-7700" "-7500" ...

For the snippet of checking for NAs, I get all TRUEs, so I have at least one NA 
in each column.

The part of the agnes documentation I was referring to is :

"In case of a matrix or data frame, each row corresponds to an observation, and 
each column corresponds to a variable. All variables must be numeric.  Missing 
values (NAs) are allowed."

So, I'm under the impression it handles NAs on its own ?

- Dario.

 Original message 
>Date: Thu, 27 Jan 2011 12:53:27 +
>From: Gavin Simpson   
>Subject: Re: [R] agnes clustering and NAs  
>To: Uwe Ligges 
>Cc: d.strbe...@garvan.org.au, r-help@r-project.org
>
>On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote:
>> 
>> On 27.01.2011 05:00, Dario Strbenac wrote:
>> > Hello,
>> >
>> > In the documentation for agnes in the package 'cluster', it says that NAs 
>> > are allowed, and sure enough it works for a small example like :
>> >
>> >> m<- matrix(c(
>> > 1, 1, 1, 2,
>> > 1, NA, 1, 1,
>> > 1, 2, 2, 2), nrow = 3, byrow = TRUE)
>> >> agnes(m)
>> > Call:agnes(x = m)
>> > Agglomerative coefficient:  0.1614168
>> > Order of objects:
>> > [1] 1 2 3
>> > Height (summary):
>> > Min. 1st Qu.  MedianMean 3rd Qu.Max.
>> >1.155   1.247   1.339   1.339   1.431   1.524
>> >
>> > Available components:
>> > [1] "order"  "height" "ac" "merge"  "diss"   "call"   "method" "data"
>> >
>> > But I have a large matrix (23371 rows, 50 columns) with some NAs in it and 
>> > it runs for about a minute, then gives an error :
>> >
>> >> agnes(iMatrix)
>> > Error in agnes(iMatrix) :
>> >No clustering performed, NA-values in the dissimilarity matrix.
>> >
>> > I've also tried getting rid of rows with all NAs in them, and it still 
>> > gave me the same error. Is this a bug in agnes() ? It doesn't seem to 
>> > fulfil the claim made by its documentation.
>> 
>> 
>> I haven't looked in the file, but you need to get rid of all NA, or in 
>> other words, all rows that contain *any* NA values.
>
>If one believes the documentation, then that only applies to the case
>where `x` is a dissimilarity matrix. `NA`s are allowed if x is the raw
>data matrix or data frame.
>
>The only way the OP could have gotten that error with the call shown is
>if iMatrix were not a dissimilarity matrix inheriting from class "dist",
>so `NA`s should be allowed.
>
>My guess would be that the OP didn't get rid of all the `NA`s.
>
>Dario: what does:
>
>sapply(iMatrix, function(x) any(is.na(x)))
>
>or if iMatrix is a matrix:
>
>apply(iMatrix, 2, function(x) any(is.na(x)))
>
>say?
>
>G
>
>> Uwe Ligges
>> 
>> 
>> 
>> > The matrix I'm using can be obtained here :
>> > http://129.94.136.7/file_dump/dario/iMatrix.obj
>> >
>> > --
>> > Dario Strbenac
>> > Research Assistant
>> > Cancer Epigenetics
>> > Garvan Institute of Medical Research
>> > Darlinghurst NSW 2010
>> > Australia
>> >
>> > __
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide 
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>-- 
>%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Dr. Gavin Simpson [t] +44 (0)20 7679 0522
> ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
> Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
> UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
>%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>


--
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] agnes clustering and NAs

2011-01-27 Thread Gavin Simpson

On Thu, 2011-01-27 at 10:45 +0100, Uwe Ligges wrote:
> 
> On 27.01.2011 05:00, Dario Strbenac wrote:
> > Hello,
> >
> > In the documentation for agnes in the package 'cluster', it says that NAs 
> > are allowed, and sure enough it works for a small example like :
> >
> >> m<- matrix(c(
> > 1, 1, 1, 2,
> > 1, NA, 1, 1,
> > 1, 2, 2, 2), nrow = 3, byrow = TRUE)
> >> agnes(m)
> > Call:agnes(x = m)
> > Agglomerative coefficient:  0.1614168
> > Order of objects:
> > [1] 1 2 3
> > Height (summary):
> > Min. 1st Qu.  MedianMean 3rd Qu.Max.
> >1.155   1.247   1.339   1.339   1.431   1.524
> >
> > Available components:
> > [1] "order"  "height" "ac" "merge"  "diss"   "call"   "method" "data"
> >
> > But I have a large matrix (23371 rows, 50 columns) with some NAs in it and 
> > it runs for about a minute, then gives an error :
> >
> >> agnes(iMatrix)
> > Error in agnes(iMatrix) :
> >No clustering performed, NA-values in the dissimilarity matrix.
> >
> > I've also tried getting rid of rows with all NAs in them, and it still gave 
> > me the same error. Is this a bug in agnes() ? It doesn't seem to fulfil the 
> > claim made by its documentation.
> 
> 
> I haven't looked in the file, but you need to get rid of all NA, or in 
> other words, all rows that contain *any* NA values.

If one believes the documentation, then that only applies to the case
where `x` is a dissimilarity matrix. `NA`s are allowed if x is the raw
data matrix or data frame.

The only way the OP could have gotten that error with the call shown is
if iMatrix were not a dissimilarity matrix inheriting from class "dist",
so `NA`s should be allowed.

My guess would be that the OP didn't get rid of all the `NA`s.

Dario: what does:

sapply(iMatrix, function(x) any(is.na(x)))

or if iMatrix is a matrix:

apply(iMatrix, 2, function(x) any(is.na(x)))

say?

G

> Uwe Ligges
> 
> 
> 
> > The matrix I'm using can be obtained here :
> > http://129.94.136.7/file_dump/dario/iMatrix.obj
> >
> > --
> > Dario Strbenac
> > Research Assistant
> > Cancer Epigenetics
> > Garvan Institute of Medical Research
> > Darlinghurst NSW 2010
> > Australia
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] agnes clustering and NAs

2011-01-27 Thread Uwe Ligges




On 27.01.2011 05:00, Dario Strbenac wrote:

Hello,

In the documentation for agnes in the package 'cluster', it says that NAs are 
allowed, and sure enough it works for a small example like :


m<- matrix(c(

1, 1, 1, 2,
1, NA, 1, 1,
1, 2, 2, 2), nrow = 3, byrow = TRUE)

agnes(m)

Call:agnes(x = m)
Agglomerative coefficient:  0.1614168
Order of objects:
[1] 1 2 3
Height (summary):
Min. 1st Qu.  MedianMean 3rd Qu.Max.
   1.155   1.247   1.339   1.339   1.431   1.524

Available components:
[1] "order"  "height" "ac" "merge"  "diss"   "call"   "method" "data"

But I have a large matrix (23371 rows, 50 columns) with some NAs in it and it 
runs for about a minute, then gives an error :


agnes(iMatrix)

Error in agnes(iMatrix) :
   No clustering performed, NA-values in the dissimilarity matrix.

I've also tried getting rid of rows with all NAs in them, and it still gave me 
the same error. Is this a bug in agnes() ? It doesn't seem to fulfil the claim 
made by its documentation.



I haven't looked in the file, but you need to get rid of all NA, or in 
other words, all rows that contain *any* NA values.


Uwe Ligges




The matrix I'm using can be obtained here :
http://129.94.136.7/file_dump/dario/iMatrix.obj

--
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] agnes clustering and NAs

2011-01-26 Thread Dario Strbenac

Hello,

In the documentation for agnes in the package 'cluster', it says that NAs are 
allowed, and sure enough it works for a small example like :

> m <- matrix(c(
1, 1, 1, 2,
1, NA, 1, 1,
1, 2, 2, 2), nrow = 3, byrow = TRUE)
> agnes(m)
Call:agnes(x = m) 
Agglomerative coefficient:  0.1614168 
Order of objects:
[1] 1 2 3
Height (summary):
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  1.155   1.247   1.339   1.339   1.431   1.524 

Available components:
[1] "order"  "height" "ac" "merge"  "diss"   "call"   "method" "data"

But I have a large matrix (23371 rows, 50 columns) with some NAs in it and it 
runs for about a minute, then gives an error :

> agnes(iMatrix)
Error in agnes(iMatrix) : 
  No clustering performed, NA-values in the dissimilarity matrix.

I've also tried getting rid of rows with all NAs in them, and it still gave me 
the same error. Is this a bug in agnes() ? It doesn't seem to fulfil the claim 
made by its documentation.

The matrix I'm using can be obtained here :
http://129.94.136.7/file_dump/dario/iMatrix.obj

--
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] agnes clustering and NAs

Re: [R] agnes clustering and NAs

Re: [R] agnes clustering and NAs

Re: [R] agnes clustering and NAs

Re: [R] agnes clustering and NAs

Re: [R] agnes clustering and NAs

[R] agnes clustering and NAs

7 matches

Site Navigation

Mail list logo

Footer information