Re: [R] Cluster analysis with missing data

2009-07-14 Thread Bill.Venables
vegdist() in the vegan package optionally allows pairwise deletion of missing 
values when computing dissimilarities.  The result can be used as the first 
agrument to hclust()

('Caveat emptor', of course.)

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
Hollix [holger.steinm...@web.de]
Sent: 14 July 2009 16:42
To: r-help@r-project.org
Subject: [R]  Cluster analysis with missing data

Hi folks,

I tried for the first time hclust. Unfortunately, with missing data in my
data file, it doesn't seem
to work. I found no information about how to consider missing data.

Omission of all missings is not really an option as I would loose to many
cases.

Thanks in advance
Holger
--
View this message in context: 
http://www.nabble.com/Cluster-analysis-with-missing-data-tp24474486p24474486.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cluster analysis with missing data

2009-07-14 Thread Gavin Simpson
On Mon, 2009-07-13 at 23:42 -0700, Hollix wrote:
 Hi folks,
 
 I tried for the first time hclust. Unfortunately, with missing data in my
 data file, it doesn't seem
 to work. I found no information about how to consider missing data.
 
 Omission of all missings is not really an option as I would loose to many
 cases.

Holger,

hclust takes a dissimilarity matrix as input, not your data, so the
problem is in finding an appropriate dissimilarity/distance coefficient
that handles missing data.

Once such measure is Gower's coefficient and is implemented in function
'daisy' in recommended package 'cluster'. Try:

require(cluster)
?daisy

to read about it.

Also 'vegdist' in package 'vegan' has an ability to not consider
pairwise missingness. See ?vegdist after loading 'vegan' and in
particular, the 'na.rm' argument.

Whether either of these (i.e. the resulting dissimilarities) make sense
for your particular problem is another matter...

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.