Re: [R-sig-eco] On The Choice of a Classification Approach

francois gillet Thu, 04 Nov 2021 09:14:04 -0700

Dear Alexandre,

To build a predictive model of functional groups of species based on a set of 
traits, you can simply apply a classification tree to the clusters obtained 
from the same species x traits table.
Here is an example with trait data from New Zealand vascular plant species, 
using the mvpart() function in the mvpart package:


library(FD)
library(mvpart)
# Gower dissimilarity matrix for mixed trait variables
gd <- gowdis(tussock$trait)
# Ward hierarchical clustering
gc <- hclust(gd, "ward.D2")
plot(gc, hang = -1)
rect.hclust(gc, 6)
# 6 clusters or plant functional types
fg <- cutree(gc, 6)
# Classification tree
tra.ct <- mvpart(as.factor(fg) ~ ., tussock$trait)

You get a decision tree with threshold values for the discriminating 
qualitative or quantitative traits.

Unfortunately and for obscure reasons, mvpart is no longer available from CRAN 
for years. However, you can install this great package from the archive:
devtools::install_github("cran/mvpart", force = TRUE)
You can also use the more limited rpart::rpart function instead.

Best,

François



----- Mail original -----
De: "Alexandre F. Souza" <alexsouza.cb.ufrn...@gmail.com>
À: "r-sig-ecology" <r-sig-ecology@r-project.org>
Envoyé: Lundi 1 Novembre 2021 20:37:20
Objet: [R-sig-eco] On The Choice of a Classification Approach

Hello,

I am trying to find a method to cluster species based on their quantitative
traits and at the same time obtain threshold value for each node in the
decision tree. My difficulty is that my dependent variable is the list of
species names, each species appearing as a single line with no repetition.
All explanatory variables are quantitative. As far as I understood,
classification trees need a dependent variable with repeated levels as in
the iris dataset, in which each species appears several times. All the
examples employing classification trees I found use a dependent variable,
but I do not have one except for the species names. MRT uses a species by
location matrix as dependent variable, and traditional hierarchical cluster
analysis do cluster species but do not use quantitative data to that aim,
nor produce threshold values. I can run a non-hierarquical cluster analysis
like kmeans, but these do not generate threshold values. My concern is that
without threshold values any classification I produce will be restricted to
the studied species and will not be applicable to different species that
can be found in the studied region, what would be a strong limitation to
the use of such classification.

Thank you very much in advance for any ideas.

Regards,

Alexandre


-- 
Dr. Alexandre F. Souza
Professor Associado
Chefe do Departamento de Ecologia
Universidade Federal do Rio Grande do Norte
CB, Departamento de Ecologia
Campus Universitário - Lagoa Nova
59072-970 - Natal, RN - Brasil
lattes: lattes.cnpq.br/7844758818522706
http://www.esferacientifica.com.br
https://www.youtube.com/user/alexfadigas
http://www.docente.ufrn.br/alexsouza
orcid.org/0000-0001-7468-3631 <http://www.docente.ufrn.br/alexsouza>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

_______________________________________________
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology

Re: [R-sig-eco] On The Choice of a Classification Approach

Reply via email to