Re: [R-sig-phylo] WG: Re: Re: MCMCglmm for categorical data with more than 2 levels - prior specification?

Jarrod Hadfield Thu, 08 Aug 2013 06:38:47 -0700

Hi,

The IJ prior (or posterior) implies that the variance in eachprobability is constant and that probabilities of different outcomesare mutually independent, conditional on the constraint that they mustsum to one. To see why, let V be the covariance matrix oflog-contrasts (either at the phylogenetic or residual level) then:


V[1,1] = VAR(LP_2-LP_1)
       = VAR(LP_2)+VAR(LP_3)-2COV(LP_2,LP_1)

and

V[1,2] = COV(LP_2-LP_1, LP_3-LP_1)
       = COV(LP_2, LP_3)-COV(LP_2,LP_1)-COV(LP_3,LP_1)+VAR(LP_1)

where LP_i = log(Pr(nominal[i])) from previous emails, and LP_1 is thelog probability for the baseline category. If we would like to have aprior where VAR(LP_i) is constant (VAR(LP)) for all i, and COV(LP_i,LP_j) = 0 for all i and j, then:


V[1,1] = 2*VAR(LP)

and

V[1,2] = VAR(LP)

so a sensible prior is proportional to an I+J matrix where I is theidentity matrix and J a unit matrix (a matrix of all ones).

My guess is that the mixing/convergence problems are due to numericalissues if this is the same dataset that your other post (comp.gee notconverging) refers to. Check out the latent variables as I havealready suggested - do their absolute values exceed 25? If so you needto find out why (very high phylogenetic heritability, extreme categoryproblems for the fixed effects etc.)


Cheers,

Jarrod

Quoting Sereina Graber <sereina.gra...@gmx.ch> on Thu, 8 Aug 201315:02:20 +0200 (CEST):




Hi Jarrod, hi all,

I am still struggling with that MCMCglmm function:

First, in the course notes I have read that for some reason which
should come clearer later on in the text the IJ matrix is used for the
prior of the residuals and the random effects in the multinomial
model. Why especially this matrix?

Second, probably a very stupid questions: if the model did not
converge, you have to run it longer, so increase the number of
iterations, right? However, when I am increasing the number of
iterations (increased from 12,000 to 100,000, there are still trends
in the times series plots. What can I do then? what else might be the
problem here? And also related to that, in the last email you wrote
that there might be a problem du to my small effect sizes, however, it
also seems that those do not increase with increasing number of
iterations.

I am very thankful for some help.

Cheers,
Sereina

GESENDET: Freitag, 02. August 2013 um 14:54 Uhr
VON: "Jarrod Hadfield" <j.hadfi...@ed.ac.uk>
AN: "Sereina Graber" <sereina.gra...@gmx.ch>
CC: r-sig-phylo@r-project.org
BETREFF: Re: Aw: Re: [R-sig-phylo] WG: Re: Re: MCMCglmm for
categorical data with more than 2 levels - prior specification?
Hi,

They are the effect of the covariates on the probability of being in
the categories 2,3,4 versus category 1. Note that your effective
sample sizes are very small which means mixing is a problem and you
need to run it for longer. Numerical/Inferential problems can also
occur if the joint distribution of the predictors and the outcomes
results in `extreme categorical problems'. You then might want to
follow Gelman's advice on priors for fixed effects. See the function
gelman.prior.

Cheers,

Jarrod

Quoting Sereina Graber <sereina.gra...@gmx.ch> on Fri, 2 Aug 2013
14:48:44 +0200 (CEST):

Great, thanks a lot! Then I have one last question: How do I have to
interpret the following output of the location effects? the first
three lines I guess represent the intercepts of categories 2 to 4, but
how I should I interpret the rest having the two covariates lnBrain
(continuous) and binary (binary).

With the following model...

myMCMC.phyl<- MCMCglmm(nominal ~ trait-1+ trait:lnBrain +
trait:binary, random=~us(trait):species, rcov = ~us(trait):units,
pedigree=bird.tree,
+ data = bird.data, family="categorical",
+ prior=Prior.phyl6)

...I got the following location effects:

Location effects: nominal ~ trait - 1 + trait:lnBrain + trait:binary
post.mean l-95% CI u-95% CI eff.samp pMCMC

traitnominal.2 5.59844 4.49565 6.90609 9.676 <0.001
***
traitnominal.3 -4.12383 -5.58366 -2.65665 7.794 <0.001
***
traitnominal.4 -1.70863 -2.86831 -0.38491 12.770 0.006
**
traitnominal.2:lnBrain -0.08244 -2.10570 1.57463 3.228 0.880

traitnominal.3:lnBrain -1.29069 -3.36790 1.08456 3.790 0.376

traitnominal.4:lnBrain -0.53814 -2.76265 1.67985 3.859 0.762

traitnominal.2:binary2 -9.59263 -16.21345 -3.88906 3.403 <0.001
***
traitnominal.3:binary2 13.37745 9.26769 19.93064 4.247 <0.001
***
traitnominal.4:binary2 8.61585 3.82747 15.54171 3.446 <0.001
***
---

Best & thank you so much for your help!

GESENDET: Freitag, 02. August 2013 um 13:55 Uhr
VON: "Jarrod Hadfield" <j.hadfi...@ed.ac.uk>
AN: "sereina.graber" <sereina.gra...@gmx.ch>
CC: r-sig-phylo@r-project.org
BETREFF: Re: [R-sig-phylo] WG: Re: Aw: Re: MCMCglmm for categorical
data with more than 2 levels - prior specification?
Hi,

1.) There is no difference between the arguments pedigree=bird.tree
and "ginverse = list(species=Ainv)" where Ainv is defined by
"Ainv=inverseA(bird.tree)$Ainv". The latter argument was added after
the first version in order to provide more flexibility (for example if
multiple phylogenies are to be fitted).

2.)and 4.) You have also fixed the phylogenetic covariance matrix in
the prior (by using fix=1). You should remove the fix=1 if you want to
actually estimate it rather than fix it. You should also add trait as
a main effect to allow the traits to have different intercepts. Its
hard to know what to recommend regarding prior information, but you
could start perhaps with V=IJ and nu low (see CourseNotes).

3.) The number of traits is one less than the number of categories, so
for a binary response there is only one trait. This is because if yuo
know the probability of being in one state (Pr(A)), you already know
the probability of being in the other state (1-Pr(A)). The covariance
matrix specification in the prior should therefore be 1x1 not 2x2. You
should also drop trait from the models and just have ~species, ~units
etc.

Cheers,

Jarrod

Quoting "sereina.graber" <sereina.gra...@gmx.ch> on Fri, 02 Aug 2013
12:54:00 +0200:




-------- Ursprüngliche Nachricht --------
Betreff: Re: Aw: Re: [R-sig-phylo] MCMCglmm for categorical data
with more than 2 levels - prior specification?
Von: Jarrod Hadfield <j.hadfi...@ed.ac.uk>
An: Sereina Graber <sereina.gra...@gmx.ch>
CC:



Quoting Sereina Graber <sereina.gra...@gmx.ch> on Fri, 2 Aug 2013
12:12:41 +0200 (CEST):



Hi Jarrod,

Thanks a lot for those helpful tips. However, now I ran into some
further problems:

1.) What is the difference between the arguments pedigree=bird.tree
and "ginverse = list(species=Ainv)" where Ainv is defined by
"Ainv=inverseA(bird.tree)$Ainv"?

2.) Now the nominal model seems to work with the following code, did

implement it right like that?

Prior.phyl3 = list(R = list(V = diag(3), fix=1),G = list( G1 =

list(V

= diag(3) ,fix=1)))
myMCMC.phyl<- MCMCglmm(nominal ~ trait:lnBrain,
random=~us(trait):species, rcov = ~us(trait):units, ginverse =
list(species=Ainv), data = bird.data, family="categorical",
prior=Prior.phyl3)

However, if I try the same code with an adjusted prior specification
for a binary response variable

Prior.phyl31 = list(R = list(V = diag(2), fix=1),G = list( G1 =

list(V

= diag(2),fix=1)))
myMCMC.phyl<- MCMCglmm(binary ~ trait:lnBrain,
random=~us(trait):species, rcov = ~us(trait):units,
pedigree=bird.tree,
data = bird.data, family="categorical",
prior=Prior.phyl31)

Then I always get the error:
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

What is the problem here?

3.) And about prior specifications: I found out that depending on

what

prior you specifiy, the results are completely different. So for the
example above, I used a identity matrix for the variance-covariance
matrix, so the covariances are equal to zero. In my case now having

levels in my nominal response variable should I put some covariances
or not, and if yes, how do I choose them? What kind of prior would

you

take?
And for G: G is the covariance structure of the random effects, but
aren`t actually the (co)variances of the random effects specified by
the vcv of the tree? so why is that a 3times3 matrix? For the
(co)variance structure of B: if I have two covariates, and I

simulated

them to be correlated with r=0.5, what covariances in B can I take?
would 0.5 be appropriate assuming SD=1?

Thanks a lot & Best,
Sereina

GESENDET: Freitag, 02. August 2013 um 11:03 Uhr
VON: "Jarrod Hadfield" <j.hadfi...@ed.ac.uk>
AN: "Sereina Graber" <sereina.gra...@gmx.ch>
CC: r-sig-phylo@r-project.org
BETREFF: Re: [R-sig-phylo] MCMCglmm for categorical data with more
than 2 levels - prior specification?
Hi Sereina,

You should not get that error message when you do not specify a

prior

- but if you do can you let me know.

For the prior you specified you get the error message because
us(trait):units is specifying a 3x3 covariance matrix, and yet your
prior, R=list(V=1,nu=0.002), is specifying a 1x1 matrix. V should be

3x3 matrix, but note that the residual covariance matrix with
categorical data cannot be estimated from the data. For this reason
most people would not fit a weak prior (i.e. nu=0.002) but fit a

very

strong prior (fixing it at some value a priori using fix=1 in the
prior specification). The choice of residual covariance matrix is
arbitrary - the results can always be expressed in a way that do not
depend on the choice of residual covariance matrix (See the
CourseNotes).

The fixed and random effect formulae are also a bit odd. This type

of

model is essentially equivalent to a trivariate model where the

three

traits (on the latent scale) are the differences on a log scale
between the probability of being in categories 2,3 or 4 compared to
category 1:

log(Pr(nominal[2]))-log(Pr(nominal[1]))
log(Pr(nominal[3]))-log(Pr(nominal[1]))
log(Pr(nominal[4]))-log(Pr(nominal[1]))

where nominal[1] is called the baseline category. You can change the
baseline category by reordering the factor levels in nominal.

By having ~animal in the random formula you are assuming that a) the
phylogenetic variance for each contrast is equal and b) that the
correlation between the phylogenetic effects is one. This may make
sense in some models and with some types of base-line category, but
not generally I think. us(trait):animal allows the phylogenetic
variances to differ over the traits and for each pair of traits to
have a unique phylogenetic correlation. There are also other

variance

structures that can be fitted that are somewhere between these two
extremes.

For the same reason you probably want to have trait specific
intercepts and trait specific regression coefficients for the
covariates. This can be achieved by having:

~ trait-1+trait:lnBrain + trait:binary.x

I remove the global intercept (-1) because I find the model output
easier to interpret, but it is not necessary.

You need to be careful with this type of model on these type of

data,

because generally there is not much information from data on extant
taxa about the parameters of comparative analyses, particularly when
the data are categorical. This means that priors, even ones that
appear innocuous such as flat priors, may have a substantial

influence

on the posterior. In addition, numerical problems may exist in
categorical models when the posterior distribution for the
phylogenetic intra-class correlations has support in regions close

to

one (either because the true value is close to one, or because the
posterior distribution is very wide because the data are not very
informative). This can be checked by saving the latent variables
(pL=TRUE in the call to MCMCglmm) and making sure that the absolute
values of the latent variables do not regularly exceed 20. Lastly,
mixing may be (very) poor so you may have to wait an inordinate

amount

of time to completely sample the posterior.

Cheers,

Jarrod

Doing this is fine: you can always rescale the model parameters post
analysis

Quoting Sereina Graber <sereina.gra...@gmx.ch> on Fri, 2 Aug 2013
10:17:58 +0200 (CEST):

Hi all,

I am doing a phylogenetic analysis using the MCMCglmm package with

the

phylogenetic tree as the pedigree (Hadfield & Nakagawa 2010). I have

categorical response variable ("nominal") with more than 2

categories

(4 categories in total) and a continuous and a binary explanatory
variable. My model:

mod<-MCMCglmm(nominal ~ lnBrain + binary.x, random= ~animal,
family="categorical",rcov=~us(trait):units, prior=prior4,
data=bird.data, pedigree=bird.tree)

Now there is always the following error message appearing if I do

not

specify any priors, thus, using the default:

Error in priorformat(if (NOpriorG) { :
V is the wrong dimension for some prior$G/prior$R elements

However, I then tried different priors which didn`t work, because I
would have the wrong dimensions in the prior...can any one help me
with how I have to specifiy the priors correctly, what dimensions do

need? My priors:

var2<-cbind(c(1e+08,0.1,0.1), c(0.1,1e+08,0.1),c(0.1,0.1,1e+08))
prior4<-list(R=list(V=1,nu=0.002), G=list(G1=list(V=1,
nu=0.002)),B=list(mu=rep(0,3), V=var2))

These priors lead to the error:

Error in priorformat(if (NOpriorG) { :
V is the wrong dimension for some prior$G/prior$R elements

For any help I am very grateful.

Best

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



[[alternative HTML version deleted]]


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] WG: Re: Re: MCMCglmm for categorical data with more than 2 levels - prior specification?

Reply via email to