Re: [R-sig-phylo] VCV based on admixture?

2024-03-19 Thread Cecile Ane
Hi Jon,
There are now comparative methods that can use admixture graphs (or 
phylogenetic networks), beyond trees.
See this tutorial:
https://cecileane.github.io/networkPCM-workshop/topic-overview.html
or the software documentation section on comparative methods:
http://crsl4.github.io/PhyloNetworks.jl/latest/man/trait_tree/

Here are papers that used these methods (among others):
Pavón-Vázquez,  Brennan & Keogh (2021)https://doi.org/10.1093/sysbio/syaa102
Rurik et al. (2024) https://doi.org/10./tpj.16630

Cécile

---
Cécile Ané, Professor (she/elle)
Departments of Statistics and of Botany
University of Wisconsin - Madison
www.stat.wisc.edu/~ane/
statistical consulting: https://stat.wisc.edu/statistical-consulting/

On Mar 19,2024, at 12:36 pm, Jon 
mailto:jonsmit...@gmail.com>> wrote:

Hello all,

I've become involved in a small student project looking at spatial
variation in a very, very young group. There's discordance between the
mtDNA and nDNA trees, and good estimates of admixture based on population
genetic studies, between the different lineages (both mt and n lineages).

I'm modeling size differences between lineages and attempting to detect
clear shifts in both average body size and sexual dimorphism. I don't see
any reason *per se* that a covariance structure based on admixture rates
(rather than strictly from the tree) wouldn't work, and it would *seem*to
allow for a not-strictly-bifurcating population history I suspect is the
case here...but I can't find anyone who has done something similar!

Am I blind and there's some obvious paper I'm missing where covariance
between lineages was determined from something like admixture rates, or is
it a silly idea and I should just use the tree as an "average" covariance
and call it a day?

Thanks!

--Jon

--
Jonathan S. Mitchell 

Assistant Professor
Coe College

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.org
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-phylo__;!!Mak6IKo!Lh7vQMPyaY0U_spRXkqStOHgek-hllxtAbir7CFnuuBjmcUg1qJoaff7tl9KG7xGoVyXLA11qXBSLTqW8BBq$
Searchable archive at 
https://urldefense.com/v3/__http://www.mail-archive.com/r-sig-phylo@r-project.org/__;!!Mak6IKo!Lh7vQMPyaY0U_spRXkqStOHgek-hllxtAbir7CFnuuBjmcUg1qJoaff7tl9KG7xGoVyXLA11qXBSLcKTMl0m$


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] VCV based on admixture?

2024-03-19 Thread Cecile Ane
Hi Jon,
There are now comparative methods that can use admixture graphs (or 
phylogenetic networks), beyond trees.
See this tutorial:
https://cecileane.github.io/networkPCM-workshop/topic-overview.html
or the software documentation section on comparative methods:
http://crsl4.github.io/PhyloNetworks.jl/latest/man/trait_tree/

Here are papers that used these methods (among others):
Pavón-Vázquez,  Brennan & Keogh (2021)https://doi.org/10.1093/sysbio/syaa102
Rurik et al. (2024) https://doi.org/10./tpj.16630

Cécile

---
Cécile Ané, Professor (she/elle)
Departments of Statistics and of Botany
University of Wisconsin - Madison
www.stat.wisc.edu/~ane/
statistical consulting: https://stat.wisc.edu/statistical-consulting/

On Mar 19,2024, at 12:36 pm, Jon 
mailto:jonsmit...@gmail.com>> wrote:

Hello all,

I've become involved in a small student project looking at spatial
variation in a very, very young group. There's discordance between the
mtDNA and nDNA trees, and good estimates of admixture based on population
genetic studies, between the different lineages (both mt and n lineages).

I'm modeling size differences between lineages and attempting to detect
clear shifts in both average body size and sexual dimorphism. I don't see
any reason *per se* that a covariance structure based on admixture rates
(rather than strictly from the tree) wouldn't work, and it would *seem*to
allow for a not-strictly-bifurcating population history I suspect is the
case here...but I can't find anyone who has done something similar!

Am I blind and there's some obvious paper I'm missing where covariance
between lineages was determined from something like admixture rates, or is
it a silly idea and I should just use the tree as an "average" covariance
and call it a day?

Thanks!

--Jon

--
Jonathan S. Mitchell 

Assistant Professor
Coe College

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.org
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-phylo__;!!Mak6IKo!Lh7vQMPyaY0U_spRXkqStOHgek-hllxtAbir7CFnuuBjmcUg1qJoaff7tl9KG7xGoVyXLA11qXBSLTqW8BBq$
Searchable archive at 
https://urldefense.com/v3/__http://www.mail-archive.com/r-sig-phylo@r-project.org/__;!!Mak6IKo!Lh7vQMPyaY0U_spRXkqStOHgek-hllxtAbir7CFnuuBjmcUg1qJoaff7tl9KG7xGoVyXLA11qXBSLcKTMl0m$


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] VCV based on admixture?

2024-03-19 Thread Cecile Ane
Hi Jon,
There are now comparative methods that can use admixture graphs (or 
phylogenetic networks), beyond trees.
See this tutorial:
https://cecileane.github.io/networkPCM-workshop/topic-overview.html
or the software documentation section on comparative methods:
http://crsl4.github.io/PhyloNetworks.jl/latest/man/trait_tree/

Here are papers that used these methods (among others):
Pavón-Vázquez,  Brennan & Keogh (2021)https://doi.org/10.1093/sysbio/syaa102
Rurik et al. (2024) https://doi.org/10./tpj.16630

Cécile

---
Cécile Ané, Professor (she/elle)
Departments of Statistics and of Botany
University of Wisconsin - Madison
www.stat.wisc.edu/~ane/
statistical consulting: https://stat.wisc.edu/statistical-consulting/

On Mar 19,2024, at 12:36 pm, Jon 
mailto:jonsmit...@gmail.com>> wrote:

Hello all,

I've become involved in a small student project looking at spatial
variation in a very, very young group. There's discordance between the
mtDNA and nDNA trees, and good estimates of admixture based on population
genetic studies, between the different lineages (both mt and n lineages).

I'm modeling size differences between lineages and attempting to detect
clear shifts in both average body size and sexual dimorphism. I don't see
any reason *per se* that a covariance structure based on admixture rates
(rather than strictly from the tree) wouldn't work, and it would *seem*to
allow for a not-strictly-bifurcating population history I suspect is the
case here...but I can't find anyone who has done something similar!

Am I blind and there's some obvious paper I'm missing where covariance
between lineages was determined from something like admixture rates, or is
it a silly idea and I should just use the tree as an "average" covariance
and call it a day?

Thanks!

--Jon

--
Jonathan S. Mitchell 

Assistant Professor
Coe College

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.org
https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-sig-phylo__;!!Mak6IKo!Lh7vQMPyaY0U_spRXkqStOHgek-hllxtAbir7CFnuuBjmcUg1qJoaff7tl9KG7xGoVyXLA11qXBSLTqW8BBq$
Searchable archive at 
https://urldefense.com/v3/__http://www.mail-archive.com/r-sig-phylo@r-project.org/__;!!Mak6IKo!Lh7vQMPyaY0U_spRXkqStOHgek-hllxtAbir7CFnuuBjmcUg1qJoaff7tl9KG7xGoVyXLA11qXBSLcKTMl0m$


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Model Selection and PGLS

2021-07-08 Thread Cecile Ane
Alternatively, you could use Rphylopars 
(https://github.com/ericgoolsby/Rphylopars), so you don’t need to code 
predictions by hand.
I included a reproducible example below, using a single variable. You could use 
multiple variables to take advantage of the correlation between variables for 
better predictions.
Cecile.

input tree and data (20 taxa):

tree <- 
read.tree(text="(((t2:1.07,(((t19:0.06,t20:0.06):0.25,t10:0.31):0.71,t3:1.01):0.05):0.03,(t17:0.13,t18:0.13):0.45,(t13:0.26,t14:0.26):0.32):0.24,((t15:0.19,t16:0.19):0.12,t9:0.31):0.52):0.17,(t6:0.53,t7:0.53):0.46):0.09,t1:1.08):0.02):0.29,(((t8:0.53,(t11:0.29,t12:0.29):0.24):0.01,t5:0.54):0.45,t4:0.99):0.4);")
dat <- data.frame(
  species = 
c("t2","t19","t20","t10","t3","t17","t18","t13","t14","t15","t16","t9","t6","t7","t1","t8","t11","t12","t5","t4"),
  trait = 
c(1.93,0.598,1.559,-0.775,-2.445,2.109,NA,3.465,2.161,-1.474,0.274,-0.026,4.419,2.9,2.795,-2.005,-3.425,-1.711,-1.686,NA)
)
row.names(dat) = dat$species

Now analysis with a Brownian motion (other models are available):

library(Rphylopars)
p_BM <- phylopars(dat, tree)
p_BM$anc_recon[1:20,] # predicted species means
cbind(dat[,1:2], prediction=p_BM$anc_recon[1:20,]) # to compare predictions 
with observations
p_BM$anc_var[1:20,] # variances of each prediction: 0 when observed
p_BM$anc_recon[1:20,] - sqrt(p_BM$anc_var[1:20,])*1.96 # lower end of 95% CI
p_BM$anc_recon[1:20,] + sqrt(p_BM$anc_var[1:20,])*1.96 # upper end of 95% CI


output showing predictions:

species  trait prediction
t2   t2  1.930  1.930
t19 t19  0.598  0.598
...
t18 t18 NA  2.0566536
t13 t13  3.465  3.465
...

On Jul 7, 2021, at 6:33 PM, 
neovenatori...@gmail.com wrote:

Dear All,

Just sending a message to see if my last email went through. The code for
the PGLS still doesn't seem to work: it's giving results contradictory to
what would be expected based on the data (e.g., mu should be extremely
negative for Monotremata, but it's very variable among the three species
and many other species exhibit higher mus in that direction, adding the
correction factor actually makes the prediction worse despite high Pagel's
lambda). I attached a reprex version of my code with the data that
reproduced the result, but I was concerned it didn't go through because
when I've tried to attach files to the r-sig-phylo list previously they are
often unable to go through. I checked the r-sig-phylo archive to see if it
successfully went through, but it isn't listed there, which is one reason I
am concerned my message did not get sent properly.

Just emailing to ask, sorry if this is a bother.

Sincerely,
Russell


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Model Selection and PGLS

2021-07-01 Thread Cecile Ane


On Jun 30, 2021, at 11:20 PM, 
neovenatori...@gmail.com wrote:

This is something that seems really, really concerning because if there is a 
method of using phylogenetic covariance to adjust the position of new data 
points it seems like a lot of workers don’t know these methods exist, to the 
point that even published papers overlook it.

I am surprised that researchers don’t know about these methods: they are 
basically just like ancestral state reconstruction, but for new tips in the 
tree. Everyone knows about “ancestral state reconstruction”. But if few people 
recognize that the same methodology can be applied to new taxa, then it’s a 
problem indeed.

Ancestral state “reconstruction” could be better called ancestral state 
“estimation” or “prediction”. In a likelihood framework, we can also give 
prediction intervals, for internal (ancestral) nodes in the tree as well as new 
tips. Not all software handle new taxa, because it requires extending the tree, 
or accepting a tree in which not all taxa have data. Taxa with missing data can 
then be predicted, like any other node.

Cecile.


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Model Selection and PGLS

2021-06-29 Thread Cecile Ane
Hi Russel,

What you see is the large uncertainty in “ancestral” states, which is part of 
the intercept here. The linear relationship that you overlaid on top of your 
data is the relationship predicted at the root of the tree (as if such a thing 
existed!). There is a lot of uncertainty about the intercept, but much less 
uncertainty in the slope. It looks like the slope is not affected by the 
inclusion or exclusion of monotremes. (for one possible reference on the 
greater precision in the slope versus the intercept, there’s this: 
http://dx.doi.org/10.1214/13-AOS1105 for the BM).

My second cent is that the phylogenetic predictions should be stable. The 
uncertainty in the intercept —and the large effect of including monotremes on 
the intercept— should not affect predictions, so long as you know for which 
species you want to make a prediction. If you want to make prediction for a 
species in a small clade “far” from monotremes, say, then the prediction is 
probably quite stable, even if you include monotremes: this is because the 
phylogenetic prediction should use the phylogenetic relationships for the 
species to be predicted. A prediction that uses the linear relationship at the 
root and ignores the placement of the species would be the worst-case scenario: 
for a mammal species with a completely unknown placement within mammals.

There’s probably a number of software that do phylogenetic prediction. I know 
of Rphylopars and PhyloNetworks.

my 2 cents…
Cecile

---
Cécile Ané, Professor (she/her)
H. I. Romnes Faculty Fellow
Departments of Statistics and of Botany
University of Wisconsin - Madison
www.stat.wisc.edu/~ane/

CALS statistical consulting lab: https://calslab.cals.wisc.edu/stat-consulting/



On Jun 29, 2021, at 5:37 PM, 
neovenatori...@gmail.com wrote:

Dear All,

So this is the main problem I'm facing (see attached figure, which should be 
small enough to post). When I calculate the best-fit line under a Brownian 
model, this produces a best-fit line that more or less bypasses the 
distribution of the data altogether. I did some testing and found that this 
result was driven solely by the presence of Monotremata, resulting in the model 
heavily downweighting all of the phylogenetic variation within Theria in favor 
of the deep divergence between Monotremata and Theria. Excluding Monotremata 
produces a PGLS fit that's comparable enough to the OLS and OU model fit to be 
justifiable (though I can't just throw out Monotremata for the sake of throwing 
it out).

I am planning to do a more theoretical investigation into the effect of 
Monotremata on the PGLS fit in a future study, but right now what I am trying 
to do is perform a study in which I use this data to construct a regression 
model that can be used to predict new data. Which is why I am trying to use AIC 
to potentially justify going with OLS or an OU model over a Brownian model. 
From a practical perspective the Brownian model is almost unusable because it 
produces systematically biased estimates with high error rates when applied to 
new data (error rate is roughly double that of both the OLS and OU model). This 
is especially the case because the data must be back-transformed into an 
arithmetic scale to be useable, and thus a seemingly minor difference in 
regression models results in a massive difference in predicted values. However, 
I need some objective test to show that OLS fits the data better than the 
Brownian model, hence why I was going with AIC. Overall, OLS does seem to 
outperform the Brownian model on average, but the variation in AIC is so high 
it is hard to interpret this.

This is kind of why I am leery of assuming a null Brownian model. A Brownian 
model, if anything, does not seem to accurately model the relationship between 
variables.

This is why I am having trouble figuring out how to do model selection. Just 
going with accuracy statistics like percent error or standard error of the 
estimate OLS is better from a purely practical sense (it doesn't work for the 
monotreme taxa, but it turns out that estimate error in the monotremes is only 
decreased by 10% in a Brownian model when it overestimates mass by nearly 75%, 
so the improvement really isn't worth it and using this for monotremes isn't 
recommended in the first place), but the reviewers are expressing skepticism 
over the fact that the Brownian model produces less useable results. And I'm 
not entirely sure the best way to go about the PGLS if using one of the 
birth-death trees isn't ideal, perhaps what Dr. Upham says about using the DNA 
tree might work better.

Ironically, an OU model might be argued to better fit the data, despite the 
concerns that Dr. Bapst mentioned. Looking at the distribution of signal even 
though signal is not random, it is more accurately described as most taxa 
hewing to a stable equilibrium with rapid, high magnitude shifts at certain 

Re: [R-sig-phylo] phylogenetic correlation analysis

2021-06-13 Thread Cecile Ane
Rphylopars is an option too (https://github.com/ericgoolsby/Rphylopars)



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Odd behavior of phylogenetic regressions

2019-09-06 Thread Cecile Ane
Hi Søren,
Your second best model (M4) has a fairly good AIC (~3 units above the best 
model) and has both of your predictors. If one of your study’s goal is to 
quantify the effect of both of your predictors, then it’s a good reason to use 
your model M4. Having them both together is important if they are correlated.
An alternative to using Wald tests is a parametric bootstrap (“boot” option in 
phyloglm). It takes longer, but it might be more reliable.
Cécile

On Sep 6, 2019, at 4:09 AM, Søren Faurby 
mailto:soren.fau...@bios.au.dk>> wrote:

Thanks for the comment Cecile

You are completely correct that Climbing and Rainforest is correlated which 
explains the "odd" behavior when both are added to the model.
It may just be a case of Walds test giving a much worse approximation than the 
AIC, but I was just surprised in this case because the delta AIC for the model 
with rainforest instead of climbing as a predictor was ~18 which I thought 
would be big enough that the two methodologies ought to produce the same result.
It is simple for me to just move forward selecting the best model based on LRT, 
but then I am not sure how to best estimate the strength of any given predictor 
in the resulting best model if I cannot rely on the  Walds test or the 
estimated effect sizes from the summary call in this case?

Cheers, Søren




-
Søren Faurby
http://antonelli-lab.net<http://antonelli-lab.net/>
________
Fra: Cecile Ane mailto:cecile@wisc.edu>>
Sendt: 3. september 2019 01:04
Til: r-sig-phylo@r-project.org<mailto:r-sig-phylo@r-project.org> 
mailto:r-sig-phylo@r-project.org>>; Søren Faurby 
mailto:soren.fau...@bios.au.dk>>
Emne: Re: [R-sig-phylo] Odd behavior of phylogenetic regressions

I suspect that “Climbing” and “Rainforest” are correlated with each other, so 
one of both lose significance when both are included in the model.

Also, the Wald-type tests in logistic regression are not exact, and they 
typically give a different p-value than a likelihood ratio test. Both Wald 
tests and LRT are approximate, but the Wald-type p-values are typically worse 
approximations.

Cécile

On Sep 2, 2019, at 11:28 AM, Søren Faurby 
mailto:soren.fau...@bios.au.dk>> wrote:

Dear list,

I am currently working on a project looking at the predictors of spinyness in 
palms

Spinyness is coded binary. I have a number of potential predictors and want to 
identify the best set of predictors. I am using "phyloglm" from the "phylolm" 
package.

I am currently using the method=�logistic_MPLE� method which as I describe 
below gives odd results. method=�logistic_IG10� fails to converge.

AIC and the estimated significance of predictors from Walds test seems oddly 
inconsistent, one predictor produced a much lower AIC but is found to be a 
non-significant predictor. Another is found to be highly significant but 
produces a much higher AIC. I am aware that Walds test are only expected to be 
approximately correct but the degree of difference surprise me. What is the 
cause of this inconsistent behavior and more importantly, how can I under these 
circumstances see what is the best predictor(s) in my data?


My problem is illustrated by four potential  models
M1= phyloglm(Spines~1, dat=DATA, phy=TREE, method="logistic_MPLE")

M2= phyloglm(Spines~ Climbing, dat=DATA, phy=TREE, method="logistic_MPLE")

M3= phyloglm(Spines~ Rainforest, dat=DATA, phy=TREE, method="logistic_MPLE")

M4= phyloglm(Spines~ Climbing + Rainforest, dat=DATA, phy=TREE, 
method="logistic_MPLE")

AIC(M1)= 568.17; AIC(M2)= 546.94; AIC(M3)= 528.68; AIC(M4)= 531.98

Based on AIC it is thus clear that the best model is M3, but I get surprised 
when I compare the results of the models

summary(M2)
 Estimate  StdErr z.value   p.value
(Intercept)  0.90537 1.43428  0.6312 0.5278839
Climbing 0.39713 0.11884  3.3417 0.0008327 ***
Note: Wald-type p-values for coefficients, conditional on alpha=0.004616504

summary(M3)
  Estimate  StdErr z.value   p.value
(Intercept)   1.065304  1.493856  0.7131 0.47577
Rainforest   -0.172741  0.090467 -1.9094 0.05621 .
Note: Wald-type p-values for coefficients, conditional on alpha=0.004046622

Summary(M4)
 Estimate  StdErr z.value   p.value
(Intercept)  0.66313 0.87768  0.7556 0.449916
Climbing 0.29763 0.10801  2.7555 0.005861 **
Rainforest   0.27519 0.18470  1.4899 0.136257
Note: Wald-type p-values for coefficients, conditional on alpha= 0.004563691


Hope someone can help me, S�ren


-
S�ren Faurby
http://antonelli-lab.net<http://antonelli-lab.net/>

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.org<mailto:R-sig-phylo@r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] model averaging using brownie.lite

2019-09-05 Thread Cecile Ane
Thanks Brian, great review, as always!

To add one bit: this paper looks at the effective sample size that should be 
used for BIC, in the standard BM model (univariate).
https://projecteuclid.org/download/pdfview_1/euclid.aoas/1223908053

It gives a formula that depends on the tree shape and branch lengths. Like what 
Brian said: a pectinate tree would generally have a smaller effective sample 
size than a symmetric tree, for the same number of taxa. The general formula 
uses matrix, but the result should be something less than the number of taxa, 
and greater than: # branches stemming from the root * ratio (total tree height 
/ length of shortest branch stemming from the root). The effective sample size 
should also be at least (total tree length / total tree height) for an 
ultrametric tree. See end of section 2 for an example of BIC penalties using 
effectives sample sizes.

The bottom line is the same as what Brian said:
- it’s generally unknown what “sample size” should be used
- in cases when we know, the answer is complicated (it depends on the tree and 
on the model).

With multivariate data (multiple sites), the effective sample size for 
univariate data (like number of taxa or something smaller) should be multiplied 
by the number of sites, if the model assumes that sites are independent and 
share the same evolutionary parameters. (consistent with what Brian said).

Cécile

On Sep 5, 2019, at 9:55 AM, Brian O'Meara 
mailto:omeara.br...@gmail.com>> wrote:

Sample size is a weird thing in this area for AICc. For comparing DNA
models in something like ModelTest, number of sites is used, but for OU/BM
models, we typically use number of taxa. It's not resolved what's best.

Posada and Buckley (2004, https://doi.org/10.1080/10635150490522304) have a
discussion on this:

Both in the AICc and the BIC descriptions above, the total number of
characters was used as an estimate of sample size. However, effective
sample sizes in phylogenetic studies are poorly understood, and depend on
the quantity of interest (Churchill et al., 1992; Goldman, 1998; Morozov et
al., 2000). Characters in an alignment will often not be independent, so
using the total number of characters as a surrogate for sample size (Minin
et al., 2003; Posada and Crandall, 2001b) could be an overestimate. Using
only the number of variable sites as an estimate of sample size is a more
conservative approach, but could be an underestimate (note that all sites
are used when estimating base frequencies or the proportion of invariable
sites). Indeed, sample size also depends on the number of taxa.
Importantly, sample size can have an effect on the outcome of model
selection with the AICc. In our example above, if we were to use the number
of variable characters (301 sites) as the sample size, instead of the total
number of characters (1927 sites), the best AICc model would not change,
but the second and third AICc models would exchange their rankings.
Furthermore, because the LRT, the AIC, and the BIC strategies rely on large
sample asymptotics, it is also important to decide when a sample should be
considered small. Although the AICc was derived under Gaussian assumptions,
Burnham et al. (1994) found that this second order expression performed
well in product multinomial models for open population capture-recapture.
Burnham and Anderson (2003, p. 66) suggest using this correction when the
sample size is small compared to the number of adjustable parameters, n/K <
40. Alternatively, and because AICc converges to the AIC with increasing
n/K ratios, one could always use the AICc (D. Anderson, personal
communications). Phylogenetic characters are mostly discrete, and the
unconstrained model in phylogenetics is multinomial (Goldman, 1993). One
may think of an alignment of nucleotide characters as a large and sparse
contingency table with 4^T bins, where T is the number of taxa. For large
sample asymptotics to hold in a contingency table every cell should
contain, in general, more than 5 observations (see Agresti, 1990, p. 49,
244–250), which gives a rule of thumb of n/4^T > 5. Clearly, more research
is needed on sample size in phylogenetics.

Beaulieu et al. (2018, https://doi.org/10.1093/molbev/msy222; note my COI
as I'm an author on this) did some simulations on a codon model testing
different ways of counting sample size (number of sites, number of taxa,
number of sites * number of taxa, etc.) and found that number of cells in
the matrix (number of sites * number of taxa) seemed to work best to
approximate Kullback-Liebler distance. For univariate models like that used
in brownie.lite, number of cells is equal to number of taxa (since there's
only one column):

We note our use of AICc, as calculated in Burnham and Anderson (2002, p.
66) and as opposed to the standard AIC, in the above model comparisons. At
the outset of our study it was unclear what the appropriate sample size n
is when comparing models of sequence evolution. Building upon the 

Re: [R-sig-phylo] Odd behavior of phylogenetic regressions

2019-09-02 Thread Cecile Ane
I suspect that “Climbing” and “Rainforest” are correlated with each other, so 
one of both lose significance when both are included in the model.

Also, the Wald-type tests in logistic regression are not exact, and they 
typically give a different p-value than a likelihood ratio test. Both Wald 
tests and LRT are approximate, but the Wald-type p-values are typically worse 
approximations.

Cécile

On Sep 2, 2019, at 11:28 AM, Søren Faurby 
mailto:soren.fau...@bios.au.dk>> wrote:

Dear list,

I am currently working on a project looking at the predictors of spinyness in 
palms

Spinyness is coded binary. I have a number of potential predictors and want to 
identify the best set of predictors. I am using "phyloglm" from the "phylolm" 
package.

I am currently using the method=�logistic_MPLE� method which as I describe 
below gives odd results. method=�logistic_IG10� fails to converge.

AIC and the estimated significance of predictors from Walds test seems oddly 
inconsistent, one predictor produced a much lower AIC but is found to be a 
non-significant predictor. Another is found to be highly significant but 
produces a much higher AIC. I am aware that Walds test are only expected to be 
approximately correct but the degree of difference surprise me. What is the 
cause of this inconsistent behavior and more importantly, how can I under these 
circumstances see what is the best predictor(s) in my data?


My problem is illustrated by four potential  models
M1= phyloglm(Spines~1, dat=DATA, phy=TREE, method="logistic_MPLE")

M2= phyloglm(Spines~ Climbing, dat=DATA, phy=TREE, method="logistic_MPLE")

M3= phyloglm(Spines~ Rainforest, dat=DATA, phy=TREE, method="logistic_MPLE")

M4= phyloglm(Spines~ Climbing + Rainforest, dat=DATA, phy=TREE, 
method="logistic_MPLE")

AIC(M1)= 568.17; AIC(M2)= 546.94; AIC(M3)= 528.68; AIC(M4)= 531.98

Based on AIC it is thus clear that the best model is M3, but I get surprised 
when I compare the results of the models

summary(M2)
 Estimate  StdErr z.value   p.value
(Intercept)  0.90537 1.43428  0.6312 0.5278839
Climbing 0.39713 0.11884  3.3417 0.0008327 ***
Note: Wald-type p-values for coefficients, conditional on alpha=0.004616504

summary(M3)
  Estimate  StdErr z.value   p.value
(Intercept)   1.065304  1.493856  0.7131 0.47577
Rainforest   -0.172741  0.090467 -1.9094 0.05621 .
Note: Wald-type p-values for coefficients, conditional on alpha=0.004046622

Summary(M4)
 Estimate  StdErr z.value   p.value
(Intercept)  0.66313 0.87768  0.7556 0.449916
Climbing 0.29763 0.10801  2.7555 0.005861 **
Rainforest   0.27519 0.18470  1.4899 0.136257
Note: Wald-type p-values for coefficients, conditional on alpha= 0.004563691


Hope someone can help me, S�ren


-
S�ren Faurby
http://antonelli-lab.net

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] using color function in phylomorphospace

2019-03-09 Thread Cecile Ane
colors can be numeric. for example:
plot(1:36, 1:36, col=1:36)
Cécile

On Mar 8, 2019, at 8:42 PM, Liam Revell 
mailto:liam.rev...@umb.edu>> wrote:

Dear Amanda.

I'm a little confused by your code. If you want to use
control=list(col.edge) you need to set col.edge to a vector of colors,
but in your example col.edge is a numeric vector from 1 to 36.


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] estimate ancestral state with OUwie models

2018-06-14 Thread Cecile Ane
It is possible to transform the branch lengths of a non-ultrametric tree to 
make the OU process equivalent to a BM process (on the transformed branch 
lengths), but there’s an extra correction needed at the tips (i.e. need to 
transform the data too via a diagonal matrix). It makes likelihood calculations 
a lot faster, but I have not seen this used for ancestral state estimation. 
This approach is described in Ho & Ané (2014) 
https://doi.org/10.1093/sysbio/syu005.
It’s implemented in the R packages 
phylolm, 
Rphylopars, or 
l1ou.
Cécile

On Jun 14, 2018, at 8:25 AM, David Bapst 
mailto:dwba...@tamu.edu>> wrote:

Simone, others,

Perhaps I misunderstand, but I believe Slater (2014) shows that there is no
correct branch length transformation for a non-ultrametric tree under an OU
process. The correct transformation that Slater applies is Hansen's, yes,
but to the VCV matrix. Slater does point out there is one VCV-free approach
that does work for non-ultrametric trees, and that is the pruning-algorithm
that was at the time implemented in diversitree.

Some quotes from Slater (2014):

"The time-dependent decay in covariance that accrues after divergence under
OU results in the rather unique phenomenon that non-ultrametric trees
cannot be rescaled according to an OU process as ultrametric trees can."

"This is a necessary error, because a treelike structure cannot be
represented with different shared path lengths for pairs of descendents
that branch from the same ancestor. However, using Hansen’s (1997) formula
to directly transform V results in the correct pattern (Fig. 1d); Vt4,t1 is
higher than Vt4,t2 because the phylogenetic distance between t1 and t4, and
thus the time available for covariance decay, is lower as a result of t1
going extinct before the present. It is additionally worth noting that
branch length scaling results in underestimated variances for fossil taxa
(upper left diagonal of 1b, c, and d); branch length transformation reduces
the variance of t1 compared with that of the other species, while direct
matrix scaling increases the variance of the fossil taxon to a similar
value as in the extant species."

"For example, although problems with branch length rescaling render
Freckleton’s (2012) approach inappropriate, it is still possible to use
VCV-free approaches to rapidly compute likelihoods with non-ultrametric
data. The pruning algorithm used by FitzJohn (2012) in his DIVERSITREE
package, which is based on Felsenstein (1973, 1981), can be used to fit OU
models to non-ultrametric trees because it works on per branch basis,
rather than by scaling all branch lengths simultaneously."

Again, this is (I think) common knowledge that OU
branch-length-transformations cannot be safely applied to non-ultrametric
trees, in our post-Slater-2014 world. However, following Lucas's comment, I
wanted to make sure the record is correct for those who stumble on this
thread later. As Slater (2014) points out, us paleontologists are ever
increasingly users of phylogenetic comparative methods, and so as
methods-developers, we should always take into account the possibility that
a user's dataset may include fossil tips.

Cheers,
-Dave

PS: I've cc'd Graham so he can rebuke me if I'm way off-the-mark here.





On Tue, Jun 12, 2018 at 12:33 AM, Simone Blomberg 
mailto:s.blombe...@uq.edu.au>>
wrote:

This sounded wrong to me, as the OU process should be agnostic to the
dataset: There are no restrictions inherent in the OU process that apply
particularly to phylogenetic data, whether the tree is ultrametric or not.
I re-read Slater 2014 and it is clear that you can use branch length
transformations with OU, so long as you use the (correct) Hansen formula,
not the Butler-king formula, which does indeed require an ultrametric tree.

Cheers,

Simone.

Sent from my iPhone

On 12 Jun 2018, at 8:01 am, David Bapst 
mailto:dwba...@tamu.edu>> wrote:

Just to follow off what Lucas said, but please note you cannot rescale
branches of a phylogeny using an OU model when the tree is
non-ultrametric (such as when it contains extinct, fossil taxa as
tips). Slater (2014, MEE) discusses this more in a brief correction to
Slater (2013).

I don't know if anyone in this conversation has a non-ultrametric
tree, but I wanted to make that clear for anyone who stumbles on this
thread n the future using a google search.
-Dave



On Sun, Jun 10, 2018 at 12:25 PM, Lucas Jardim <
lucas.ljard...@gmail.com> wrote:
Hi Bruno,

You can transform the branches of your phylogeny using the estimated
parameters of OU models. Then, if those models describe the observed
data
adequatly, the transformed tree should model the observed data as a
Brownian motion model. So you can use an ancestral state reconstruction
based on Brownian motion model. However, I do not know if that is the
best

Re: [R-sig-phylo] Rescaling a cophenetic matrix based on the Early Burst Model?

2017-12-15 Thread Cecile Ane
Not in phylolm, because it’s faster to avoid manipulating this large matrix, if 
possible (and it is indeed possible for many purposes). If you can rephrase 
your calculations to use branching times (distance from a node to its 
descendant tips) or using the nodes’ distance to the root, then you could use 
functions like pruningwise.branching.times and pruningwise.distFromRoot in 
phylolm. Other folks could chime in, for tools in other packages.
Cécile

On Dec 15, 2017, at 3:55 PM, Max Farrell 
<maxwellfarr...@gmail.com<mailto:maxwellfarr...@gmail.com>> wrote:

Hi Cecile,

Thanks for the input - it looks like this function does perform the tree 
rescaling, and in about 1/4 of the time, so this could help speed up the code!

However, the function doesn't seem to return a covariance matrix. Is there a 
way to get the cophenetic matrix with the out put of this function? Otherwise 
I'm stuck using cophenetic again...

Max

Max

On Fri, Dec 15, 2017 at 4:36 PM, Cecile Ane 
<cecile@wisc.edu<mailto:cecile@wisc.edu>> wrote:
In the package phylolm, the function "transf.branch.lengths” might do what you 
need. It has an option model=“EB” for early burst.
Cécile

> On Dec 15, 2017, at 3:27 PM, Max Farrell 
> <maxwellfarr...@gmail.com<mailto:maxwellfarr...@gmail.com>> wrote:
>
> I have been using the rescale function from geiger for a link prediction
> model I recently helped develop (https://arxiv.org/abs/1707.08354). I'm now
> running this model on a much larger dataset and part of the code is
> computing cophenetic(rescale(phy)) with 'EB' rescaling many many times.
>
> We're finding that calling cophenetic() is the rate limiting step, and if
> we can avoid this function we expect to speed up our code by up to 4 times.
> This would be very useful as our simulation has been running for over 50
> days now...
>
> I was wondering if there is a way to do EB transformations on a
> phylogenetic distance matrix directly so as to avoid using the cophenetic
> function?
>
> Any help or insights would be greatly appreciated!
>
>   [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - 
> R-sig-phylo@r-project.org<mailto:R-sig-phylo@r-project.org>
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.org<mailto:R-sig-phylo@r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Rescaling a cophenetic matrix based on the Early Burst Model?

2017-12-15 Thread Cecile Ane
In the package phylolm, the function "transf.branch.lengths” might do what you 
need. It has an option model=“EB” for early burst.
Cécile

> On Dec 15, 2017, at 3:27 PM, Max Farrell  wrote:
> 
> I have been using the rescale function from geiger for a link prediction
> model I recently helped develop (https://arxiv.org/abs/1707.08354). I'm now
> running this model on a much larger dataset and part of the code is
> computing cophenetic(rescale(phy)) with 'EB' rescaling many many times.
> 
> We're finding that calling cophenetic() is the rate limiting step, and if
> we can avoid this function we expect to speed up our code by up to 4 times.
> This would be very useful as our simulation has been running for over 50
> days now...
> 
> I was wondering if there is a way to do EB transformations on a
> phylogenetic distance matrix directly so as to avoid using the cophenetic
> function?
> 
> Any help or insights would be greatly appreciated!
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] predict using the PGLS model output

2017-10-09 Thread Cecile Ane
Rphylopars can make phylogenetic predictions: it’s on CRAN and on github 
https://github.com/ericgoolsby/Rphylopars
Cécile

On Oct 9, 2017, at 9:59 AM, Clara Grilo 
> wrote:

Hi everyone,

I run a phylogenetic generalised least squares model (PGLS) to analyze the
effect of demographic parameters on the risk of extinction for 70 species.

Now I would like to predict the extinction risk based on this model for the
remaining unstudied species (~4000 species).
I have to use the  phylogenetic tree for the 4000 species and applied the
previous model.

Are you aware of a code that make predictions based on the model and
another tree of species?

Thanks you in advance.
Best regards,
Clara

--
-
Clara Grilo

Centro Brasileiro de Estudos de Ecologia de Estradas (CBEE)
Departamento de Biologia
Universidade Federal de Lavras
37200 000 Minas Gerais
Brasil

Website: http://lattes.cnpq.br/2789017774255561
 http://cbee.ufla.br

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Fwd: Fw: Help with tree manipulation

2017-06-24 Thread Cecile Ane
Hi Monica,
Perhaps it's because your bootstrap values are stored as node labels, instead 
of edge labels. This problem is well documented here:
https://doi.org/10.1093/molbev/msx055
But perhaps it's a different issue in your case.
Cécile

On Jun 24, 2017, at 9:18 AM, Monica Carlsen 
> wrote:

Hi everyone,

I have a set of gene trees (resulting from RAxML) analyses that I would like to 
manipulate before inputting them into a coalescence analysis.

I have unrooted trees with bootstrap support as node labels. I would need to 
collapse branches with less than 50%BS, and root these trees.

I'm using a combination of packages (ape, phytools, phangorn) to achieve these 
results. When visualizing the trees (using ggtree), I noticed that the order of 
the steps makes a difference in the final output tree. See attached script, 
dataset and tree comparison PDF.

For example (in PDF), if I collapse branches first, then root the tree, I get 
tree # 3. But, if I do the rooting first, then collapsing branches, I get tree 
# 4, as well as warnings that "number of items to replace is not a multiple of 
replacement length" in some steps. Tree #1 is the original tree, unrooted, 
uncollapsed, straight from RAxML. Tree # 2 is the collapsed, but unrooted tree.

The differences in topology are so striking that they influence (a lot!) the 
final topology and support of the coalescence analysis.

I've read before that different software plots/uses the support-as-node-label 
in different ways, and that could create erroneous representations. But, I 
can't seem to understand which one is the correct one in my case.

Any thoughts on what could be happening here? what would be the order in which 
the steps should be performed? or any recommendations for different packages to 
use that could solve the issue?

Monica.

Monica Carlsen, PhD - Plant systematics, ecology and evolution
Peter Buck Postdoctoral Fellow | Smithsonian National Museum of Natural History 
| Washington, D.C.
Research Associate | Missouri Botanical Garden | St. Louis, MO




___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] How to set up the minimum limit of the imputated value in phylopars(), (RPhylopars package), in R

2017-06-18 Thread Cecile Ane
Hi Judit,
Negative predictions are surprising indeed if your observed data are all 
positive values, and if you have a single trait. Do you have multiple traits? 
If they are correlated, information on one trait can push imputation of another 
in the negative range. 

In any case, one option would be to log-transform the trait that has to take 
positive values. If it makes the trait more normally distributed, it’s 
generally a good thing to do. Then you can impute the missing values for the 
log-transformed trait, and finally transform back (take the exponential) if you 
need to.
Cécile

> On Jun 16, 2017, at 3:39 PM, Judit Mokos  wrote:
> 
> Dear R-sig-phlyo members,
> 
> I am using phylopars() in Rphylopars package in R to generate the missing
> values in a large dataset about animal body traits (eg body size). (
> https://www.rdocumentation.org/packages/Rphylopars/versions/0.2.9/topics/phylopars)
> This method is called imputation and what it does is to phylogenetically
> estimate this missing datas.
> 
> However the output of the imputation contains some negative values which
> make no sense because all the estimated trait have to be bigger than zero.
> I wonder how I can fix this issue or how to set up a minimum limit for the
> estimated values.
> 
> I'm not new in R but new in Rphylopars so maybe that question is pretty
> naive but I couldn't find the solution.
> 
> Thanks for your answer in advanced,
> 
> Judit
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] simulating gene trees on a given species tree and theta

2016-12-18 Thread Cecile Ane
to install phybase from the CRAN archive:

install.packages("https://cran.r-project.org/src/contrib/Archive/phybase/phybase_1.1.tar.gz;,
 repos = NULL, type="source")

Cécile

On Dec 18, 2016, at 4:26 AM, Liam J. Revell 
> wrote:

Hi John.

I believe this used to be possible using the package phybase. Unfortunately, 
this package was not being maintained & was removed from CRAN - however 
archived package versions can still be obtained (so it may be possible to 
install from source).

All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 12/18/2016 7:22 AM, john d wrote:
Dear colleagues,

I'm looking for an R package that will simulate gene trees onto a given
species tree and a value of theta. I tried TreeSim, but it would use a
provided tree as input.

I know that DendroPy does that, but I'd prefer using R.

Thanks!

Carlos

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] get branch lengths from distances to root

2016-11-30 Thread Cecile Ane
One way would be to label the root to make it a leaf, and apply NJ.
Neighbor-joining reconstructs the original branch lengths (and topology) given 
tree-distances between the leaves.
Cécile

On Nov 30, 2016, at 10:30 AM, Juan Antonio Balbuena 
> wrote:


Hi all

This is a simple question:

I have a set of ultrametric trees with information on their topology but 
without branch lengths. On a separate file, I have the distances of each node 
and leaf to the root. Can someone instruct me on how to to compute the branch 
lengths of the tree with these two separate bits of information?

Many thanks in advance

Juan A. Balbuena

--
Dr. Juan A. Balbuena
Cavanilles Institute of Biodiversity and Evolutionary Biology
University of Valencia
http://www.uv.es/~balbuena
P.O. Box 22085
http://www.uv.es/cophylpaco
46071 Valencia, Spain
e-mail: j.a.balbu...@uv.estel. +34 963 543 658   
 fax +34 963 543 733

NOTE! For shipments by EXPRESS COURIER use the following street address:
C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.

___
R-sig-phylo mailing list - 
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Standard errors of theta estimates in OU models

2016-09-15 Thread Cecile Ane
I would say no: standard errors of theta values do not have a biological 
interpretation, in my opinion.
Going back to a sample from a single population and its mean: the standard 
deviation of the sample has a biological interpretation (e.g within-population 
variation), but the standard error does not. It has a statistical 
interpretation: about estimate uncertainty. Similarly with the OU model: the 
standard errors of theta estimates are also about uncertainty in these 
estimates. Even in the best case scenario when the model that you are using is 
correct, these standard errors would be affected by your sample size (i.e. go 
down with the number of tips in your tree), whereas a biological interpretation 
like plasticity should not be affected by the sample size.

On top of all this, a mismatch between the evolution process and the assumed OU 
model would affect the theta estimates and their standard errors. 

The new version of phylolm (on github, not on CRAN) has bootstrap-based 
standard errors, if you wanted to follow-up on Brian’s idea. It also has the 
possibility to add measurement error in the model when analyzing data
Cécile

> On Sep 15, 2016, at 9:11 PM, Brian O'Meara  wrote:
> 
> That's a really cool question. I don't know. I'm pretty sure that
> practically, the answer is no (esp when you consider all the issues -- any
> feasible OU is too simple a model to match reality, tree errors, data
> errors, etc. -- that also come into an analysis). However, I could imagine
> that in a perfect world you could look at expected standard errors created
> using parametric bootstrapping from the MLE of the parameters and see if
> the actual standard errors are different -- maybe it could reflect
> plasticity or some other factor. But I really don't know. You might look at
> some of the methods that can deal with intraspecific and interspecific
> processes to get at this sort of biological question (the Felsenstein 2008
> Contrasts revisited paper might be interesting as a start, though it's not
> OU and comes at this from a different direction), but it's fun to think
> about ways to go from what is treated as annoying noise to a meaningful
> signal.
> 
> Best,
> Brian
> 
> ___
> Brian O'Meara, http://www.brianomeara.info, especially Calendar
> , CV
> , and Feedback
> 
> 
> Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville
> Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville
> Associate Director for Postdoctoral Activities, National Institute for
> Mathematical & Biological Synthesis  (NIMBioS)
> Communication Director, Society of Systematic Biologists
> 
> On Thu, Sep 15, 2016 at 5:37 PM, Xiaojing Wei  wrote:
> 
>> Dear R-sig-phylo users,
>> 
>> I wonder if the standard errors of the theta estimates in OU models have
>> any biologically meaningful interpretations. For instance, could it give
>> some indication of plasticity in traits, or the optimal level of plasticity
>> in traits? Or does it simply reflect estimation uncertainty of the
>> phylogeny or the OU model?
>> 
>> Thanks a lot!
>> 
>> --
>> Xiaojing Wei
>> PhD student
>> Dept. of Ecology, Evolution, and Behavior
>> University of Minnesota
>> Rm. 211 Ecology Bld., 1987 Upper Buford Cir.
>> St. Paul, 55108
>> 
>>[[alternative HTML version deleted]]
>> 
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at http://www.mail-archive.com/r-
>> sig-ph...@r-project.org/
>> 
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] hierarchical model with phylogenetic dependence term

2014-12-22 Thread Cecile Ane

Hi Peter,

We have taken a similar approach in this paper:
http://onlinelibrary.wiley.com/doi/10./evo.12582/abstract
We also use a Bayesian approach, with a prior that allows us to 
integrate the ancestral state and the total variance analytically (no 
need for MCMC for those parameters). Our notation for your sigma_phy is 
lambda, like Pagel's, and we do use multiple individuals per species. 
For what it's worth.


Regarding the scaling of branch lengths, I agree with Joe that there is 
nothing particular about 1, other than providing an easier 
interpretation for the numerical value of the phylogenetic variance.

Cheers,
Cecile.


On 12/14/2014 01:03 PM, Peter Smits wrote:

Hi list,

I have a similar question to Edwin. I too am working with a hierarchical
Bayesian model, though I've implemented it in stan. I've included a
phylogenetic random effect term, which is modeled as being distributed as
multivariate normal with known covariance matrix up to a constant,
sigma_{phy}. This follows Lynch '91 Am Nat and Housworth et al. '04 Am Nat
by drawing on the similarity with the animal model from quantitative
genetics.

My question is about the scaling of the covariance matrix: is it necessary
to have the the diagonal terms satisfy x = 1 and all the off diagonals to
be 0 = x  1? I have a time scaled phylogeny from which I have my
covariance matrix, so currently all elements of the matrix are not scaled
so that the greatest distance from the root to a tip is 1. Currently, the
elements of the matrix are just the sum of the shared branch lengths. Is
this appropriate? Why or why not?

Any input would be much appreciated.

Cheers,

Peter Smits



--
Cecile Ane
Departments of Statistics and of Botany
University of Wisconsin - Madison
www.stat.wisc.edu/~ane/

CALS statistical consulting lab:
www.cals.wisc.edu/calslab/stat_consulting.php

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] hierarchical model with phylogenetic dependence term

2014-12-22 Thread Cecile Ane

Hi again Peter,

If matrix inversion and determinants are a bottleneck, there are 
multiple ways to avoid them (without assigning random effects to 
ancestral species), including Felsenstein (1973), FitzJohn (2012) and 
Freckleton (2012). The speed-up is huge if you have thousands of tips. 
This algorithm extends to a variety of models:

http://sysbio.oxfordjournals.org/cgi/reprint/syu005?ijkey=bIsHxa2dpqXCplckeytype=ref

Cecile.

On 12/22/2014 12:01 PM, Peter Smits wrote:

Cecile and Joe,

Thank you for the replies. That helped clear up a lot for me.

And thank you for the link Cecile. Estimating sigma_phy is quite a pain
because of all the matrix inversions. I've a tree of approx. 2000 tips,
so even using of a Cholesky decomposed V_{phy} doesn't buy me much time
at all.

Because the vector of means for the MVN in my work is all 0s, I've
written my own MVN sampler to try and minimize how many inversions or
determinants need to be calculated. The stan mailing list had a great
little discussion on how to speed up estimation of MVN random effects
with partially known covariance matrices
https://groups.google.com/forum/#!topic/stan-users/HcvYaDu71_Y and I
just followed that code. Now my posterior sampling at least gets off the
ground and finishes.

The stan list discussion possibly provides a useful starting template
for other people on this list trying to do similar things.

Cheers,

Peter

On Mon, Dec 22, 2014 at 11:32 AM, Cecile Ane a...@stat.wisc.edu wrote:

Hi Peter,

We have taken a similar approach in this paper:
http://onlinelibrary.wiley.__com/doi/10./evo.12582/__abstract
http://onlinelibrary.wiley.com/doi/10./evo.12582/abstract
We also use a Bayesian approach, with a prior that allows us to
integrate the ancestral state and the total variance analytically
(no need for MCMC for those parameters). Our notation for your
sigma_phy is lambda, like Pagel's, and we do use multiple
individuals per species. For what it's worth.

Regarding the scaling of branch lengths, I agree with Joe that there
is nothing particular about 1, other than providing an easier
interpretation for the numerical value of the phylogenetic variance.
Cheers,
Cecile.


On 12/14/2014 01:03 PM, Peter Smits wrote:

Hi list,

I have a similar question to Edwin. I too am working with a
hierarchical
Bayesian model, though I've implemented it in stan. I've included a
phylogenetic random effect term, which is modeled as being
distributed as
multivariate normal with known covariance matrix up to a constant,
sigma_{phy}. This follows Lynch '91 Am Nat and Housworth et al.
'04 Am Nat
by drawing on the similarity with the animal model from
quantitative
genetics.

My question is about the scaling of the covariance matrix: is it
necessary
to have the the diagonal terms satisfy x = 1 and all the off
diagonals to
be 0 = x  1? I have a time scaled phylogeny from which I have my
covariance matrix, so currently all elements of the matrix are
not scaled
so that the greatest distance from the root to a tip is 1.
Currently, the
elements of the matrix are just the sum of the shared branch
lengths. Is
this appropriate? Why or why not?

Any input would be much appreciated.

Cheers,

Peter Smits
--
Peter D Smits
Grad student
Committee on Evolutionary Biology
University of Chicago
psm...@uchicago.edu mailto:psm...@uchicago.edu
http://home.uchicago.edu/~psmits/home.html


--
Cecile Ane
Departments of Statistics and of Botany
University of Wisconsin - Madison
www.stat.wisc.edu/~ane/

CALS statistical consulting lab:
www.cals.wisc.edu/calslab/stat_consulting.php

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] How to speed up pblm in picante

2014-03-08 Thread Cecile Ane

Hi Miguel, thanks for your correspondence, on and off the list!

It could be done with the approach in Ho  Ané (2014, Syst. Biol. 
http://sysbio.oxfordjournals.org/cgi/reprint/syu005?ijkey=bIsHxa2dpqXCplckeytype=ref 
)


Most specifically, some adjustments would be required. The large 
covariance matrix that slows things down is called W in Ives  Godfray 
(2006): W = kronecker(U,V) where U is from the host phylogeny (and its 
phylogenetic signal d_h), and V is from the parasitoid phylogeny (and 
its phylogenetic signal d_p). Our approach can be applied to both U and 
V. At several points in pblm, the determinant of W needs to be 
calculated. That's just det(U)*det(V) so we can get this fast with our 
approach. Then pblm needs things involving the inverse of W, which is 
the kronecker product of the inverse of U and the inverse of V. So these 
steps can be sped up too. If someone is interested in coding that, I 
would be happy to help or advise.

Cecile.

On 03/03/2014, Miguel Verdu miguel.ve...@uv.es wrote:

Hi all,

I am calculating the phylogenetic signal of
ecological interactions matrices with
pblm{picante] but the run time for medium sized
matrices (70 x 100) with a reasonable number of
iterations (nreps=2000) seems prohibitive. Has
anybody tried to speed up the process by parallelization or other procedures?
Thanks in advance

Miguel Verdú


***
Centro de Investigaciones sobre Desertificacion -CIDE-
(CSIC/UV/GV)
Carretera Moncada - Náquera, Km. 4,5
46113 Moncada (Valencia)
Spain
Tel +34 96 3424204
Fax +34 96 3424160
http://www.uv.es/verducam

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive athttp://www.mail-archive.com/r-sig-phylo@r-project.org/


--
Cecile Ane
Departments of Statistics and of Botany
University of Wisconsin - Madison
www.stat.wisc.edu/~ane/

CALS statistical consulting lab:
www.cals.wisc.edu/calslab/stat_consulting.php

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compar.ou

2013-10-29 Thread Cecile Ane
FYI, we have some theory to explain why alpha has large standard errors 
and in which conditions. As Brian says, it comes with a flat likelihood 
with respect with alpha.
http://dx.doi.org/10.1214/13-AOS1105 or 
http://www.stat.wisc.edu/~ane/publis/2013HoAne_AoS.pdf


On 10/29/2013 09:46 AM, Brian O'Meara wrote:

In at least the OUwie paper we spent a lot of time doing simulations to
determine this empirically (this may have been examined in other papers,
too, though none come to mind). Alpha can be estimated, but sometimes with
scarily large standard errors (but not always). This property should hold
for related methods (it's a property of the likelihood surface for these
models). You can just plot the alpha values vs likelihood to get a sense of
this surface for your dataset (ideally, estimating the other parameters for
each fixed alpha, which is possible, though the kludge of holding the other
parameters at their MLE and just varying alpha will give you a sense of the
surface, though it's a bit non-conservative). We had code in OUwie to do a
contour plot of the likelihood surface but took it out (a small licensing
difference between akima (university license) and OUwie (GPL) led to OUwie
being pulled from CRAN by the CRAN maintainers until we resolved the
difference).

Brian


___
Brian O'Meara
Assistant Professor
Dept. of Ecology  Evolutionary Biology
U. of Tennessee, Knoxville
http://www.brianomeara.info

Students wanted: Applications due Dec. 15, annually
Postdoc collaborators wanted: Check NIMBioS' website
Calendar: http://www.brianomeara.info/calendars/omeara


On Tue, Oct 29, 2013 at 10:21 AM, sandra goutte gou...@mnhn.fr wrote:


Thank you Marguerite. Looking at OUwie and OUCH/SLOUCH, i see that alpha is
estimated along the other parameters, whereas in Hansen 1997 and other
papers it is suggested that this would lead to very large standard errors.
Is that problem resolved in these functions?

Best,
Sandra.


2013/10/26 Marguerite Butler mbutler...@gmail.com


Dear Sandra,

You might also want to look at the papers that go along with slouch,

ouch,

and ouwie. Here are some pdfs, along with some tutorials.

On my Rcompstart, the ouch stuff starts around page 165

Let me know if you need help with ouch.

Marguerite



On Oct 24, 2013, at 7:03 AM, sandra goutte gou...@mnhn.fr wrote:

Dear list,

My aim is to compare the fit of models for which *theta* is allowed to
change at different nodes (different combinations of 1 ,2 or 3 nodes). I
don't really understand the calculation of the deviance, but if i'm not
mistaken the difference between the deviances of 2 models follows a
*chi*square distribution with the difference of parameters in the
models as
degree of freedom. Now, how to compare two models with the same number of
optimum changes but at different nodes?

I am having a hard time understanding the details of the function with
regards to Hansen's (1997) paper. What is the relationship between the
deviance and the RSS from Hansen (1997)? Is it possible to calculate the
RSS from the output of compar.ou ?

I would appreciate any advice or insight on the subject!
Sandra.

--
PhD Student
Muséum National d'Histoire Naturelle
Département Systématique et Évolution
USM 601 / UMR 7205 Origine, Structure et Évolution de la Biodiversité
Reptiles  Amphibiens - Case Postale 30
25 rue Cuvier
F-75005 Paris

Tel :  +33 (0) 1 40 79 34 90
Mobile: +33 (0) 6 79 20 29 99

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/


  
  Marguerite A. Butler
  Associate Professor

  Department of Biology
University of Hawaii
2538 McCarthy Mall, Edmondson Hall 216
Honolulu HI 96822

Office: 808-956-4713
Dept: 808-956-8617
  Lab:  808-956-5867
FAX:   808-956-9812
  http://www.hawaii.edu/zoology/faculty/butler.html
http://www2.hawaii.edu/~mbutler
  http://www.hawaii.edu/zoology/



--
Cecile Ane
Departments of Statistics and of Botany
University of Wisconsin - Madison
www.stat.wisc.edu/~ane/

CALS statistical consulting lab:
www.cals.wisc.edu/calslab/stat_consulting.php

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Convert instantaneous transition rate to time.

2011-11-22 Thread Cecile Ane

Hi Simon,

One option could be to look at the expected number of substitutions over 
time t, and find the smallest time t for which you expect at least 1 
substitution. The idea here is that the first substitution is the one 
that most disrupts the signal.
Technically, if Q is your rate matrix and p is the stationary 
distribution of states associated with Q, then the average number of 
substitutions over time t is t * r where r is the average jumping rate:

r = sum over all states p(i) Q(i,i) in absolute value.
So the time to the first substitution is 1/r.

Cecile.

On 11/22/11 09:01, Simon Greenhill wrote:

Hi all,

What is the best way to convert an instantaneous transition rate (such as that 
given by geiger's `fitDiscrete` method) into a measure of stability over time?

So, I have a set of traits with a small number of states. I want to fit these 
onto a set of trees with branches proportional to substitutions. The 
instantaneous transition rate for these traits give me the relative stability, 
but I want to be able to quantify this over time e.g. what proportion of signal 
is remaining after time t? how could I represent this as a decay curve of 
signal over time?

Any ideas/hints would be much appreciated!

Simon


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Model-Selection vs. Finding Models that Fit Well

2011-01-31 Thread Cecile Ane
I don't find the white noise to be any good evolutionary scenario: it's 
nowhere continuous. It just reduces to the assumption of normal, 
independent observations at the tips. Nothing fancy, then :)

Cecile.

On 01/31/11 11:53, Luke Harmon wrote:

I agree with Dave here. White noise has two parameters, mean and variance, and - to me - 
is an interesting model to test. But I'm not sure it should be considered as a 
baseline.

One can link Brownian motion and white noise through the Ornstein-Uhlenbeck 
model - BM is OU with alpha (constraint) parameter equal to zero, and WN is OU 
with infinite alpha.

lh
On Jan 30, 2011, at 3:18 PM, David Bapst wrote:

...
--
Cecile Ane
Assistant Professor, University of Wisconsin - Madison
www.stat.wisc.edu/~ane/

CALS statistical consulting lab:
www.cals.wisc.edu/calslab/stat.html

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo