Re: [R-sig-phylo] dangerous tree

2015-03-10 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

  (I tried to respond yesterday but Google thought I was sending spam
for some reason.)

  I totally agree that this is *not* high priority -- I just found it
while browsing through some other stuff and thought it should be
noted, since crashes are bugs by definition ...

  thanks
   Ben


On 15-03-10 11:13 AM, Emmanuel Paradis wrote:
 Hi,
 
 Liam is right.
 
 Some months ago, I wrote a function to help diagnose this kind of 
 problem. I've put it on GitHub:
 
 https://github.com/emmanuelparadis/checkValidPhylo
 
 However, in the present case, it won't tell you where is the
 problem in your Newick string.
 
 Best,
 
 Emmanuel
 
 Le 10/03/2015 02:45, Liam J. Revell a écrit :
 Hi Ben.
 
 Your Newick string has some issues  is possibly invalid. I
 believe there are commas missing after 0.0242997 and after
 0.475571. If the line-breaks are eliminated, phytools
 read.newick will read the string - but the result is a very
 peculiar tree. Whereas if I put commas in those two spots, I
 obtain an ultrametric tree from read.tree or read.newick.
 
 This may be beside the point if the issue is that this text
 string when read using read.tree crashes R rather than producing
 a more sensible error. This I too experienced.
 
 All the best, Liam
 
 Liam J. Revell, Assistant Professor of Biology University of
 Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ 
 email: liam.rev...@umb.edu blog: http://blog.phytools.org
 
 On 3/9/2015 9:02 PM, Ben Bolker wrote:
 I came across this in one of my example files.  Just thought I'd 
 report it:
 
 library(ape) sessionInfo() ## R Under development (unstable)
 (2015-02-11 r67792) ## Platform: i686-pc-linux-gnu (32-bit) ##
 Running under: Ubuntu precise (12.04.5 LTS) ## ... ## other
 attached packages: ## [1] ape_3.2
 
 treeText - (((8:0.102773,3:0.102773):0.0537417,((0:0.0242997 
 (7:0.0205236,2:0.0205236):0.00377606):0.0523697,4:0.0766694):0.0798455):0.475571

 
 
 ((9:0.0099204,5:0.0099204):0.202541,(1:0.00367543,6:0.00367543):0.208786):0.419624);

 
 
 tree - collapse.singles(read.tree(file=textConnection(treeText))) 
 plot(tree)
 
 *** caught segfault *** address 0x4d79c0e0, cause 'memory not
 mapped'
 
 Traceback: 1: .C(node_height, as.integer(Ntip), as.integer(Nnode), 
 as.integer(edge[, 1]), as.integer(edge[, 2]),
 as.integer(Nedge), as.double(yy)) 2: .nodeHeight(Ntip, Nnode,
 z$edge, Nedge, yy) 3: plot.phylo(tree) 4: plot(tree)
 
 
 ___ R-sig-phylo
 mailing list - R-sig-phylo@r-project.org 
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable
 archive at 
 http://www.mail-archive.com/r-sig-phylo@r-project.org/
 
 
 ___ R-sig-phylo
 mailing list - R-sig-phylo@r-project.org 
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable
 archive at 
 http://www.mail-archive.com/r-sig-phylo@r-project.org/
 
 
 
 
 
 
 ___ R-sig-phylo mailing
 list - R-sig-phylo@r-project.org 
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable
 archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJU/xjMAAoJEOCV5YRblxUH17kIAMRiAxIcMELv5C+xC2SyScfJ
OstMWkw5Z7jn1EdhhYP2dgFHxyO6HL3nL4OZlSGw2MYTW3dfC2UH2K1RCdGLkJVS
0pOm2LsnzLi+t5XwSkJ9vwotvVF7qSlKabFEhB1/8d0iwB9ff2+HurjOOccDkqh6
FQ6OGKrxdsrvYX9aT2dNwYAijUjKMJ1j0oHZhW3zchJah1L+JSbaK/vxw5X/Qc12
vZSLQr+q0GHMHVklZgO7Lvs9irz/mfoBFxwYRE4Lz2NEfyOOBat40GiBY5AjKj0j
vwapu4Q2kYvviQgpLgz80Susf42bBcGesApXXaofHFle/ZlSIcx57VXECDc8FlQ=
=THO4
-END PGP SIGNATURE-

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] dangerous tree

2015-03-09 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

  I came across this in one of my example files.  Just thought I'd
report it:

library(ape)
sessionInfo()
## R Under development (unstable) (2015-02-11 r67792)
## Platform: i686-pc-linux-gnu (32-bit)
## Running under: Ubuntu precise (12.04.5 LTS)
## ...
## other attached packages:
## [1] ape_3.2

treeText - (((8:0.102773,3:0.102773):0.0537417,((0:0.0242997
(7:0.0205236,2:0.0205236):0.00377606):0.0523697,4:0.0766694):0.0798455):0.475571
((9:0.0099204,5:0.0099204):0.202541,(1:0.00367543,6:0.00367543):0.208786):0.419624);
tree - collapse.singles(read.tree(file=textConnection(treeText)))
plot(tree)

 *** caught segfault ***
address 0x4d79c0e0, cause 'memory not mapped'

Traceback:
 1: .C(node_height, as.integer(Ntip), as.integer(Nnode),
as.integer(edge[, 1]), as.integer(edge[, 2]), as.integer(Nedge),
as.double(yy))
 2: .nodeHeight(Ntip, Nnode, z$edge, Nedge, yy)
 3: plot.phylo(tree)
 4: plot(tree)

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJU/kKhAAoJEOCV5YRblxUHZ0oIAINCaS7b6BK3LNnb36rGJqsM
8dtdWvXuVAN88h2EF7EnRbXm9dLQ3tyd/Ss7+mpATInohJ5vI7l42QecQYRzFw5P
1hqCGxuVpTwStXxEemlbvOQUf6iEaPYxBMFHkWXszOP2yesRt3k4qsij4YVhHKVi
HTqnR+2YTg0Rjm+gHk4kidU2mcMZl4mPwSWJamGnmve3Qw58mpB2tSzqhnJYS2XH
DEbrV1FEsZPGr7NnS01FrykDjB4gnRx+qiIul7Ym6N0ONc/q9Mt3Fewwi2K4MpyS
D5uaJXfUUKIr8MQq64sBGAjTw96ja4s20aqEl2yAf0tH4nGgMH+xa1n/8f8Cuo0=
=/Ft9
-END PGP SIGNATURE-

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phytools - evaluating significance of pgls.Ives

2015-03-01 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 15-03-01 11:40 PM, Simon Blomberg wrote:
 Am I missing something? The OP only has 8 species in the data set.
 I wouldn't put much store in fancy PCM modelling based on such a
 small data set. And 3 individuals per species is not enough for a
 good estimate of the within-species variance.
 
 Simon.

  Agree wholeheartedly with the first point -- but for the second,
isn't 24 rather than 8 the relevant number for estimating
within-species variance (since presumably we are assuming the same
variance within every species, thus we can effectively pool
within-species variation \across species for this purpose) ?

 
 Simon Blomberg, BSc (Hons), PhD, MAppStat, AStat. Senior Lecturer
 and Consultant Statistician School of Biological Sciences The
 University of Queensland St. Lucia Queensland 4072 Australia
 
 T: +61 7 3365 2506 email: S.Blomberg1_at_uq.edu.au 
 http://www.evolutionarystatistics.org
 
 Policies:
 
 1.  I will NOT analyse your data for you. 2.  Your deadline is
 your problem
 
 Basically, I'm not interested in doing research and I never have 
 been. I'm interested in understanding, which is quite a different 
 thing. - David Blackwell
 
  From: R-sig-phylo 
 [r-sig-phylo-boun...@r-project.org] on behalf of Anthony R Ives 
 [ari...@wisc.edu] Sent: Monday, March 02, 2015 2:14 PM To: Andrea 
 Berardi Cc: r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] 
 phytools - evaluating significance of pgls.Ives
 
 Andrea,
 
 I second Liam’s recommendation to use a LRT.
 
 For measurement error, the latest code I have in matlab is 
 MERegPHYSIGv2.m, which does both measurement error and an OU or 
 Pagel-lambda transform (see Johnson, M. T. J., A. R. Ives, J.
 Ahern, and J. P. Salminen. 2014. Macroevolution of plant defenses
 against herbivores in the evening primroses. New Phytologist
 203:267-279). Measurement-error models are always going to have
 difficulties at parameter boundaries; for example, if the assumed
 measurement error is large, it can exceed the observed variation in
 the data, which of course causes problems (statistical and
 logical).
 
 In MERegPHYSIGv2.m, I did a round or two of simulated annealing 
 first, before polishing the results with a Nelder-Mead optimizer.
 It seems like you could do the same with Liam’s code pretty easily
 by changing the method of optimization (using edit()). Before
 doing this, thought, I would take a careful look at your data and
 your estimates of measurement error. An easy diagnostic is to start
 with 10% of your estimated measurement standard errors and then
 increase slowly to 100%. When I have done this, I’ve been able to
 see problems when parameter values go awry. It is not a fail-safe
 diagnostic in any way, but it can help.
 
 Cheers, Tony
 
 
 
 Anthony Ragnar Ives Department of Zoology UW-Madison Madison, WI 
 53706 608-262-1519
 
 On Mar 1, 2015, at 9:42 PM, Liam J. Revell liam.rev...@umb.edu 
 wrote:
 
 Hi Andrea.
 
 This is not presently implemented, but since this is a
 likelihood method it would be straightforward to constrain to a
 slope of zero and then do a LR test. This would be probably be
 the easiest way to test a hypothesis about the regression.
 
 That being said, as noted in the function documentation, some 
 problems have been reported with the optimization algorithm for 
 this model, which is simple and thus may fail to find the ML 
 solution. Consequently, I would encourage you to look for other 
 implementations of the method so that you can be confident in
 your result. I'm not aware of one in R at this time.
 
 All the best, Liam
 
 Liam J. Revell, Assistant Professor of Biology University of 
 Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ 
 email: liam.rev...@umb.edu blog: http://blog.phytools.org
 
 On 3/1/2015 10:31 PM, Andrea Berardi wrote:
 Hi all,
 
 I'm just learning how to do PGLS analyses, and I'm looking for 
 advice on how to evaluate the significance of the regression
 fit using pgls.Ives in the phytools package. I'm using this
 function because it incorporates sampling error of species
 means, and my data has about 3 individuals per species, with 8
 species. My goal is to test whether a flower trait predicts the
 leaf trait, while controlling for shared ancestry. Here is the
 output from pgls.Ives:
 
 fit - pgls.Ives(Tree, Flower_trait, Leaf_trait) fit
 $beta [1] 96.3963098  0.1292656
 
 $sig2x [1] 22218901073
 
 $sig2y [1] 23027587
 
 $a [1] -10063.150  -1204.422
 
 $logL [1] -158.2337
 
 $convergence [1] 0
 
 $message [1] CONVERGENCE: REL_REDUCTION_OF_F = FACTR*EPSMCH
 
 I am also running pgls on species averages for the traits
 using the gls function in nlme and the corBrownian and
 corMartins functions in ape. But, we are interested in
 incorporating the within-species variation in our small
 dataset.
 
 Any suggestions would be welcome!
 
 Thanks for your help, Andrea
 
 ~~ Andrea Berardi, PhD Postdoctoral 

Re: [R-sig-phylo] fitting cuadratic model using gnls

2013-12-10 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 13-12-10 12:57 PM, Theodore Garland Jr wrote:
 I don't know, but one thing to watch out for is a high correlation 
 between X and Xsquared. That gives multicollinearity and can lead 
 to numerical problems. One thing that is commonly done is: 1. 
 standardize the X variable (subtract mean and divide by standard 
 deviation) 2.  square that value and use it for the quadratic This 
 is referred to as an orthogonol polynomial.
 
 Cheers, Ted
 

  Yes.  Therefore, you would be (much) better off wit

 m3 - gls(duration ~ poly(latitude_used,2), data=data,
correlation = dataCor)

(A quadratic model is still a *linear* fit in the statistical sense.)

poly() will generate orthogonal polynomials by default, which will be
more stable and are more appropriate if you're planning to test the
significance of terms.  If it is important to you to have the
parameters interpretable as slope and curvature at the equator (i.e.
where latitude=0), you may be able to get away with

m3 - gls(duration ~ poly(latitude_used,2,raw=TRUE), data=data,
correlation = dataCor)



 Theodore Garland, Jr., Professor Department of Biology University 
 of California, Riverside Riverside, CA 92521 Office Phone:  (951) 
 827-3524 Wet Lab Phone:  (951) 827-5724 Dry Lab Phone:  (951) 
 827-4026 Home Phone:  (951) 328-0820 Skype:  theodoregarland 
 Facsimile:  (951) 827-4286 = Dept. office (not confidential)
 Email: tgarl...@ucr.edu 
 http://www.biology.ucr.edu/people/faculty/Garland.html 
 http://scholar.google.com/citations?hl=enuser=iSSbrhwJ
 
 Inquiry-based Middle School Lesson Plan: Born to Run: Artificial 
 Selection Lab 
 http://www.indiana.edu/~ensiweb/lessons/BornToRun.html
 
 Fail Lab: Episode One 
 http://testtube.com/faillab/zoochosis-episode-one-evolution 
 http://www.youtube.com/watch?v=c0msBWyTzU0
 
  From: 
 r-sig-phylo-boun...@r-project.org 
 [r-sig-phylo-boun...@r-project.org] on behalf of Jorge Sánchez 
 Gutiérrez [jorgesgutier...@unex.es] Sent: Tuesday, December 10, 
 2013 9:48 AM To: r-sig-phylo@r-project.org Subject: [R-sig-phylo] 
 fitting cuadratic model using gnls
 
 Hello everyone,
 
 
 
 I’m trying to fit nonlinear models using the function “gnls” in 
 ape. For example, I run a simple quadratic regression of moult 
 duration (duration) and latitude (latitude_used) as follows:
 
 
 
 mod - duration ~ latitude_used +I(latitude_used^2)
 
 init-list(alpha=1,beta=1)
 
 m3 - gnls(mod, data=data, start=init, correlation = dataCor)
 
 
 
 However, I get the following error message:
 
 
 
 “Error in gnls(mod, data=data, start = init, correlation =
 dataCor) :
 
 step halving factor reduced below minimum in NLS step”
 
 
 
 I’ve also tried with several correlation structures and different 
 starting values for estimation, but without success. I would be 
 very grateful if you can provide any suggestions.
 
 
 
 Thanks in advance,
 
 Jorge S. Gutiérrez
 
 
 [[alternative HTML version deleted]]
 
 
 ___ R-sig-phylo
 mailing list - R-sig-phylo@r-project.org 
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable 
 archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSp2DoAAoJEOCV5YRblxUHn4gH/jYtxFo6aj2DlHAPw2FXufA4
CwtJmIHpWlnD5mhg5YqBgJNldn2XPxqx/qEf0mMzRgd2gAyIQiXd91WI6sqkaVhF
FmYSPzClqx14jZSlF64KE54l/8W4mp81rDyYnh+/SSebltCrX8Cv7NqYqvNDhYQC
M7i/avh0NNv/PZuDuUwbj95doQmdeUyeSlig0sxHaP14C7FxzvxODUBhcTB6yQzN
0wuK+iIaUKqSGm3jmcpY08oy0RJ34QfyH175UezRmSaE/RI+XCohhBbdtU5+WgRX
lBRYmwtCIGAJ+DNmhlnhE3u6ZFdnzhw95gYd1LGL3Sr+JZl1U+VIKfIjhlrTgLQ=
=lLum
-END PGP SIGNATURE-

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Extracting all interaction estimates from compar.gee output

2013-12-02 Thread Ben Bolker
On 13-12-02 07:00 AM, Nina Hobbhahn wrote:
 Dear fellow list users,
 
 how do I extract estimates for all possible combinations of effects in 
 significant interactions from compar.gee?
 
 To illustrate: I'm interested in flower-head size in males and females of 
 wind- or insect-pollinated species. My model looks like this:
 
 compar.gee(FHsize~poll+sex+poll*sex, phy=phy, data=data, family=gaussian)
 
 The output looks like this:
 
 Coefficients:
   Estimate S.E. t Pr(T  |t|)
 (Intercept)  244.51760 1.256536e+02  1.945966e+00   0.1285677
 pollW 54.77936 5.384660e+01  1.017323e+00   0.3703143
 sexM -29.03269 8.148837e-05 -3.562802e+05   0.000
 pollW:sexM  -134.18408 1.502553e-04 -8.930409e+05   0.000
 
 
 However, to illustrate this significant interaction I need estimates for
 
 pollW:sexM
 pollW:sexF
 pollI:sexM
 pollI:sexF
 
 How do I make compar.gee return these estimates?
 
 Many thanks,
 
 Nina

Try compar.gee(FHsize~poll:sex-1, phy=phy, data=data, family=gaussian)


 
 
 
 
 Nina Hobbhahn, PhD
 Claude Leon post-doctoral fellow
 Lab of Prof. S. D. Johnson
 School of Life Sciences
 University of KwaZulu-Natal
 Private Bag X01
 Scottsville, Pietermaritzburg, 3201
 South Africa
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phylobase::descendants issue

2013-04-27 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

  Thanks for the heads-up.  I or one of the other phylobase developers
will try to take a look ...

  cheers
Ben Bolker


On 13-04-26 06:50 PM, Nick Matzke wrote:
 Hi all,
 
 Not sure if this is a bug, but I noticed weird behavior when
 loading a tree with APE and then converting to phylo4 to do
 descendants() -- not all the tips are produced.
 
 Example:
 
 
 library(ape) library(phylobase)
 
 trstr = 
 P_hawaiiensis_WaikamoiL1:0.9656850499,P_mauiensis_Eke:0.9656850499):0.7086257935,(P_fauriei2:1.230218511,P_hathewayi_1:1.230218511):0.4440923324):0.1767115552,(P_kaduana_PuuKukuiAS:1.851022399,P_mauiensis_PepeAS:1.851022399):0.0008897862802):0.3347375986,P_kaduana_HawaiiLoa:2.185759997):0.302349378,(P_greenwelliae07:1.131363255,P_greenwelliae907:1.131363255):1.35674612):1.689170274,P_mariniana_MauiNui:1.994011054,P_hawaiiensis_Makaopuhi:1.994011054):0.7328279804,P_mariniana_Oahu:2.726839034):0.2574151709,P_mariniana_Kokee2:2.984254205):0.4601084855,P_wawraeDL7428:3.444362691):0.732916959):0.7345185743,(P_grandiflora_Kal2:2.479300491,P_hobdyi_Kuia:2.479300491):2.432497733):0.2873119899,((P_hexandra_K1:2.363984189,P_hexandra_M:2.363984189):0.4630447802,P_hexandra_Oahu:2.826939991):2.372081244);

 
 
 # Load tree tr = read.tree(file=, text=trstr) tr
 
 # Convert to phylo4 tr4 = as(tr, phylo4)
 
 # Find root (node 20) rootnode = nodeId(tr4, type=root) rootnode
 
 # Only 3 tips! (there should be 19) tipnames = descendants(phy=tr4,
 node=rootnode, type=tips) tipnames
 
 
 
 
 The problem is here in descendants():
 
 if (phy@order == preorder) { edge - phy@edge } else { edge -
 reorder(phy, order = preorder)@edge } ancestor -
 as.integer(edge[, 1]) descendant - as.integer(edge[, 2]) isDes -
 .Call(descendants, node, ancestor, descendant)
 
 ...it works if I force phy@order to cladewise or unknown
 
 
 library(ape) library(phylobase)
 
 trstr = 
 P_hawaiiensis_WaikamoiL1:0.9656850499,P_mauiensis_Eke:0.9656850499):0.7086257935,(P_fauriei2:1.230218511,P_hathewayi_1:1.230218511):0.4440923324):0.1767115552,(P_kaduana_PuuKukuiAS:1.851022399,P_mauiensis_PepeAS:1.851022399):0.0008897862802):0.3347375986,P_kaduana_HawaiiLoa:2.185759997):0.302349378,(P_greenwelliae07:1.131363255,P_greenwelliae907:1.131363255):1.35674612):1.689170274,P_mariniana_MauiNui:1.994011054,P_hawaiiensis_Makaopuhi:1.994011054):0.7328279804,P_mariniana_Oahu:2.726839034):0.2574151709,P_mariniana_Kokee2:2.984254205):0.4601084855,P_wawraeDL7428:3.444362691):0.732916959):0.7345185743,(P_grandiflora_Kal2:2.479300491,P_hobdyi_Kuia:2.479300491):2.432497733):0.2873119899,((P_hexandra_K1:2.363984189,P_hexandra_M:2.363984189):0.4630447802,P_hexandra_Oahu:2.826939991):2.372081244);

 
 
 # Load tree tr = read.tree(file=, text=trstr) tr
 
 # Convert to phylo4 tr4 = as(tr, phylo4)
 
 # Looks like by default the tree is called preorder tr4@order
 
 # It works if I do this: tr4@order = cladewise
 
 # Find root (node 20) rootnode = nodeId(tr4, type=root) rootnode
 
 # Now this works tipnames = descendants(phy=tr4, node=rootnode,
 type=tips) tipnames
 
 
 
 
 # This also works:
 
 # Load tree tr = read.tree(file=, text=trstr) tr
 
 # Convert to phylo4 tr4 = as(tr, phylo4)
 
 # Looks like by default the tree is called preorder tr4@order
 
 # It works if I do this: tr4@order = unknown
 
 # Find root (node 20) rootnode = nodeId(tr4, type=root) rootnode
 
 # Now this works tipnames = descendants(phy=tr4, node=rootnode,
 type=tips) tipnames
 
 
 ...so at least there's a workaround...
 
 Cheers, Nick
 
 
 
ph
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQEcBAEBAgAGBQJRfD6oAAoJEOCV5YRblxUHwC0IAKFRYnR1RoYqxVf2GFQxc03R
HQAo5/N01tWBsTKZ7c4KdIk9HIhANu3F9v0trtwutUNXs/V7v3oa8QZFI9CC4Plk
2HIAox/rgwrX7yhlmKPf21KTMiiM5zHxtlCF63tkGCnwi/v/xf8IbFyqTd9ZH9NG
0xoeQTrhOLp8Lu0ARVWc1Ie01ujV+RXgJvv9RTJEhDSqeRKsYb46qhh4eFHjd4bE
qB/9P40Z4RUhV2gMXRkaPDWGWzXUhLrl1qvdPgSJ/5yQgns+g0I007452SALv3yV
mQ13Nkd/9Ze0mGUC2KjxlBiVmyDWWQQvKbJlA9eTHxZv3DPZgyg5YwiBDgf3Tsk=
=L06Y
-END PGP SIGNATURE-

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] read.nexus.data parser

2013-04-07 Thread Ben Bolker
On 13-04-05 01:29 PM, Jessica Sabo wrote:
 Hi All,
 
 I am wondering if there is anyway to increase the speed of the 
 read.nexus.data parser. Or if there is an alternative that is a
 faster nexus file data parser.
 
 THanks, Jess
 

  I don't know if it's faster or not, but there is ?readNexus in the
'ape' package.  Also see library(sos); findFn(read {nexus format})

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Question about model selection using compar.gee

2013-03-05 Thread Ben Bolker
 [taking the liberty of cc'ing this to r-sig-phylo, which is a better
place for such questions because (1) other people can answer (2) the
answers are publicly viewable and get archived]

On 13-03-05 10:44 AM, Javier Lenzi wrote:
 Dear Dr. Bolker,
 
 My name is Javier Lenzi and I am trying to perform a comparative analysis
 with seabirds (gulls). I am trying to implement a model selection approach
 using compar.gee in ape package and I have some problems. I was wondering
 if I could ask for your advice to deal with them. In case you could help me
 there some questions below that I would like to ask. I am sorry for
 bothering you, my skills in statistics are basic and I have to be fully
 autodidact in matters related to the comparative method.
 
 Thank you very much in advance,
 Best regards,
 Javier Lenzi
 
 
 I would like to ask if it is possible to perform a model selection approach
 using compar.gee, because the articles I have read that use compar.gee do
 not use this approach when they have multiple explanatory variables.
 
 Then, I have a set of candidate models for each life history trait and when
 I run compar.gee I get the QIC and the parameter estimates, but I can not
 find an index of goodness of fit for the model and the P-value. Is there
 any available? How should I proceed?
 
 I would also like to ask if there is any function similar to step() in
 order to perform model selection using GEEs?
 
 Other question I would like to ask you is how can I get the amount of
 variance that is explained by the phylogeny? I think it would be nice to
 analyze and present it as a result.

  Unfortunately none of these are simple questions.  Generalized
estimating equations are not really designed for model selection and
comparison approaches, as they don't operate in terms of likelihoods
(which are the standard currency of such selection and comparison).
Thus they don't have model-level p-values, just t-statistics and
p-values for the individual parameters.  If you are comfortable with QIC
you can use them in a standard information-theoretic multi-model
inference approach (i.e. decide to take the model with the best QIC, or
compute QIC weights and get parameter weights or averaged predictions).

  I strongly recommend against stepwise model selection: google
stepwise regression problems.  You can do it by hand if you need to ...

   Let's see if anyone else on the list has suggestions.

  good luck,
   Ben Bolker

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] problem installing phylobase package

2013-02-07 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 13-02-07 08:42 AM, Marc Octavo wrote:
 Hi François,
 
 I have upgraded R from 2.15.0 to 2.15.2 and the problem is solved.
 Thanks

  Good to know.  My best guess is that there is an Rcpp version
mismatch; the current version of Rcpp depends on 2.15.1.


 
 On Mon, Feb 4, 2013 at 8:13 PM, François Michonneau  
 francois.michonn...@gmail.com wrote:
 
 
 Hi Marc,
 
 Can you provide me with some more information about your system
 (output of R.version, the versions of Rcpp, and of phylobase you
 are trying to install)?
 
 Thanks, -- François
 
 
 On Mon, Feb 4, 2013 at 6:56 AM, Marc Octavo
 marc.oct...@gmail.com wrote:
 
 Hi all,
 
 I'm trying to install the phylobase package without succes.
 I've got an error message:
 
 nxspublicblocks.cpp:686:6: error: ‘Rcerr’ is not a member of
 ‘Rcpp’
 
 These are all messages :
 
 * installing *source* package ‘phylobase’ ... ** package
 ‘phylobase’ successfully unpacked and MD5 sums checked ** libs 
 g++ -I/usr/local/lib64/R/include -DNDEBUG -I.
 -DHAVE_INTTYPES_H -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2  -c
 GetNCL.cpp -o GetNCL.o gcc -std=gnu99
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c ancestors.c -o ancestors.o gcc -std=gnu99
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c descendants.c -o descendants.o g++
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c nxsassumptionsblock.cpp -o nxsassumptionsblock.o 
 nxsassumptionsblock.cpp: In member function ‘void 
 NxsAssumptionsBlock::HandleTypeSet(NxsToken)’: 
 nxsassumptionsblock.cpp:1344:107: warning: converting ‘false’
 to pointer type ‘bool*’ [-Wconversion-null] 
 nxsassumptionsblock.cpp: In member function ‘void 
 NxsAssumptionsBlock::HandleCodeSet(NxsToken)’: 
 nxsassumptionsblock.cpp:1654:121: warning: converting ‘false’
 to pointer type ‘bool*’ [-Wconversion-null] 
 nxsassumptionsblock.cpp: In member function ‘void 
 NxsAssumptionsBlock::HandleCodonPosSet(NxsToken)’: 
 nxsassumptionsblock.cpp:1688:117: warning: converting ‘false’
 to pointer type ‘bool*’ [-Wconversion-null] 
 nxsassumptionsblock.cpp: In member function ‘void 
 NxsAssumptionsBlock::HandleCharPartition(NxsToken)’: 
 nxsassumptionsblock.cpp:1821:113: warning: converting ‘false’
 to pointer type ‘bool*’ [-Wconversion-null] 
 nxsassumptionsblock.cpp: In member function ‘void 
 NxsAssumptionsBlock::HandleCharSet(NxsToken)’: 
 nxsassumptionsblock.cpp:1846:107: warning: converting ‘false’
 to pointer type ‘bool*’ [-Wconversion-null] 
 nxsassumptionsblock.cpp: In member function ‘void 
 NxsAssumptionsBlock::HandleExSet(NxsToken)’: 
 nxsassumptionsblock.cpp:1893:105: warning: converting ‘false’
 to pointer type ‘bool*’ [-Wconversion-null] g++
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c nxsblock.cpp -o nxsblock.o g++ -I/usr/local/lib64/R/include
 -DNDEBUG -I. -DHAVE_INTTYPES_H -DASSERTS_TO_EXCEPTIONS
 -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c nxscharactersblock.cpp -o nxscharactersblock.o g++
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c nxscxxdiscretematrix.cpp -o nxscxxdiscretematrix.o g++
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c nxsdatablock.cpp -o nxsdatablock.o g++
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c nxsdistancedatum.cpp -o nxsdistancedatum.o g++
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c nxsdistancesblock.cpp -o nxsdistancesblock.o g++
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c nxsemptyblock.cpp -o nxsemptyblock.o g++
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 -I/usr/local/lib64/R/library/Rcpp/include   -fpic  -g -O2
 -c nxsexception.cpp -o nxsexception.o g++
 -I/usr/local/lib64/R/include -DNDEBUG -I. -DHAVE_INTTYPES_H 
 -DASSERTS_TO_EXCEPTIONS -I/usr/local/include 
 

Re: [R-sig-phylo] LL ratio test

2012-06-17 Thread Ben Bolker
On 12-06-17 07:35 AM, Joe Felsenstein wrote:
 
 Ben Bolker wrote:
 
  I'd like to chime in here ... again ready to be corrected by others.
 The description above doesn't match my understanding of AIC at all
 (although again I will be perfectly happy, and in fact really
 interested, if you can point me to a reference that lays out this
 justification for the AIC).  AIC is an estimate of the expected
 (Kullback-Leibler) distance between a hypothetical 'true model' and
 any specified model. 
 
 Thanks, that is very clear.  It is the correct description.  But that
 leaves me unsatisfied.  Consider, for example, the case where
 we have two models, M0 and M1.   Suppose that they are in
 fact nested, with degrees of freedom  d0  and  d1.
 
 Now we can apply Fisher's  Likelihood Ratio Test of Fisher as well
 as the AIC.  The LRT tells us that the expectation 

  ... under the null hypothesis where M0 (the simpler model) is true?

 of   
 2 log(L1) - 2 log(L0)   is   d1 - d0   (because it is distributed as a
 Chi-Square with that number of degrees of freedom).
 
 But the AIC tells us that the expectation is  2(d1 - d0).

  Maybe I'm missing something, but I don't see how the AIC tells us
something about the expectation of 2 log(L1) - 2 log(L0) ?  It gives us
the expectation of the Kullback-Leibler distance, which is something
like sum(p(i) log(p(i)/q(i)) where p(i) is the true distribution of
outcomes and q(i) is the predicted distribution of outcomes ... so it's
something more like a marginal log-likelihood difference rather than a
maximum log-likelihood difference ...
 
 This is, I gather, not a conflict because the assumption of the
 AIC is that neither M0 nor M1 is correct, but instead a model
 M' which has an infinite number of degrees of freedom is
 correct.   So both assertions are (formally) correct. 
 
 But what if  M0  was actually correct?  Are we supposed
 to use AIC?   

  I would say that if you want to find the _correct_ model, and you
think that the correct model might be in the set of your candidate
models, you ought to be using BIC/SIC. Along with Burnham and Anderson,
I think that in ecological and evolutionary analysis it is very unlikely
that the true model is *ever* in your set of candidate models (note that
I do disagree with them on a lot of things!) ...
 
 I also understand that AIC does not give us a distribution
 of the test statistic, LRT does.   For example, in the case of
 phylogenies that all have the same number of degrees of
 freedom, all AIC does is tell us to prefer the one with
 highest likelihood.

  Yes.  People have come up with various (often misused) rules of thumb
about how big an AIC difference is large (please don't say
significant), but it really boils down to understanding the
log-likelihood scale in a basic (non-probabilistic) way as promoted e.g.
by some of the pure likelihoodists (Edwards, Royall) -- a 2-point
difference on the log-likelihood scale corresponds to an eightfold
difference in log-likelihood, so would be reasonable choice for a cutoff
between small and large likelihood differences.  People often
sometimes reason that adding a single, useless parameter adds +2 to the
AIC, so a difference of 2 is equivalent to less than 1 effective
parameters'-worth of difference (hence small).
  (I think lots of people here, including you, already know this ...)

  If you want distributions of test statistics, I claim it makes the
most sense to work in a likelihood-ratio-test-based framework.  (I can
imagine that it would be possible to derive asymptotic distributions for
AICs, but I've never seen it done ...)

 Anyway, thanks for the clarification, which makes clear
 that the rationale of using the LRT to justify the AIC
 is incorrect.
 
 Joe
 
 Joe Felsenstein j...@gs.washington.edu mailto:j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA
 
 


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] 2 root edges in phylocom tree, but suggested fix in ape FAQ doesn't help

2012-01-12 Thread Ben Bolker
On 12-01-12 01:54 PM, Megan Bartlett wrote:
 Hi,
 
 I'm trying to use R package ape to manipulate the phylocom megatree,
 version R20091110. However, whenever I try to read in the tree with:
 
 read.tree(R20091110.new.txt)
 
 I get the error message:
 
 Error in if (sum(obj[[i]]$edge[, 1] == ROOT) == 1  dim(obj[[i]]$edge)[1]
  :
   missing value where TRUE/FALSE needed
 
 I know that the suggested fix is to delete ( and )euphyllophyte from
 the beginning and end of the file, but this doesn't solve the problem.
 Trying to read the new tree gives me exactly the same error message. What
 should I be doing instead?
 
 Thanks so much for your help!
 
 Best,
 
 Megan Bartlett
 
   [[alternative HTML version deleted]]
 

  I started to respond to this on r-h...@r-project.org ... it's best not
to cross-post if you can help it, or at least to say that you're
crossposting (and why).

  Based on this small experiment:

library(ape)
r -
read.tree(url(http://svn.phylodiversity.net/tot/megatrees/R20091110.new;))

r - read.tree(text=(((abc)def,(ghi)jkl),(mno)pqr);)
r - read.tree(text=((def,jkl),pqr);)

  it looks like ape's read.tree() doesn't like the internal-node
labeling convention that phylocom is using.  Someone else will probably
respond more helpfully ...

  Ben Bolker

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] ape processing time

2011-06-13 Thread Ben Bolker
On 11-06-13 12:02 PM, alfredo tello wrote:
 Hi Folks,
 
 Can anyone tell me how to roughly calculate the processing time required to
 compute a phylogenetic matrix (K80) with ape given a certain hardware? I
 just want to have a rough idea before pressing Go
 

  Try running a few tiny- to small-sized subsets to figure out roughly
how the times scale with tree size.

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] NEVERMIND -read.tree and read.nexus error

2011-02-20 Thread Ben Bolker
  On the other hand, if anyone has any ideas about a paranoid and
verbose Newick file-checker that gives verbose output about what
precisely might be wrong/non-conforming in a Newick file (lint for
Newick for the old-fashioned programmers) to help track this sort of
thing down, that would be probably be very helpful to many on the list ...


On 11-02-20 06:59 PM, James Meadow wrote:
 List administrator - Do not post this question.  I picked through the nwk
 file again and found a single misplaced space!  it is now fixed.
 
 Sorry about that,
 James
 
 On Sun, Feb 20, 2011 at 4:44 PM, James Meadow jfmea...@gmail.com wrote:
 
 Hello,

 I have seen this error reported in the help list, but the suggestions do
 not fix my problem. I am using a very recent version of ape, and I have also
 tried the patch given here:
 https://svn.mpl.ird.fr/ape/dev/ape/R/read.nexus.R

 I am trying to read.tree a large (300 taxa) nwk tree, and I continue to
 get this error:

 pltr-read.tree(plant_all_1.nwk)
   # There is apparently two root edges in your file: cannot read tree file.
   # Reading Newick file aborted at tree no.1

 I have tried it with .nex, and .nwk and both give the same error.  I can
 open this file in any other tree viewer (like FigTree and Mega...) with no
 problem, and the root is clearly separated.  I have looked through the .nwk
 file but cannot find where there might be a problem.  I have also tried
 removing the first and last ( ) and the tree will read, but then plotting
 the tree gives this error:

   # Error in plot.phylo(pltree) :
   # there are single (non-splitting) nodes in your tree; you may need to
 use collapse.singles()
   # Calls: plot - plot.phylo

 and collapse.singles() doesn't fix it.  I have tried to simulate smaller
 trees to get the error, but I can't seem to reproduce the situation.  Does
 ape have some sort of upper limit on tree size?

 The nwk file is attached.

 I apologise in advance if I am missing something simple.

 As a sidenote, I have been using ape and other r-phylo packages for only a
 short time, and I have all but abandoned most other phylo software in the
 process - very nice job developing these packages and to EP for a terrific
 book!  My dissertation is indebted to you.

 Thank you,
 James




 --
 James Meadow
 Land Resources and Environmental Sciences
 Montana State University
 (406) 370-7157
 jfmea...@gmail.com

 
 


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] GLM, GLMM, CAIC, mcmcGLMM - AIC model selection for phylogentic data

2011-01-17 Thread Ben Bolker
On 11-01-13 11:16 AM, Chris Mcowen wrote:
 The point of me doing this was i am new to the area so i wanted to
 make sure when i have my PhD viva / submit a paper i could defend my
 method - as you say
 some might be better than others.
 
 I am currently writing up and am unsure if i should make a chapter /
 paper out of comparing the results of the different methods or not.
 
 So, if i interpret what you are saying correctly, the reason that all
 the methods i tried (phylogenetic and not ) gave the same answer is
 data specific?  if this is the case, and given all the research in
 this area i imagine it is, i wonder why
 
 You can have a strong phylogenetic signal; that still (as you have
 demonstrated) doesn't mean that it will overturn the conclusions
 based on a non-phylogenetic analysis.
 
 The conclusion i would like to draw from this is why? Is there a
 pattern to this, could randomizations show why this occurs? Or is it
 something simple that i am making too complicated? I just don't
 understand how having a strong phylogenetic signal and using
 phylogenetic and no phylogenetic methods can give the same answer?
 And under what circumstances does this not hold true?
 
 Chris

  Here's a simple example. Suppose that you have species within genera
and genera within species, and that each level we have a polytomy --
that is, the genera all radiated in a single event, and then within each
genus the species radiated in a single event.  At each level (species
within genera and genera within families) there is a strong positive
correlation between trait X and trait Y.  If you plot the data in the
X-Y plane you'll see a series of clusters of points (corresponding to
different genera), but all the clusters will also be aligned along the
same linear trend. Because we have polytomies/stars at each level, the
results of taxonomy-based and phylogeny-based methods will be the same.
 Because the within- and between-genus patterns are the same,
phylogenetic and non-phylogenetic methods will give about the same
answers (although may differ in their effective number of degrees of
freedom/strength of inference).

  If instead the within-genus relationships were opposite (e.g. negative
correlations) then the phylogenetic and non-phylogenetic methods would
give different answers.

  Comparing different statistical approaches on a single data set is a
bit like making statistical inferences from a single data point.  In
order to compare methodologies you have to understand them at a
reasonably deep level.  If possible, use some mathematical analysis to
understand when the results will differ; otherwise, use your
understanding of the differences to give verbal or heuristic arguments
of when they will differ.  Simulations guided by this analysis will then
show the magnitude of the difference across a range of situations.

  cheers
Ben Bolker

 
 
 
 
 The fact that *for this analysis* all the methods used give the same 
 answer doesn't invalidate the general point that the different
 methods do different things and that some might be better than
 others.
 
 I guess I'm curious what conclusions you're drawing from this
 exercise:
 
 * am I doing something wrong, or missing something obvious? - I
 don't think so. * why does everyone make such a fuss about the
 differences? - Because they sometimes (just not in this case) have a
 big effect on the conclusions
 
 
 On 13 Jan 2011, at 16:04, Ben Bolker wrote:
 
 On 11-01-13 10:13 AM, Chris Mcowen wrote:
 Hi Ben,
 
 Thanks for the reply:
 
 The Pagel test showed that there is a strong phylogentic signal in
 my data.
 
 I don't believe i used
 reasonably minor differences in the way that you account for 
 correlation
 
 as i used a method without any account for the correlation  i.e
 GLM, which selected the same model set  and the same best model
 as GLMM the other phylogentic methods.
 
 I guess my central point is i spent a while researching methods to 
 deal with phylogenetic structure, and there are various schools of 
 thought of the best method. Some say using random effects in GLMM 
 models is not capable of dealing with phylogentic structure where
 as others suggest the CAIC method is the best as it actually uses
 the tree. So from these points of view there is actually a
 considerable difference in the method used?
 
 If you look at the model set selected based on AIC differences
 by each method, they are the same, therefore is any method really
 better for this type of investigation than another?
 
 OK, relatively minor was an overstatement (sorry I overlooked the 
 fact that you used a GLM as well).  You can have a strong
 phylogenetic signal; that still (as you have demonstrated) doesn't
 mean that it will overturn the conclusions based on a
 non-phylogenetic analysis.
 
 The fact that *for this analysis* all the methods used give the same 
 answer doesn't invalidate the general point that the different
 methods do different things and that some might be better

Re: [R-sig-phylo] GLM, GLMM, CAIC, mcmcGLMM - AIC model selection for phylogentic data

2011-01-13 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11-01-13 04:38 AM, Chris Mcowen wrote:
 
 Dear list,
 
 I am modelling the effect of various life history traits of species
 against their extinction rating.
 
 My data has a phylogentic signal (see table below) so to be
 statistically correct i worked with phylogentic independent contrasts
 
 
 From Purvis et al., 2000 phylogenetic analyses were necessary
 because of the pseudoreplication and, hence, elevated type I error
 rates that result from treating species as independent points when
 relevant variables show a phylogenetic pattern
 
 Variable  λ ( Pagel) IUCN extinction risk 
 0.47 Breeding system
 0.99 Endosperm0.96 Floral symmetry
 0.93 Fruit  0.97 Pollen
 dispersal 0.99 Seasonality
 0.52 Storage organ  0.85 Woodyness
 0.61
 
 There are various ways of doing this one method is to use the package
 CAIC (generated using compara- tive analysis by independent
 contrasts)  which utilizes the phylogeny to generate the independent
 contrasts. However as i was using it i found this: from Sodhi et al.,
 2008
 
 It was necessary to decompose the variance across species by coding
 the random-effects error structure of the GLMM as a hierarchical
 taxonomic (class/order/ family) effect (Blackburn  Duncan, 2001).
 We had insufficient replication within genera to include the genus
 in the nested random effect. Our method is more appropriate than
 the independent-contrasts approach (Purvis et al., 2000) in
 situations where a complete phylogeny of the study taxon is
 unavailable, when categorical variables are included in the
 analysis, and when model selection, rather than hypothesis testing,
 is the statistical paradigm being used.
 
 So i gave this a go - using GLMM and setting the order / family as
 random effects.
 
 I then came across mcmcGLMM which allows the use of a phylogentic
 tree to deal with the phylogenetic structure of the data set.
 
 So i gave this a go as well!
 
 Finally to be consistent i ran the model with no phylogenetic control
 - using GLM.
 
 My criteria for model selection was that of Burnham and Anderson -
 using AIC and AICML etc to select the most likely set of models.
 
 Interestingly i found that using all four methods the same model
 (based on AIC difference) was selected as the most likely?
 Furthermore, the pattern of AIC differences across the models
 reflected each other.

  I think your comparison is really worthwhile, but I don't see why this
is surprising at all -- perhaps I'm missing something.  Given that you
have reasonably strong effects of your predictors, reasonably minor
differences in the way that you account for correlation won't change the
qualitative conclusions.

  Ben Bolker



 
 I was wondering why this maybe? Does anybody know of similar studies
 comparing model selection from GLM and GLMM?
 
 I apologise if this is not the correct forum to post this question.
 
 I have attached links to two graphs that explain my question.
 
 Thanks Chris
 
 LINKS
 
 https://files.me.com/chrismcowen/hmptnq
 
 files.me.com/chrismcowen/t3xvk5
 
 ___ R-sig-phylo mailing
 list R-sig-phylo@r-project.org 
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEUEARECAAYFAk0vE44ACgkQc5UpGjwzenPPRwCYwge2SpvCiNaRQn7Q3NBW5NQM
JwCfcdqvYMkrPyh7IOY8mRDJNil9Rzc=
=oVYU
-END PGP SIGNATURE-

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] GLM, GLMM, CAIC, mcmcGLMM - AIC model selection for phylogentic data

2011-01-13 Thread Ben Bolker
On 11-01-13 10:13 AM, Chris Mcowen wrote:
 Hi Ben,
 
 Thanks for the reply:
 
 The Pagel test showed that there is a strong phylogentic signal in my
 data.
 
 I don't believe i used
 reasonably minor differences in the way that you account for
 correlation
 
 as i used a method without any account for the correlation  i.e GLM,
 which selected the same model set  and the same best model as GLMM
 the other phylogentic methods.
 
 I guess my central point is i spent a while researching methods to
 deal with phylogenetic structure, and there are various schools of
 thought of the best method. Some say using random effects in GLMM
 models is not capable of dealing with phylogentic structure where as
 others suggest the CAIC method is the best as it actually uses the
 tree. So from these points of view there is actually a considerable
 difference in the method used?
 
 If you look at the model set selected based on AIC differences by
 each method, they are the same, therefore is any method really better
 for this type of investigation than another?

 OK, relatively minor was an overstatement (sorry I overlooked the
fact that you used a GLM as well).  You can have a strong phylogenetic
signal; that still (as you have demonstrated) doesn't mean that it will
overturn the conclusions based on a non-phylogenetic analysis.

  The fact that *for this analysis* all the methods used give the same
answer doesn't invalidate the general point that the different methods
do different things and that some might be better than others.

  I guess I'm curious what conclusions you're drawing from this exercise:

 * am I doing something wrong, or missing something obvious?
 - I don't think so.
 * why does everyone make such a fuss about the differences?
 - Because they sometimes (just not in this case) have a big effect
on the conclusions

Ben Bolker

 
 Chris
 
 I think your comparison is really worthwhile, but I don't see why
 this is surprising at all -- perhaps I'm missing something.  Given
 that you have reasonably strong effects of your predictors,
 reasonably minor differences in the way that you account for
 correlation won't change the qualitative conclusion
 
 On 13 Jan 2011, at 15:00, Ben Bolker wrote:
 
 On 11-01-13 04:38 AM, Chris Mcowen wrote:
 
 Dear list,
 
 I am modelling the effect of various life history traits of
 species against their extinction rating.
 
 My data has a phylogentic signal (see table below) so to be 
 statistically correct i worked with phylogentic independent
 contrasts
 
 
 From Purvis et al., 2000 phylogenetic analyses were necessary 
 because of the pseudoreplication and, hence, elevated type I error 
 rates that result from treating species as independent points when 
 relevant variables show a phylogenetic pattern
 
 Variable » ( Pagel) IUCN extinction risk 
 0.47 Breeding system 
 0.99 Endosperm   0.96 Floral symmetry
 0.93 Fruit  0.97
 Pollen dispersal 0.99 Seasonality
 0.52 Storage organ  0.85
 Woodyness 0.61
 
 There are various ways of doing this one method is to use the
 package CAIC (generated using compara- tive analysis by
 independent contrasts)  which utilizes the phylogeny to generate
 the independent contrasts. However as i was using it i found this:
 from Sodhi et al., 2008
 
 It was necessary to decompose the variance across species by
 coding the random-effects error structure of the GLMM as a
 hierarchical taxonomic (class/order/ family) effect (Blackburn 
 Duncan, 2001). We had insufficient replication within genera to
 include the genus in the nested random effect. Our method is more
 appropriate than the independent-contrasts approach (Purvis et
 al., 2000) in situations where a complete phylogeny of the study
 taxon is unavailable, when categorical variables are included in
 the analysis, and when model selection, rather than hypothesis
 testing, is the statistical paradigm being used.
 
 So i gave this a go - using GLMM and setting the order / family as 
 random effects.
 
 I then came across mcmcGLMM which allows the use of a phylogentic 
 tree to deal with the phylogenetic structure of the data set.
 
 So i gave this a go as well!
 
 Finally to be consistent i ran the model with no phylogenetic
 control - using GLM.
 
 My criteria for model selection was that of Burnham and Anderson - 
 using AIC and AICML etc to select the most likely set of models.
 
 Interestingly i found that using all four methods the same model 
 (based on AIC difference) was selected as the most likely? 
 Furthermore, the pattern of AIC differences across the models 
 reflected each other.
 
 I think your comparison is really worthwhile, but I don't see why
 this is surprising at all -- perhaps I'm missing something.  Given
 that you have reasonably strong effects of your predictors

Re: [R-sig-phylo] Using OU model in PGLS

2011-01-05 Thread Ben Bolker
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11-01-05 12:50 PM, Alanna Maltby wrote:
 Hi all
 
 I'm trying to run a PGLS using the function 'gls' in 'nlme', which works
 when I use corBrownian as the correlation structure, but not when I use
 corMartins. Typing:
 
 oumodel-corMartins(1, tree)
 pglsModel1-gls(BM~BW,correlation=oumodel,data=data2)
 
 Results in this:
 
 Error in recalc.corStruct(object[[i]], conLin) : 
   NA/NaN/Inf in foreign function call (arg 4)
 
 I've seen that other people have had this problem, but no solution to it
 - is it a bug?


  I see this
http://www.mail-archive.com/r-sig-phylo@r-project.org/msg00577.html

  Can you give us a reproducible example?

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0kvS8ACgkQc5UpGjwzenPR8wCdF7BsvP3TfVReNRUW/+O8xuqf
UDEAnjZwZQfQsosSHOMV2Ly5+Y3odcjm
=4KoS
-END PGP SIGNATURE-

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo