Re: not exactlyRe: [MORPHMET] PREPRINT: between-group principal components analysis (bgPCA)

2019-05-15 Thread Polly, P. David
Wow.  Andrea invited me to collaborate with him, Paul O'Higgins, and Jim Rohlf 
on the BG-PCA problem back in July, 2018.  Andrea then graciously invited Fred 
a couple weeks later.  Andrea drafted a manuscript by early September, a long 
time before GMAustria19.  Through November he politely hounded me and Fred 
because we were not keeping up with discussion and comments.  Feeling more a 
hinderance than a help, I withdrew so they could get on with it.  It never 
occurred to me to quickly publish a paper of my own on the same subject
David


P. David Polly
Robert R. Shrock Professor
Earth and Atmospheric Sciences
(with affiliated appointments in Biology and Anthropology)
Indiana University
pdpo...@indiana.edu
https://pollylab.indiana.edu

On sabbatical leave 2018-19
Institute for Biospheric Studies
Yale University





On 15 May 2019, at 2:33 AM, andrea cardini 
mailto:alcard...@gmail.com>> wrote:

I have to correct Fred on this:
we accelerated our writing. My paper was the first to be finished, probably 
because it is a single-authored item by an emeritus with no other obligations,

No, WE did not accelerate the writing. We started a cooperation, after my small 
finding, and we were supposed to work all together on this. At some stage, we 
heard no more from Fred and I suggested to have two companion papers, but NEVER 
got an answer from Fred.
Months later, Fred let us know he was presenting and discussing results 
(without ever asking me if I was OK with this). Finally, HE decided to go on on 
his own, submit and announce in this list (again letting me know after he was 
done). This is an accurate reconstruction of the events. The other one is not 
and Fred was not unaware that I wasn't OK: before the preprint he just 
announced, he (again without ever asking) had already done an informal 
presubmission to a journal and the journal has my written complaint about it.

I let the morphometric community judge if this is the appropriate behaviour. 
Certainly it is not what I teach students, but possibly it is what a famous 
retired emeritus and one of the leader of a scientific community can do.

All the best

Andrea

PS
On a technical side, as I never thought that CVA was the source of all evil and 
BG-PCA a simple solution, here too I agree that the method has some problems 
but I am more than confident that it can still be WISELY applied in many cases. 
That small N (especially when one works with small differences) and large p 
(numbers of variables) are not desirable in very many types of analyses is 
written in all introductory textbook on multivariate stats (at least those 
written in simple non-mathematical language for non-numerically skilled people 
like me).
In relation to this, there's a point I raised many times for years in this list 
and in some of my papers: one uses the specific landmarks required for her/his 
specific aim (I am in debt to Paul O'Higgins for teaching me this). 
Semilandmarks are a great tool but should be used when really needed and 
bearing in mind that almost inevitably p will become big and that might create 
problems. There are different views on this, including that having many points 
makes beautiful pictures: I agree but probably most of the time that is not the 
aim of a biologist. However, there might be cases when even with small N 
semilandmarks might be a huge step forward and possibly the best example I know 
it's the virtual reconstruction of fossils (further analysis of those data may 
then be harder, because of very big p and small N).
I definitely share the frustration of many taxonomists and palaeontologists who 
have often very precious material and very small samples and want to get the 
most out of them. Regardless of p/N problems, estimates of means will be then 
inevitably inaccurate (and sometimes even biased, as the sample could be few 
and maybe related individuals of a rare species). Sometimes those means could 
be OKish (macroevolutionary analyses with very large differences?); most of the 
time they will be as accurate as trying to estimate the average body height of 
Italian men using a sample of 10 men from the same small region of Italy. 
Again, not my discovery: it's all in the introductory stats textbook, but I 
myself too often forget about it.



--

Dr. Andrea Cardini
Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università di Modena 
e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy
tel. 0039 059 2058472

Adjunct Associate Professor, Centre for Forensic Anthropology, The University 
of Western Australia, 35 Stirling Highway, Crawley WA 6009, Australia

E-mail address: alcard...@gmail.com, 
andrea.card...@unimore.it
WEBPAGE: https://sites.google.com/site/alcardini/home/main

FREE Yellow BOOK on Geometric Morphometrics: 
https://tinyurl.com/2013-Yellow-Book

ESTIMATE YOUR GLOBAL FOOTPRINT: 

Re: [MORPHMET] curiosity about ancestral shape reconstruction

2018-08-02 Thread Polly, P. David
My reply to Andrea the other day bounced from the list and Andrea only copied 
some of it in his reply so I'm reposting the whole thing in case anyone is 
interested.  (thanks Dennis Slice for helping me with the bounce).

Hi Andrea,

Using a Brownian motion model, ancestor reconstructions are essentially means 
of the tip taxa weighted by the branch lengths connecting them to the nodes.  
Branch length is arguably more important for the outcome than tree topology.  
In your example, B and C are likely to have much stronger influence on the 
reconstruction than A because of this.

Consider a scenario in which A, B, and C form a trichotomy and you are trying 
to reconstruct their ancestral trait value.  If they are all extant species 
they will contribute equally because their branch lengths will be the same, the 
ancestral estimate would therefore be the simple average of the three.  But if 
B and C are extinct, their branch lengths will be shorter and they will 
contribute proportionally more to the reconstruction.  If B and C lie very 
close in time to the node, A will have very little impact on the reconstruction.

Consider another scenario with relationships ((A,B), C) and you are 
reconstructing the ancestor of (A,B). If all three taxa are extant, A and B 
will have the strongest influence because the total branch length between C and 
the node of interest is longer.  However, if C lived only a short time after 
the base node and if the node (A,B) is deep (closer to the base of the tree 
than to its tips) then C will have a stronger influence on the ancestral 
reconstruction of node (A,B) than do the tip taxa A & B.

I think part of your question involves the morphology of B and C being poorly 
estimated (as they might be if you are using mean values of quantitative traits 
for the tips of A, B, and C).  These estimates are also important.  If B & C 
are badly estimated, they will bias the ancestor reconstruction.  If they are 
connected to the nodes by long branches relative to well estimated taxa, the 
effect will be minimal, but if they are connected by short branches their bias 
will overwhelm the contribution of better estimated but more distant taxa.

With best wishes,
David



P. David Polly
Robert R. Shrock Professor
Department of Earth and Atmospheric Sciences
Adjunct Professor, Biology and Anthropology
Indiana University
1001 E. 10th Street
Bloomington, IN  47405-1405
pdpo...@indiana.edu<mailto:pdpo...@indiana.edu>
+1 (812) 855-7994
http://pages.iu.edu/~pdpolly/

P. David Polly
Robert R. Shrock Professor
Department of Earth and Atmospheric Sciences
Adjunct Professor, Biology and Anthropology
Indiana University
1001 E. 10th Street
Bloomington, IN  47405-1405
pdpo...@indiana.edu<mailto:pdpo...@indiana.edu>
+1 (812) 855-7994
http://pages.iu.edu/~pdpolly/








On 30 Jul 2018, at 10:57 AM, Polly, P. David 
mailto:pdpo...@indiana.edu>> wrote:

Hi Andrea,

Using a Brownian motion model, ancestor reconstructions are essentially means 
of the tip taxa weighted by the branch lengths connecting them to the nodes.  
Branch length is arguably more important for the outcome than tree topology.  
In your example, B and C are likely to have much stronger influence on the 
reconstruction than A because of this.

Consider a scenario in which A, B, and C form a trichotomy and you are trying 
to reconstruct their ancestral trait value.  If they are all extant species 
they will contribute equally because their branch lengths will be the same, the 
ancestral estimate would therefore be the simple average of the three.  But if 
B and C are extinct, their branch lengths will be shorter and they will 
contribute proportionally more to the reconstruction.  If B and C lie very 
close in time to the node, A will have very little impact on the reconstruction.

Consider another scenario with relationships (A,B), C) and you are 
reconstructing the ancestor of (A,B). If all three taxa are extant, A and B 
will have the strongest influence because the total branch length between C and 
the node of interest is longer.  However, if C lived only a short time after 
the base node and if the node (A,B) is deep (closer to the base of the tree 
than to its tips) then C will have a stronger influence on the ancestral 
reconstruction of node (A,B) than do the tip taxa A & B.

I think part of your question involves the morphology of B and C being poorly 
estimated (as they might be if you are using mean values of quantitative traits 
for the tips of A, B, and C).  These estimates are also important.  If B & C 
are badly estimated, they will bias the ancestor reconstruction.  If they are 
connected to the nodes by long branches relative to well estimated taxa, the 
effect will be minimal, but if they are connected by short branches their bias 
will overwhelm the contribution of better estimated but more distant taxa.

With best wishes,
David



P. David Polly
Robert R. Shrock Professor
Department