Information-Theoretic Approaches to Empirical Science     
Courses can be scheduled late summer or fall.
Instructor:  David R. Anderson

These courses present a new science paradigm based on Information Theory.  
Kullback-Leibler information is the basis for model selection leading to 
Akaike’s Information Criterion (AIC).  The course deals with science 
philosophy, as much as data analysis and model selection.  The focus is on 
quantitative evidence for multiple science hypotheses.  This general 
approach includes ranking the science hypotheses; examination of the 
probability of hypothesis j, given the data; and evidence ratios.  Once 
these concepts have been presented, the discussion shifts to making formal 
inference from all the hypotheses and their models (multimodel inference).    
Additional details can be viewed at 

www.informationtheoryworkshop.com

Key Outcomes:  Attendees will have a good understanding of these new 
approaches and be able to perform analyses with their own data.  The 
computations required are quite simple once the parameter estimates have 
been obtained for each model.

Target Audience:  Graduate students, post-docs, faculty, and research people 
in various agencies and institutes.  People involved in research and science 
where their work involves hypothesizing and modelling and their inferences 
are model based will gain from this material.  

Background Required:  Attendees should have a decent background in 
statistical principles and modelling (this is NOT a modelling course).  The 
course focuses on science, science philosophy, information and evidence.  
The amount of mathematics or statistics presented in the course is 
relatively meager; however, without a good understanding of linear and 
nonlinear regression, least squares and maximum likelihood estimation, one 
will struggle to understand some of the material to be presented.

Why Take This Course?  A substantial paradigm shift is occurring in our 
science and resource management.  The past century relied on null hypothesis 
testing, asymptotic distributions of the test statistic, P-values and a 
ruling concerning “significant” or “not significant.”  Under this analysis 
paradigm a test statistic (T) is computed from the data.  The P-value is the 
focus of the analysis and is the Prob{T or more extreme, given the null 
hypothesis].  With this definition in mind, we can  abbreviate slightly. 
Prob(X|Ho), where it is understood that X represents the data or more 
extreme (unobserved) data.

The null hypothesis (Ho) takes center stage but is often trivial or even 
silly.  The alternative hypothesis (HA) is not the subject of the test; 
“support” for the alternative occurs only if the P-value (for the null 
hypothesis) is low, (often <0.05).  Support for the alternative hypothesis 
comes by default and only when the Prob{data|Ho} is low.  

The proper interpretation of the P-value is quite strained: this might 
explain why so many people erroneously pretend it means something quite 
different (i.e., the probability that the null hypothesis is true).  This is 
not what is meant by a P-value.  

These traditional methods are being replaced by “information-theoretic” 
methods (and to a lesser extent, at least at this time, by a variety of 
Bayesian methods).  These approaches focus on an a priori set of plausible 
science hypotheses
                              H1, H2, …, HR .

Evidence for or against members of this set of “multiple working hypotheses” 
consists of a (1) the likelihood of each hypothesis, given the data, L(Hj|X) 
or (2) a set of probabilities, Prob{H1, H2, …,HR, given the data} or 
Prob(Hj|X}.  These likelihoods and probabilities are direct evidence, where 
evidence = information = -entropy.

Simple evidence ratios allow a measure of the formal strength of evidence 
for any two science hypotheses.  Note the radical difference in the 
probability statements (above) stemming from either a P-value or the 
probability of hypothesis j.  Statistical inference should be about models 
and parameters, conditional on the data, however, P-values are probability 
statements about the data, conditional on the null hypothesis.

These new approaches (including Bayesian methods) allow statistical 
inference to be based on all (or some) of the models in the a priori set, 
leading to a robust class of methods termed “multimodel inference.”  That 
is, the inference is based on all the models in the set.  Alternative 
science hypotheses take center stage in these approaches and will require 
much more attention than in the past century (where one started with an 
alternative and the null was merely “nothing” or the naïve position: thus, 
little science thinking was called for).  

The set of science hypotheses “evolves” through time as implausible 
hypotheses are eventually dropped from consideration, new hypotheses are 
added, and existing hypotheses are further refined.  Rapid progress in the 
theoretical or applied sciences can be realized as this set evolves, based 
on careful inferences from new data.  This is an exciting time to be in 
science or science-based management.  There are important philosophies 
involved here: these approaches go well beyond methods for just “data 
analysis.”

The course will make use of the textbook,

Anderson, R. D.  2008.  Model based evidence in the life sciences:
   a primer on evidence. Springer, New York, NY. 184pp.

This book is included in the registration fell.

If you are interested in hosting a course at your location, please contact 
me.

David R. Anderson
August 13, 2013
quietander...@yahoo.com

Reply via email to