Information-Theoretic Approaches to Empirical Science Courses can be scheduled late summer or fall. Instructor: David R. Anderson
These courses present a new science paradigm based on Information Theory. Kullback-Leibler information is the basis for model selection leading to Akaikes Information Criterion (AIC). The course deals with science philosophy, as much as data analysis and model selection. The focus is on quantitative evidence for multiple science hypotheses. This general approach includes ranking the science hypotheses; examination of the probability of hypothesis j, given the data; and evidence ratios. Once these concepts have been presented, the discussion shifts to making formal inference from all the hypotheses and their models (multimodel inference). Additional details can be viewed at www.informationtheoryworkshop.com Key Outcomes: Attendees will have a good understanding of these new approaches and be able to perform analyses with their own data. The computations required are quite simple once the parameter estimates have been obtained for each model. Target Audience: Graduate students, post-docs, faculty, and research people in various agencies and institutes. People involved in research and science where their work involves hypothesizing and modelling and their inferences are model based will gain from this material. Background Required: Attendees should have a decent background in statistical principles and modelling (this is NOT a modelling course). The course focuses on science, science philosophy, information and evidence. The amount of mathematics or statistics presented in the course is relatively meager; however, without a good understanding of linear and nonlinear regression, least squares and maximum likelihood estimation, one will struggle to understand some of the material to be presented. Why Take This Course? A substantial paradigm shift is occurring in our science and resource management. The past century relied on null hypothesis testing, asymptotic distributions of the test statistic, P-values and a ruling concerning significant or not significant. Under this analysis paradigm a test statistic (T) is computed from the data. The P-value is the focus of the analysis and is the Prob{T or more extreme, given the null hypothesis]. With this definition in mind, we can abbreviate slightly. Prob(X|Ho), where it is understood that X represents the data or more extreme (unobserved) data. The null hypothesis (Ho) takes center stage but is often trivial or even silly. The alternative hypothesis (HA) is not the subject of the test; support for the alternative occurs only if the P-value (for the null hypothesis) is low, (often <0.05). Support for the alternative hypothesis comes by default and only when the Prob{data|Ho} is low. The proper interpretation of the P-value is quite strained: this might explain why so many people erroneously pretend it means something quite different (i.e., the probability that the null hypothesis is true). This is not what is meant by a P-value. These traditional methods are being replaced by information-theoretic methods (and to a lesser extent, at least at this time, by a variety of Bayesian methods). These approaches focus on an a priori set of plausible science hypotheses H1, H2, , HR . Evidence for or against members of this set of multiple working hypotheses consists of a (1) the likelihood of each hypothesis, given the data, L(Hj|X) or (2) a set of probabilities, Prob{H1, H2, ,HR, given the data} or Prob(Hj|X}. These likelihoods and probabilities are direct evidence, where evidence = information = -entropy. Simple evidence ratios allow a measure of the formal strength of evidence for any two science hypotheses. Note the radical difference in the probability statements (above) stemming from either a P-value or the probability of hypothesis j. Statistical inference should be about models and parameters, conditional on the data, however, P-values are probability statements about the data, conditional on the null hypothesis. These new approaches (including Bayesian methods) allow statistical inference to be based on all (or some) of the models in the a priori set, leading to a robust class of methods termed multimodel inference. That is, the inference is based on all the models in the set. Alternative science hypotheses take center stage in these approaches and will require much more attention than in the past century (where one started with an alternative and the null was merely nothing or the naïve position: thus, little science thinking was called for). The set of science hypotheses evolves through time as implausible hypotheses are eventually dropped from consideration, new hypotheses are added, and existing hypotheses are further refined. Rapid progress in the theoretical or applied sciences can be realized as this set evolves, based on careful inferences from new data. This is an exciting time to be in science or science-based management. There are important philosophies involved here: these approaches go well beyond methods for just data analysis. The course will make use of the textbook, Anderson, R. D. 2008. Model based evidence in the life sciences: a primer on evidence. Springer, New York, NY. 184pp. This book is included in the registration fell. If you are interested in hosting a course at your location, please contact me. David R. Anderson August 13, 2013 quietander...@yahoo.com