[R] AIC function and Step function
I would like to figure out the equations for calculating "AIC" in both "step() function" and "AIC () function". They are different. Then I just type "step" in the R console, and found the "AIC" used in "step() function" is "extractAIC". I went to the R help, and found: "The criterion used is AIC = - 2*log L + k * edf, where L is the likelihood and edf the equivalent degrees of freedom (i.e., the number of free parameters for usual parametric models) of fit. For linear models with unknown scale (i.e., for lm and aov), -2log L is computed from the deviance and uses a different additive constant to logLik and hence AIC. If RSS denotes the (weighted) residual sum of squares then extractAIC uses for - 2log L the formulae RSS/s - n (corresponding to Mallows' Cp) in the case of known scale s and n log (RSS/n) for unknown scale. AIC only handles unknown scale and uses the formula n log (RSS/n) - n + n log 2π - sum log w where w are the weights." Now, my question is what code I should use to look at the exact calculation process in the AIC()function and extractAIC() function in R? Thanks! Dana -- View this message in context: http://www.nabble.com/AIC-function-and-Step-function-tp20728043p20728043.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AIC function and Step function
ues ## for loglik edf <- length(fit$coef) loglik <- fit$loglik[length(fit$loglik)] c(edf, -2 * loglik + k * edf) } extractAIC.survreg <- function(fit, scale, k = 2, ...) { edf <- sum(fit$df) c(edf, -2 * fit$loglik[2] + k * edf) } extractAIC.glm <- function(fit, scale = 0, k = 2, ...) { n <- length(fit$residuals) edf <- n - fit$df.residual aic <- fit$aic c(edf, aic + (k-2) * edf) } extractAIC.lm <- function(fit, scale = 0, k = 2, ...) { n <- length(fit$residuals) edf <- n - fit$df.residual RSS <- deviance.lm(fit) dev <- if(scale > 0) RSS/scale - n else n * log(RSS/n) c(edf, dev + k * edf) } extractAIC.aov <- extractAIC.lm extractAIC.negbin <- function(fit, scale, k = 2, ...) { n <- length(fit$residuals) edf <- n - fit$df.residual c(edf, -fit$twologlik + k * edf) } HTH Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada -Original Message- From: [EMAIL PROTECTED] on behalf of Dana77 Sent: Thu 11/27/2008 6:11 PM To: r-help@r-project.org Subject: [R] AIC function and Step function I would like to figure out the equations for calculating "AIC" in both "step() function" and "AIC () function". They are different. Then I just type "step" in the R console, and found the "AIC" used in "step() function" is "extractAIC". I went to the R help, and found: "The criterion used is AIC = - 2*log L + k * edf, where L is the likelihood and edf the equivalent degrees of freedom (i.e., the number of free parameters for usual parametric models) of fit. For linear models with unknown scale (i.e., for lm and aov), -2log L is computed from the deviance and uses a different additive constant to logLik and hence AIC. If RSS denotes the (weighted) residual sum of squares then extractAIC uses for - 2log L the formulae RSS/s - n (corresponding to Mallows' Cp) in the case of known scale s and n log (RSS/n) for unknown scale. AIC only handles unknown scale and uses the formula n log (RSS/n) - n + n log 2? - sum log w where w are the weights." Now, my question is what code I should use to look at the exact calculation process in the AIC()function and extractAIC() function in R? Thanks! Dana -- View this message in context: http://www.nabble.com/AIC-function-and-Step-function-tp20728043p20728043.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AIC function and Step function
Hi Dana, Many thanks to Christos Hatzis who sent me an offline response, pointing out the new functions that make this much easier than my last suggestions: methods() and getAnywhere() > methods("extractAIC") [1] extractAIC.aov* extractAIC.coxph* extractAIC.glm* extractAIC.lm* extractAIC.negbin* [6] extractAIC.survreg* Non-visible functions are asterisked > getAnywhere("extractAIC.coxph") A single object matching extractAIC.coxph was found It was found in the following places registered S3 method for extractAIC from namespace stats namespace:stats with value function (fit, scale, k = 2, ...) { edf <- length(fit$coef) loglik <- fit$loglik[length(fit$loglik)] c(edf, -2 * loglik + k * edf) } > Thank you Christos. That said, one of the advantages of getting the source code is that it has comments that are stripped out when the code is sourced into R e.g. from the command line > getAnywhere(AIC.default) A single object matching AIC.default was found It was found in the following places registered S3 method for AIC from namespace stats namespace:stats with value function (object, ..., k = 2) { ll <- if ("stats4" %in% loadedNamespaces()) stats4:::logLik else logLik if (length(list(...))) { object <- list(object, ...) val <- lapply(object, ll) val <- as.data.frame(t(sapply(val, function(el) c(attr(el, "df"), AIC(el, k = k) names(val) <- c("df", "AIC") Call <- match.call() Call$k <- NULL row.names(val) <- as.character(Call[-1]) val } else AIC(ll(object), k = k) } >From the source file # File src/library/stats/R/AIC.R # Part of the R package, http://www.R-project.org # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # A copy of the GNU General Public License is available at # http://www.r-project.org/Licenses/ Return the object's value of the Akaike Information Criterion (or "An Inf.. Crit..") AIC <- function(object, ..., k = 2) UseMethod("AIC") ## AIC for logLik objects AIC.logLik <- function(object, ..., k = 2) -2 * c(object) + k * attr(object, "df") AIC.default <- function(object, ..., k = 2) { ## AIC for various fitted objects --- any for which there's a logLik() method: ll <- if("stats4" %in% loadedNamespaces()) stats4:::logLik else logLik if(length(list(...))) {# several objects: produce data.frame object <- list(object, ...) val <- lapply(object, ll) val <- as.data.frame(t(sapply(val, function(el) c(attr(el, "df"), AIC(el, k = k) names(val) <- c("df", "AIC") Call <- match.call() Call$k <- NULL row.names(val) <- as.character(Call[-1]) val } else AIC(ll(object), k = k) } Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AIC function and Step function
Thanks for kind help from Steven and Christos last time. Now I got new problem regarding the codes for calculating the "weights" (w) in "AIC () function". The original code is as below: > getAnywhere("logLik.lm") function (object, REML = FALSE, ...) { res <- object$residuals p <- object$rank N <- length(res) if (is.null(w <- object$weights)) { w <- rep.int(1, N) }else { excl <- w == 0 if (any(excl)) { res <- res[!excl] N <- length(res) w <- w[!excl] } } Now my question is, if I use "lm()" function to fit a multiple linear regression model, such as "mod.fit<-lm(formula = Y~ X1 + X2 + X3, data = set1)", what code could I use to extract the "weights" (w) out? or how to calculate the weights(w) shown in above codes? Thanks for your time and kind help! Dana Steven McKinney wrote: > > Hi Dana, > > Many thanks to Christos Hatzis who sent > me an offline response, pointing out the > new functions that make this much > easier than my last suggestions: > methods() and getAnywhere() > >> methods("extractAIC") > [1] extractAIC.aov* extractAIC.coxph* extractAIC.glm* > extractAIC.lm* extractAIC.negbin* > [6] extractAIC.survreg* > >Non-visible functions are asterisked >> getAnywhere("extractAIC.coxph") > A single object matching extractAIC.coxph was found > It was found in the following places > registered S3 method for extractAIC from namespace stats > namespace:stats > with value > > function (fit, scale, k = 2, ...) > { > edf <- length(fit$coef) > loglik <- fit$loglik[length(fit$loglik)] > c(edf, -2 * loglik + k * edf) > } > >> > > Thank you Christos. > > > That said, one of the advantages of getting > the source code is that it has comments that > are stripped out when the code is sourced into R > > e.g. from the command line > >> getAnywhere(AIC.default) > A single object matching AIC.default was found > It was found in the following places > registered S3 method for AIC from namespace stats > namespace:stats > with value > > function (object, ..., k = 2) > { > ll <- if ("stats4" %in% loadedNamespaces()) > stats4:::logLik > else logLik > if (length(list(...))) { > object <- list(object, ...) > val <- lapply(object, ll) > val <- as.data.frame(t(sapply(val, function(el) c(attr(el, > "df"), AIC(el, k = k) > names(val) <- c("df", "AIC") > Call <- match.call() > Call$k <- NULL > row.names(val) <- as.character(Call[-1]) > val > } > else AIC(ll(object), k = k) > } > > >>From the source file > > > # File src/library/stats/R/AIC.R > # Part of the R package, http://www.R-project.org > # > # This program is free software; you can redistribute it and/or modify > # it under the terms of the GNU General Public License as published by > # the Free Software Foundation; either version 2 of the License, or > # (at your option) any later version. > # > # This program is distributed in the hope that it will be useful, > # but WITHOUT ANY WARRANTY; without even the implied warranty of > # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > # GNU General Public License for more details. > # > # A copy of the GNU General Public License is available at > # http://www.r-project.org/Licenses/ > > Return the object's value of the Akaike Information Criterion > (or "An Inf.. Crit..") > > AIC <- function(object, ..., k = 2) UseMethod("AIC") > > ## AIC for logLik objects > AIC.logLik <- function(object, ..., k = 2) > -2 * c(object) + k * attr(object, "df") > > AIC.default <- function(object, ..., k = 2) > { > ## AIC for various fitted objects --- any for which there's a logLik() > method: > ll <- if("stats4" %in% loadedNamespaces()) stats4:::logLik else logLik > if(length(list(...))) {# several objects: produce data.frame > object <- list(object, ...) > val <- lapply(object, ll) > val <- as.data.frame(t(sapply(val, > function(el) > c(attr(el, "df"), AIC(el, k = k) > names(val) <- c("df", "AIC") > Call <- match.call() > Call$k <- NULL > row.names(val) <- as.character(Call[-1]) > val > } else AIC(ll(object), k = k) > } > > > > Steven McKinney > > Statistician > Molecular Oncology and Breast Cancer Program > British Columbia Cancer Research Centre > > email: smckinney +at+ bccrc +dot+ ca > > tel: 604-675-8000 x7561 > > BCCRC > Molecular Oncology > 675 West 10th Ave, Floor 4 > Vancouver B.C. > V5Z 1L3 > Canada > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducib
Re: [R] AIC function and Step function
On Sun, Nov 30, 2008 at 5:05 PM, Dana77 <[EMAIL PROTECTED]> wrote: > > Thanks for kind help from Steven and Christos last time. Now I got new > problem regarding the codes for calculating the "weights" (w) in "AIC () > function". > The original code is as below: > > getAnywhere("logLik.lm") > function (object, REML = FALSE, ...) > { >res <- object$residuals >p <- object$rank >N <- length(res) >if (is.null(w <- object$weights)) { >w <- rep.int(1, N) >}else { >excl <- w == 0 >if (any(excl)) { >res <- res[!excl] >N <- length(res) >w <- w[!excl] >} >} > > Now my question is, if I use "lm()" function to fit a multiple linear > regression model, such as "mod.fit<-lm(formula = Y~ X1 + X2 + X3, data = > set1)", what code could I use to extract the "weights" (w) out? or how to > calculate the weights(w) shown in above codes? mod.fit won't have weights because you didn't specify any through the weights argument to lm. If you had, you could extract them using the same technique used in the above code: w <- mod.fit$weights hth, Kingsford Jones > Thanks for your time and kind > help! > > Dana > > > > Steven McKinney wrote: >> >> Hi Dana, >> >> Many thanks to Christos Hatzis who sent >> me an offline response, pointing out the >> new functions that make this much >> easier than my last suggestions: >> methods() and getAnywhere() >> >>> methods("extractAIC") >> [1] extractAIC.aov* extractAIC.coxph* extractAIC.glm* >> extractAIC.lm* extractAIC.negbin* >> [6] extractAIC.survreg* >> >>Non-visible functions are asterisked >>> getAnywhere("extractAIC.coxph") >> A single object matching 'extractAIC.coxph' was found >> It was found in the following places >> registered S3 method for extractAIC from namespace stats >> namespace:stats >> with value >> >> function (fit, scale, k = 2, ...) >> { >> edf <- length(fit$coef) >> loglik <- fit$loglik[length(fit$loglik)] >> c(edf, -2 * loglik + k * edf) >> } >> >>> >> >> Thank you Christos. >> >> >> That said, one of the advantages of getting >> the source code is that it has comments that >> are stripped out when the code is sourced into R >> >> e.g. from the command line >> >>> getAnywhere(AIC.default) >> A single object matching 'AIC.default' was found >> It was found in the following places >> registered S3 method for AIC from namespace stats >> namespace:stats >> with value >> >> function (object, ..., k = 2) >> { >> ll <- if ("stats4" %in% loadedNamespaces()) >> stats4:::logLik >> else logLik >> if (length(list(...))) { >> object <- list(object, ...) >> val <- lapply(object, ll) >> val <- as.data.frame(t(sapply(val, function(el) c(attr(el, >> "df"), AIC(el, k = k) >> names(val) <- c("df", "AIC") >> Call <- match.call() >> Call$k <- NULL >> row.names(val) <- as.character(Call[-1]) >> val >> } >> else AIC(ll(object), k = k) >> } >> >> >>>From the source file >> >> >> # File src/library/stats/R/AIC.R >> # Part of the R package, http://www.R-project.org >> # >> # This program is free software; you can redistribute it and/or modify >> # it under the terms of the GNU General Public License as published by >> # the Free Software Foundation; either version 2 of the License, or >> # (at your option) any later version. >> # >> # This program is distributed in the hope that it will be useful, >> # but WITHOUT ANY WARRANTY; without even the implied warranty of >> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> # GNU General Public License for more details. >> # >> # A copy of the GNU General Public License is available at >> # http://www.r-project.org/Licenses/ >> >> Return the object's value of the Akaike Information Criterion >> (or "An Inf.. Crit..") >> >> AIC <- function(object, ..., k = 2) UseMethod("AIC") >> >> ## AIC for logLik objects >> AIC.logLik <- function(object, ..., k = 2) >> -2 * c(object) + k * attr(object, "df") >> >> AIC.default <- function(object, ..., k = 2) >> { >> ## AIC for various fitted objects --- any for which there's a logLik() >> method: >> ll <- if("stats4" %in% loadedNamespaces()) stats4:::logLik else logLik >> if(length(list(...))) {# several objects: produce data.frame >> object <- list(object, ...) >> val <- lapply(object, ll) >> val <- as.data.frame(t(sapply(val, >> function(el) >> c(attr(el, "df"), AIC(el, k = k) >> names(val) <- c("df", "AIC") >> Call <- match.call() >> Call$k <- NULL >> row.names(val) <- as.character(Call[-1]) >> val >> } else AIC(ll(object), k = k) >> } >> >> >> >> Steven McKinney >> >> Statistician >> Molecular Oncology and Breast Cancer Program >> British Columbia Cancer Research Centre >> >> email
Re: [R] AIC function and Step function
Thank you, Kingsford. Then I am wondering if there are other ways to write R codes to calculate the "weights" ? Thanks! Dana Kingsford Jones wrote: > > On Sun, Nov 30, 2008 at 5:05 PM, Dana77 <[EMAIL PROTECTED]> wrote: >> >> Thanks for kind help from Steven and Christos last time. Now I got new >> problem regarding the codes for calculating the "weights" (w) in "AIC () >> function". >> The original code is as below: >> > getAnywhere("logLik.lm") >> function (object, REML = FALSE, ...) >> { >>res <- object$residuals >>p <- object$rank >>N <- length(res) >>if (is.null(w <- object$weights)) { >>w <- rep.int(1, N) >>}else { >>excl <- w == 0 >>if (any(excl)) { >>res <- res[!excl] >>N <- length(res) >>w <- w[!excl] >>} >>} >> >> Now my question is, if I use "lm()" function to fit a multiple linear >> regression model, such as "mod.fit<-lm(formula = Y~ X1 + X2 + X3, data = >> set1)", what code could I use to extract the "weights" (w) out? or how to >> calculate the weights(w) shown in above codes? > > > mod.fit won't have weights because you didn't specify any through the > weights argument to lm. If you had, you could extract them using the > same technique used in the above code: w <- mod.fit$weights > > hth, > > Kingsford Jones > > > > > > >> Thanks for your time and kind >> help! >> >> Dana >> >> >> >> Steven McKinney wrote: >>> >>> Hi Dana, >>> >>> Many thanks to Christos Hatzis who sent >>> me an offline response, pointing out the >>> new functions that make this much >>> easier than my last suggestions: >>> methods() and getAnywhere() >>> methods("extractAIC") >>> [1] extractAIC.aov* extractAIC.coxph* extractAIC.glm* >>> extractAIC.lm* extractAIC.negbin* >>> [6] extractAIC.survreg* >>> >>>Non-visible functions are asterisked getAnywhere("extractAIC.coxph") >>> A single object matching 'extractAIC.coxph' was found >>> It was found in the following places >>> registered S3 method for extractAIC from namespace stats >>> namespace:stats >>> with value >>> >>> function (fit, scale, k = 2, ...) >>> { >>> edf <- length(fit$coef) >>> loglik <- fit$loglik[length(fit$loglik)] >>> c(edf, -2 * loglik + k * edf) >>> } >>> >>> >>> Thank you Christos. >>> >>> >>> That said, one of the advantages of getting >>> the source code is that it has comments that >>> are stripped out when the code is sourced into R >>> >>> e.g. from the command line >>> getAnywhere(AIC.default) >>> A single object matching 'AIC.default' was found >>> It was found in the following places >>> registered S3 method for AIC from namespace stats >>> namespace:stats >>> with value >>> >>> function (object, ..., k = 2) >>> { >>> ll <- if ("stats4" %in% loadedNamespaces()) >>> stats4:::logLik >>> else logLik >>> if (length(list(...))) { >>> object <- list(object, ...) >>> val <- lapply(object, ll) >>> val <- as.data.frame(t(sapply(val, function(el) c(attr(el, >>> "df"), AIC(el, k = k) >>> names(val) <- c("df", "AIC") >>> Call <- match.call() >>> Call$k <- NULL >>> row.names(val) <- as.character(Call[-1]) >>> val >>> } >>> else AIC(ll(object), k = k) >>> } >>> >>> From the source file >>> >>> >>> # File src/library/stats/R/AIC.R >>> # Part of the R package, http://www.R-project.org >>> # >>> # This program is free software; you can redistribute it and/or modify >>> # it under the terms of the GNU General Public License as published by >>> # the Free Software Foundation; either version 2 of the License, or >>> # (at your option) any later version. >>> # >>> # This program is distributed in the hope that it will be useful, >>> # but WITHOUT ANY WARRANTY; without even the implied warranty of >>> # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>> # GNU General Public License for more details. >>> # >>> # A copy of the GNU General Public License is available at >>> # http://www.r-project.org/Licenses/ >>> >>> Return the object's value of the Akaike Information Criterion >>> (or "An Inf.. Crit..") >>> >>> AIC <- function(object, ..., k = 2) UseMethod("AIC") >>> >>> ## AIC for logLik objects >>> AIC.logLik <- function(object, ..., k = 2) >>> -2 * c(object) + k * attr(object, "df") >>> >>> AIC.default <- function(object, ..., k = 2) >>> { >>> ## AIC for various fitted objects --- any for which there's a >>> logLik() >>> method: >>> ll <- if("stats4" %in% loadedNamespaces()) stats4:::logLik else >>> logLik >>> if(length(list(...))) {# several objects: produce data.frame >>> object <- list(object, ...) >>> val <- lapply(object, ll) >>> val <- as.data.frame(t(sapply(val, >>> function(el) >>> c(attr(el, "df"), AIC(el, k = k) >>> n