Re: [R] regression with proportion data
Your response variable is not binomial, it's a proportion. Try the betareg function in the betareg package, which more correctly assumes that your response variable is Beta distributed (but beware that 1 and 0 are not allowed). The syntax is the same as in a glm. HTH Ruben -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Georgiana May Enviado el: lunes, 19 de marzo de 2012 15:06 Para: r-help@r-project.org Asunto: [R] regression with proportion data Hello, I want to determine the regression relationship between a proportion (y) and a continuous variable (x). Reading a number of sources (e.g. The R Book, Quick R,help), I believe I should be able to designate the model as: model<-glm(formula=proportion~x, family=binomial(link="logit")) this runs but gives me error messages: Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! If I transform the proportion variable with log, it doesn't like that either (values not: 0https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Standard errors GLM
You have a conceptual problem, as pointed out by previous helpers. You don't have a standard error for the first level of your categorical variable because that level's effect is not estimated. It is being used as a reference level against which the other levels of that categorical variable are being estimated (the default in R). This is one way by which statisticians include categorical predictors into the regression framework, originally meant for relations between continuous quantitative variables. You might want to read about regression, factors, and contrasts. This paper about the issue is available online: M.J. Davis, 2010. Contrast coding in multiple regression analysis: strengths, weaknesses and utility of popular coding structures. Journal of Data Science 8:61-73. HTH Ruben -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de D_Tomas Enviado el: martes, 13 de marzo de 2012 14:39 Para: r-help@r-project.org Asunto: [R] Standard errors GLM Dear userRs, when applied the summary function to a glm fit (e.g Poisson) the parameter table provides the categorical variables assuming that the first level estimate (in alphabetical order) is 0. What is the standard error for that variable then? Are the standard errors calculated assuming a normal distribution? Many thanks, -- View this message in context: http://r.789695.n4.nabble.com/Standard-errors-GLM-tp4469086p4469086.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculation of standard error for a function
deltamethod function in package msm may help (but bear in mind the warnings/admonitions/recommendations of other helpers) HTH Rubén -- Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -Original Message- From: r-help-boun...@r-project.org on behalf of Jun Shen Sent: Fri 3/2/2012 10:47 PM To: R-help Subject: [R] Calculation of standard error for a function Dear list, If I know the standard error for k1 and k2, is there anything I can call in R to calculate the standard error of k1/k2? Thanks. Jun [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Constraint on one of parameters.
Read optimx's help. There are 'method', 'upper', 'lower' arguments that'll let you put bounds on pars. HTH Rubén -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de FU-WEN LIANG Enviado el: jueves, 09 de febrero de 2012 23:56 Para: r-help@r-project.org Asunto: [R] Constraint on one of parameters. Dear all, I have a function to optimize for a set of parameters and want to set a constraint on only one parameter. Here is my function. What I want to do is estimate the parameters of a bivariate normal distribution where the correlation has to be between -1 and 1. Would you please advise how to revise it? ex=function(s,prob,theta1,theta,xa,xb,xc,xd,t,delta) { expo1= exp(s[3]*xa+s[4]*xb+s[5]*xc+s[6]*xd) expo2= exp(s[9]*xa+s[10]*xb+s[11]*xc+s[12]*xd) expo3= exp(s[15]*xa+s[16]*xb+s[17]*xc+s[18]*xd) expo4= exp(s[21]*xa+s[22]*xb+s[23]*xc+s[24]*xd) expo5= exp(s[27]*xa+s[28]*xb+s[29]*xc+s[30]*xd) nume1=prob[1]*(s[2]^-s[1]*s[1]*t^(s[1]-1)*expo1)^delta*exp(-s[2]^-s[1]*t^s[1]*expo1)* theta1[1]^xa*(1-theta1[1])^(1-xa)*theta1[2]^xb*(1-theta1[2])^(1-xb)*(1+theta1[11]*(xa-theta1[1])*(xb-theta1[2])/sqrt(theta1[1]*(1-theta1[1]))/sqrt(theta1[2]*(1-theta1[2])))/ (2*pi*theta[2]*theta[4]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[1])^2/theta[2]^2+(xd-theta[3])^2/theta[4]^2-2*theta[21]^2*(xc-theta[1])*(xd-theta[3])/(theta[2]*theta[4])) nume2=prob[2]*(s[8]^-s[7]*s[7]*t^(s[7]-1)*expo2)^delta*exp(-s[8]^-s[7]*t^s[7]*expo2)* theta1[3]^xa*(1-theta1[3])^(1-xa)*theta1[4]^xb*(1-theta1[4])^(1-xb)*(1+theta1[11]*(xa-theta1[3])*(xb-theta1[4])/sqrt(theta1[3]*(1-theta1[3]))/sqrt(theta1[4]*(1-theta1[4])))/ (2*pi*theta[6]*theta[8]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[5])^2/theta[6]^2+(xd-theta[7])^2/theta[8]^2-2*theta[21]^2*(xc-theta[5])*(xd-theta[7])/(theta[6]*theta[8])) nume3=prob[3]*(s[14]^-s[13]*s[13]*t^(s[13]-1)*expo3)^delta*exp(-s[14]^-s[13]*t^s[13]*expo3)* theta1[5]^xa*(1-theta1[5])^(1-xa)*theta1[6]^xb*(1-theta1[6])^(1-xb)*(1+theta1[11]*(xa-theta1[5])*(xb-theta1[6])/sqrt(theta1[5]*(1-theta1[5]))/sqrt(theta1[6]*(1-theta1[6])))/ (2*pi*theta[10]*theta[12]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[9])^2/theta[10]^2+(xd-theta[11])^2/theta[12]^2-2*theta[21]^2*(xc-theta[9])*(xd-theta[11])/(theta[10]*theta[12])) nume4=prob[4]*(s[20]^-s[19]*s[19]*t^(s[19]-1)*expo4)^delta*exp(-s[20]^-s[19]*t^s[19]*expo4)* theta1[7]^xa*(1-theta1[7])^(1-xa)*theta1[8]^xb*(1-theta1[8])^(1-xb)*(1+theta1[11]*(xa-theta1[7])*(xb-theta1[8])/sqrt(theta1[7]*(1-theta1[7]))/sqrt(theta1[8]*(1-theta1[8])))/ (2*pi*theta[14]*theta[16]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[13])^2/theta[14]^2+(xd-theta[15])^2/theta[16]^2-2*theta[21]^2*(xc-theta[13])*(xd-theta[15])/(theta[14]*theta[16])) nume5=prob[5]*(s[26]^-s[25]*s[25]*t^(s[25]-1)*expo5)^delta*exp(-s[26]^-s[25]*t^s[25]*expo5)* theta1[9]^xa*(1-theta1[9])^(1-xa)*theta1[10]^xb*(1-theta1[10])^(1-xb)*(1+theta1[11]*(xa-theta1[9])*(xb-theta1[10])/sqrt(theta1[9]*(1-theta1[9]))/sqrt(theta1[10]*(1-theta1[10])))/ (2*pi*theta[18]*theta[20]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[17])^2/theta[18]^2+(xd-theta[19])^2/theta[20]^2-2*theta[21]^2*(xc-theta[17])*(xd-theta[19])/(theta[18]*theta[20])) denom=nume1+nume2+nume3+nume4+nume5 Ep1=nume1/denom Ep2=nume2/denom Ep3=nume3/denom Ep4=nume4/denom Ep5=nume5/denom elogld= sum(Ep1*(-log(2*pi*theta[2]*theta[4]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[1])^2/theta[2]^2+(xd-theta[3])^2/theta[4]^2-2*theta[21]^2*(xc-theta[1])*(xd-theta[3])/(theta[2]*theta[4] + sum(Ep2*(-log(2*pi*theta[6]*theta[8]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[5])^2/theta[6]^2+(xd-theta[7])^2/theta[8]^2-2*theta[21]^2*(xc-theta[5])*(xd-theta[7])/(theta[6]*theta[8] + sum(Ep3*(-log(2*pi*theta[10]*theta[12]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[9])^2/theta[10]^2+(xd-theta[11])^2/theta[12]^2-2*theta[21]^2*(xc-theta[9])*(xd-theta[11])/(theta[10]*theta[12] + sum(Ep4*(-log(2*pi*theta[14]*theta[16]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[13])^2/theta[14]^2+(xd-theta[15])^2/theta[16]^2-2*theta[21]^2*(xc-theta[13])*(xd-theta[15])/(theta[14]*theta[16] + sum(Ep5*(-log(2*pi*theta[18]*theta[20]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[17])^2/theta[18]^2+(xd-theta[19])^2/theta[20]^2-2*theta[21]^2*(xc-theta[17])*(xd-theta[19])/(theta[18]*theta[20] return(-elogld) } normal=optimx(c(15.5,0.4,10,1.3,17.5,0.3,15,1.5,13.5,0.5,19.5,1.1,20.7,0.4,30,0.4,25,0.7,24.6,1.6,0.5),ex,s=s,prob=prob,theta1=theta1,xa=xa,xb=xb,xc=xc,xd=xd,t=t,delta=delta) When I run this code, I got this error: In log(2 * pi * theta[6] * theta[8] * sqrt(1 - theta[21]^2)) : NaNs produced Because (1-theta[21]^2)<0. Would you please advise? Thank you very much. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.et
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Frank Harrell Enviado el: viernes, 27 de enero de 2012 14:28 Para: r-help@r-project.org Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? Ruben, I'm not sure you are understanding the ramifications of what Bert said. In addition you are making several assumptions implicitly: -- Ruben: Frank, I guess we are going nowhere now. But thanks anyways. See below if you want. 1. model selection is needed (vs. fitting the full model and using shrinkage) Ruben: Nonlinear mechanistic models that are too complex often just don't converge, they crash. No shrinkage to apply to a failed convergence model. 2. model selection works in the absence of shrinkage Ruben: I think you are assuming that shrinkage is necessary. 3. model selection can find the "right" model and the features selected would be the same features selected if the data were slightly perturbed or a new sample taken Ruben: I don't make this assumption. New data, new model. 4. AIC tells you something that P-values don't (unless using structured multiple degree of freedom tests) Ruben: It does. 5. parsimony is good Ruben: It is. None of these assumptions is true. Model selection without shrinkage (penalization) seems to offer benefits but this is largely a mirage. Ruben: Have a good weekend! Ruben Rubén Roa wrote > > -Mensaje original- > De: Bert Gunter [mailto:gunter.berton@] Enviado el: jueves, 26 de > enero de 2012 21:20 > Para: Rubén Roa > CC: Ben Bolker; Frank Harrell > Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 > interactions and unique combinations? > > On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa <rroa@> wrote: >> I think we have gone through this before. >> First, the destruction of all aspects of statistical inference is not >> at stake, Frank Harrell's post notwithstanding. >> Second, checking all pairs is a way to see for _all pairs_ which >> model outcompetes which in terms of predictive ability by -2AIC or >> more. Just sorting them by the AIC does not give you that if no model >> is better than the next best by less than 2AIC. >> Third, I was not implying that AIC differences play the role of >> significance tests. I agree with you that model selection is better >> not understood as a proxy or as a relative of significance tests procedures. >> Incidentally, when comparing many models the AIC is often inconclusive. >> If one is bent on selecting just _the model_ then I check numerical >> optimization diagnostics such as size of gradients, KKT criteria, and >> other issues such as standard errors of parameter estimates and the >> correlation matrix of parameter estimates. > > -- And the mathematical basis for this claim is ?? > > -- > Ruben: In my area of work (building/testing/applying mechanistic > nonlinear models of natural and economic phenomena) the issue of > numerical optimization is a very serious one. It is often the case > that a really good looking model does not converge properly (that's > why ADMB is so popular among us). So if the AIC is inconclusive, but > one AIC-tied model yields reasonably looking standard errors and low > pairwise parameter estimates correlations, while the other wasn´t even > able to produce a positive definite Hessian matrix (though it was able > to maximize the log-likelihood), I think I have good reasons to select > the properly converged model. I assume here that the lack of > convergence is a symptom of model inadequacy to the data, that the AIC was > not able to detect. > --- > Ruben: For some reasons I don't find model averaging appealing. I > guess deep in my heart I expect more from my model than just the best > predictive ability. > > -- This is a religious, not a scientific statement, and has no place > in any scientific discussion. > > -- > Ruben: Seriously, there is a wide and open place in scientific > discussion for mechanistic model-building. I expect the models I built > to be more than able predictors, I want them to capture some aspect of > nature, to teach me something about nature, so I refrain from model > averaging, which is an open admission that you don't care too much > about what's really going on. > > -- The belief that certain data analysis practices -- standard or not > -- somehow can be trusted to yield reliable scientific results in the > face of clear theoretical (mathematical )and practical results to the > contrary, while widespread, impedes and often thwarts the progress of > science, Evidence-based medicine and clinical trials came about f
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
-Mensaje original- De: Bert Gunter [mailto:gunter.ber...@gene.com] Enviado el: jueves, 26 de enero de 2012 21:20 Para: Rubén Roa CC: Ben Bolker; Frank Harrell Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa wrote: > I think we have gone through this before. > First, the destruction of all aspects of statistical inference is not at > stake, Frank Harrell's post notwithstanding. > Second, checking all pairs is a way to see for _all pairs_ which model > outcompetes which in terms of predictive ability by -2AIC or more. Just > sorting them by the AIC does not give you that if no model is better than the > next best by less than 2AIC. > Third, I was not implying that AIC differences play the role of significance > tests. I agree with you that model selection is better not understood as a > proxy or as a relative of significance tests procedures. > Incidentally, when comparing many models the AIC is often inconclusive. If > one is bent on selecting just _the model_ then I check numerical optimization > diagnostics such as size of gradients, KKT criteria, and other issues such as > standard errors of parameter estimates and the correlation matrix of > parameter estimates. -- And the mathematical basis for this claim is ?? -- Ruben: In my area of work (building/testing/applying mechanistic nonlinear models of natural and economic phenomena) the issue of numerical optimization is a very serious one. It is often the case that a really good looking model does not converge properly (that's why ADMB is so popular among us). So if the AIC is inconclusive, but one AIC-tied model yields reasonably looking standard errors and low pairwise parameter estimates correlations, while the other wasn´t even able to produce a positive definite Hessian matrix (though it was able to maximize the log-likelihood), I think I have good reasons to select the properly converged model. I assume here that the lack of convergence is a symptom of model inadequacy to the data, that the AIC was not able to detect. --- Ruben: For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability. -- This is a religious, not a scientific statement, and has no place in any scientific discussion. -- Ruben: Seriously, there is a wide and open place in scientific discussion for mechanistic model-building. I expect the models I built to be more than able predictors, I want them to capture some aspect of nature, to teach me something about nature, so I refrain from model averaging, which is an open admission that you don't care too much about what's really going on. -- The belief that certain data analysis practices -- standard or not -- somehow can be trusted to yield reliable scientific results in the face of clear theoretical (mathematical )and practical results to the contrary, while widespread, impedes and often thwarts the progress of science, Evidence-based medicine and clinical trials came about for a reason. I would encourage you to reexamine the basis of your scientific practice and the role that "magical thinking" plays in it. Best, Bert -- Ruben: All right Bert. I often re-examine the basis of my scientific praxis but less often than I did before, I have to confess. I like to think it is because I am converging on the right praxis so there are less issues to re-examine. But this problem of model selection is a tough one. Being a likelihoodist in inference naturally leads you to AIC-based model selection, Jim Lindsey being influent too. Wanting that your models say some something about nature is another strong conditioning factor. Put this two together and it becomes hard to do model-averaging. And it has nothing to do with religion (yuck!). __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
I think we have gone through this before. First, the destruction of all aspects of statistical inference is not at stake, Frank Harrell's post notwithstanding. Second, checking all pairs is a way to see for _all pairs_ which model outcompetes which in terms of predictive ability by -2AIC or more. Just sorting them by the AIC does not give you that if no model is better than the next best by less than 2AIC. Third, I was not implying that AIC differences play the role of significance tests. I agree with you that model selection is better not understood as a proxy or as a relative of significance tests procedures. Incidentally, when comparing many models the AIC is often inconclusive. If one is bent on selecting just _the model_ then I check numerical optimization diagnostics such as size of gradients, KKT criteria, and other issues such as standard errors of parameter estimates and the correlation matrix of parameter estimates. For some reasons I don't find model averaging appealing. I guess deep in my heart I expect more from my model than just the best predictive ability. Rubén -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Ben Bolker Enviado el: miércoles, 25 de enero de 2012 15:41 Para: r-h...@stat.math.ethz.ch Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? Rubén Roa azti.es> writes: > A more 'manual' way to do it is this. > If you have all your models named properly and well organized, say > Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces > a list with one component named 'aic') then something like > this: > x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in 1:n){x[,3] > <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])- > unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))} > will calculate all the 1081 AIC differences between paired comparisons > of the 47 models. It may not be pretty but works for me. > An AIC difference equal or less than 2 is a tie, anything higher is > evidence for the model with the less AIC (Sakamoto et al., Akaike > Information Criterion Statistics, KTK Scientific Publishers, Tokyo). I wouldn't go quite as far as Frank Harrell did in his answer, but (1) it seems silly to do all pairwise comparisons of models; one of the big advantages of information-theoretic approaches is that you can just list the models in order of AIC ... (2) the dredge() function from the MuMIn package may be useful for automating this sort of thing. There is an also an AICtab function in the emdbook package. (3) If you're really just interested in picking the single model with the best expected predictive capability (and all of the other assumptions of the AIC approach are met -- very large data set, good fit to the data, etc.), then just picking the model with the best AIC will work. It's when you start to make inferences on the individual parameters within that model, without taking account of the model selection process, that you run into trouble. If you are really concerned about good predictions then it may be better to do multi-model averaging *or* use some form of shrinkage estimator. (4) The "delta-AIC<2 means pretty much the same" rule of thumb is fine, but it drives me nuts when people use it as a quasi-significance-testing rule, as in "the simpler model is adequate if dAIC<2". If you're going to work in the model selection arena, then don't follow hypothesis-testing procedures! A smaller AIC means lower expected K-L distance, period. For the record, Brian Ripley has often expressed the (minority) opinion that AIC is *not* appropriate for comparing non-nested models (e.g. <http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html>). > > -Original Message- > From: r-help-bounces r-project.org on behalf of Milan > Bouchet-Valat > Sent: Wed 1/25/2012 10:32 AM > To: Jhope > Cc: r-help r-project.org > Subject: Re: [R] How do I compare 47 GLM models with 1 to 5 > interactions and unique combinations? > Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit : > > Hi R-listers, > > > > I have developed 47 GLM models with different combinations of > > interactions from 1 variable to 5 variables. I have manually made > > each model separately and put them into individual tables (organized > > by the number of variables) showing the AIC score. I want to compare all of > > these models. > > > > 1) What is the best way to compare various models with unique > > combinations and different number of variables? > See ?step or ?stepAIC (from package MASS) if you want an automated way &g
Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?
A more 'manual' way to do it is this. If you have all your models named properly and well organized, say Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces a list with one component named 'aic') then something like this: x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in 1:n){x[,3] <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))} will calculate all the 1081 AIC differences between paired comparisons of the 47 models. It may not be pretty but works for me. An AIC difference equal or less than 2 is a tie, anything higher is evidence for the model with the less AIC (Sakamoto et al., Akaike Information Criterion Statistics, KTK Scientific Publishers, Tokyo). HTH Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -Original Message- From: r-help-boun...@r-project.org on behalf of Milan Bouchet-Valat Sent: Wed 1/25/2012 10:32 AM To: Jhope Cc: r-help@r-project.org Subject: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations? Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit : > Hi R-listers, > > I have developed 47 GLM models with different combinations of interactions > from 1 variable to 5 variables. I have manually made each model separately > and put them into individual tables (organized by the number of variables) > showing the AIC score. I want to compare all of these models. > > 1) What is the best way to compare various models with unique combinations > and different number of variables? See ?step or ?stepAIC (from package MASS) if you want an automated way of doing this. > 2) I am trying to develop the most simplest model ideally. Even though > adding another variable would lower the AIC, how do I interpret it is worth > it to include another variable in the model? How do I know when to stop? This is a general statistical question, not specific to R. As a general rule, if adding a variable lowers the AIC by a significant margin, then it's worth including it. You should only stop when a variable increases the AIC. But this is assuming you consider it a good indicator and you know what you're doing. There's plenty of literature on this subject. > Definitions of Variables: > HTL - distance to high tide line (continuous) > Veg - distance to vegetation > Aeventexhumed - Event of exhumation > Sector - number measurements along the beach > Rayos - major sections of beach (grouped sectors) > TotalEggs - nest egg density > > Example of how all models were created: > Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, > data=data.to.analyze, family=binomial) > Model7.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = > binomial, data.to.analyze) > Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs, > data.to.analyze, family = binomial) > Model37.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ > HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial) To extract the AICs of all these models, it's easier to put them in a list and get their AICs like this: m <- list() m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed, data=data.to.analyze, family=binomial) m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family = binomial, data.to.analyze) sapply(m, extractAIC) Cheers __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get inflection point in binomial glm
René, Yes, to fit a re-parameterized logistic model I think you'd have to code the whole enchilada yourself, not relying on glm (but not nls() as nls() deals with least squares minimization whereas here we want to minimize a minus log binomial likelihood). I did that and have the re-parameterized logistic model in a package I wrote for a colleague (this package has the logistic fit fully functional and documented). My code though only considers one continuous predictor. If you want I may email you this package and you figure out how to deal with the categorical predictor. One thing I believe at this point is that you'd have to do the inference on the continuous predictor _conditional_ on certain level(s) of the categorical predictor. Rubén -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de René Mayer Enviado el: jueves, 01 de diciembre de 2011 20:34 Para: David Winsemius CC: r-help Help Asunto: Re: [R] how to get inflection point in binomial glm Thanks David and Rubén! @David: indeed 15 betas I forgot the interaction terms, thanks for correcting! @Rubén: the re-parameterize would be done within nls()? how to do this practically with including the factor predictor? and yes, we can solve within each group for Y=0 getting 0=b0+b1*X |-b0 -b0=b1*X |/b1 -b0/b1=X but I was hoping there might a more general solution for the case of multiple logistic regression. HTH René Zitat von "David Winsemius" : > > On Dec 1, 2011, at 8:24 AM, René Mayer wrote: > >> Dear All, >> >> I have a binomial response with one continuous predictor (d) and one >> factor (g) (8 levels dummy-coded). >> >> glm(resp~d*g, data, family=binomial) >> >> Y=b0+b1*X1+b2*X2 ... b7*X7 > > Dear Dr Mayer; > > I think it might be a bit more complex than that. I think you should > get 15 betas rather than 8. Have you done it? > >> >> how can I get the inflection point per group, e.g., P(d)=.5 > > Wouldn't that just be at d=1/beta in each group? (Thinking, perhaps > naively, in the case of X=X1 that > > (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta *d*(X==X1) ) # all other > terms = 0 > > And taking the log of both sides, and then use "middle school" math to solve. > > Oh, wait. Muffed my first try on that for sure. Need to add back both > the constant intercept and the baseline "d" coefficient for the > non-b0 levels. > > (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta_0 + beta_d_0*d + > beta_n + beta_d_n *d*(X==Xn) ) > > And just > > (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta_0 + beta_d_0*d ) # for the > reference level. > > This felt like an exam question in my categorical analysis course 25 > years ago. (Might have gotten partial credit for my first stab, > depending on how forgiving the TA was that night.) > >> >> I would be grateful for any help. >> >> Thanks in advance, >> René >> > -- > > David Winsemius, MD > West Hartford, CT > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get inflection point in binomial glm
Assuming a logistic model, for each group solve for d at Y=0, or solve for d at p=0.5, where d is your continuous predictor, Y is the logit score, and p is the probability of success in the binomial model. After that you can get the standard error of the inflection point by Taylor series (aka delta method). Another idea is to re-parameterize the logistic model to include explicitly the inflection point, thus you'll get its estimate and standard error directly as a result of the fit. For example, disregarding the g factor predictor (or for each group), a logistic model such as P(d) = 1/(1+exp(log(1/19)(d-d50)/(d95-d50)) includes the inflection point directly (d50) and is a re-parameterized version of the usual logistic model P(d) =1/(1+exp(b0+b1*d)) where parameters b0 and b1 have been replaced by d95 and d50, the predictor at 50% probability (inflection point), and the predictor at 95% probability, respectively. HTH Rubén -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de René Mayer Enviado el: jueves, 01 de diciembre de 2011 14:25 Para: r-help@r-project.org Asunto: [R] how to get inflection point in binomial glm Dear All, I have a binomial response with one continuous predictor (d) and one factor (g) (8 levels dummy-coded). glm(resp~d*g, data, family=binomial) Y=b0+b1*X1+b2*X2 ... b7*X7 how can I get the inflection point per group, e.g., P(d)=.5 I would be grateful for any help. Thanks in advance, René __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RV: Reporting a conflict between ADMB and Rtools on Windows systems
Thanks Gabor and Jan. The batch files solution seems the way to go. Will implement it! Rubén -Mensaje original- De: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Enviado el: jueves, 17 de noviembre de 2011 13:58 Para: Rubén Roa CC: r-help@r-project.org Asunto: Re: [R] RV: Reporting a conflict between ADMB and Rtools on Windows systems On Thu, Nov 17, 2011 at 3:54 AM, Rubén Roa wrote: > > I've just found that there is a conflict between tools used to build R > packages (Rtools) and ADMB due to the need to put Rtools compiler's location > in the PATH environmental variable to make Rtools work. > > On a Windows 7 64bit with Rtools installed I installed ADMB-IDE latest > version and although I could translate ADMB code to cpp code I could not > build the cpp code into an executable via ADMB-IDE's compiler. > > On another Windows machine, a Windows Vista 32bits with Rtools installed I > also installed the latest ADMB-IDE and this time it was not possible to > create the .obj file on the way to build the executable when building with > ADMB-IDE. On this machine I also have a previous ADMB version (6.0.1) that I > used to run from the DOS shell. This ADMB also failed to build the .obj file. > > Now, going to PATH, the location info to make Rtools is: > c:\Rtools\bin;c:\Rtools\MinGW\bin;c:\Rtools\MinGW64\bin;C:\Program > Files (x86)\MiKTeX 2.9\miktex\bin; If from this list I remove the > reference to the compiler c:\Rtools\MinGW\bin then ADMB works again. > So beware of this conflict. Suggestion of a solution will be > appreciated. Meanwhile, I run ADMB code in one computer and build R packages > with Rtools in another computer. The batchfiles Rcmd.bat, Rgui.bat temporarily add R and Rtools to your path by looking them up in the registry and then calling Rcmd.exe or Rgui.exe respectively. When R is finished the path is restored to what it was before. By using those its not necessary to have either on your path.These are self contained batch files with no dependencies so can simply be placed anywhere on the path in order to use them. For those and a few other batch files of interest to Windows users of R see: http://batchfiles.googlecode.com -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RV: Reporting a conflict between ADMB and Rtools on Windows systems
De: Rubén Roa Enviado el: jueves, 17 de noviembre de 2011 9:53 Para: 'us...@admb-project.org' Asunto: Reporting a conflict between ADMB and Rtools on Windows systems Hi, I have to work under Windows, it's a company policy. I've just found that there is a conflict between tools used to build R packages (Rtools) and ADMB due to the need to put Rtools compiler's location in the PATH environmental variable to make Rtools work. On a Windows 7 64bit with Rtools installed I installed ADMB-IDE latest version and although I could translate ADMB code to cpp code I could not build the cpp code into an executable via ADMB-IDE's compiler. On another Windows machine, a Windows Vista 32bits with Rtools installed I also installed the latest ADMB-IDE and this time it was not possible to create the .obj file on the way to build the executable when building with ADMB-IDE. On this machine I also have a previous ADMB version (6.0.1) that I used to run from the DOS shell. This ADMB also failed to build the .obj file. Now, going to PATH, the location info to make Rtools is: c:\Rtools\bin;c:\Rtools\MinGW\bin;c:\Rtools\MinGW64\bin;C:\Program Files (x86)\MiKTeX 2.9\miktex\bin; If from this list I remove the reference to the compiler c:\Rtools\MinGW\bin then ADMB works again. So beware of this conflict. Suggestion of a solution will be appreciated. Meanwhile, I run ADMB code in one computer and build R packages with Rtools in another computer. Best Ruben -- Dr. Ruben H. Roa-Ureta Senior Researcher, AZTI Tecnalia, Marine Research Division, Txatxarramendi Ugartea z/g, 48395, Sukarrieta, Bizkaia, Spain [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Can I tell about someone's academic cheating
-Original Message- From: r-help-boun...@r-project.org on behalf of YuHong Sent: Sun 10/2/2011 3:27 AM To: r-help@r-project.org Subject: [R] Can I tell about someone's academic cheating Hello, Can I tell about someone¡¦s academic cheating behavior in the mailing list? For I knew this person through this R mailing list. Thanks! Regards, Hong Yu -- You have to provide a reproducible example ... Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Poor performance of "Optim"
-Original Message- From: r-help-boun...@r-project.org on behalf of yehengxin Sent: Sat 10/1/2011 8:12 AM To: r-help@r-project.org Subject: [R] Poor performance of "Optim" I used to consider using R and "Optim" to replace my commercial packages: Gauss and Matlab. But it turns out that "Optim" does not converge completely. The same data for Gauss and Matlab are converged very well. I see that there are too many packages based on "optim" and really doubt if they can be trusted! -- Considering that your post is pure whining without any evidence or reproducible example, considering that you speak of 'data' being converged, me think it's your fault, you cann't control optim well enough to get sensible results. There are many ways to use optim eh?, you can pass on the gradients, you can use a variety of methods, you can increase the number of iterations, et cetera, read optim's help, come back with a reproducible example, or quietly stick to your commercial sofware, leaving the whining to yourself. HTH Ruben -- Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -- View this message in context: http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3862229.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cannot allocate vector of size x
Yes, on a recent heavy-duty job -profile likelihood of Tweedie power parameter for a relatively complex glm with hundreds of thousands rows dataframe- I had the "cannot allocate vector ..." error, then I just closed-saved the main workspace, full of large objects, then I did the profiling on a fresh and almost empty (with strictly necessary objects) workspace, and it run successfully. Then I just loaded the two workspaces to one session, and continued happily to the end of the job. :-) Dr. Ruben H. Roa-Ureta Senior Researcher, AZTI Tecnalia, Marine Research Division, Txatxarramendi Ugartea z/g, 48395, Sukarrieta, Bizkaia, Spain -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Jim Holtman Enviado el: miércoles, 21 de septiembre de 2011 12:11 Para: Michael Haenlein CC: r-help@r-project.org Asunto: Re: [R] Cannot allocate vector of size x how much memory do you have on your system? How large are the vectors you are creating? How many other large vectors do you have in memory? Remove all unused objects and do gc() to reclaim some of the memory. Remember all objects are in memory and you have to understand how large they are and how many you have. Ther is more information you have to provide and some more inspection you have to do. Sent from my iPad On Sep 21, 2011, at 5:53, Michael Haenlein wrote: > Dear all, > > I am running a simulation in which I randomly generate a series of > vectors to test whether they fulfill a certain condition. In most > cases, there is no problem. But from time to time, the (randomly) > generated vectors are too large for my system and I get the error > message: "Cannot allocate vector of size x". > > The problem is that in those cases my simulation stops and I have to > start it again manually. What I would like to do is to simply ignore > that the error happened (or probably report that it did) and then > continue with another (randomly) generated vector. > > So my question: Is there a way to avoid that R stops in such a case > and just restarts the program from the beginning as if nothing happened? > I hope I'm making myself clear here ... > > Thanks, > > Michael > >[[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cannot allocate vector of size x
Check one of the examples in ?try It has this heading: ## run a simulation, keep only the results that worked. If your system is Windows, you can also try to increase the memory available for one application, in order to avoid the problem. Do a search for "3GB switch" HTH Dr. Ruben H. Roa-Ureta Senior Researcher, AZTI Tecnalia, Marine Research Division, Txatxarramendi Ugartea z/g, 48395, Sukarrieta, Bizkaia, Spain -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Michael Haenlein Enviado el: miércoles, 21 de septiembre de 2011 11:54 Para: r-help@r-project.org Asunto: [R] Cannot allocate vector of size x Dear all, I am running a simulation in which I randomly generate a series of vectors to test whether they fulfill a certain condition. In most cases, there is no problem. But from time to time, the (randomly) generated vectors are too large for my system and I get the error message: "Cannot allocate vector of size x". The problem is that in those cases my simulation stops and I have to start it again manually. What I would like to do is to simply ignore that the error happened (or probably report that it did) and then continue with another (randomly) generated vector. So my question: Is there a way to avoid that R stops in such a case and just restarts the program from the beginning as if nothing happened? I hope I'm making myself clear here ... Thanks, Michael [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Gradients in optimx
Well, I guess this doesn't make necessary for me to prepare a report with the CatDyn package. However, I am available to test a new optimx R-forge version with my package. Cheers Rubén -- Dr. Ruben H. Roa-Ureta Senior Researcher, AZTI Tecnalia, Marine Research Division, Txatxarramendi Ugartea z/g, 48395, Sukarrieta, Bizkaia, Spain -Mensaje original- De: John C Nash [mailto:nas...@uottawa.ca] Enviado el: domingo, 04 de septiembre de 2011 18:55 Para: Ravi Varadhan CC: Rubén Roa; 'kathryn.lord2...@gmail.com'; 'r-help@r-project.org' Asunto: Re: Gradients in optimx I've started to work on this again, and can confirm there seems to be some sort of bug in the gradient test at the beginning of the current R-forge version of optimx. It is not something obvious, and looks like a mixup in arguments to functions, which have been an issue since I've been trying to trap NaN and Inf returns. Worse, making the control starttests = FALSE fails because there I inadvertently put the initial function calculation inside the block that does the tests. Sigh. Will try to get something done by end of this week. (This will be R-forge version.) JN On 08/31/2011 09:31 AM, Ravi Varadhan wrote: > Hi Reuben, > > > > I am puzzled to note that the gradient check in "optimx" does not work > for you. Can you send me a reproducible example so that I can figure this > out? > > > > John - I think the best solution for now is to issue a "warning" > rather than an error message, when the numerical gradient is not > sufficiently close to the user-specified gradient. > > > > Best, > > Ravi. > > > > --- > > Ravi Varadhan, Ph.D. > > Assistant Professor, > > Division of Geriatric Medicine and Gerontology School of Medicine > Johns Hopkins University > > > > Ph. (410) 502-2619 > > email: rvarad...@jhmi.edu <mailto:rvarad...@jhmi.edu> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gradient function in OPTIMX
Hi, In my package CatDyn, which uses optimx, I included the gradients of 20 version of the model involved. I estimate model parameters with numerical gradients, and at the final estimates I calculate the analytical gradients. In the simplest version of the model the analytical gradients computed post hoc are almost identical to the numerical gradients. This shows that the analytical gradients (whose formulas were obtained by the CAS Maxima) are correct, at least for those simple versions of my model. However, if I try to pass the analytical gradients to optimx in a new optimization, I invariably get the error message that you got: "Gradient function might be wrong - check it!" This happens regardless of the method used (BFGS, spg, Rcgmin). Same as you, when I try to pass the gradients to optim, instead of optimx, the gradients are accepted and computed correctly, but then I cann't use the very nice other features of optimx. I wanted to report this to Ravi and Prof. Nash but I haven't got the time for a full report with several examples and variations. So now that you report it, here I am, seconding you in calling the attention to this apparent problem in optimx. Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -Original Message- From: r-help-boun...@r-project.org on behalf of Kathie Sent: Mon 8/29/2011 11:10 AM To: r-help@r-project.org Subject: [R] gradient function in OPTIMX Dear R users When I use OPTIM with BFGS, I've got a significant result without an error message. However, when I use OPTIMX with BFGS( or spg), I've got the following an error message. > optimx(par=theta0, fn=obj.fy, gr=gr.fy, method="BFGS", > control=list(maxit=1)) Error: Gradient function might be wrong - check it! I checked and checked my gradient function line by line. I could not find anything wrong. Is it a bug or something? I prefer OPTIMX, so I'd like to know why. Thanks a lot in advance Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/gradient-function-in-OPTIMX-tp3775791p3775791.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] PBSmapping, where is Ireland?!
Dear Ewan, I faced this problem and solved it by contacting the package authors, John Schnute and Rowan Haigh, rowan.ha...@dfo-mpo.gc.ca. Here is a function that solves the problem by displacing the Greenwich meridian to longitude 348 leaving Ireland to the right. This longitude does not span any land mass within the limits of the map so it does not cause any disappearing land masses. The function loads the GSHHS data in intermediate resolution, so it takes some time, less than 1 min in my standard laptop, to run. Change the xlim and ylim values to get different fractions of Europe. Last time I contacted them (October 2010), package authors were planning to add some comments about this in PBSmapping user's guide. So you may find more info by digging into the user's guide, or else, contact Rowan. HTH Rubén Dr. Ruben H. Roa-Ureta Senior Researcher, AZTI Tecnalia, Marine Research Division, Txatxarramendi Ugartea z/g, 48395, Sukarrieta, Bizkaia, Spain library(PBSmapping) Euromap <- function(path="C:/Temp", cutLon=348) { fnam <- paste(path,"gshhs_f.b",sep="/"); p1 <- importGSHHS(fnam,xlim=c(-20,360),ylim=c(30,80),level=1,n=0,xoff=0); z <- p1$X>cutLon; p1$X[z] <- p1$X[z]-360; NorthSeaHR <- thinPolys(p1, tol=0.1, filter=3) .initPBS() clr <- PBSval$PBSclr; xlim <- c(-18, 16) ylim <- c(32, 64) WEurope <- clipPolys(NorthSeaHR, xlim=xlim, ylim=ylim) par(mfrow=c(1,1),omi=c(0,0,0,0)) plotMap(WEurope, xlim=xlim, ylim=ylim, col=clr$land, bg=clr$sea, tck=-0.02,mgp=c(2,.75,0), cex=1.2, plt=c(.08,.98,.08,.98)) } Euromap(cutLon=348) -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Ewan Minter Enviado el: martes, 16 de agosto de 2011 14:57 Para: r-help@r-project.org Asunto: [R] PBSmapping, where is Ireland?! Hi folks, I've been using 'PBSmapping' to make a map of Europe with some labels. I've been using the 'worldLL' PolyData, as my computer is too slow to make my own from the GSHHS files. The only p __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] CatDyn - Estimation of wild populations abundance
Dear ComRades, Package CatDyn 1.0-3 is now available on CRAN. The package allows the estimation of the absolute abundance of a wild population during a capture season using a nonlinear and recursive discrete-time structural model. The response variable is the catch, assumed a random variable produced by either a multiplicative stochastic process (lognormal distribution) or an additive stochastic process (normal distribution). The predictor variables are capture effort, assumed observed exactly, and stock abundance, a latent predictor. The data are high frequency (e.g. daily) observations of catch and capture effort during the season. The structural model comes in five varieties. The simplest model assumes that all individuals that can be captured during the season were available at the start of the season, thus during the season the abundance always decreases (a pure depletion model), due to natural mortality and capture mortality. The other versions include one, two, three or four positive perturbations, such as immigration of recruits, that occur at certain time steps during the season. The models also includes nonlinear (power) relations between response and predictors. The models can be fit with natural mortality fixed at a given value or letting this parameter be estimated as well. Taking into account structural and stochastic options, CatDyn can fit up to 20 model versions to the same data. CatDyn uses the recently released optimx package for maximum flexibility in selecting and using numerical methods implemented in R. CatDyn also has analytical gradients but in the current version these gradients are not yet passed to the optimizer; instead they are computed after convergence using numerical gradients, in order to compare analytical versus numerical gradients at the maximum likelihood estimates using various methods. CatDyn estimates all parameters in the log scale and uses function deltamethod() from package msm to return back-transformed estimates vectors and covariance matrices. CatDyn has been used already to estimate abundance of several fish and invertebrate stocks harvested by industrial and artisanal fishing fleets over the world. A manuscript is under preparation with the first application (a squid stock from the Falkland Islands) and will be submitted to a marine science journal. Dr. Rubén H. Roa-Ureta, AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to find the likelihood of a null model in R
I know how to do what you want. However, the fact that you didn't bother to follow the rules of the list eliminates the initial, admittedly faint, enthusiasm to help you. Maybe someone else will guide you anyways. The rules are extremely simple. Read them at the end of this message. If you do you'd probably find the answer to your question yourself. Dr. Rubén H. Roa-Ureta, AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -Original Message- From: r-help-boun...@r-project.org on behalf of Partha Pratim PATTNAIK Sent: Mon 7/25/2011 1:16 PM To: r-help@r-project.org Subject: [R] How to find the likelihood of a null model in R Dear All, I am working on a dataset having the dependent variable as ordinal data(discrete data) and multiple independent variables. I need to find the likelihood for the NULL model.i.e the model with only the dependent variable and all other independent variables as zero. Kindly let me know how to find the likelihood for a NULL model in R. Is there any specific function in R that can do this task? Please share if anyone has any information on this. Thanks and Regards Partha Pattnaik -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] please help! what are the different using log-link function and log transformation?
Funny, I was going to quote your paper with Dichmont in Fisheries Research regarding this, then I forgot to do it, but you yourself came out and posted the explanation. This forum is great! Rubén -Original Message- From: r-help-boun...@r-project.org on behalf of bill.venab...@csiro.au Sent: Sun 6/19/2011 2:36 PM To: gloryk...@hotmail.com; r-help@r-project.org Subject: Re: [R] please help! what are the different using log-link function and log transformation? The two commands you give below are certain to lead to very different results, because they are fitting very different models. The first is a gaussian model for the response with a log link, and constant variance. The second is a gaussian model for a log-transformed response and identity link. On the original scale this model would imply a constant coefficient of variation and hence a variance proportional to the square of the mean, and not constant. Your problem is not particularly an R issue, but a difficulty with understanding generalized linear models (and hence generalized additive models, which are based on them). Bill Venables. From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of pigpigmeow [gloryk...@hotmail.com] Sent: 19 June 2011 17:39 To: r-help@r-project.org Subject: [R] please help! what are the different using log-link function and log transformation? I'm new R-programming user, I need to use gam function. y <- gam(a ~ s(b), family = gaussian(link=log), data) y <- gam(log(a) ~ s(b), family = gaussian (link=identity), data) why [do] these two command [give different] results? I guess these two command results are same, but actally these two command results are different, Why? -- View this message in context: http://r.789695.n4.nabble.com/please-help-what-are-the-different-using-log-link-function-and-log-transformation-tp3608931p3608931.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] please help! what are the different using log-link function and log transformation?
The problem is not that you are new to R. This is a conceptual issue. Let y be the response variable and let {x_i} be a set of predictors. Your first model (identity response and log-link) is saying that y=f(x_1)f(x_2)...f(x_n) + e, e~Normal(0,sigma) i.e. this is an additive observation-error model with constant variance. Your second model (log-response and identity link) is saying that y=f(x_1)f(x_2)...f(x_n)u, u=exp(e), e~Normal(0,sigma) i.e. this a multiplicative observation-error model with variance proportional to the mean. Plot the data versus response and visually examine whether you have heteroscedasticity. If this is true, use your second model, otherwise use the first one. One key to understand these kind of dichotomies is to realize that statistical models are composed of a process part and an observation part. In your models the process part is deterministic and multiplicative but after that, you still have two choices, make the random observation part additive (your first model) or multiplicative (your second model). Needless to say (but I am saying it anyways) these two models will give different results, at the very least because one assumes constant variance (your first model) whereas the other assumes a variance proportional to the mean. In my experience with multiplicative process models, the random observation part shall usually be multiplicative as well because of heteroscedasticity. HTH Rubén - Rubén H. Roa-Ureta, Ph. D. AZTI Tecnalia, Txatxarramendi Ugartea z/g, Sukarrieta, Bizkaia, SPAIN -Original Message- From: r-help-boun...@r-project.org on behalf of pigpigmeow Sent: Sun 6/19/2011 9:39 AM To: r-help@r-project.org Subject: [R] please help! what are the different using log-link function and log transformation? I'm new R-programming user, I need to use gam function. y<-gam(a~s(b),family=gaussian(link=log),data) y<-gam(loga~s(b), family =gaussian (link=identity),data) why these two command results are different? I guess these two command results are same, but actally these two command results are different, Why? -- View this message in context: http://r.789695.n4.nabble.com/please-help-what-are-the-different-using-log-link-function-and-log-transformation-tp3608931p3608931.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] logistic growth model
Write the growth formula in an R script. Define initial par values. Input the size and age data. Plot the size and age data as points. Plot the growth model with the initial par values as a line. Play with the initial par values until you see a good agreement between the model (the line) and the data (the points). Optimise. Re-plot. Plot a residual histogram. Plot a residual scatterplot. Plot a Q-Q residual plot. HTH Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN > -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de Renalda > Enviado el: sábado, 04 de junio de 2011 6:17 > Para: r-help@r-project.org > Asunto: [R] logistic growth model > > I want to Fit a logistic growth model for y = k > *eb0+b1(age)/1 + eb0+b1(age), can some one help on how to get > the initial coefficients b0 and b1? I need to estimate in > order to do the regression analysis. When I run using b0=0.5 > and b1=3.4818, I get the following error > > 397443.8 : 0.5 3.4818 > Error in nls(Height ~ k * exp(b1 + b2 * Age)/(1 + exp(b1 + b2 > * Age)), : >singular gradient > "please tell me what is wrong with my initials values, and > how to get the initial values" > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Building Custom GUIs for R
Check the PBSModelling package. HTH Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN > -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de vioravis > Enviado el: viernes, 20 de mayo de 2011 8:52 > Para: r-help@r-project.org > Asunto: [R] Building Custom GUIs for R > > I am looking to build simple GUIs based on the R codes I > have. The main objective is to hide the scary R codes from > non-programming people and make it easier for them to try out > different inputs. > > For example, > > 1. The GUI will have means to upload a csv file which will be > read by the R code. > > 2. A button to preprocess data (carried out by a R function behind) > > 3. A button to build some models and run simulations > > 4. Space to display visual charts based on the simulations results > > 5. Option to save the results to a csv file or something similar. > > Are there any tools currently available that enable us build > GUIs??? (MATLAB has a GUI builder that enables the users > build custom GUIs). > > Can we make a exe of such GUI (with the R code) and let > people use it without having to install R??? > > Any help on this would be much appreciated?? > > Thank you. > > Ravi > > > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Building-Custom-GUIs-for-R-tp353 > 7794p3537794.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nls problem with R
In addition to Andrew's advice, you should get more familiar with your nonlinear model. >From what you wrote, as T2 tends to infinity, V2 tends to v0*(1-epi). There you have a baseline on the Y-axis towards which your model tends, and this will give you sensible starting values for v0 and epi. Also, as T0 tends to 0, V2 tends to v0(1-epi(1+exp(cl*t0))). There you have another higher point on the Y-axis, and this one will give you additional sensible starting values for cl and t0. Plot the data and the predicted model with your initial values and sends the model-data combination to the optimizer once you see that the predicted line is close to the observed response. V2 <- c(371000, 285000 ,156000, 20600, 4420, 3870, 5500 ) T2 <- c(0.3403 ,0.4181 ,0.4986 ,0.7451 ,1.0069 ,1.553, 1.333) #last value inserted for illustration. #nls2 <- nls(V2~v0*(1-epi+epi*exp(-cl*(T2-t0))),start=list(v0=10^7,epi=0.9 ,cl=6.2,t0=8.7)) v0.ini <- 10^7 epi.ini <- 0.9 cl.ini <- 6.2 t0.ini <- 8.7 V2.pred.ini <- v0.ini*(1-epi.ini+epi.ini*exp(-cl.ini*(T2-t0.ini))) plot(T2,V2) lines(T2,V2.pred.ini) As you can see, with your initial values the line doesn't even show on the plot. No wonder the gradients are singular. So go find better initial values by trial and error and check the results on the plot. Then the optimizer called by nls will finish the job (hopefully). Then you repeat your plot this time with the estimates instead of the initial values. This may get you started in the business of estimating nolinear models. HTH Rubén ________ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN > -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de Andrew Robinson > Enviado el: miércoles, 04 de mayo de 2011 9:15 > Para: sterlesser > CC: r-help@r-project.org > Asunto: Re: [R] nls problem with R > > The fact that T2 and V2 are of different lengths seems like a > likely culprit. Other than that, you need to find start > points that do not lead to a singular gradient. There are > several books that provide advice on obtaining initial > parameter estimates for non-linear models. Google Books > might help you. > > Cheers > > Andrew > > > > > On Tue, May 03, 2011 at 09:08:03PM -0700, sterlesser wrote: > > the original data are > > V2 =c(371000,285000 ,156000, 20600, 4420, 3870, 5500 ) T2=c( 0.3403 > > ,0.4181 ,0.4986 ,0.7451 ,1.0069 ,1.553) > > > nls2=nls(V2~v0*(1-epi+epi*exp(-cl*(T2-t0))),start=list(v0=10^7,epi=0.9 > > ,cl=6.2,t0=8.7)) > > after execution error occurs as below > > > > Error in nlsModel(formula, mf, start, wts) : > > singular gradient matrix at initial parameter estimates Error in > > nlsModel(formula, mf, start, wts) : > > singular gradient matrix at initial parameter estimates > In addition: > > Warning messages: > > 1: In lhs - rhs : > > longer object length is not a multiple of shorter object length > > 2: In .swts * attr(rhs, "gradient") : > > longer object length is not a multiple of shorter object length > > > > could anyone help me ?thansks > > > > -- > > View this message in context: > > > http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3494454.htm > > l Sent from the R help mailing list archive at Nabble.com. > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Andrew Robinson > Program Manager, ACERA > Department of Mathematics and StatisticsTel: > +61-3-8344-6410 > University of Melbourne, VIC 3010 Australia > (prefer email) > http://www.ms.unimelb.edu.au/~andrewpr Fax: > +61-3-8344-4599 > http://www.acera.unimelb.edu.au/ > > Forest Analytics with R (Springer, 2011) > http://www.ms.unimelb.edu.au/FAwR/ > Introduction to Scientific Programming and Simulation using R > (CRC, 2009): > http://www.ms.unimelb.edu.au/spuRs/ > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple lattice question
> -Mensaje original- > De: Peter Ehlers [mailto:ehl...@ucalgary.ca] > Enviado el: jueves, 31 de marzo de 2011 18:09 > Para: Rubén Roa > CC: r-help@r-project.org > Asunto: Re: [R] Simple lattice question > > On 2011-03-31 06:58, Rubén Roa wrote: > > Thanks Peters! > > > > Just a few minor glitches now: > > > > require(lattice) > > data<- > data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)), > > x=rpois(60,10), > > y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5), > > z=rep(1:4,15)) > > > xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4 ,col='black',type='b', > > key=list(x = .65, y = .75, corner = c(0, 0), points=TRUE, > >lines=TRUE, pch=1:4, lty=1:4, type='b', > >text=list(lab = as.character(unique(data$z) > > > > I have an extra column of symbols on the legend, > > > > and, > > > > would like to add a title to the legend. Such as 'main' for plots. > > > > Any suggestions? > > for key(), make 'lines' into a list: > > xyplot(x~y|SP,data=data,groups=z,layout=c(2,3), > pch=1:4,lty=1:4,col='black',type='b', > key=list(x = .65, y = .75, corner = c(0, 0), > title="title here", cex.title=.9, lines.title=3, > lines=list(pch=1:4, lty=1:4, type='b'), > text=list(lab = as.character(unique(data$z) > > Peter Ehlers ... that works. Thanks Peter (sorry I misspelled your name b4). The clean code is now: require(lattice) data <- data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)), x=rpois(60,10), y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5), z=rep(1:4,15)) xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='black',type='b', key=list(x = .65, y = .75, corner = c(0, 0), lines=list(pch=1:4, lty=1:4, type='b'), title=expression(CO^2), text=list(lab = as.character(unique(data$z) David's code works too (thanks to you too!) and is somewhat shorter xyplot(x~y|SP, data=data,groups=z, layout=c(2,3), par.settings=simpleTheme(pch=1:4,lty=1:4,col='black'), type="b", auto.key = list(x = .6, y = .7, lines=TRUE, corner = c(0, 0))) but the lines and symbols are on different columns, and the line types look as if they were in bold. Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple lattice question
Thanks Peters! Just a few minor glitches now: require(lattice) data <- data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)), x=rpois(60,10), y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5), z=rep(1:4,15)) xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='black',type='b', key=list(x = .65, y = .75, corner = c(0, 0), points=TRUE, lines=TRUE, pch=1:4, lty=1:4, type='b', text=list(lab = as.character(unique(data$z) I have an extra column of symbols on the legend, and, would like to add a title to the legend. Such as 'main' for plots. Any suggestions? Rubén ____ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN > -Mensaje original- > De: Peter Ehlers [mailto:ehl...@ucalgary.ca] > Enviado el: jueves, 31 de marzo de 2011 15:41 > Para: Rubén Roa > CC: r-help@r-project.org > Asunto: Re: [R] Simple lattice question > > On 2011-03-31 03:39, Rubén Roa wrote: > > > > DeaR ComRades, > > > > require(lattice) > > data<- > data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)), > > x=rpois(60,10), > > y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5), > > z=rep(1:4,15)) > > > xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='bl > > ack',type='b') > > > > How do I put a legend for the grouping variable in the > empty upper-right panel? > > See the help page for xyplot for an example using the iris data. > You just need to add something like > > auto.key = list(x = .6, y = .7, corner = c(0, 0)) > > to your lattice call. See the 'key' entry on the ?xyplot page. > > Peter Ehlers > > > > > TIA > > > > Rubén > > > > > __ > > __ > > > > Dr. Rubén Roa-Ureta > > AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g > > 48395 Sukarrieta (Bizkaia) > > SPAIN > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple lattice question
DeaR ComRades, require(lattice) data <- data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)), x=rpois(60,10), y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5), z=rep(1:4,15)) xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='black',type='b') How do I put a legend for the grouping variable in the empty upper-right panel? TIA Rubén ________ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Degrees of freedom for lm in logLik and AIC
However, shouldn't _free parameters_ only be counted for degrees of freedom and for calculation of AIC? The sigma parameter is profiled out in a least-squares linear regression, so it's not free, it's not a dimension of the likelihood. Just wondering ... ____ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN > -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de Frank Harrell > Enviado el: lunes, 28 de marzo de 2011 15:44 > Para: r-help@r-project.org > Asunto: Re: [R] Degrees of freedom for lm in logLik and AIC > > Thank you Peter. I didn't realize that was the convention used. > Frank > > > Peter Dalgaard-2 wrote: > > > > On Mar 28, 2011, at 05:36 , Frank Harrell wrote: > > > > > I have a question about the computation of the degrees > of freedom > > in a linear > model: > > > > > > x <- runif(20); y <- runif(20) > f <- lm(y > ~ x) > > > logLik(f) > 'log Lik.' -1.968056 (df=3) > > The 3 > is coming > > from f$rank + 1. Shouldn't it be f$rank? This affects > AIC(f). > > > > I count three parameters in a simple linear regression: > alpha, beta, > > sigma. > > > > From a generic-likelihood point of view, I don't see how > you can omit > > the last one. > > > > -pd > > > > -- > > Peter Dalgaard > > Center for Statistics, Copenhagen Business School Solbjerg Plads 3, > > 2000 Frederiksberg, Denmark > > Phone: (+45)38153501 > > Email: pd@cbs.dk Priv: pda...@gmail.com > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > - > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: > http://r.789695.n4.nabble.com/Degrees-of-freedom-for-lm-in-log > Lik-and-AIC-tp3410687p3411759.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R helps win competitions
DeaR ComRades, This is a quote from a News article in Science's 11-February issue, about competitions to model data: "For Chris Raimondi, a search-engine expert based in Baltimore, Maryland, and winner of the HIV-treatment competition, the Kaggle contest motivated him to hone his skills in a newly learned computer language called R, which he used to encode the winning data model. Raimondi also enjoys the competitive aspect of Kaggle challenges: "It was nice to be able to compare yourself with others; ... it became kind of addictive. ... I spent more time on this than I should." If you are interested read the full article here: http://www.sciencemag.org/content/331/6018/698.full Rubén ________ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R² for non-linear model
> -Mensaje original- > De: Kjetil Halvorsen [mailto:kjetilbrinchmannhalvor...@gmail.com] > Enviado el: jueves, 17 de marzo de 2011 16:19 > Para: Rubén Roa > CC: Alexx Hardt; r-help@r-project.org > Asunto: Re: [R] R² for non-linear model > > see inline. > > On Thu, Mar 17, 2011 at 4:58 AM, Rubén Roa wrote: > > Hi Alexx, > > > > I don't see any problem in comparing models based on > different distributions for the same data using the AIC, as > long as they have a different number of parameters and all > the constants are included. > > For example, you can compare distribution mixture models > with different number of components using the AIC. > > This is one example: > > Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth > With Multiple Length Frequency Data. Journal of Biological, > Agricultural and Environmental Statistics 15:416-429. > > Here is another example: > > www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf > > Prof. Dayton writes above that one advantage of AIC over > hypothesis testing is: > > "(d) Considerations related to underlying distributions > for random > > variables can be incorporated into the > decision-making process > > rather than being treated as an assumption whose robustness > must be > > considered (e.g., models based on normal densities > and on log-normal densities can be compared)." > > My reading of this is that AIC can be used to compare models > with densities relative to the same dominating measure. > > Kjetil I think this is correct. It is probably not wise to use the AIC to compare distribution models based on the counting measure with distribution models based on the Lebesgue measure! Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R² for non-linear model
Hi Alexx, I don't see any problem in comparing models based on different distributions for the same data using the AIC, as long as they have a different number of parameters and all the constants are included. For example, you can compare distribution mixture models with different number of components using the AIC. This is one example: Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length Frequency Data. Journal of Biological, Agricultural and Environmental Statistics 15:416-429. Here is another example: www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf Prof. Dayton writes above that one advantage of AIC over hypothesis testing is: "(d) Considerations related to underlying distributions for random variables can be incorporated into the decision-making process rather than being treated as an assumption whose robustness must be considered (e.g., models based on normal densities and on log-normal densities can be compared)." Last, if you read Akaike's theorem you will see there is nothing precluding comparing models built on different distributional models. Here it is: " the expected (over the sample space and the space of parameter estimates) maximum log-likelihood of some data on a working model overshoots the expected (over the sample space only) maximum log-likelihood of the data under the true model that generated the data by exactly the number of parameters in the working model." A remarkable result. Rubén -Original Message- From: r-help-boun...@r-project.org on behalf of Alexx Hardt Sent: Wed 3/16/2011 7:42 PM To: r-help@r-project.org Subject: Re: [R] R² for non-linear model Am 16.03.2011 19:34, schrieb Anna Gretschel: > Am 16.03.2011 19:21, schrieb Alexx Hardt: >> And to be on-topic: Anna, as far as I know anova's are only useful to >> compare a submodel (e.g. with one less regressor) to another model. >> > thanks! i don't get it either what they mean by fortune... It's an R-package (and a pdf [1]) with collected quotes from the mailing list. Be careful with the suggestion to use AIC. If you wanted to compare two models using AICs, you need the same distribution (that is, Verteilungsannahme) in both models. To my knowledge, there is no way to "compare" a gaussian model to an exponential one (except common sense), but my knowledge is very limited. [1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf -- alexx@alexx-fett:~$ vi .emacs __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to produce glm graph
In addition to David's suggestion, you might want to examine the termplot function, ?termplot HTH -Original Message- From: r-help-boun...@r-project.org on behalf of David Winsemius Sent: Sat 11/20/2010 3:54 PM To: Sonja Klein Cc: r-help@r-project.org Subject: Re: [R] How to produce glm graph On Nov 20, 2010, at 4:27 AM, Sonja Klein wrote: > > I'm very new to R and modeling but need some help with visualization > of glms. [snip] > Is there a way to produce a graph in R that has these features? Of course. Modeling would be of little value without such capability. In R, regression functions produce an object with a particular class ("glm" in this case) and there is generally have predict method for each class. There is also a vector of fitted values within the object that may be accessed with the fitted or fitted values functions. The predict.glm help page has a worked example. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 0.3 is not 0.3, bug in seq() function?
Enrico, The same happens with other numbers/sequences. seq(0.1,0.9,0.1)[7]==0.7 [1] FALSE seq(0.1,1.3,0.1)[12]==1.2 [1] FALSE Rounding seems to fix it, round(seq(0.1,0.5,0.1),1)[3]==0.3 round(seq(0.1,0.9,0.1),1)[7]==0.7 round(seq(0.1,1.3,0.1),1)[12]==1.2 They all return TRUE. Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN > -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de Enrico R. Crema > Enviado el: jueves, 28 de octubre de 2010 12:24 > Para: r-help@r-project.org > Asunto: [R] 0.3 is not 0.3, bug in seq() function? > > Dear List, > > I've been running a numerical simulation and I found this odd > error in my code where the which command could not identify > which rows of a column of data.frame were corresponding to > the value 0.3. There are 7 unique values in this column > (0.01,0.05,0.1,0.2,0.3,0.4,0.5), and this does not work only > for 0.3. So I looked at the column and manually tried to use > the which() command, and the results were all FALSE despite I > could see those number. So I recreated my sequence of number > and tested: > > seq(0.1,0.5,0.1)[3]==0.3 > > which gave me FALSE!!! All the other numbers > (0.1,0.2,0.4,0.5) give me TRUE, but 0.3 was not working. So I did: > > seq(0.1,0.5,0.1)[3]-0.3 > > which gave me 5.551115e-17. If you run a similar sequence like: > > seq(0.2,0.6,0.1)[2]==0.3 > > this will still give me FALSE. No, for my own purpose, I > fixed the problem in this way: > > zerothree=seq(0.1,0.5,0.1)[3] > which(data[,1]==zerothree) > > but I guess this bug is a bit of problem...Apologies if it is > the wrong place to post this bug, and apologies also if this > was a known issue. My version of R is : > > platform x86_64-pc-linux-gnu > arch x86_64 > os linux-gnu > system x86_64, linux-gnu > status > major 2 > minor 10.1 > year 2009 > month 12 > day14 > svn rev50720 > language R > version.string R version 2.10.1 (2009-12-14) > > > Many Thanks, > > Enrico > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Set value if else...
x <- data.frame(x=1:10) require(gtools) x$y <- ifelse(odd(x$x),0,1) HTH R. Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN > -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de Joel > Enviado el: viernes, 15 de octubre de 2010 10:16 > Para: r-help@r-project.org > Asunto: [R] Set value if else... > > > Hi > > I want to set a variable to either 1 or 0 depending on an > value of a dataframe and then add this as a colum to the dataframe. > > This could be done with a loop but as we are able to do > questions on a complete row or colum without a loop it would > be sweet if it could be done. > > for example: > > table: > > Name Age > Joel 24 > agust 17 > maja40 > and so on... > > And what I want to do is a command that gives me > VoteRight<-1 if table$age>18 else set VoteRight to 0 > > Then use cbind or something to add it to the table. > > so i get > Name Age VoteRight > Joel 241 > agust 170 > maja401 > > And as I said before im guessing this can be done without a loop... > > //Joel > -- > View this message in context: > http://r.789695.n4.nabble.com/Set-value-if-else-tp2996667p2996667.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Memory limit problem
Hi, Probably Windows cann't allocate enough contiguous free space. Try this: Find boot.ini (usually at the root c:\) Without changing anything else, add this line at the end of the script: multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional 3GB" /3GB /noexecute=optin /fastdetect Reboot. When prompted select the 3GB boot-up. Performance will decay but bigger objects could be saved to disk. Hope that this will be enough to get you code working. When finished re-boot and start with the normal boot-up. An example of a boot.ini script that I had to prepare for one big simulation work in an XP machine is at the end of my message. HTH Rubén [boot loader] timeout=30 default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS [operating systems] multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /noexecute=optin /fastdetect multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional 3GB" /3GB /noexecute=optin /fastdetect -Original Message- From: r-help-boun...@r-project.org on behalf of Tim Clark Sent: Tue 10/12/2010 5:49 AM To: r help r-help Cc: Tim Clark Subject: [R] Memory limit problem Dear List, I am trying to plot bathymetry contours around the Hawaiian Islands using the package rgdal and PBSmapping. I have run into a memory limit when trying to combine two fairly small objects using cbind(). I have increased the memory to 4GB, but am being told I can't allocate a vector of size 240 Kb. I am running R 2.11.1 on a Dell Optiplex 760 with Windows XP. I have pasted the error message and summaries of the objects below. Thanks for your help. Tim > xyz<-cbind(hi.to.utm,z=b.depth$z) Error: cannot allocate vector of size 240 Kb > memory.limit() [1] 4000 > memory.size() [1] 1971.68 > summary(hi.to.utm) Object of class SpatialPoints Coordinates: min max x 708745.5 923406.7 y 2046153.1 2327910.9 Is projected: TRUE proj4string : [+proj=utm +zone=4 +datum=NAD83 +ellps=GRS80 +towgs84=0,0,0] Number of points: 15328 > str(hi.to.utm) Formal class 'SpatialPoints' [package "sp"] with 3 slots ..@ coords : num [1:15328, 1:2] 708746 710482 712218 713944 715681 ... .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : NULL .. .. ..$ : chr [1:2] "x" "y" ..@ bbox : num [1:2, 1:2] 708746 2046153 923407 2327911 .. ..- attr(*, "dimnames")=List of 2 .. .. ..$ : chr [1:2] "x" "y" .. .. ..$ : chr [1:2] "min" "max" ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots .. .. ..@ projargs: chr " +proj=utm +zone=4 +datum=NAD83 +ellps=GRS80 +towgs84=0,0,0" > summary(b.depth) x y z Min. :-157.0 Min. :18.50 Min. :-5783 1st Qu.:-156.6 1st Qu.:18.98 1st Qu.:-4565 Median :-156.1 Median :19.80 Median :-3358 Mean :-156.1 Mean :19.73 Mean :-3012 3rd Qu.:-155.5 3rd Qu.:20.41 3rd Qu.:-1601 Max. :-155.0 Max. :21.00 Max. : 0 > str(b.depth) 'data.frame': 15328 obs. of 3 variables: $ x: num -157 -157 -157 -157 -157 ... $ y: num 21 21 21 21 21 ... $ z: num -110 -114 -110 -88 -76 -122 -196 -224 -240 -238 ... Tim Clark Marine Ecologist National Park of American Samoa __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [Fwd: RE: maximum likelihood problem]
If your problem was as you described it, you'd simply find the 1st derivative of your eq. w.r.t. k, equate to 0, and then solve for k (and check that the solution is a maximum). But I guess what you really want to do is to estimate k from data given your equation and _another model_ for the likelihood of the data. If that is the case then you haven't yet defined the model for the likelihood of the data as a function of your single parameter k. You have to define an R function that contains your process model (apparently your equation) and your likelihood model, which will be made from some probability model for the data. Try this to get help on how to write new functions: ?"function" After defining this function with the two models then you can maximize the likelihood of the data as a function of the process model parameter k. If there is only one parameter then it is recommended to use optimise(), instead of optim(). It is not necessary to maximize, just minimize the negative of the log(likelihood) once you have defined what this likelihood is. HTH Rubén -Original Message- From: r-help-boun...@r-project.org on behalf of mlar...@rsmas.miami.edu Sent: Sat 10/2/2010 3:11 AM To: r-help@r-project.org Subject: [R] [Fwd: RE: maximum likelihood problem] I forgot to add that I first gave a starting value for K. Nonlinear least squares won't work because my errors are not normally distributed. Any advide on my maximum likelihood function would be greatly appreciated. Original Message Subject: RE: [R] maximum likelihood problem From:"Ravi Varadhan" Date:Fri, October 1, 2010 5:10 pm To: mlar...@rsmas.miami.edu r-help@r-project.org -- Do you want to do a nonlinear least-squares estimation (which is MLE if the errors are Gaussian)? If so, you have to define a function that takes the parameter (k) and data matrix (LR, T, LM), as arguments, and returns a scalar, which is the residual sum of squares. Then you can optimize (minimize) that function. Ravi. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of mlar...@rsmas.miami.edu Sent: Friday, October 01, 2010 4:40 PM To: r-help@r-project.org Subject: [R] maximum likelihood problem I am trying to figure out how to run maximum likelihood in R. Here is my situation: I have the following equation: equation<-(1/LR-(exp(-k*T)*LM)*(1-exp(-k))) LR, T, and LM are vectors of data. I want to R to change the value of k to maximize the value of equation. My attempts at optim and optimize have been unsuccessful. Are these the recommended functions that I should use to maximize my equation? With optim I wanted the function to be maximized so I had to make the fnscale negative. Here is what I put: L<-optim(k,equation,control=(fnscale=-1)) My result: Error: could not find function "fn" Here is what I put for optimize: L<-optimise(equation,k,maximum=TRUE) My result: Error: 'xmin' not less than 'xmax' Any advise would be greatly appreciated. Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geoR - likfit failure
Questions about geoR should be directed to R-SIG-GEO. Anyway, you should provide more info about your problem. Read the Posting Guide. Have you tried changing the model? Sometimes falling back from Matern to exponential or Gaussian allows successful convergence. HTH Rubén De: r-help-boun...@r-project.org en nombre de Mohammed Ouassou Enviado el: mié 04/08/2010 10:36 Para: R-help@r-project.org Asunto: [R] geoR - likfit failure Hi I'm using geoR package to perform linear spatial interpolation(OK). The function likfit() fails to compute REML. The error meassage is : Error in solve.default(v$varcov, xmat); How I can find out that likfit() is failed to process and retrieving the error message ? Thank you so much for your help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ed50
> -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de Dipa Hari > Enviado el: lunes, 12 de julio de 2010 22:19 > Para: r-help@r-project.org > Asunto: [R] ed50 > > I am using semiparametric Model > library(mgcv) > sm1=gam(y~x1+s(x2),family=binomial, f) > > How should I find out standard error for ed50 for the above > model ED50 =( -sm1$coef[1]-f(x2)) / sm1$coef [2] > > f(x2) is estimated value for non parametric term. > > Thanks Two ways, 1) Re-parameterise the model so that ed50 is an explicit parameter in the model, or 2) Taylor series (aka Delta method) using the standard errors of coef[1], coef[2] and their correlation. HTH ________ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selection of optim parameters
> -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de Fabian Gehring > Enviado el: lunes, 05 de julio de 2010 21:53 > Para: r-help@r-project.org > Asunto: [R] selection of optim parameters > > Hi all, > > I am trying to rebuild the results of a study using a > different data set. I'm using about 450 observations. The > code I've written seems to work well, but I have some > troubles minimizing the negative of the LogLikelyhood > function using 5 free parameters. > > As starting values I am using the result of the paper I am rebuiling. > The system.time of the calculation of the function is about 0.65 sec. > Since the free parameters should be within some boundaries I > am using the following command: > > optim(fn=calculateLogLikelyhood, c(0.4, 2, 0.4, 8, 0.8), > lower=c(0,0,0,0,0), upper=c(1, 5, Inf, Inf, 1), control=list(trace=1, > maxit=1000)) > > Unfortunately the result doesn't seem to be reasonable. 3 of > the optimized parameters are on the boundaries. 1) Your parameters seem to vary over several orders of magnitude. The control argument has a parscale parameter that can be used to re-scale all parameters to the same order of magnitude. Alternatively, your could estimate the log of your parameters, say par=c(log(0.4), log(2), log(0.4), log(8), log(0.8) (and equivalent changes in lower and upper), and in your function, instead of the parameter value, use exp(parameter value9. That way the _numerical optimization_ occurs in the log space whereas the _function evaluation_ occurs in the original space. This transformation approach would make your parameter estimates and their hessian matrix (in case you are interested in the hessian) be output in the transformed space, so estimates and their covariance matrix will have to be back-transformed. For estimates just use exp(), whereas for the covariance matrix you might have to use something like Taylor series. 2) Did you use "L-BFGS-B" in the method argument of optim()? This method admits box-constrained optimization whereas the default (which you seem to be using, Nelder-Mead) in unconstrained, as far as I know. > Unfortunately I don't have much experience using > optimizatzion methods. > That's why I am asking you. > Do you have any hints for me what should be taken into > account when doing such an optimization. > > Is there a good way to implement the boundaries into the code > (instead of doing it while optimizing)? I've read about > parscale in the help-section. Unfortunately I don't really > know how to use it. And anyways, could this help? What other > points/controls should be taken into account? About using parscale, see Ravi Varadhan's post "Re: optim() not finding optimal values" the 27th of June. HTH Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim() not finding optimal values
Derek, As a general strategy, and as an alternative to parscale when using optim, you can just estimate the logarithm of your parameters. So in optim the par argument would contain the logarithm of your parameters, whereas in the model itself you would write exp(par) in the place of the parameter. The purpose of this is to bring all parameters to a similar scale to aid the numerical algorithm in finding the optimum over several dimensions. Due to the functional invariance property of maximum likelihood estimates your transformed pars back to the original scale are also the MLEs of the pars in your model. If you were using ADMB you'd get the standard error of the pars in the original scale simply by declaring them sd_report number class. With optim, you would get the standard error of pars in the original scale post-hoc by using Taylor series (a.k.a. Delta method) which in this case is very simple since the transformation is just the exponential. In relation to your model/data combination, since you have only 17 years of data and just one series of cpue, and this is a rather common case, you may want to give the choice to set B0=K, i.e. equilibrium conditions at the start, in your function, to reduce the dimensionality of your profile likelihood approximation thus helping the optimizer. HTH Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN > -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de Derek Ogle > Enviado el: sábado, 26 de junio de 2010 22:28 > Para: R (r-help@R-project.org) > Asunto: [R] optim() not finding optimal values > > I am trying to use optim() to minimize a sum-of-squared > deviations function based upon four parameters. The basic > function is defined as ... > > SPsse <- function(par,B,CPE,SSE.only=TRUE) { > n <- length(B) # get number of > years of data > B0 <- par["B0"]# isolate B0 parameter > K <- par["K"] # isolate K parameter > q <- par["q"] # isolate q parameter > r <- par["r"] # isolate r parameter > predB <- numeric(n) > predB[1] <- B0 > for (i in 2:n) predB[i] <- > predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)-B[i-1] > predCPE <- q*predB > sse <- sum((CPE-predCPE)^2) > if (SSE.only) sse > else list(sse=sse,predB=predB,predCPE=predCPE) > } > > My call to optim() looks like this > > # the data > d <- data.frame(catch= > c(9,113300,155860,181128,198584,198395,139040,109969,71896 > ,59314,62300,65343,76990,88606,118016,108250,108674), > cpe=c(109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60. > 5,89.9,117.0,93.0,116.6,90.0,105.1)) > > pars <- c(80,100,0.0001,0.17) # put > all parameters into one vector > names(pars) <- c("B0","K","q","r") # > name the parameters > ( SPoptim <- optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim() > > > This produces parameter estimates, however, that are not at > the minimum value of the SPsse function. For example, these > parameter estimates produce a smaller SPsse, > > parsbox <- c(732506,1160771,0.0001484,0.4049) > names(parsbox) <- c("B0","K","q","r") > ( res2 <- SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) ) > > Setting the starting values near the parameters shown in > parsbox even resulted in a movement away from (to a larger > SSE) those parameter values. > > ( SPoptim2 <- optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# > run optim() > > > This "issue" most likely has to do with my lack of > understanding of optimization routines but I'm thinking that > it may have to do with the optimization method used, > tolerance levels in the optim algorithm, or the shape of the > surface being minimized. > > Ultimately I was hoping to provide an alternative method to > fisheries biologists who use Excel's solver routine. > > If anyone can offer any help or insight into my problem here > I would be greatly appreciative. Thank you in advance. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem with package installation
It was the Panda antivirus installed in the main corporate server of my company. This antivirus program was corrupting the zip files containing the packages for Windows systems. The problem was solved by including CRAN repositories as trustable webpages. Best, Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN > -Mensaje original- > De: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] > Enviado el: lunes, 21 de junio de 2010 16:52 > Para: Rubén Roa > CC: r-help@r-project.org > Asunto: Re: [R] Problem with package installation > > No idea, do you have the right repository seledcted? Is there > nay proxy in your network that may change the download data > for some reason? I have not observed that before and I > haven't seen similar reports before. > > Best, > Uwe Ligges > > On 21.06.2010 11:03, Rubén Roa wrote: > > > > Dear ComRades, > > > > I am having the "wrong MD5 checksums" error with every > package I try to install. > > It happened with R 2.11.0, then I updated to R 2.11.1, same thing. > > > > sessionInfo() > > R version 2.11.1 (2010-05-31) > > i386-pc-mingw32 > > > > locale: > > [1] LC_COLLATE=Spanish_Spain.1252 > LC_CTYPE=Spanish_Spain.1252LC_MONETARY=Spanish_Spain.1252 > > [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > loaded via a namespace (and not attached): > > [1] tools_2.11.1 > > > > I triend changing the mirror, to no avail. Then I tried downloading > > zip files and installing from local zips but again I got the "wrong > > MD5 checksums" error > > > > utils:::menuInstallLocal() > > files R/MASS.rdb, data/Rdata.rdb, help/MASS.rdb have the wrong MD5 > > checksums > > > > In some cases, another error message pointed to a namespace > problem, for example: > > > > library(Formula) > > Error in get(Info[i, 1], envir = env) : > >internal error -3 in R_decompress1 > > Error: package/namespace load failed for 'Formula' > > > > Any hints? > > > > > __ > > __ > > > > Dr. Rubén Roa-Ureta > > AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g > > 48395 Sukarrieta (Bizkaia) > > SPAIN > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with package installation
Dear ComRades, I am having the "wrong MD5 checksums" error with every package I try to install. It happened with R 2.11.0, then I updated to R 2.11.1, same thing. sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 LC_MONETARY=Spanish_Spain.1252 [4] LC_NUMERIC=C LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.11.1 I triend changing the mirror, to no avail. Then I tried downloading zip files and installing from local zips but again I got the "wrong MD5 checksums" error utils:::menuInstallLocal() files R/MASS.rdb, data/Rdata.rdb, help/MASS.rdb have the wrong MD5 checksums In some cases, another error message pointed to a namespace problem, for example: library(Formula) Error in get(Info[i, 1], envir = env) : internal error -3 in R_decompress1 Error: package/namespace load failed for 'Formula' Any hints? ________ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [ADMB Users] an alternative to R for nonlinear stat models
De: Chris Gast [mailto:cmg...@gmail.com] Enviado el: jueves, 17 de junio de 2010 22:32 Para: Rubén Roa CC: r-help@r-project.org; us...@admb-project.org Asunto: Re: [ADMB Users] an alternative to R for nonlinear stat models I spoke with my colleague who did most of the testing, and he has informed me that much of the hessian sensitivity actually came from a separate program (based on Numerical Recipes in C++ code) that did not use optim(), after having stopped using optim() due to speed issues. In my experience with optim, the reltol argument has improved important in this regard. Very small changes in the parameter estimates at the converged solution (influenced by reltol) can lead to different standard error estimates by inverting the hessian, especially for parameter estimates close to zero (as vulnerability coefficients can be in many models with such a feature). It is a limitation of the finite difference method for computing the hessian based on optimal parameter estimates. Chris If the problem originates in estimates being close to zero, a simple transformation (log or exp) might help. However, what I would really like to know is the performance of different methods of the optim function. I guess the reltol parameter and other control parameters would act different depending on the method, whether Nelder-Mead, CG, etc. I am currently experimenting with CG and although it is slow for my model, it has always produced hessian matrices that were invertible and with all diagonals positive after inversion (unlike Nelder-Mead, the default). Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [ADMB Users] an alternative to R for nonlinear stat models
De: users-boun...@admb-project.org [mailto:users-boun...@admb-project.org] En nombre de Chris Gast Enviado el: miércoles, 16 de junio de 2010 21:11 Para: Arni Magnusson CC: r-help@r-project.org; us...@admb-project.org Asunto: Re: [ADMB Users] an alternative to R for nonlinear stat models Hi Arni (and others), My dissertation work involves use (and extension) of models of the same ilk (sometimes exactly the same) as those described by Nancy Gove and John Skalski in their 2002 article. I began with R, and moved to my own home-brewed C/C++ programs for the sake of of speed when fitting models and real and simulated data. In addition, we found that the estimated standard errors (based on the inverse hessian output from optim()) were very sensitive to tolerance criteria--often changing orders of magnitude. Hi, Regarding the last bit, optim() has several methods (Nelder-Mead, simulated annealing, conjugate gradient, etc). It is interesting to me which method produced what result with the standard errors from the inverse Hessian. Can you briefly ellaborate? Thanks Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding year information to a scatter plot
> -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de Ozan Bakis > Enviado el: domingo, 02 de mayo de 2010 21:25 > Para: r-help@r-project.org > Asunto: [R] adding year information to a scatter plot > > Hi R users, > > I would like to add year information to each point in a > scatter plot. The following example shows what i mean: > > df=data.frame(year=c(2001,2002,2003),a=c(8,9,7),b=c(100,90,95)) > df > plot(b~a,data=df) > > Now, I would like to see which point belongs to 2001, 2002 > and 2003 in the above graphic. > > Thank you very much, > ozan Use text, df <- data.frame(year=c(2001,2002,2003),a=c(8,9,7),b=c(100,90,95)) df plot(b~a,data=df,ylim=c(.98*min(b),1.02*max(b))) text(x=df$a,y=df$b-.01*max(df$b),lab=format(df$year)) ________ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] non linear estimation
> -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de JamesHuang > Enviado el: jueves, 29 de abril de 2010 3:38 > Para: r-help@r-project.org > Asunto: Re: [R] non linear estimation > > > any suggestion? actually I just wanna know if there is a > package for non linear estimation with restriction, thanks. I > am a new for R I do not know if there is any specific package for optimization with restrictions, but you can use optim with method="L-BFGS-B" This only lets you set bounds of single parameters, so for restrictions such as a+b<19 in Y=a+(b+c*x)*exp(-d*x) you could deduce your restrictions in terms of single parameters (for example, in your original mail you put that a>10, a+b<19, and b<3, so the restriction a+b>19 is actually redundant), or else you could think of some re-parameterization that would put a+b (and all other multi-par restrictions) as a single parameter. Wait, is this a homework? ________ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Converting daily data series to monthly series
> -Mensaje original- > De: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] En nombre de zow...@ncst.go.ke > Enviado el: miércoles, 21 de abril de 2010 5:16 > Para: r-help@r-project.org > Asunto: [R] Converting daily data series to monthly series > > Hi Users, > I have daily series of data from 1962 - 2000, with the data > for February 29th in leap years excluded, leaving 365 daily > values for each year. I wish to convert the daily series to > monthly series. How can I do this using the zoo package or > any other package? > > Thanks > > ZABLONE OWITI > GRADUATE STUDENT > Nanjing University of Information, Science and Technology > College of International Education Let df be your dataframe, #In case you have to format your data as date before setting the montth df$date <- as.Date(df$date, "%d/%m/%Y") #Getting year, month and week from your correctly formatted date df$Year <- as.numeric(format(df$date, "%Y"))#Year df$Month <- as.numeric(format(df$date, "%m"))#Month df$Week <- as.numeric(format(df$date, "%W")) +1 #Week Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Truncated Normal Distribution and Truncated Pareto distribution
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Julia Cains Enviado el: lunes, 19 de abril de 2010 9:22 Para: r-help@r-project.org Asunto: [R] Truncated Normal Distribution and Truncated Pareto distribution Dear R helpers, I have a bimodal dataset dealing with loss amounts. I have divided this dataset into two with the bounds for the first dataset i.e. dataset-A being 5,000$ to 100,000$ and the dataset-B deals with the losses exceeding 100,000$ i.e. dataset-B is left truncated. I need to fit truncated normal disribution to dataset - I having lower bound of 5000 and upper bound of 100,000. While I need to fit truncated Pareto for the lossess exceeding 100,000$. Is there any package in R which will guide me to fit these two distrubitions also giving KS (Kolmogorov Smirnov) test and Anderson Darling test results. Please guide Julia --- See library(MASS) ?fitdistr You can define your customized truncated density as a function in the parameter densfun of fitdistr. See also http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg34540.html http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg34548.html HTH Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN Only a man of Worth sees Worth in other men [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scanning only specific columns into R from a VERY large file
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Josh B Enviado el: sábado, 17 de abril de 2010 0:12 Para: R Help Asunto: [R] Scanning only specific columns into R from a VERY large file Hi, I turn to you, the R Sages, once again for help. You've never let me down! (1) Please make the following toy files: x <- read.table(textConnection("var.1 var.2 var.3 var.1000 indv.1 1 5 9 7 indv.21 2 9 3 8"), header = TRUE) y <- read.table(textConnection("var.3 var.1000"), header = TRUE) write.csv(x, file = "x.csv") write.csv(y, file = "y.csv") (2) Pretend you are starting with the files "x.csv" and "y.csv." They come from another source -- an online database. Pretend that these files are much, much, much larger. Specifically: (a) Pretend that "x.csv" contains 1000 columns by 210,000 rows. (b) "y.csv" contains just header titles. Pretend that there are 90 header titles in "y.csv" in total. These header titles are a subset of the header titles in "x.csv." (3) What I want to do is scan (or import, or whatever the appropriate word is) only a subset of the columns from "x.csv" into an R. Specifically, I only want to scan the columns of data from "x.csv" into R that are indicated in the file "y.csv." I still want to scan in all 21 rows from "x.csv," but only for the aforementioned columns listed in "y.csv." Can you guys recommend a strategy for me? I think I need to use the scan command, based on the hugeness of "x.csv," but I don't know what exactly to do. Specific code that gets the job done would be the most useful. Thank you very much in advance! Josh --- Try with something like do.call("cbind",scan(file="yourfile.csv",what=list(NULL,NULL,,0,NULL,0,NULL,NULL,...,NULL),flush=TRUE)) you have to work out how to set up the list of parameter 'what' to read the headers of 'y'. In the above the only columns read are those indicated by a '0'. HTH Ruben Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] From THE R BOOK -> Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Corrado Enviado el: martes, 30 de marzo de 2010 16:52 Para: r-help@r-project.org Asunto: [R] From THE R BOOK -> Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! Dear friends, I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on proportion data. I use glm(y~x1+,family=binomial) y is a proportion in (0,1), and x is a real number. I get the error: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! But that is exactly what was suggested in the book, where there is no mention of a similar warning. Where am I going wrong? Here is the output: > glm(response.prepared~x,data=,family=binomial) Call: glm(formula = response.prepared ~ x, family = binomial, data = ) Coefficients: (Intercept)x -0.3603 0.4480 Degrees of Freedom: 510554 Total (i.e. Null); 510553 Residual Null Deviance: 24420 Residual Deviance: 23240AIC: 700700 Warning message: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm! > Regards -- Corrado Topi PhD Researcher Global Climate Change and Biodiversity Area 18,Department of Biology University of York, York, YO10 5YW, UK Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk --- Probably you are misreading Crawley's Book? A proportion would usually be modeled with the Beta distribution, not the binomial, which is for counts. If you are modeling a proportion try the betareg function in betareg package. HTH Ruben ____ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] remove from the R mailing list
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de zoe zhang Enviado el: martes, 30 de marzo de 2010 12:18 Para: r-help@r-project.org Asunto: [R] remove from the R mailing list I would like to be removed from the R mailing list. Thanks. --- Hi, Would you like me to remove you? Rubén Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] add information above bars of a barplot()
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Martin Batholdy Enviado el: lunes, 22 de marzo de 2010 15:53 Para: r help Asunto: [R] add information above bars of a barplot() hi, I have a barplot with six clusters of four bars each. Now I would like to add the exact value of each bar as a number above the bar. I hoped to get some tips here. I could simply add text at the different positions, but I don't understand how the margins on the x-axis are calculated (how can I get / calculate the x-ticks of a barplot?). Also I would like to code this flexible enough so that it still works when I have more bars in each cluster. thanks for any suggestions! If you are barplotting x barplot(x) text(x=barplot(x),y=x,label=format(x),po=3) should get you closer to what you want. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tm[,-1]
-Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de ManInMoon Enviado el: jueves, 11 de marzo de 2010 12:22 Para: r-help@r-project.org Asunto: [R] tm[,-1] This does what I was hoping it would: aggregate(tm[,-1],by=list(tm[,10]),FUN="mean") but I don't know what "tm[,-1]" means (well - the -1 bit anyway. Does it somehow means the whole matrix? --- No, it means the matrix 'tm' minus the 1st column, (a <- matrix(rnorm(16,4,5),4,4)) a[,-1] ________ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Tinn-R RGui Send problem
ComRades, On a recent Windows XP system that I have to use at work, I installed Tinn-R 2.3.4.4 and R 2.10.1 When I mark-select a piece of code, say summary(x) written in Tinn-R and send it to the RGui using the send button, the RGui console instead of showing me my piece of code, it shows something like source(.trPaths[5], echo=FALSE, max.deparse.length=150) The code is executed fine but if I want to re-call it to the console, say to make a little change, using the Up arrow of the keyboard, I do not receive the code but again the line quoted above. That's not very useful. Checking .TrPaths[5] shows that Tinn-R is writing a temporary file in ..\\Program Data\\tmp\selection.r and that file contains my code, and is being overwritten with every new code that I select and send to the console. Does anyone know what to do to make the console re-call the selected piece of code instead of the call to the temporary file? I guess it is a Tinn-R issue Thanks Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tcltk
I doubt it. Guess why? Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Vasco Cadavez Enviado el: lunes, 08 de marzo de 2010 15:46 Para: r-help@r-project.org Asunto: [R] tcltk Hi, I'm trying to install tcltk in R-2.10.1, however I get error. someone can help? thanks Vasco __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random real numbers
Lower case vignette("Runuran") Sorry, it's this clever email program. R. ____ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de Rubén Roa Enviado el: martes, 02 de marzo de 2010 14:13 Para: frederik vanhaelst; r-h...@stat.math.ethz.ch Asunto: Re: [R] Random real numbers >From what distribution? If the uniform, runif(100,0,2*pi) If another, install package Runuran, and do this ?Runuran Vignette("Runuran") HTH ________ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de frederik vanhaelst Enviado el: martes, 02 de marzo de 2010 11:52 Para: r-h...@stat.math.ethz.ch Asunto: [R] Random real numbers Hi, How could i generate random real numbers between 0 en 2*pi? Thanks, Frederik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random real numbers
>From what distribution? If the uniform, runif(100,0,2*pi) If another, install package Runuran, and do this ?Runuran Vignette("Runuran") HTH ____ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) SPAIN -Mensaje original- De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En nombre de frederik vanhaelst Enviado el: martes, 02 de marzo de 2010 11:52 Para: r-h...@stat.math.ethz.ch Asunto: [R] Random real numbers Hi, How could i generate random real numbers between 0 en 2*pi? Thanks, Frederik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Plotmath: suprscript on scalable delimiter?
ComRades, How do you put a superscript on a scalable delimiter? I want to put 'b' as the power of the expression in the following plot: t <- 1:25 K <- 0.2 y <- ((1-exp(-K*t))/(1-exp(-K*t)*exp(K)))^3 plot(t,y,"l",main="K=0.2, b=3") text(15,5,expression(bgroup("(",frac(1-e^-Kt,1-e^-Kt*e^K),")"))) Plotmath examples in demo(plotmath) do not include this. I've tried a few things to no avail and I did an RSiteSearch("superscript delimiter") with zero results. Thx Rubén ________ Dr. Rubén Roa-Ureta AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g 48395 Sukarrieta (Bizkaia) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] discriminant analysis
Beatriz Yannicelli wrote: Dear all: Is it possible to conduct a discriminant analysis in R with categorical and continuous variables as predictors? Beatriz Beatriz, Simply doing this in the R console: RSiteSearch("discriminant") yields many promising links. In particular, check documentation of package mda. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MLE for bimodal distribution
_nico_ wrote: Hello everyone, I'm trying to use mle from package stats4 to fit a bi/multi-modal distribution to some data, but I have some problems with it. Here's what I'm doing (for a bimodal distribution): # Build some fake binormally distributed data, the procedure fails also with real data, so the problem isn't here data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3)) # Just to check it's bimodal plot(density(data)) f = function(m, s, m2, s2, w) { -log( sum(w*dnorm(data, mean=m, sd=s)) + sum((1-w)*dnorm(data, mean=m2, sd=s2)) ) } res = mle(f, start=list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6)) This gives an error: Error in optim(start, f, method = method, hessian = TRUE, ...) : non-finite finite-difference value [2] In addition: There were 50 or more warnings (use warnings() to see the first 50) And the warnings are about dnorm producing NaNs So, my questions are: 1) What does "non-finite finite-difference value" mean? My guess would be that an Inf was passed somewhere where a finite number was expected...? I'm not sure how optim works though... 2) Is the log likelihood function I wrote correct? Can I use the same type of function for 3 modes? I think it is correct but it is difficult to fit as others have pointed out. As an alternative, you may discretise your variable so the mixture is fit to counts on a histogram using a multinomial likelihood. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in the NY Times
Zaslavsky, Alan M. wrote: This article is accompanied by nice pictures of Robert and Ross. Data Analysts Captivated by Power of R http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html Thanks for the heads up. The R morale is going through the roof! I've given three courses on R since the second half of 2007 here in Chile (geostatistics, Fisheries Libraries for R, and generalized linear models) and all my three audiences (professionals working in academia, government, and private research institutions) were very much impressed by the power of R. I spent as much time on R itself as on the statistical topics, since students wanted to learn data management and graphics once they started to grasp the basic elements. R creators, Core Team, package creators and maintainers, and experts on the list, thanks so much for such a great work and such an open attitude. You lead by example. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AD Model Builder now freely available
dave fournier wrote: Hi All, Following Mike Praeger's posting on this list, I'm happy to pass on that AD Model Builder is now freely available from the ADMB Foundation. http://admb-foundation.org/ Two areas where AD Model builder would be especially useful to R users are multi-parmater smooth optimization as in larg scale maximum likelihood estimation and the analysis of general nonlinear random effects models. Cheers, Dave Thank you Dave, this is really great news! So now I can download ADMB and ADMB-RE for free? And use Mike's X2R package to connect my ADMB scripts with the power of R. Only one word: awesome! Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting data to a sigmoidal curve
Greg Snow wrote: Sarah, Doing: RSiteSearch('gompertz', restrict='functions') At the command prompt gives several promising results. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. And you can also do: nobs <- length(data$salam.size.observed) fn<-function(p){ salam.size.model <- salam.size.inf*exp(-G1*exp(-G2*data$age.observed)); # Gompertz model squdiff<- (salam.size.model-data$salam.size.observed)^2; #vector of squared differences sigmasqu<- sum(squdiff)/nobs; # nuisance sigma parameter profiled out minloglik<- (nobs/2)*log(sigmasqu); #profile likelihood as approx. to true likelihood } (salam.size.likfit <- nlm(fn,p=c(init. val. 4 salam.size.inf, init. val. 4 G1, init. val 4 G2), hessian=TRUE)) where data is a dataframe with observed sizes and ages. Invert Hessian to obtain measures of precision. Note also that if you know the size at birth then you can re-parameterize, put size at birth instead of asymptotic size (salam.size.inf), and reduce model complexity to two parameters (but of course the Gompertz formula changes). If the Gompertz model is not flexible enough, note that it is a particular case of another more flexible, and more complex model, Schnute's, which I have found often provides a better explanation of growth data (as measured by the AIC), especially if you have sizes of very young individuals. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting a modified logistic with glm?
Mike Lawrence wrote: On Sat, Nov 8, 2008 at 3:59 PM, Rubén Roa-Ureta <[EMAIL PROTECTED]> wrote: ... The fit is for grouped data. ... As illustrated in my example code, I'm not dealing with data that can be grouped (x is a continuous random variable). Four points: 1) I've showed you an approach that you can follow in order to fit a modified (custom-made) logistic model, it's up to you to follow this approach and modify it accordingly, it was never my intention to give you the precise code for your problem (obviously), 2) If you do follow this approach and you keep your predictor continuous, you have to change the line of the likelihood definition, 3) If it is really true that your predictor is a *random* variable, then you have not a simple glmn (you may have a glmm), and 4) You can always make your continuous predictor categorical, for example, as when you make a histogram of said predictor. R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting a modified logistic with glm?
Mike Lawrence wrote: Hi all, Where f(x) is a logistic function, I have data that follow: g(x) = f(x)*.5 + .5 How would you suggest I modify the standard glm(..., family='binomial') function to fit this? Here's an example of a clearly ill-advised attempt to simply use the standard glm(..., family='binomial') approach: # First generate some data #define the scale and location of the modified logistic to be fit location = 20 scale = 30 # choose some x values x = runif(200,-200,200) # generate some random noise to add to x in order to # simulate real-word measurement and avoid perfect fits x.noise = runif(length(x),-10,10) # define the probability of success for each x given the modified logistic prob.success = plogis(x+x.noise,location,scale)*.5 + .5 # obtain y, the observed success/failure at each x y = rep(NA,length(x)) for(i in 1:length(x)){ y[i] = sample( x = c(1,0) , size = 1 , prob = c(prob.success[i], 1-prob.success[i]) ) } #show the data and the source modified logistic plot(x,y) curve(plogis(x,location,scale)*.5 + .5,add=T) # Now try to fit the data #fit using standard glm and plot the result fit = glm(y~x,family='binomial') curve(plogis(fit$coefficients[1]+x*fit$coefficients[2])*.5+.5,add=T,col='red',lty=2) # It's clear that it's inappropriate to use the standard "glm(y~x,family='binomial')" # method to fit the modified logistic data, but what is the alternative? A custom-made fit function using nlm? Below, in the second line the logistic model has been re-parameterized to include p[1]=level of the predictor which predicts a 50% probability of success, and p[2]=level of the predictor which predicts a 95% probability of success. You can change the model in this line to your version. In the 4th line (negative log-likelihood) mat is a vector of 0s and 1s representing success and imat is a vector of 1s and 0s representing failure. The fit is for grouped data. fn <- function(p){ prop.est <- 1/(1+exp(log(1/19)*(l-p[1])/(p[2]-p[1]))); # estimated proportion of success iprop.est <- 1-prop.est;# estimated proportion of failure negloglik <- -sum(count*(mat*log(prop.est)+imat*log(iprop.est))); #binomial likelihood, prob. model } prop.lik <- nlm(fn,p=c(25.8,33.9),hessian=TRUE) HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] descretizing xy data
Rubén Roa-Ureta wrote: Jon A wrote: Hello, I have a dataset with a continuous independent variable (fish length, range: 30-150 mm) and a binary response (foraging success, 0 or 1). I want to discretize fish length into 5 mm bins and give the proportion of individuals who successfully foraged in each each size bin. I have used the cut function to discretize the length values into my desired bins, but I can't figure out how to manipulate my response data in terms of the levels I've created. Any advice on how to achieve my task? Thanks in advance. You have the option of using catspec. Here is another, more transparent solution, using hist(). lb <- 30 ub <- 150 bk <- 5 x <- data.frame(cbind(runif(1000,lb,ub),rbinom(1000,1,0.75))) x$X3 <- cut(x$X1,breaks=(ub-lb)/bk,labels=FALSE) y <- data.frame(cbind(hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks[-1],hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$counts,0)) for (i in 1:length(y$X1)) { for (j in 1:length(x$X1)) { if(identical(x$X3[j],i)) y$X3[i] <- y$X3[i]+x$X2[j] } } sum(x$X2) #check that counting was correct sum(y$X3) y$X4 <- ifelse(y$X3 > 0,y$X3/y$X2,NA) plot(y$X1,y$X4,pch=19) abline(v=hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks) BTW, if you add the line below, text(x=y$X1,y=y$X4+.01,format(y$X2),cex=.5) you show the sample size at each proportion. R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] descretizing xy data
Jon A wrote: Hello, I have a dataset with a continuous independent variable (fish length, range: 30-150 mm) and a binary response (foraging success, 0 or 1). I want to discretize fish length into 5 mm bins and give the proportion of individuals who successfully foraged in each each size bin. I have used the cut function to discretize the length values into my desired bins, but I can't figure out how to manipulate my response data in terms of the levels I've created. Any advice on how to achieve my task? Thanks in advance. You have the option of using catspec. Here is another, more transparent solution, using hist(). lb <- 30 ub <- 150 bk <- 5 x <- data.frame(cbind(runif(1000,lb,ub),rbinom(1000,1,0.75))) x$X3 <- cut(x$X1,breaks=(ub-lb)/bk,labels=FALSE) y <- data.frame(cbind(hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks[-1],hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$counts,0)) for (i in 1:length(y$X1)) { for (j in 1:length(x$X1)) { if(identical(x$X3[j],i)) y$X3[i] <- y$X3[i]+x$X2[j] } } sum(x$X2) #check that counting was correct sum(y$X3) y$X4 <- ifelse(y$X3 > 0,y$X3/y$X2,NA) plot(y$X1,y$X4,pch=19) abline(v=hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks) HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distributions Comparison
Igor Telezhinsky wrote: Dear all, I have recently found out about the R project. Could anyone tell me if it is possible to make the comparison of two distributions using R packages? The problem is TRICKY because I have the distributions of measurements and each measurement in the given distribution has its own error, which I need to take into account when comparing these distributions. If anyone knows it is possible with R, could you please tell me what package to use. I will really appreciate some hints in code as well. Thank you. Dear Igor, Welcome to the list and to R. Please read the posting guide and study your problem b4 attempting to seek help from us. Your question as it stands is impossibly ambiguous. Try to make a toy example or formulate it better and you may have better luck. Ruben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot the chi square distribution and the histogram in the same graph
leo_wa wrote: In the previous post ,i ask how to plot the normal curve and the histogram in the same graph.if i want to know how to plot the chi square distribution to campare the data set ,how can i do that? You should make up your mind, is your random variable X (-inf,+inf) or Sum(X^2) (which obviously cann't take negative values)? They are quite different. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to plot the histogram and the curve in the same graph
leo_wa wrote: i want to plot the histogram and the curve in the same graph.if i have a set of data ,i plot the histogram and also want to see what distribution it was.So i want to plot the curve to know what distribution it like. To draw the curve and the distribution you should have an idea about the distribution. You cann't just draw the histogram and expect R to make a curve of the best distribution to fit that histogram. But you can plot a curve of a kernel density. x <- rnorm(1000,5,3) library(MASS) (x.normfit <- fitdistr(x,"normal")) hist(x,prob=TRUE) lines(density(x,na.rm=TRUE),col="red") # kernel density curve(dnorm(x,mean= x.normfit$estimate[1],sd= x.normfit$estimate[2]),col="blue",add=TRUE) #maximum likelihood estimate Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] distributions and glm
drbn wrote: Hello, I have seen that some papers do this: 1.) Group data by year (e.g. 35 years) 2.) Estimate the mean of the key variable through the distribution that fits better (some years is a normal distribution , others is a more skewed, gamma distribution, etc.) 3.) With these estimated means of each year do a GLM. I'd like to know if it is possible (to use these means in a GLM) or is a wrong idea. Thanks in advance David David, You can model functions of data, such as means, but you must be careful to carry over most of the uncertainty in the original data into the model. If you don't, for example if you let the model know only the values of the means, then you are actually assuming that these means were observed with absolute certainty instead of being estimated from the data. To carry over the uncertainty in the original data to your modeling you can use a Bayesian approach or you can use a marginal likelihood approach. A marginal likelihood is a true likelihood function not of the data, but of functions of the data, such as of maximum likelihood estimates. If your means per year were estimated using maximum likelihood (for example with fitdistr in package MASS) and you sample size is not too small then you can use a normal marginal likelihood model for the means. Note however that each mean may come from a different distribution so the full likelihood model for your data would be a mixture of normal distributions. You may not be able to use the pre-built glm function so you may face the challenge to write your own code. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about glm using R
Jordi Garcia wrote: Good morning, I am using R to try to model the proportion of burned area in Portugal. The dependent variable is the proportion. The family used is binomial and the epsilon would be binary. I am not able to find the package to be used when the proportion (%) has to be used in glm. Could someone help me? I am using normal commands of glm.. for example: glm_5<- glm(formula=p~Precipitation, family=binomial(link=logit), data=dados) where p is the proportion of burned area, but this error message apperars: Warning message: non-integer #successes in a binomial glm! in: eval(expr, envir, enclos) That is why I think I am not using the proper glm package. Thank you very much in advance. Jordi Jordi, Your statistical model is wrong. The binomial family if four counts data (counts of successes given n trials), not for proportions. To model proportions, your family is the Beta family. I've modeled proportion response variables with function betareg of package betareg. If you want my example applications I can send you code and data off list. Reference: Ferrari and Cribari-Neto. 2004. Beta regression for modelling rates and proportions. Journal of Applied Statistics 31:799-815. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot - central limit theorem
Jörg Groß wrote: Hi, Is there a way to simulate a population with R and pull out m samples, each with n values for calculating m means? I need that kind of data to plot a graphic, demonstrating the central limit theorem and I don't know how to begin. So, perhaps someone can give me some tips and hints how to start and which functions to use. thanks for any help, joerg x <- runif(1,0,1) y <- runif(1,0,1) z <- runif(1,0,1) u <- runif(1,0,1) v <- runif(1,0,1) w <- runif(1,0,1) t <- x+y+z+u+v+w hist(t,breaks=100,xlab="sum of i.i.d r.v with finite mean and variance",ylab="Frequency",main="") Change runif for the corresponding function of the distribution of your choice. Not exactly what you wanted but it does verify the C.L.T. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R graph with values incorporated
Prasanth wrote: Dear All: Greetings! By the way, is it possible to have a graph (say line graph) that shows "values" as well (say y-axis values within the graph)? One could do it in excel. I am just wondering whether it is possible with R! x <- rnorm(100,2,3) y <- rnorm(100,2,3) plot(x,y,pch=19) text(x=x,y=y+.5,format(x,digits=1),cex=.5) [[alternative HTML version deleted]] <-- Read the posting guide. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] draw a function over a histogram
Daniela Garavaglia wrote: Sorry, I have some troubles with the graph device. How can I draw a function over a histogram? Thank's so much. Daniela, What function? Here is one example using density() and using dnorm() x <- rnorm(1000,2,2) hist(x,prob=TRUE) lines(density(x,na.rm=TRUE),col="red") library(MASS) y <- fitdistr(x,"normal") curve(dnorm(x,mean=y$estimate[1],sd=y$estimate[2]),add=TRUE) HTH R. [[alternative HTML version deleted]] <--- What about this?? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coordinate systems for geostatistics in R
imicola wrote: Hi, I read somewhere that when carrying out geostatistical analysis in R you should not use latitude and longitude...can anyone expand on this a little for me, and what would be the best coordinate system to use? Not only in R. In most systems, the inter-point distances are assumed to be planar (distances over an Euclidean space), whereas latitude and longitude are spherical. I guess there could be a geostatistical analysis based on spherical distances, but why to make things more complicated when projecting the spherical coordinates into planar coordinates b4 the analysis produces a good approximation and simplifies the analysis significantly? I use UTM coordinates, and transform from geodetic to metric with Eino Uikkanen's GeoConv program (it's free). I have my data in a geographic coordinate system, WGS84, decimal degreesis this the wrong format for such analyses? If the distances are short, it is not so wrong, and the wrongness increases with increasing distance. I have also converted my data in the UTM projection and so have it in metres(ranging from 480,000 to 550,000 E and 170,000 to 230,000 N). If I was to use the UTM coordinates, should I be using the actual coordinates in metres, or should I convert this into an arbitrary coordinate system (i.e. from 0 - 1) somehow? It would be convenient to use km rather than m, so the range parameter would be closer in magnitude to the other parameters of the model. A very large range parameter in metres may cause numerical instability during minimization of the negative support function in likelihood-based models such as that implemented in geoR. I have noticed that running an analysis on the data gives different results depending on which type of system you use, so I want to make sure I have the correct system. I should also probably note that I am a geostatistical novice! Thanks, Bottomline, your geostatistical software is probably based on distance calculations on an Euclidean space so it is wrong to input locations in spherical coordinates. HTH Ruben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Geodata object border
imicola wrote: Sorry, this is probably quite an easy question, but I'm new to R and couldn't find the answer anywhere. I'm using geoR and geoRglm, but can't figure out how to get a border in my geodata object. Does this need to be defined when I'm importing my data, or afterwards, and how do I go about doing this? Thanks You can define the border previously and import it to R with read.table or read.csv or you can define it from your geodata object in R. In the first case don't forget to close the polygon by repeating the first vertex at the end of the file. In the second case you can use a convex hull function in R, such as chull(), to define the polygon with your geodata object. Say your geodata object is called my.geo, then my.geo$coords would be the coordinates. Then try this, bor <- chull(my.geo$coords) bor <- c(bor, bor[1]) # close polygon by appending the first vertex plot(my.geo$coords) lines(my.geo$coords[pol,]) HTH Ruben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geoR : Passing arguments to "optim" when using "likfit"]
Mzabalazo Ngwenya wrote: Hi everyone ! I'm am trying to fit a kriging model to a set of data. When I just run the "likfit" command I can obtain the results. However when I try to pass additional arguements to the optimization function "optim" I get errors. That is I want to obtain the hessian matrix so matrix (hessian=TRUE). Heres a little example( 1-D). Can anyone shed some light? Where am I going wrong ? --- obs.points <-matrix(c(0.1,0,0.367,0,0.634,0,0.901,0),byrow=TRUE,nrow=4) MM1.fx <-MM1(obs.points[,1]) MM1.fx [1] 0.111 0.5789475 1.7272732 9.100 # Creating geoR object MM1.data <-as.geodata(cbind(obs.points,MM1.fx)) MM1.data $coords [1,] 0.100 0 [2,] 0.367 0 [3,] 0.634 0 [4,] 0.901 0 $data [1] 0.111 0.5789475 1.7272732 9.100 attr(,"class") [1] "geodata" # Starting values for MLE sta.vals =expand.grid(seq(1,20),seq(1,20)) # Maximum likelihood estimation of covariance parameters HERE IS THE PROBLEM MM1.fit <-likfit(MM1.data,ini.cov.pars=sta.vals,fix.nugget=TRUE,cov.model="gaussian",optim(hessian=TRUE)) MM1.fit <-likfit(MM1.data,ini.cov.pars=sta.vals,fix.nugget=TRUE,cov.model="gaussian",hessian=TRUE) MM1.fit$info.minimisation.function$hessian For the observed Fisher information: solve(MM1.fit$info.minimisatrion.function$hessian) HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Solving for parameter values at likelihood points
ComRades, On Oct 16, 2006, I posted a question regarding how to find the parameter values that made the likelihood function take a value equal to 1/8th of the maximum likelihood value. There was no reply but I found a solution and would like to know if there is a better solution using the function uniroot(). I think with uniroot() I can get better parameter values at these bounds. This is my solution, using the function nearest() from package GenKern. The example concerns the marginal likelihood for the normal variance parameter when the mean is a nuisance parameter (see section 7.3 in Royall, 1997, Statistical evidence. A likelihood paradigm, Chapman & Hall). Rubén y<-c(-2.250357,10.723614,7.238959,9.179939,5.831603,9.301580,4.019388,9.777280,5.587761,3.600276,9.753381,5.154928,2.776350,9.940350,14.185712,4.281198,10.994180,9.333027,13.082535,2.067826,10.772551,-3.811529,5.446021,-5.333287,1.421544) n <- length(y) s.sq <- (1/(n-1))*sum((y-mean(y))^2) sigma.sq <- seq(1,100,0.1) Lik.M <- sigma.sq^(-(n-1)/2)*exp(-(n-1)*s.sq/(2*sigma.sq)) Rel.Lik.M <- Lik.M/max(Lik.M) plot(sigma.sq,Rel.Lik.M,type="l") abline(h=1/8) arrows(y0=0.4,y1=0.18,x0=0,x1=12) arrows(y0=0.4,y1=0.18,x0=62,x1=50) library(GenKern) cbind(sigma.sq,Rel.Lik.M)[Rel.Lik.M==1] #maximum likelihood estimate max.idx <- nearest(Rel.Lik.M,1,outside=TRUE) # vector index value at the maximum likelihood #finding the parameter value at 1/8th the max. likelihood to the left of the maximum cbind(sigma.sq[1:max.idx],Rel.Lik.M[1:max.idx])[nearest(Rel.Lik.M[1:max.idx],0.125),] #finding the parameter value at 1/8th the max. likelihood to the right of the maximum cbind(sigma.sq[max.idx:length(sigma.sq)],Rel.Lik.M[max.idx:length(sigma.sq)])[nearest(Rel.Lik.M[max.idx:length(sigma.sq)],0.125),] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] deleting variables
Richard Pearson wrote: ?rm Richard Ralf Goertz wrote: How can I automatically exclude one variable from being saved in the workspace when I quit an R session? The problem is I don't know how to erase a variable once it has been created. [...] Ralf More on the use of rm. If you want to delete many variables in one go, x<-runif(10) y<-runif(10) z<-runif(10) ls() #you check all the objects in your workspace #[1] "x" "y" "z" rm(list=your.list <- ls()[2:3]) #you selected to delete all those objects whose indices are between 2 and 3. rm("your.list") #remove the temporary list with variable names ls() [1] "x" HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Newbie Question/Data Model
[EMAIL PROTECTED] wrote: > Given a data set and a set of predictors and a response in the data, > we would like to find a model that fits the data set best. > Suppose that we do not know what kind of model (linear, polynomial > regression,... ) might be good, we are wondering if there is R-package(s) > can auctomatically do this. > Otherwise, can you direct me, or point out reference(s), > basic steps to do this. Thanks. > > -james The best-fitting model for any data is a model with a lot of parameters, so maybe the best fitting model for any data is a model with an infinite number of parameters. However, any model with more parameters than data will have a negative number of degrees of freedom, and you do not want that. The best-fitting model for any data subject to the constraint that the number of degrees of freedom is non-negative, is the data itself, with zero degrees of freedom. The AIC tells you this too. The AIC for the model formed by the data itsel is 2n, whereas the AIC for any model with negative degrees of freedom is > 2n. But I guess you want to make inference from sample to population. If that is indeed the case, then you should consider changing your focus from finding "a model that fits the data set best" to a model that best summarizes the information contained in your sample about the population the sample comes from. To do that, start by defining the nature of your response variable. What is the nature of the natural process generating this response variable? Is it continuous or discrete? Is it univariate or multivariate? Can it take negative and positive values? Can it take values of zero? After you have clarified the probabilistic model for the response variable, then you can start thinking about the mathematical relation between the response variable and the predictors. Is it linear or nonlinear? Are the predictors categorical or continuous? Read the posting guide, formulate a clear question, and maybe you will be given more specific help. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modeling presence only data in R
milton ruser wrote: > Dear All, > > I have a set of environental maps and presence-only points for some species. > How can I generate distributions models on R using these presence-only data? > What packages and functions can I use to do so? > > Kind regards, > > Miltinho > Brazil If you have location data for each absence/presence (such as linear location data, easting and northing) consider using geoRglm. Christensen, O. F., and Ribeiro, P. J. 2002. geoRglm: a package for generalized linear spatial models. R-NEWS, 2: 26–28. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] finding an unknown distribution
andrea previtali wrote: > Hi, > I need to analyze the influences of several factors on a variable that is a > measure of fecundity, consisting of 73 observations ranging from 0 to 5. The > variable is continuous and highly positive skewed, none of the typical > transformations was able to normalize the data. Thus, I was thinking in > analyzing these data using a generalized linear model where I > can specify a distribution other than normal. I'm thinking it may fit a > gamma or exponential distribution. But I'm not sure if the data meets > the assumptions of those distributions because their definitions are > too complex for my understanding! Roughly, the exponential distribution is the model of a random variable describing the time/distance between two independent events that occur at the same constant rate. The gamma distribution is the model of a random variable that can be thought of as the sum of exponential random variables. I don't think fecundity data, the count of reproductive cells, qualifies as a random variable to be modeled by either of these distributions. If the count of reproductive cells is very large, and you are modeling this count as a function of animal size, such as length, you should consider the lognormal distribution, since the count of cells grow multiplicatively (volumetrically) with the increase in length. In that case you can model your response variable using glm with family=gaussian(link="log"). Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory issues
I think any geostatistical program/R package would have trouble handling 12000 observations on a PC. The empirical variogram would be built with the combinations of 12000 over 2 pairs, nearly 72 millions pairs, and during kriging, if you didn't restrict the search neighbourhood, interpolation would involve solving something very big, more so if you defined a fine grid, ... Try restricting the search neighbourhood, if you didn't, with maxdist and nmax. Rubén Prof Brian Ripley wrote: > I think the clue is that the message you quote comes from gstat, which > does not use R's memory allocator. It is gstat and not R that has failed > to allocate memory. > > Try re-reading the help page for memory.size. 'max=T' does not indicate > the limit (that is the job of memory.limit()), but the maximum 'used' > (acquired from the OS) in that session. So 19Mb looks reasonable for R's > usage. > > I don't understand the message from memory.limit() (and the formatting is > odd). memory.limit() does not call max() (it is entirely internal), so I > wonder if that really is the output from that command. (If you can > reproduce it, please let us have precise reproduction instructions.) > > There isn't much point in increasing the memory limit from the default > 1.5Gb on a 2Gb XP machine. The problem is that the user address space > limit is 2Gb and fragmentation means that you are unlikely to be able to > get over 1.5Gb unless you have very many small objects, in which case R > will run very slowly. In any case, that is not the issue here. > > On Wed, 16 Apr 2008, Dave Depew wrote: > >> Hi all, >> I've read the R for windows FAQ and am a little confused re: >> memory.limit and memory.size >> >> to start using R 2.6.2 on WinXP, 2GB RAM, I have the command line "sdi >> --max-mem-size=2047M" >> Once the Rgui is open, memory.limit() returns 2047, memory.size() >> returns 11.315, and memory.size(max=T) returns 19.615 >> >> Shouldn't memory.size(max=T) return 2047? >> >> Upon running several operations involving kriging (gstat package, >> original data file 3 variables, 12000 observations) >> the program runs out of memory >> >> "memory.c", line 57: can't allocate memory in function m_get() >> Error in predict.gstat(fit.uk, newdata = EcoSAV.grid.clip.spxdf, >> debug.level = -2, : >> m_get >> >> Immediately following this, >> >> memory.limit() returns [1] -Inf >>Warning message: >>In memory.limit() : no non-missing arguments >> to max; returning -Inf >> >> memory.size() returns 24.573. >> >> memory.size(max=T) returns 46.75 >> >> To my untrained eye, it appears that R is not being allowed access to >> the full memory limit specified in the cmd lineif this is the case, >> how does one ensure that R is getting access to the full allotment of RAM? >> Any insight is appreciated... >> >> >>> sessionInfo() >> R version 2.6.2 (2008-02-08) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices datasets tcltk utils >> methods base >> >> other attached packages: >> [1] maptools_0.7-7 foreign_0.8-23 gstat_0.9-43 rgdal_0.5-24 >> lattice_0.17-4 sp_0.9-23 svSocket_0.9-5 svIO_0.9-5 >> R2HTML_1.58svMisc_0.9-5 svIDE_0.9-5 >> >> loaded via a namespace (and not attached): >> [1] grid_2.6.2 tools_2.6.2 >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confidence intervals of log transformed data
tom soyer wrote: > Hi > > I have a general statistics question on calculating confidence interval of > log transformed data. > > I log transformed both x and y, regressed the transformed y on transformed > x: lm(log(y)~log(x)), and I get the following relationship: > > log(y) = alpha + beta * log(x) with se as the standard error of residuals > > My question is how do I calculate the confidence interval in the original > scale of x and y? Should I use [...] Confidence interval for the mean of Y? If that is the case, when you transformed Y to logY and run a regression assuming normal deviates you were in fact assuming that Y distributes lognormally. Your interval must be assymetric, reflecting the shape of the lognormal. The lognormal mean is lambda=exp(mu + 0.5*sigma^2), where mu and sigma^2 are the parameters of the normal variate logY. A confidence interval for lambda is Lower Bound=exp(mean(logY)+0.5*var(logY)+sd(logY)*H_alpha/sqrt(n-1)) Upper Bound=exp(mean(logY)+0.5*var(logY)+sd(logY)*H_(1-alpha)/sqrt(n-1)) where the quantiles H_alpha and H_(1-alpha) are quantiles of the distribution of linear combinations of the normal mean and variance (Land, 1971, Ann. Math. Stat. 42:1187-1205, and Land, 1975, Sel. Tables Math. Stat. 3:385-419). Alternatively, you can model directly Y=p1*X^p2, p1=exp(your alpha), p1=beta with a lognormal likelihood and predict the mean of Y with the fitted model (I'm guessing here). It could be useful to check Crow and Shimizu, Lognormal distributions. Theory and practice, 1988, Dekker, NY. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mixture distribution analisys
Antonio Olinto wrote: > Hello, > > I'm analyzing some fish length-frequency data to access relative age and > growth > information and I would like to do something different from FiSAT / ELEFAN > analysis. > > I found a package named MIX in http://www.math.mcmaster.ca/peter/mix/mix.html > but it's compiled for R 2.0.0 > > So I have two questions: > Is there any problem to run this package in R 2.7.0? - probably yes … > And, is there any other package in R that can be used to analyze and to > separate > mixture distributions? > > Thanks for any help. Best regards, > > Antonio Olinto > Sao Paulo Fisheries Institute > Brazil This problem is not too difficult. Maybe you can write your own code. Say, you enter the number of fish you measured, n, and a two-column dataframe with the mid point of the length class in the first column (call it l) and the observed relative frequency of each length class in the second column (call it obs_prob). Then using the multinomial likelihood for the number of fish in each length class as the random variable (the approach in Peter Macdonald's Mix), minimize log_likelihood=-n*sum(elementwise product(obs_prob,log(pred_prob))) where pred_prob=(p1*0.3989423/s1)*exp(-0.5*square((l-m1)/s1)) +(p2*0.3989423/s2)*exp(-0.5*square((l-m2)/s2)) +(p3*0.3989423/s3)*exp(-0.5*square((l-m3)/s3)) where s1, s2, s3, m1, m2, m3, p1 and p2 (p3=1-p1+p2) are the parameters. Do you know your number of components (cohorts) in the mixture? In the model above it is three. If you don't know it, try several models with different number of components and the AIC may let you know which one is the best working model. Don't use the likelihood ratio test as one of the p parameters is on the boundary of parameter space if the null is true. Also, you should bound the parameters (m1https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing length-weight regressions]
[EMAIL PROTECTED] wrote: > I'd like to compare length-weight regressions among years. Any information > would be appreciated. > > a. gray > fisheries consultant Your message is rather cryptic for a general statistical audience, whereas I'm sure in a fisheries group everybody would understand what you want. Use a glm with family=Gaussian(link=log) for all data sets together (in original units) with year as a factor, then run the model again ignoring the year factor, and compare the different fits using anova(name of your glm objects) for a likelihood ratio test, or inspect the AICs for a non-frequentist model selection method. R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analyzing binomial data with spatially correlated errors
Roger Bivand wrote: > Ben Bolker ufl.edu> writes: > >> Jean-Baptiste Ferdy univ-montp2.fr> writes: >> >>> Dear R users, >>> >>> I want to explain binomial data by a serie of fixed effects. My >>> problem is that my binomial data are spatially correlated. Naively, >>> I thought I could found something similar to gls to analyze such >>> data. After some reading, I decided that lmer is probably to tool >>> I need. The model I want to fit would look like >>> > (...) >> You could *almost* use glmmPQL from the MASS package, >> which allows you to fit any lme model structure >> within a GLM 'wrapper', but as far as I know it wraps only lme ( >> which requires at least one random effect) and not gls. >> > > The trick used in: > > Dormann, C. F., McPherson, J. M., Araujo, M. B., Bivand, R., > Bolliger, J., Carl, G., Davies, R. G., Hirzel, A., Jetz, W., > Kissling, W. D., Kühn, I., Ohlemüller, R., Peres-Neto, P. R., > Reineking, B., Schröder, B., Schurr, F. M. & Wilson, R. J. (2007): > Methods to account for spatial autocorrelation in the analysis of > species distributional data: a review. Ecography 30: 609–628 > > (see online supplement), is to add a constant term "group", and set > random=~1|group. The specific use with a binomial family there is for > a (0,1) response, rather than a two-column matrix. > >> You could try gee or geoRglm -- neither trivially easy, I think ... > > The same paper includes a GEE adaptation, but for a specific spatial > configuration rather than a general one. > > Roger Bivand > >> Ben Bolker I suggest you also check out the package geoRglm, where you can model binomial and Poisson spatially correlated data. I used it to model spatially correlated binomial data but without covariates, i.e. without your fixed effects (so my model was a logistic regression with the intercept only) (Reference below). But I understand that you can add covariates and use them to estimate the non-random set of predictors. Here is the geoRglm webpage: http://www.daimi.au.dk/~olefc/geoRglm/ This approach would be like tackling the problem from the point of view of geostatistics, rather than from mixed models. But I believe the likelihood-based geostatistical model is the same as a generalized linear mixed model where the distance is the random effect. In SAS you can do this using the macro glimmix but from the point of view of generalized linear mixed models because there they have implemented a correlation term, so that you can identify typical spatial correlation functions such as Gauss and exponential, particular cases of the Matern family. Rubén Roa-Ureta, R. and E.N. Niklitschek (2007) Biomass estimation from surveys with likelihood-based geostatistics. ICES Journal of Marine Science 64:1723-1734 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading data into R
BEP wrote: > Hello all, > > I am working with a very large data set into R, and I have no interest in > reviving my SAS skills. To do this, I will need to drop unwanted variables > given the size of the data file. The most common strategy seems to be > subsetting the data after it is read into R. Unfortunately, given the size > of the data set, I can't get the file read and then subsquently do the > subset procedure. I would be appreciative of help on the following: > > 1. What are the possibilities of reading in just a small set of variables > during the statement (or another 'read' statement)? That is, > is it possible specify just the variables that I want to keep? > > 2. Can I randomly select a set of observations during the 'read' statement? > > > I have searched various R resources for this information, so if I am simply > overlooking a key resource on this issue, pointing that out to me would be > greatly appreciated. > > Thanks in advance. > > Brian Check this for input of specific columns from a large data matrix: mysubsetdata<-do.call("cbind",scan(file='location and name of your file',what=list(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,0,NULL,0,NULL,NULL),flush=TRUE)) This will input only columns 10 and 11 into 'mysubsetdata'. With this method you can work out the way to select random columns. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.