from:"Rubén Roa"

Re: [R] regression with proportion data

2012-03-19 Thread Rubén Roa

Your response variable is not binomial, it's a proportion.
Try the betareg function in the betareg package, which more correctly assumes 
that your response variable is Beta distributed (but beware that 1 and 0 are 
not allowed). The syntax is the same as in a glm.

HTH

Ruben

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Georgiana May
Enviado el: lunes, 19 de marzo de 2012 15:06
Para: r-help@r-project.org
Asunto: [R] regression with proportion data

Hello,
I want to determine the regression relationship between a proportion (y) and a 
continuous variable (x).
Reading a number of sources (e.g. The R Book, Quick R,help), I believe I should 
be able to designate the model as:

model<-glm(formula=proportion~x, family=binomial(link="logit"))

this runs but gives me error messages:
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

If I transform the proportion variable with log, it doesn't like that either 
(values not: 0https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Standard errors GLM

2012-03-13 Thread Rubén Roa

You have a conceptual problem, as pointed out by previous helpers.
You don't have a standard error for the first level of your categorical 
variable because that level's effect is not estimated.
It is being used as a reference level against which the other levels of that 
categorical variable are being estimated (the default in R).
This is one way by which statisticians include categorical predictors into the 
regression framework, originally meant for relations between continuous 
quantitative variables.
You might want to read about regression, factors, and contrasts.
This paper about the issue is available online:
M.J. Davis, 2010. Contrast coding in multiple regression analysis: strengths, 
weaknesses and utility of popular coding structures. Journal of Data Science 
8:61-73.
HTH
Ruben

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de D_Tomas
Enviado el: martes, 13 de marzo de 2012 14:39
Para: r-help@r-project.org
Asunto: [R] Standard errors GLM

Dear userRs, 

when applied the summary function to a glm fit (e.g Poisson) the parameter 
table provides the categorical variables assuming that the first level estimate 
(in alphabetical order) is 0. 

What is the standard error for that variable then? 

Are the standard errors calculated assuming a normal distribution?

Many thanks, 

 

--
View this message in context: 
http://r.789695.n4.nabble.com/Standard-errors-GLM-tp4469086p4469086.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculation of standard error for a function

2012-03-03 Thread Rubén Roa

deltamethod function in package msm may help (but bear in mind the 
warnings/admonitions/recommendations of other helpers)

HTH

Rubén


--
Rubén H. Roa-Ureta, Ph. D.
AZTI Tecnalia, Txatxarramendi Ugartea z/g,
Sukarrieta, Bizkaia, SPAIN


-Original Message-
From: r-help-boun...@r-project.org on behalf of Jun Shen
Sent: Fri 3/2/2012 10:47 PM
To: R-help
Subject: [R] Calculation of standard error for a function
 
Dear list,

If I know the standard error for k1 and k2, is there anything I can call in
R to calculate the standard error of k1/k2? Thanks.

Jun

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Constraint on one of parameters.

2012-02-09 Thread Rubén Roa

Read optimx's help.
There are 'method', 'upper', 'lower' arguments that'll let you put bounds on 
pars.
HTH
Rubén

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de FU-WEN LIANG
Enviado el: jueves, 09 de febrero de 2012 23:56
Para: r-help@r-project.org
Asunto: [R] Constraint on one of parameters.

Dear all,

I have a function to optimize for a set of parameters and want to set a 
constraint on only one parameter. Here is my function. What I want to do is 
estimate the parameters of a bivariate normal distribution where the 
correlation has to be between -1 and 1. Would you please advise how to revise 
it?

ex=function(s,prob,theta1,theta,xa,xb,xc,xd,t,delta) {

expo1= exp(s[3]*xa+s[4]*xb+s[5]*xc+s[6]*xd)
expo2= exp(s[9]*xa+s[10]*xb+s[11]*xc+s[12]*xd)
expo3= exp(s[15]*xa+s[16]*xb+s[17]*xc+s[18]*xd)
expo4= exp(s[21]*xa+s[22]*xb+s[23]*xc+s[24]*xd)
expo5= exp(s[27]*xa+s[28]*xb+s[29]*xc+s[30]*xd)

nume1=prob[1]*(s[2]^-s[1]*s[1]*t^(s[1]-1)*expo1)^delta*exp(-s[2]^-s[1]*t^s[1]*expo1)*
theta1[1]^xa*(1-theta1[1])^(1-xa)*theta1[2]^xb*(1-theta1[2])^(1-xb)*(1+theta1[11]*(xa-theta1[1])*(xb-theta1[2])/sqrt(theta1[1]*(1-theta1[1]))/sqrt(theta1[2]*(1-theta1[2])))/
(2*pi*theta[2]*theta[4]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[1])^2/theta[2]^2+(xd-theta[3])^2/theta[4]^2-2*theta[21]^2*(xc-theta[1])*(xd-theta[3])/(theta[2]*theta[4]))

nume2=prob[2]*(s[8]^-s[7]*s[7]*t^(s[7]-1)*expo2)^delta*exp(-s[8]^-s[7]*t^s[7]*expo2)*
theta1[3]^xa*(1-theta1[3])^(1-xa)*theta1[4]^xb*(1-theta1[4])^(1-xb)*(1+theta1[11]*(xa-theta1[3])*(xb-theta1[4])/sqrt(theta1[3]*(1-theta1[3]))/sqrt(theta1[4]*(1-theta1[4])))/
(2*pi*theta[6]*theta[8]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[5])^2/theta[6]^2+(xd-theta[7])^2/theta[8]^2-2*theta[21]^2*(xc-theta[5])*(xd-theta[7])/(theta[6]*theta[8]))

nume3=prob[3]*(s[14]^-s[13]*s[13]*t^(s[13]-1)*expo3)^delta*exp(-s[14]^-s[13]*t^s[13]*expo3)*
theta1[5]^xa*(1-theta1[5])^(1-xa)*theta1[6]^xb*(1-theta1[6])^(1-xb)*(1+theta1[11]*(xa-theta1[5])*(xb-theta1[6])/sqrt(theta1[5]*(1-theta1[5]))/sqrt(theta1[6]*(1-theta1[6])))/
(2*pi*theta[10]*theta[12]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[9])^2/theta[10]^2+(xd-theta[11])^2/theta[12]^2-2*theta[21]^2*(xc-theta[9])*(xd-theta[11])/(theta[10]*theta[12]))

nume4=prob[4]*(s[20]^-s[19]*s[19]*t^(s[19]-1)*expo4)^delta*exp(-s[20]^-s[19]*t^s[19]*expo4)*
theta1[7]^xa*(1-theta1[7])^(1-xa)*theta1[8]^xb*(1-theta1[8])^(1-xb)*(1+theta1[11]*(xa-theta1[7])*(xb-theta1[8])/sqrt(theta1[7]*(1-theta1[7]))/sqrt(theta1[8]*(1-theta1[8])))/
(2*pi*theta[14]*theta[16]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[13])^2/theta[14]^2+(xd-theta[15])^2/theta[16]^2-2*theta[21]^2*(xc-theta[13])*(xd-theta[15])/(theta[14]*theta[16]))

nume5=prob[5]*(s[26]^-s[25]*s[25]*t^(s[25]-1)*expo5)^delta*exp(-s[26]^-s[25]*t^s[25]*expo5)*
theta1[9]^xa*(1-theta1[9])^(1-xa)*theta1[10]^xb*(1-theta1[10])^(1-xb)*(1+theta1[11]*(xa-theta1[9])*(xb-theta1[10])/sqrt(theta1[9]*(1-theta1[9]))/sqrt(theta1[10]*(1-theta1[10])))/
(2*pi*theta[18]*theta[20]*sqrt(1-theta[21]^2))*exp(-2*(1-theta[21]^2))^(-1)*((xc-theta[17])^2/theta[18]^2+(xd-theta[19])^2/theta[20]^2-2*theta[21]^2*(xc-theta[17])*(xd-theta[19])/(theta[18]*theta[20]))


denom=nume1+nume2+nume3+nume4+nume5

Ep1=nume1/denom
Ep2=nume2/denom
Ep3=nume3/denom
Ep4=nume4/denom
Ep5=nume5/denom


elogld=
sum(Ep1*(-log(2*pi*theta[2]*theta[4]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[1])^2/theta[2]^2+(xd-theta[3])^2/theta[4]^2-2*theta[21]^2*(xc-theta[1])*(xd-theta[3])/(theta[2]*theta[4]
+
sum(Ep2*(-log(2*pi*theta[6]*theta[8]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[5])^2/theta[6]^2+(xd-theta[7])^2/theta[8]^2-2*theta[21]^2*(xc-theta[5])*(xd-theta[7])/(theta[6]*theta[8]
+
sum(Ep3*(-log(2*pi*theta[10]*theta[12]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[9])^2/theta[10]^2+(xd-theta[11])^2/theta[12]^2-2*theta[21]^2*(xc-theta[9])*(xd-theta[11])/(theta[10]*theta[12]
+
sum(Ep4*(-log(2*pi*theta[14]*theta[16]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[13])^2/theta[14]^2+(xd-theta[15])^2/theta[16]^2-2*theta[21]^2*(xc-theta[13])*(xd-theta[15])/(theta[14]*theta[16]
+
sum(Ep5*(-log(2*pi*theta[18]*theta[20]*sqrt(1-theta[21]^2))-(2*(1-theta[21]^2))^(-1)*((xc-theta[17])^2/theta[18]^2+(xd-theta[19])^2/theta[20]^2-2*theta[21]^2*(xc-theta[17])*(xd-theta[19])/(theta[18]*theta[20]


return(-elogld)
}


normal=optimx(c(15.5,0.4,10,1.3,17.5,0.3,15,1.5,13.5,0.5,19.5,1.1,20.7,0.4,30,0.4,25,0.7,24.6,1.6,0.5),ex,s=s,prob=prob,theta1=theta1,xa=xa,xb=xb,xc=xc,xd=xd,t=t,delta=delta)

When I run this code, I got this error:
In log(2 * pi * theta[6] * theta[8] * sqrt(1 - theta[21]^2)) : NaNs produced

Because (1-theta[21]^2)<0.

Would you please advise?
Thank you very much.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.et

Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-27 Thread Rubén Roa

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Frank Harrell
Enviado el: viernes, 27 de enero de 2012 14:28
Para: r-help@r-project.org
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and 
unique combinations?

Ruben, I'm not sure you are understanding the ramifications of what Bert said.  
In addition you are making several assumptions implicitly:

--
Ruben: Frank, I guess we are going nowhere now.
But thanks anyways. See below if you want.

1. model selection is needed (vs. fitting the full model and using shrinkage)
Ruben: Nonlinear mechanistic models that are too complex often just don't 
converge, they crash. No shrinkage to apply to a failed convergence model.

2. model selection works in the absence of shrinkage 
Ruben: I think you are assuming that shrinkage is necessary.

3. model selection can find the "right" model and the features selected would 
be the same features selected if the data were slightly perturbed or a new 
sample taken 
Ruben: I don't make this assumption. New data, new model.

4. AIC tells you something that P-values don't (unless using structured 
multiple degree of freedom tests)
Ruben: It does.

 5. parsimony is good
Ruben: It is.

None of these assumptions is true.  Model selection without shrinkage
(penalization) seems to offer benefits but this is largely a mirage.

Ruben: Have a good weekend!

Ruben

Rubén Roa wrote
> 
> -Mensaje original-
> De: Bert Gunter [mailto:gunter.berton@] Enviado el: jueves, 26 de 
> enero de 2012 21:20
> Para: Rubén Roa
> CC: Ben Bolker; Frank Harrell
> Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 
> interactions and unique combinations?
> 
> On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa <rroa@> wrote:
>> I think we have gone through this before.
>> First, the destruction of all aspects of statistical inference is not 
>> at stake, Frank Harrell's post notwithstanding.
>> Second, checking all pairs is a way to see for _all pairs_ which 
>> model outcompetes which in terms of predictive ability by -2AIC or 
>> more. Just sorting them by the AIC does not give you that if no model 
>> is better than the next best by less than 2AIC.
>> Third, I was not implying that AIC differences play the role of 
>> significance tests. I agree with you that model selection is better 
>> not understood as a proxy or as a relative of significance tests procedures.
>> Incidentally, when comparing many models the AIC is often inconclusive.
>> If one is bent on selecting just _the model_ then I check numerical 
>> optimization diagnostics such as size of gradients, KKT criteria, and 
>> other issues such as standard errors of parameter estimates and the 
>> correlation matrix of parameter estimates.
> 
> -- And the mathematical basis for this claim is   ??
> 
> --
> Ruben: In my area of work (building/testing/applying mechanistic 
> nonlinear models of natural and economic phenomena) the issue of 
> numerical optimization is a very serious one. It is often the case 
> that a really good looking model does not converge properly (that's 
> why ADMB is so popular among us). So if the AIC is inconclusive, but 
> one AIC-tied model yields reasonably looking standard errors and low 
> pairwise parameter estimates correlations, while the other wasn´t even 
> able to produce a positive definite Hessian matrix (though it was able 
> to maximize the log-likelihood), I think I have good reasons to select 
> the properly converged model. I assume here that the lack of 
> convergence is a symptom of model inadequacy to the data, that the AIC was 
> not able to detect.
> ---
> Ruben: For some reasons I don't find model averaging appealing. I 
> guess deep in my heart I expect more from my model than just the best 
> predictive ability.
> 
> -- This is a religious, not a scientific statement, and has no place 
> in any scientific discussion.
> 
> --
> Ruben: Seriously, there is a wide and open place in scientific 
> discussion for mechanistic model-building. I expect the models I built 
> to be more than able predictors, I want them to capture some aspect of 
> nature, to teach me something about nature, so I refrain from model 
> averaging, which is an open admission that you don't care too much 
> about what's really going on.
> 
> -- The belief that certain data analysis practices -- standard or not 
> -- somehow can be trusted to yield reliable scientific results in the 
> face of clear theoretical (mathematical )and practical results to the 
> contrary, while widespread, impedes and often thwarts the progress of 
> science, Evidence-based medicine and clinical trials came about f

Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-27 Thread Rubén Roa

-Mensaje original-
De: Bert Gunter [mailto:gunter.ber...@gene.com] 
Enviado el: jueves, 26 de enero de 2012 21:20
Para: Rubén Roa
CC: Ben Bolker; Frank Harrell
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and 
unique combinations?

On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa  wrote:
> I think we have gone through this before.
> First, the destruction of all aspects of statistical inference is not at 
> stake, Frank Harrell's post notwithstanding.
> Second, checking all pairs is a way to see for _all pairs_ which model 
> outcompetes which in terms of predictive ability by -2AIC or more. Just 
> sorting them by the AIC does not give you that if no model is better than the 
> next best by less than 2AIC.
> Third, I was not implying that AIC differences play the role of significance 
> tests. I agree with you that model selection is better not understood as a 
> proxy or as a relative of significance tests procedures.
> Incidentally, when comparing many models the AIC is often inconclusive. If 
> one is bent on selecting just _the model_ then I check numerical optimization 
> diagnostics such as size of gradients, KKT criteria, and other issues such as 
> standard errors of parameter estimates and the correlation matrix of 
> parameter estimates.

-- And the mathematical basis for this claim is   ??

--
Ruben: In my area of work (building/testing/applying mechanistic nonlinear 
models of natural and economic phenomena) the issue of numerical optimization 
is a very serious one. It is often the case that a really good looking model 
does not converge properly (that's why ADMB is so popular among us). So if the 
AIC is inconclusive, but one AIC-tied model yields reasonably looking standard 
errors and low pairwise parameter estimates correlations, while the other 
wasn´t even able to produce a positive definite Hessian matrix (though it was 
able to maximize the log-likelihood), I think I have good reasons to select the 
properly converged model. I assume here that the lack of convergence is a 
symptom of model inadequacy to the data, that the AIC was not able to detect.
---
Ruben: For some reasons I don't find model averaging appealing. I guess deep in 
my heart I expect more from my model than just the best predictive ability.

-- This is a religious, not a scientific statement, and has no place in any 
scientific discussion.

--
Ruben: Seriously, there is a wide and open place in scientific discussion for 
mechanistic model-building. I expect the models I built to be more than able 
predictors, I want them to capture some aspect of nature, to teach me something 
about nature, so I refrain from model averaging, which is an open admission 
that you don't care too much about what's really going on.

-- The belief that certain data analysis practices -- standard or not -- 
somehow can be trusted to yield reliable scientific results in the face of 
clear theoretical (mathematical )and practical results to the contrary, while 
widespread, impedes and often thwarts the progress of science, Evidence-based 
medicine and clinical trials came about for a reason. I would encourage you to 
reexamine the basis of your scientific practice and the role that "magical 
thinking" plays in it.

Best,
Bert

--
Ruben: All right Bert. I often re-examine the basis of my scientific praxis but 
less often than I did before, I have to confess. I like to think it is because 
I am converging on the right praxis so there are less issues to re-examine. But 
this problem of model selection is a tough one. Being a likelihoodist in 
inference naturally leads you to AIC-based model selection, Jim Lindsey being 
influent too. Wanting that your models say some something about nature is 
another strong conditioning factor. Put this two together and it becomes hard 
to do model-averaging. And it has nothing to do with religion (yuck!).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-25 Thread Rubén Roa

I think we have gone through this before.
First, the destruction of all aspects of statistical inference is not at stake, 
Frank Harrell's post notwithstanding.
Second, checking all pairs is a way to see for _all pairs_ which model 
outcompetes which in terms of predictive ability by -2AIC or more. Just sorting 
them by the AIC does not give you that if no model is better than the next best 
by less than 2AIC.
Third, I was not implying that AIC differences play the role of significance 
tests. I agree with you that model selection is better not understood as a 
proxy or as a relative of significance tests procedures.
Incidentally, when comparing many models the AIC is often inconclusive. If one 
is bent on selecting just _the model_ then I check numerical optimization 
diagnostics such as size of gradients, KKT criteria, and other issues such as 
standard errors of parameter estimates and the correlation matrix of parameter 
estimates. For some reasons I don't find model averaging appealing. I guess 
deep in my heart I expect more from my model than just the best predictive 
ability.

Rubén

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Ben Bolker
Enviado el: miércoles, 25 de enero de 2012 15:41
Para: r-h...@stat.math.ethz.ch
Asunto: Re: [R]How do I compare 47 GLM models with 1 to 5 interactions and 
unique combinations?

Rubén Roa  azti.es> writes:

> A more 'manual' way to do it is this.

> If you have all your models named properly and well organized, say 
> Model1, Model2, ..., Model 47, with a slot with the AIC (glm produces 
> a list with one component named 'aic') then something like
> this:

> x <- matrix(0,1081,3) x[,1:2] <- t(combn(47,2)) for(i in 1:n){x[,3]
> <- abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-
> unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))}

> will calculate all the 1081 AIC differences between paired comparisons 
> of the 47 models. It may not be pretty but works for me.

> An AIC difference equal or less than 2 is a tie, anything higher is 
> evidence for the model with the less AIC (Sakamoto et al., Akaike 
> Information Criterion Statistics, KTK Scientific Publishers, Tokyo).

  I wouldn't go quite as far as Frank Harrell did in his answer, but

(1) it seems silly to do all pairwise comparisons of models; one of the big 
advantages of information-theoretic approaches is that you can just list the 
models in order of AIC ...

(2) the dredge() function from the MuMIn package may be useful for automating 
this sort of thing.  There is an also an AICtab function in the emdbook package.

(3) If you're really just interested in picking the single model with the best 
expected predictive capability (and all of the other assumptions of the AIC 
approach are met -- very large data set, good fit to the data, etc.), then just 
picking the model with the best AIC will work.  It's when you start to make 
inferences on the individual parameters within that model, without taking 
account of the model selection process, that you run into trouble.  If you are 
really concerned about good predictions then it may be better to do multi-model 
averaging *or* use some form of shrinkage estimator.

(4) The "delta-AIC<2 means pretty much the same" rule of thumb is fine, but it 
drives me nuts when people use it as a quasi-significance-testing rule, as in 
"the simpler model is adequate if dAIC<2".  If you're going to work in the 
model selection arena, then don't follow hypothesis-testing procedures!  A 
smaller AIC means lower expected K-L distance, period.

For the record, Brian Ripley has often expressed the (minority) opinion that 
AIC is *not* appropriate for comparing non-nested models (e.g.  
<http://tolstoy.newcastle.edu.au/R/help/06/02/21794.html>).

> 
> -Original Message-
> From: r-help-bounces  r-project.org on behalf of Milan 
> Bouchet-Valat
> Sent: Wed 1/25/2012 10:32 AM
> To: Jhope
> Cc: r-help  r-project.org

> Subject: Re: [R] How do I compare 47 GLM models with 1 to 5  
> interactions and unique combinations?

> Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
> > Hi R-listers,
> > 
> > I have developed 47 GLM models with different combinations of 
> > interactions from 1 variable to 5 variables. I have manually made 
> > each model separately and put them into individual tables (organized 
> > by the number of variables) showing the AIC score. I want to compare all of 
> > these models.
> > 
> > 1) What is the best way to compare various models with unique 
> > combinations and different number of variables?
> See ?step or ?stepAIC (from package MASS) if you want an automated way 
&g

Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

2012-01-25 Thread Rubén Roa


A more 'manual' way to do it is this.
If you have all your models named properly and well organized, say Model1, 
Model2, ..., Model 47, with a slot with the AIC (glm produces a list with one 
component named 'aic') then something like this:

x <- matrix(0,1081,3)
x[,1:2] <- t(combn(47,2))
for(i in 1:n){x[,3] <- 
abs(unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic[1,])-unlist(combn(eval(as.name(paste("Model",i,sep="")))$aic,2)[2,]))}

will calculate all the 1081 AIC differences between paired comparisons of the 
47 models. It may not be pretty but works for me.

An AIC difference equal or less than 2 is a tie, anything higher is evidence 
for the model with the less AIC (Sakamoto et al., Akaike Information Criterion 
Statistics, KTK Scientific Publishers, Tokyo).

HTH

Rubén H. Roa-Ureta, Ph. D.
AZTI Tecnalia, Txatxarramendi Ugartea z/g,
Sukarrieta, Bizkaia, SPAIN


-Original Message-
From: r-help-boun...@r-project.org on behalf of Milan Bouchet-Valat
Sent: Wed 1/25/2012 10:32 AM
To: Jhope
Cc: r-help@r-project.org
Subject: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and 
unique combinations?
 
Le mardi 24 janvier 2012 à 20:41 -0800, Jhope a écrit :
> Hi R-listers,
> 
> I have developed 47 GLM models with different combinations of interactions
> from 1 variable to 5 variables. I have manually made each model separately
> and put them into individual tables (organized by the number of variables)
> showing the AIC score. I want to compare all of these models. 
> 
> 1) What is the best way to compare various models with unique combinations
> and different number of variables? 
See ?step or ?stepAIC (from package MASS) if you want an automated way
of doing this.

> 2) I am trying to develop the most simplest model ideally. Even though
> adding another variable would lower the AIC, how do I interpret it is worth
> it to include another variable in the model? How do I know when to stop? 
This is a general statistical question, not specific to R. As a general
rule, if adding a variable lowers the AIC by a significant margin, then
it's worth including it. You should only stop when a variable increases
the AIC. But this is assuming you consider it a good indicator and you
know what you're doing. There's plenty of literature on this subject.

> Definitions of Variables:
> HTL - distance to high tide line (continuous)
> Veg - distance to vegetation 
> Aeventexhumed - Event of exhumation
> Sector - number measurements along the beach
> Rayos - major sections of beach (grouped sectors)
> TotalEggs - nest egg density
> 
> Example of how all models were created: 
> Model2.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
> data=data.to.analyze, family=binomial)
> Model7.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
> binomial, data.to.analyze)
> Model21.glm <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg:TotalEggs,
> data.to.analyze, family = binomial)
> Model37.glm <- glm(cbind(Shells, TotalEggs-Shells) ~
> HTL:Veg:TotalEggs:Aeventexhumed, data.to.analyze, family=binomial)
To extract the AICs of all these models, it's easier to put them in a
list and get their AICs like this:
m <- list()
m$model2 <- glm(cbind(Shells, TotalEggs-Shells) ~ Aeventexhumed,
data=data.to.analyze, family=binomial)
m$model3 <- glm(cbind(Shells, TotalEggs-Shells) ~ HTL:Veg, family =
binomial, data.to.analyze)

sapply(m, extractAIC)


Cheers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to get inflection point in binomial glm

2011-12-02 Thread Rubén Roa

René,

Yes, to fit a re-parameterized logistic model I think you'd have to code the 
whole enchilada yourself, not relying on glm (but not nls() as nls() deals with 
least squares minimization whereas here we want to minimize a minus log 
binomial likelihood).

I did that and have the re-parameterized logistic model in a package I wrote 
for a colleague (this package has the logistic fit fully functional and 
documented).
My code though only considers one continuous predictor.

If you want I may email you this package and you figure out how to deal with 
the categorical predictor.
One thing I believe at this point is that you'd have to do the inference on the 
continuous predictor _conditional_ on certain level(s) of the categorical 
predictor.

Rubén

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de René Mayer
Enviado el: jueves, 01 de diciembre de 2011 20:34
Para: David Winsemius
CC: r-help Help
Asunto: Re: [R] how to get inflection point in binomial glm

Thanks David and Rubén!

@David: indeed 15 betas I forgot the interaction terms, thanks for correcting!

@Rubén:  the re-parameterize would be done within nls()? how to do this 
practically with including the factor predictor?

and yes, we can solve within each group for Y=0 getting

0=b0+b1*X |-b0
-b0=b1*X |/b1
-b0/b1=X

but I was hoping there might a more general solution for the case of multiple 
logistic regression.

HTH

René

Zitat von "David Winsemius" :

>
> On Dec 1, 2011, at 8:24 AM, René Mayer wrote:
>
>> Dear All,
>>
>> I have a binomial response with one continuous predictor (d) and one 
>> factor (g) (8 levels dummy-coded).
>>
>> glm(resp~d*g, data, family=binomial)
>>
>> Y=b0+b1*X1+b2*X2 ... b7*X7
>
> Dear Dr Mayer;
>
> I think it might be a bit more complex than that. I think you should 
> get 15 betas rather than 8. Have you done it?
>
>>
>> how can I get the inflection point per group, e.g., P(d)=.5
>
> Wouldn't that just be at d=1/beta in each group? (Thinking, perhaps 
> naively, in the case of X=X1 that
>
> (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta *d*(X==X1) )  # all other 
> terms = 0
>
> And taking the log of both sides, and then use "middle school" math to solve.
>
> Oh, wait. Muffed my first try on that for sure.  Need to add back both 
> the constant intercept and the baseline "d" coefficient for the
> non-b0 levels.
>
> (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta_0 + beta_d_0*d +
> beta_n + beta_d_n *d*(X==Xn) )
>
> And just
>
> (Pr[y==1])/(1-Pr[y==1])) = 1 = exp( beta_0 + beta_d_0*d ) # for the 
> reference level.
>
> This felt like an exam question in my categorical analysis course 25 
> years ago. (Might have gotten partial credit for my first stab, 
> depending on how forgiving the TA was that night.)
>
>>
>> I would be grateful for any help.
>>
>> Thanks in advance,
>> René
>>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to get inflection point in binomial glm

2011-12-01 Thread Rubén Roa

Assuming a logistic model, for each group solve for d at Y=0, or solve for d at 
p=0.5, where d is your continuous predictor, Y is the logit score, and p is the 
probability of success in the binomial model. After that you can get the 
standard error of the inflection point by Taylor series (aka delta method).

Another idea is to re-parameterize the logistic model to include explicitly the 
inflection point, thus you'll get its estimate and standard error directly as a 
result of the fit.
For example, disregarding the g factor predictor (or for each group), a 
logistic model such as
P(d) = 1/(1+exp(log(1/19)(d-d50)/(d95-d50))
includes the inflection point directly (d50) and is a re-parameterized version 
of the usual logistic model
P(d) =1/(1+exp(b0+b1*d))
where parameters b0 and b1 have been replaced by d95 and d50, the predictor at 
50% probability (inflection point), and the predictor at 95% probability, 
respectively.

HTH

Rubén

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de René Mayer
Enviado el: jueves, 01 de diciembre de 2011 14:25
Para: r-help@r-project.org
Asunto: [R] how to get inflection point in binomial glm

Dear All,

I have a binomial response with one continuous predictor (d) and one factor (g) 
(8 levels dummy-coded).

glm(resp~d*g, data, family=binomial)

Y=b0+b1*X1+b2*X2 ... b7*X7

how can I get the inflection point per group, e.g., P(d)=.5

I would be grateful for any help.

Thanks in advance,
René

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] RV: Reporting a conflict between ADMB and Rtools on Windows systems

2011-11-17 Thread Rubén Roa

Thanks Gabor and Jan.
The batch files solution seems the way to go.
Will implement it!

Rubén

-Mensaje original-
De: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Enviado el: jueves, 17 de noviembre de 2011 13:58
Para: Rubén Roa
CC: r-help@r-project.org
Asunto: Re: [R] RV: Reporting a conflict between ADMB and Rtools on Windows 
systems

On Thu, Nov 17, 2011 at 3:54 AM, Rubén Roa  wrote:
>
> I've just found that there is a conflict between tools used to build R 
> packages (Rtools) and ADMB due to the need to put Rtools compiler's location 
> in the PATH environmental variable to make Rtools work.
>
> On a Windows 7 64bit  with Rtools installed I installed ADMB-IDE latest 
> version and although I could translate ADMB code to cpp code I could not 
> build the cpp code into an executable via ADMB-IDE's compiler.
>
> On another Windows machine, a Windows Vista 32bits with Rtools installed I 
> also installed the latest ADMB-IDE and this time it was not possible to 
> create the .obj file on the way to build the executable when building with 
> ADMB-IDE. On this machine I also have a previous ADMB version (6.0.1) that I 
> used to run from the DOS shell. This ADMB also failed to build the .obj file.
>
> Now, going to PATH, the location info to make Rtools is:
> c:\Rtools\bin;c:\Rtools\MinGW\bin;c:\Rtools\MinGW64\bin;C:\Program 
> Files (x86)\MiKTeX 2.9\miktex\bin; If from this list I remove the 
> reference to the compiler c:\Rtools\MinGW\bin then ADMB works again.
> So beware of this conflict. Suggestion of a solution will be 
> appreciated. Meanwhile, I run ADMB code in one computer and build R packages 
> with Rtools in another computer.

The batchfiles Rcmd.bat, Rgui.bat temporarily add R and Rtools to your path by 
looking them up in the registry and then calling Rcmd.exe or Rgui.exe 
respectively.  When R is finished the path is restored to what it was before.  
By using those its not necessary to have either
on your path.These are self contained batch files with no
dependencies so can simply be placed anywhere on the path in order to use them.

For those and a few other batch files of interest to Windows users of R see:
http://batchfiles.googlecode.com


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] RV: Reporting a conflict between ADMB and Rtools on Windows systems

2011-11-17 Thread Rubén Roa

 

De: Rubén Roa 
Enviado el: jueves, 17 de noviembre de 2011 9:53
Para: 'us...@admb-project.org'
Asunto: Reporting a conflict between ADMB and Rtools on Windows systems

 

Hi,

 

I have to work under Windows, it's a company policy.

 

I've just found that there is a conflict between tools used to build R packages 
(Rtools) and ADMB due to the need to put Rtools compiler's location in the PATH 
environmental variable to make Rtools work.

 

On a Windows 7 64bit  with Rtools installed I installed ADMB-IDE latest version 
and although I could translate ADMB code to cpp code I could not build the cpp 
code into an executable via ADMB-IDE's compiler.

 

On another Windows machine, a Windows Vista 32bits with Rtools installed I also 
installed the latest ADMB-IDE and this time it was not possible to create the 
.obj file on the way to build the executable when building with ADMB-IDE. On 
this machine I also have a previous ADMB version (6.0.1) that I used to run 
from the DOS shell. This ADMB also failed to build the .obj file.

 

Now, going to PATH, the location info to make Rtools is:

c:\Rtools\bin;c:\Rtools\MinGW\bin;c:\Rtools\MinGW64\bin;C:\Program Files 
(x86)\MiKTeX 2.9\miktex\bin;

If from this list I remove the reference to the compiler 

c:\Rtools\MinGW\bin

then ADMB works again.

 

So beware of this conflict. Suggestion of a solution will be appreciated. 
Meanwhile, I run ADMB code in one computer and build R packages with Rtools in 
another computer.

 

Best

 

Ruben

--

Dr. Ruben H. Roa-Ureta

Senior Researcher, AZTI Tecnalia,

Marine Research Division,

Txatxarramendi Ugartea z/g, 48395, Sukarrieta,

Bizkaia, Spain

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can I tell about someone's academic cheating

2011-10-01 Thread Rubén Roa

-Original Message-
From: r-help-boun...@r-project.org on behalf of YuHong
Sent: Sun 10/2/2011 3:27 AM
To: r-help@r-project.org
Subject: [R] Can I tell about someone's academic cheating

Hello,
Can I tell about someone¡¦s academic cheating behavior in the mailing list?  
For I knew this person through this R mailing list.  Thanks!
Regards,
Hong Yu

--

You have to provide a reproducible example ...

Rubén H. Roa-Ureta, Ph. D.
AZTI Tecnalia, Txatxarramendi Ugartea z/g,
Sukarrieta, Bizkaia, SPAIN

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Poor performance of "Optim"

2011-10-01 Thread Rubén Roa

-Original Message-
From: r-help-boun...@r-project.org on behalf of yehengxin
Sent: Sat 10/1/2011 8:12 AM
To: r-help@r-project.org
Subject: [R] Poor performance of "Optim"

I used to consider using R and "Optim" to replace my commercial packages:
Gauss and Matlab.  But it turns out that "Optim" does not converge
completely.  The same data for Gauss and Matlab are converged very well.  I
see that there are too many packages based on "optim" and really doubt if
they can be trusted!  

--
Considering that your post is pure whining without any evidence or reproducible 
example, considering that you speak of 'data' being converged, me think it's 
your fault, you cann't
control optim well enough to get sensible results. There are many ways to use 
optim eh?, you can pass on the gradients, you can use a variety of methods, you 
can increase the number of iterations, et cetera, read optim's help, come back 
with a reproducible example, or quietly stick to your commercial sofware, 
leaving the whining to yourself.

HTH

Ruben 

--
Rubén H. Roa-Ureta, Ph. D.
AZTI Tecnalia, Txatxarramendi Ugartea z/g,
Sukarrieta, Bizkaia, SPAIN

--
View this message in context: 
http://r.789695.n4.nabble.com/Poor-performance-of-Optim-tp3862229p3862229.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot allocate vector of size x

2011-09-21 Thread Rubén Roa

Yes, on a recent heavy-duty job -profile likelihood of Tweedie power parameter 
for a relatively complex glm with hundreds of thousands rows dataframe- I had 
the "cannot allocate vector ..." error, then I just closed-saved the main 
workspace, full of large objects, then I did the profiling on a fresh and 
almost empty (with strictly necessary objects) workspace, and it run 
successfully. Then I just loaded the two workspaces to one session,  and 
continued happily to the end of the job. :-)

Dr. Ruben H. Roa-Ureta
Senior Researcher, AZTI Tecnalia,
Marine Research Division,
Txatxarramendi Ugartea z/g, 48395, Sukarrieta,
Bizkaia, Spain

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Jim Holtman
Enviado el: miércoles, 21 de septiembre de 2011 12:11
Para: Michael Haenlein
CC: r-help@r-project.org
Asunto: Re: [R] Cannot allocate vector of size x

how much memory do you have on your system? How large are the vectors you are 
creating? How many other large vectors do you have in memory?  Remove all 
unused objects and do gc() to reclaim some of the memory. Remember all objects 
are in memory and you have to understand how large they are and how many you 
have. Ther is more information you have to provide and some more inspection you 
have to do.

Sent from my iPad

On Sep 21, 2011, at 5:53, Michael Haenlein  wrote:

> Dear all,
> 
> I am running a simulation in which I randomly generate a series of 
> vectors to test whether they fulfill a certain condition. In most 
> cases, there is no problem. But from time to time, the (randomly) 
> generated vectors are too large for my system and I get the error 
> message: "Cannot allocate vector of size x".
> 
> The problem is that in those cases my simulation stops and I have to 
> start it again manually. What I would like to do is to simply ignore 
> that the error happened (or probably report that it did) and then 
> continue with another (randomly) generated vector.
> 
> So my question: Is there a way to avoid that R stops in such a case 
> and just restarts the program from the beginning as if nothing happened?
> I hope I'm making myself clear here ...
> 
> Thanks,
> 
> Michael
> 
>[[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Cannot allocate vector of size x

2011-09-21 Thread Rubén Roa

Check one of the examples in

?try

It has this heading:

## run a simulation, keep only the results that worked.

If your system is Windows, you can also try to increase the memory available 
for one application, in order to avoid the problem.
Do a search for "3GB switch"

HTH

Dr. Ruben H. Roa-Ureta
Senior Researcher, AZTI Tecnalia,
Marine Research Division,
Txatxarramendi Ugartea z/g, 48395, Sukarrieta,
Bizkaia, Spain

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Michael Haenlein
Enviado el: miércoles, 21 de septiembre de 2011 11:54
Para: r-help@r-project.org
Asunto: [R] Cannot allocate vector of size x

Dear all,

I am running a simulation in which I randomly generate a series of vectors to 
test whether they fulfill a certain condition. In most cases, there is no 
problem. But from time to time, the (randomly) generated vectors are too large 
for my system and I get the error message: "Cannot allocate vector of size x".

The problem is that in those cases my simulation stops and I have to start it 
again manually. What I would like to do is to simply ignore that the error 
happened (or probably report that it did) and then continue with another 
(randomly) generated vector.

So my question: Is there a way to avoid that R stops in such a case and just 
restarts the program from the beginning as if nothing happened?
I hope I'm making myself clear here ...

Thanks,

Michael

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Gradients in optimx

2011-09-05 Thread Rubén Roa

Well, I guess this doesn't make necessary for me to prepare a report with the 
CatDyn package.
However, I am available to test a new optimx R-forge version with my package.

Cheers

Rubén

--
Dr. Ruben H. Roa-Ureta
Senior Researcher, AZTI Tecnalia,
Marine Research Division,
Txatxarramendi Ugartea z/g, 48395, Sukarrieta,
Bizkaia, Spain

-Mensaje original-
De: John C Nash [mailto:nas...@uottawa.ca] 
Enviado el: domingo, 04 de septiembre de 2011 18:55
Para: Ravi Varadhan
CC: Rubén Roa; 'kathryn.lord2...@gmail.com'; 'r-help@r-project.org'
Asunto: Re: Gradients in optimx

I've started to work on this again, and can confirm there seems to be some sort 
of bug in the gradient test at the beginning of the current R-forge version of 
optimx. It is not something obvious, and looks like a mixup in arguments to 
functions, which have been an issue since I've been trying to trap NaN and Inf 
returns.

Worse, making the control starttests = FALSE fails because there I 
inadvertently put the initial function calculation inside the block that does 
the tests. Sigh.

Will try to get something done by end of this week. (This will be R-forge 
version.)

JN

On 08/31/2011 09:31 AM, Ravi Varadhan wrote:
> Hi Reuben,
> 
>  
> 
> I am puzzled to note that the gradient check in "optimx" does not work 
> for you.  Can you send me a reproducible example so that I can figure this 
> out?
> 
>  
> 
> John - I think the best solution for now is to issue a "warning" 
> rather than an error message, when the numerical gradient is not 
> sufficiently close to the user-specified gradient.
> 
>  
> 
> Best,
> 
> Ravi.
> 
>  
> 
> ---
> 
> Ravi Varadhan, Ph.D.
> 
> Assistant Professor,
> 
> Division of Geriatric Medicine and Gerontology School of Medicine 
> Johns Hopkins University
> 
>  
> 
> Ph. (410) 502-2619
> 
> email: rvarad...@jhmi.edu <mailto:rvarad...@jhmi.edu>
> 
>  
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] gradient function in OPTIMX

2011-08-30 Thread Rubén Roa

Hi,

In my package CatDyn, which uses optimx, I included the gradients of 20 version 
of the model involved.
I estimate model parameters with numerical gradients, and at the final 
estimates I calculate the analytical gradients.
In the simplest version of the model the analytical gradients computed post hoc 
are almost identical to the numerical gradients. This shows that the analytical 
gradients (whose formulas were obtained by the CAS Maxima) are correct, at 
least for those simple versions of my model. However, if I try to pass the 
analytical gradients to optimx in a new optimization, I invariably get the 
error message that you got: "Gradient function might be wrong - check it!"
This happens regardless of the method used (BFGS, spg, Rcgmin).
Same as you, when I try to pass the gradients to optim, instead of optimx, the 
gradients are accepted and computed correctly, but then I cann't use the very 
nice other features of optimx.
I wanted to report this to Ravi and Prof. Nash but I haven't got the time for a 
full report with several examples and variations.
So now that you report it, here I am, seconding you in calling the attention to 
this apparent problem in optimx.

Rubén H. Roa-Ureta, Ph. D.
AZTI Tecnalia, Txatxarramendi Ugartea z/g,
Sukarrieta, Bizkaia, SPAIN

-Original Message-
From: r-help-boun...@r-project.org on behalf of Kathie
Sent: Mon 8/29/2011 11:10 AM
To: r-help@r-project.org
Subject: [R] gradient function in OPTIMX
 
Dear R users

When I use OPTIM with BFGS, I've got a significant result without an error
message.  However, when I use OPTIMX with BFGS( or spg), I've got the
following an error message.



>  optimx(par=theta0, fn=obj.fy, gr=gr.fy, method="BFGS",
> control=list(maxit=1))

Error: Gradient function might be wrong - check it! 



I checked and checked my gradient function line by line. I could not find
anything wrong.

Is it a bug or something?  I prefer OPTIMX, so I'd like to know why.

Thanks a lot in advance

Regards,

Kathryn Lord 

--
View this message in context: 
http://r.789695.n4.nabble.com/gradient-function-in-OPTIMX-tp3775791p3775791.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] PBSmapping, where is Ireland?!

2011-08-17 Thread Rubén Roa

Dear Ewan,

I faced this problem and solved it by contacting the package authors, John 
Schnute and Rowan Haigh, rowan.ha...@dfo-mpo.gc.ca.
Here is a function that solves the problem by displacing the Greenwich meridian 
to longitude 348 leaving Ireland to the right.
This longitude does not span any land mass within the limits of the map so it 
does not cause any disappearing land masses.
The function loads the GSHHS data in intermediate resolution, so it takes some 
time, less than 1 min in my standard laptop, to run.
Change the xlim and ylim values to get different fractions of Europe.
Last time I contacted them (October 2010), package authors were planning to add 
some comments about this in PBSmapping  user's guide.
So you may find more info by digging into the user's guide, or else, contact 
Rowan. 

HTH

Rubén


Dr. Ruben H. Roa-Ureta
Senior Researcher, AZTI Tecnalia,
Marine Research Division,
Txatxarramendi Ugartea z/g, 48395, Sukarrieta,
Bizkaia, Spain

library(PBSmapping)
Euromap <- function(path="C:/Temp", cutLon=348) 
{
 fnam <- paste(path,"gshhs_f.b",sep="/");
   p1 <- 
importGSHHS(fnam,xlim=c(-20,360),ylim=c(30,80),level=1,n=0,xoff=0);
   z <- p1$X>cutLon;
   p1$X[z] <- p1$X[z]-360;
   NorthSeaHR <- thinPolys(p1, tol=0.1, filter=3)
   .initPBS()
   clr <- PBSval$PBSclr;
   xlim   <- c(-18, 16)
   ylim   <- c(32, 64)
   WEurope <- clipPolys(NorthSeaHR, xlim=xlim, ylim=ylim)
   par(mfrow=c(1,1),omi=c(0,0,0,0))
   plotMap(WEurope, xlim=xlim, ylim=ylim, col=clr$land, 
bg=clr$sea, tck=-0.02,mgp=c(2,.75,0), cex=1.2, plt=c(.08,.98,.08,.98))
}
Euromap(cutLon=348)

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Ewan Minter
Enviado el: martes, 16 de agosto de 2011 14:57
Para: r-help@r-project.org
Asunto: [R] PBSmapping, where is Ireland?!

Hi folks,

I've been using 'PBSmapping' to make a map of Europe with some labels. I've 
been using the 'worldLL' PolyData, as my computer is too slow to make my own 
from the GSHHS files.

The only p

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] [R-pkgs] CatDyn - Estimation of wild populations abundance

2011-07-28 Thread Rubén Roa

Dear ComRades,

Package CatDyn 1.0-3 is now available on CRAN.

The package allows the estimation of the absolute abundance of a wild 
population during a capture season using a nonlinear and recursive 
discrete-time structural model.

The response variable is the catch, assumed a random variable produced by 
either a multiplicative stochastic process (lognormal distribution) or an 
additive stochastic process (normal distribution).

The predictor variables are capture effort, assumed observed exactly, and stock 
abundance, a latent predictor.

The data are high frequency (e.g. daily) observations of catch and capture 
effort during the season.

The structural model comes in five varieties. The simplest model assumes that 
all individuals that can be captured during the season were available at the 
start of the season, thus during the season the abundance always decreases (a 
pure depletion model), due to natural mortality and capture mortality. The 
other versions include one, two, three or four positive perturbations, such as 
immigration of recruits, that occur at certain time steps during the season. 
The models also includes nonlinear (power) relations between response and 
predictors. The models can be fit with natural mortality fixed at a given value 
or letting this parameter be estimated as well. Taking into account structural 
and stochastic options, CatDyn can fit up to 20 model versions to the same data.

CatDyn uses the recently released optimx package for maximum flexibility in 
selecting and using numerical methods implemented in R. CatDyn also has 
analytical gradients but in the current version these gradients are not yet 
passed to the optimizer; instead they are computed after convergence using 
numerical gradients, in order to compare analytical versus numerical gradients 
at the maximum likelihood estimates using various methods.

CatDyn estimates all parameters in the log scale and uses function 
deltamethod() from package msm to return back-transformed estimates vectors and 
covariance matrices. 

CatDyn has been used already to estimate abundance of several fish and 
invertebrate stocks harvested by industrial and artisanal fishing fleets over 
the world.

A manuscript is under preparation with the first application (a squid stock 
from the Falkland Islands) and will be submitted to a marine science journal.

Dr. Rubén H. Roa-Ureta,
AZTI Tecnalia, Txatxarramendi Ugartea z/g,
Sukarrieta, Bizkaia, SPAIN




[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to find the likelihood of a null model in R

2011-07-25 Thread Rubén Roa

I know how to do what you want. However, the fact that you didn't bother to 
follow the rules of the list eliminates the initial, admittedly faint, 
enthusiasm to help you. Maybe someone else will guide you anyways.
The rules are extremely simple. Read them at the end of this message. If you do 
you'd probably find the answer to your question yourself.

Dr. Rubén H. Roa-Ureta,
AZTI Tecnalia, Txatxarramendi Ugartea z/g,
Sukarrieta, Bizkaia, SPAIN


-Original Message-
From: r-help-boun...@r-project.org on behalf of Partha Pratim PATTNAIK
Sent: Mon 7/25/2011 1:16 PM
To: r-help@r-project.org
Subject: [R] How to find the likelihood of a null model in R
 

Dear All,
I am working on a dataset having the dependent variable as ordinal  
data(discrete data) and multiple independent variables. I need to find  
the likelihood for the NULL model.i.e the model with only the  
dependent variable and all other independent variables as zero. Kindly  
let me know how to find the likelihood for a NULL model in R. Is there  
any specific function in R that can do this task? Please share if  
anyone has any information on this.

Thanks and Regards
Partha Pattnaik


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] please help! what are the different using log-link function and log transformation?

2011-06-19 Thread Rubén Roa


Funny, I was going to quote your paper with Dichmont in Fisheries Research 
regarding this, then I forgot to do it, but you yourself came out and posted 
the explanation.

This forum is great!

Rubén

-Original Message-
From: r-help-boun...@r-project.org on behalf of bill.venab...@csiro.au
Sent: Sun 6/19/2011 2:36 PM
To: gloryk...@hotmail.com; r-help@r-project.org
Subject: Re: [R] please help! what are the different using log-link function 
and log transformation?
 
The two commands you give below are certain to lead to very different results, 
because they are fitting very different models.

The first is a gaussian model for the response with a log link, and constant 
variance.

The second is a gaussian model for a log-transformed response and identity 
link.  On the original scale this model would imply a constant coefficient of 
variation and hence a variance proportional to the square of the mean, and not 
constant.

Your problem is not particularly an R issue, but a difficulty with 
understanding generalized linear models (and hence generalized additive models, 
which are based on them).

Bill Venables.

From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of 
pigpigmeow [gloryk...@hotmail.com]
Sent: 19 June 2011 17:39
To: r-help@r-project.org
Subject: [R] please help! what are the different using log-link function and 
log transformation?

I'm new R-programming user, I need to use gam function.

y <- gam(a ~ s(b), family = gaussian(link=log),  data)
y <- gam(log(a) ~ s(b), family = gaussian (link=identity), data)

why [do] these two command [give different] results?

I guess these two command results are same, but actally these two command
results are different, Why?

--
View this message in context: 
http://r.789695.n4.nabble.com/please-help-what-are-the-different-using-log-link-function-and-log-transformation-tp3608931p3608931.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] please help! what are the different using log-link function and log transformation?

2011-06-19 Thread Rubén Roa

The problem is not that you are new to R. This is a conceptual issue.

Let y be the response variable and let {x_i} be a set of predictors. Your first 
model (identity response and log-link) is saying that 

y=f(x_1)f(x_2)...f(x_n) + e, e~Normal(0,sigma)

i.e. this is an additive observation-error model with constant variance.

Your second model (log-response and identity link) is saying that
 
y=f(x_1)f(x_2)...f(x_n)u, u=exp(e), e~Normal(0,sigma)

i.e. this a multiplicative observation-error model with variance proportional 
to the mean.

Plot the data versus response and visually examine whether you have 
heteroscedasticity. If this is true, use your second model, otherwise use the 
first one.

One key to understand these kind of dichotomies is to realize that statistical 
models are composed of a process part and an observation part. In your models 
the process part is deterministic and multiplicative but after that, you still 
have two choices, make the random observation part additive (your first model) 
or multiplicative (your second model).

Needless to say (but I am saying it anyways) these two models will give 
different results, at the very least because one assumes constant variance 
(your first model) whereas the other assumes a variance proportional to the 
mean.

In my experience with multiplicative process models, the random observation 
part shall usually be multiplicative as well because of heteroscedasticity.

HTH

Rubén

-

Rubén H. Roa-Ureta, Ph. D.
AZTI Tecnalia, Txatxarramendi Ugartea z/g,
Sukarrieta, Bizkaia, SPAIN

-Original Message-
From: r-help-boun...@r-project.org on behalf of pigpigmeow
Sent: Sun 6/19/2011 9:39 AM
To: r-help@r-project.org
Subject: [R] please help! what are the different using log-link function and 
log transformation?

I'm new R-programming user, I need to use gam function.

y<-gam(a~s(b),family=gaussian(link=log),data)
y<-gam(loga~s(b), family =gaussian (link=identity),data)
why these two command results are different?
I guess these two command results are same, but actally these two command
results are different, Why?

--
View this message in context: 
http://r.789695.n4.nabble.com/please-help-what-are-the-different-using-log-link-function-and-log-transformation-tp3608931p3608931.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic growth model

2011-06-06 Thread Rubén Roa


Write the growth formula in an R script.
Define initial par values.
Input the size and age data.
Plot the size and age data as points.
Plot the growth model with the initial par values as a line.
Play with the initial par values until you see a good agreement between the 
model (the line) and the data (the points).
Optimise.
Re-plot.
Plot a residual histogram.
Plot a residual scatterplot.
Plot a Q-Q residual plot.

HTH

Rubén

 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de Renalda
> Enviado el: sábado, 04 de junio de 2011 6:17
> Para: r-help@r-project.org
> Asunto: [R] logistic growth model
> 
> I want to Fit a logistic growth model for y = k 
> *eb0+b1(age)/1 + eb0+b1(age), can some one help on how to get 
> the initial coefficients b0 and b1? I need to estimate in 
> order to do the regression analysis. When I run using b0=0.5 
> and b1=3.4818, I get the following error
> 
> 397443.8 :  0.5 3.4818
> Error in nls(Height ~ k * exp(b1 + b2 * Age)/(1 + exp(b1 + b2 
> * Age)),  :
>singular gradient
> "please tell me what is wrong with my initials values, and 
> how to get the initial values"
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Building Custom GUIs for R

2011-05-20 Thread Rubén Roa

Check the PBSModelling package.

HTH

Rubén


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de vioravis
> Enviado el: viernes, 20 de mayo de 2011 8:52
> Para: r-help@r-project.org
> Asunto: [R] Building Custom GUIs for R
> 
> I am looking to build simple GUIs based on the R codes I 
> have. The main objective is to hide the scary R codes from 
> non-programming people and make it easier for them to try out 
> different inputs.
> 
> For example, 
> 
> 1. The GUI will have means to upload a csv file which will be 
> read by the R code. 
> 
> 2. A button to preprocess data (carried out by a R function behind)
> 
> 3. A button to build some models and run simulations
> 
> 4. Space to display visual charts based on the simulations results
> 
> 5. Option to save the results to a csv file or something similar.
> 
> Are there any tools currently available that enable us build 
> GUIs??? (MATLAB has a GUI builder that enables the users 
> build custom GUIs). 
> 
> Can we make a exe of such GUI (with the R code) and let 
> people use it without having to install R???
> 
> Any help on this would be much appreciated??
> 
> Thank you.
> 
> Ravi
> 
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Building-Custom-GUIs-for-R-tp353
> 7794p3537794.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] nls problem with R

2011-05-04 Thread Rubén Roa


In addition to Andrew's advice, you should get more familiar with your 
nonlinear model.

>From what you wrote, as T2 tends to infinity, V2 tends to v0*(1-epi).
There you have a baseline on the Y-axis towards which your model tends, and 
this will give you sensible starting values for v0 and epi.

Also, as T0 tends to 0, V2 tends to v0(1-epi(1+exp(cl*t0))).
There you have another higher point on the Y-axis, and this one will give you 
additional sensible starting values for cl and t0.

Plot the data and the predicted model with your initial values and sends the 
model-data combination to the optimizer once you see that the predicted line is 
close to the observed response.

V2 <- c(371000, 285000 ,156000, 20600, 4420, 3870, 5500 )
T2 <- c(0.3403 ,0.4181 ,0.4986 ,0.7451 ,1.0069 ,1.553, 1.333) #last value 
inserted for illustration.
#nls2 <- nls(V2~v0*(1-epi+epi*exp(-cl*(T2-t0))),start=list(v0=10^7,epi=0.9 
,cl=6.2,t0=8.7))
v0.ini <- 10^7
epi.ini <- 0.9
cl.ini <-  6.2
t0.ini <-  8.7
V2.pred.ini <- v0.ini*(1-epi.ini+epi.ini*exp(-cl.ini*(T2-t0.ini)))
plot(T2,V2)
lines(T2,V2.pred.ini)

As you can see, with your initial values the line doesn't even show on the plot.
No wonder the gradients are singular.
So go find better initial values by trial and error and check the results on 
the plot.
Then the optimizer called by nls will finish the job (hopefully).
Then you repeat your plot this time with the estimates instead of the initial 
values.
This may get you started in the business of estimating nolinear models.

HTH

Rubén
________
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de Andrew Robinson
> Enviado el: miércoles, 04 de mayo de 2011 9:15
> Para: sterlesser
> CC: r-help@r-project.org
> Asunto: Re: [R] nls problem with R
> 
> The fact that T2 and V2 are of different lengths seems like a 
> likely culprit.  Other than that, you need to find start 
> points that do not lead to a singular gradient.  There are 
> several books that provide advice on obtaining initial 
> parameter estimates for non-linear models.  Google Books 
> might help you.
> 
> Cheers
> 
> Andrew
> 
> 
> 
> 
> On Tue, May 03, 2011 at 09:08:03PM -0700, sterlesser wrote:
> > the original data are
> > V2 =c(371000,285000 ,156000, 20600, 4420, 3870, 5500 ) T2=c( 0.3403 
> > ,0.4181 ,0.4986 ,0.7451 ,1.0069 ,1.553)
> > 
> nls2=nls(V2~v0*(1-epi+epi*exp(-cl*(T2-t0))),start=list(v0=10^7,epi=0.9
> > ,cl=6.2,t0=8.7))
> > after execution error occurs as below
> > 
> > Error in nlsModel(formula, mf, start, wts) : 
> >   singular gradient matrix at initial parameter estimates Error in 
> > nlsModel(formula, mf, start, wts) :
> >   singular gradient matrix at initial parameter estimates 
> In addition: 
> > Warning messages:
> > 1: In lhs - rhs :
> >   longer object length is not a multiple of shorter object length
> > 2: In .swts * attr(rhs, "gradient") :
> >   longer object length is not a multiple of shorter object length
> > 
> > could anyone help me ?thansks
> > 
> > --
> > View this message in context: 
> > 
> http://r.789695.n4.nabble.com/nls-problem-with-R-tp3494454p3494454.htm
> > l Sent from the R help mailing list archive at Nabble.com.
> > 
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> --
> Andrew Robinson
> Program Manager, ACERA 
> Department of Mathematics and StatisticsTel: 
> +61-3-8344-6410
> University of Melbourne, VIC 3010 Australia   
> (prefer email)
> http://www.ms.unimelb.edu.au/~andrewpr  Fax: 
> +61-3-8344-4599
> http://www.acera.unimelb.edu.au/
> 
> Forest Analytics with R (Springer, 2011) 
> http://www.ms.unimelb.edu.au/FAwR/
> Introduction to Scientific Programming and Simulation using R 
> (CRC, 2009): 
> http://www.ms.unimelb.edu.au/spuRs/
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple lattice question

2011-03-31 Thread Rubén Roa

> -Mensaje original-
> De: Peter Ehlers [mailto:ehl...@ucalgary.ca] 
> Enviado el: jueves, 31 de marzo de 2011 18:09
> Para: Rubén Roa
> CC: r-help@r-project.org
> Asunto: Re: [R] Simple lattice question
> 
> On 2011-03-31 06:58, Rubén Roa wrote:
> > Thanks Peters!
> >
> > Just a few minor glitches now:
> >
> > require(lattice)
> > data<- 
> data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)),
> > x=rpois(60,10),
> > y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5),
> > z=rep(1:4,15))
> > 
> xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4
,col='black',type='b',
> > key=list(x = .65, y = .75, corner = c(0, 0), points=TRUE,
> >lines=TRUE, pch=1:4, lty=1:4, type='b',
> >text=list(lab = as.character(unique(data$z)
> >
> > I have an extra column of symbols on the legend,
> >
> > and,
> >
> > would like to add a title to the legend. Such as 'main' for plots.
> >
> > Any suggestions?
> 
> for key(), make 'lines' into a list:
> 
>   xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),
>  pch=1:4,lty=1:4,col='black',type='b',
>  key=list(x = .65, y = .75, corner = c(0, 0),
>   title="title here", cex.title=.9, lines.title=3,
>   lines=list(pch=1:4, lty=1:4, type='b'),
>   text=list(lab = as.character(unique(data$z)
> 
> Peter Ehlers

... that works. Thanks Peter (sorry I misspelled your name b4). The clean code 
is now:

require(lattice)
data <- data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)),
   x=rpois(60,10),
   y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5),
   z=rep(1:4,15))
xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='black',type='b',
   key=list(x = .65, y = .75, corner = c(0, 0),
  lines=list(pch=1:4, lty=1:4, type='b'),
  title=expression(CO^2),
  text=list(lab = as.character(unique(data$z)

David's code works too (thanks to you too!) and is somewhat shorter

xyplot(x~y|SP, data=data,groups=z, layout=c(2,3), 
par.settings=simpleTheme(pch=1:4,lty=1:4,col='black'), type="b", 
   auto.key = list(x = .6, y = .7, lines=TRUE, corner = c(0, 0)))

but the lines and symbols are on different columns, and the line types look as 
if they were in bold.

Rubén



 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple lattice question

2011-03-31 Thread Rubén Roa

Thanks Peters!

Just a few minor glitches now:

require(lattice)
data <- data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)),
   x=rpois(60,10),
   y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5),
   z=rep(1:4,15))
xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='black',type='b',
   key=list(x = .65, y = .75, corner = c(0, 0), points=TRUE,
  lines=TRUE, pch=1:4, lty=1:4, type='b',
  text=list(lab = as.character(unique(data$z)

I have an extra column of symbols on the legend,

and,

would like to add a title to the legend. Such as 'main' for plots.

Any suggestions?

Rubén

____
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



> -Mensaje original-
> De: Peter Ehlers [mailto:ehl...@ucalgary.ca] 
> Enviado el: jueves, 31 de marzo de 2011 15:41
> Para: Rubén Roa
> CC: r-help@r-project.org
> Asunto: Re: [R] Simple lattice question
> 
> On 2011-03-31 03:39, Rubén Roa wrote:
> >
> > DeaR ComRades,
> >
> > require(lattice)
> > data<- 
> data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)),
> > x=rpois(60,10),
> > y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5),
> > z=rep(1:4,15))
> > 
> xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='bl
> > ack',type='b')
> >
> > How do I put a legend for the grouping variable in the 
> empty upper-right panel?
> 
> See the help page for xyplot for an example using the iris data.
> You just need to add something like
> 
>   auto.key = list(x = .6, y = .7, corner = c(0, 0))
> 
> to your lattice call. See the 'key' entry on the ?xyplot page.
> 
> Peter Ehlers
> 
> >
> > TIA
> >
> > Rubén
> >
> > 
> __
> > __
> >
> > Dr. Rubén Roa-Ureta
> > AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g
> > 48395 Sukarrieta (Bizkaia)
> > SPAIN
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Simple lattice question

2011-03-31 Thread Rubén Roa


DeaR ComRades,

require(lattice)
data <- data.frame(SP=sort(rep(as.factor(c('A','B','C','D','E')),12)),
   x=rpois(60,10),
   y=rep(c(rep(0,4),rep(10,4),rep(20,4)),5),
   z=rep(1:4,15))
xyplot(x~y|SP,data=data,groups=z,layout=c(2,3),pch=1:4,lty=1:4,col='black',type='b')

How do I put a legend for the grouping variable in the empty upper-right panel?

TIA

Rubén

________
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Degrees of freedom for lm in logLik and AIC

2011-03-28 Thread Rubén Roa


However, shouldn't _free parameters_ only be counted for degrees of freedom and 
for calculation of AIC?
The sigma parameter is profiled out in a least-squares linear regression, so 
it's not free, it's not a dimension of the likelihood.
Just wondering ...

____
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de Frank Harrell
> Enviado el: lunes, 28 de marzo de 2011 15:44
> Para: r-help@r-project.org
> Asunto: Re: [R] Degrees of freedom for lm in logLik and AIC
> 
> Thank you Peter.  I didn't realize that was the convention used.
> Frank
> 
> 
> Peter Dalgaard-2 wrote:
> > 
> > On Mar 28, 2011, at 05:36 , Frank Harrell wrote:
> > 
> > > I have a question about the computation of the degrees 
> of freedom 
> > in a linear > model:
> > >
> > > x <- runif(20); y <- runif(20) > f <- lm(y 
> ~ x) > 
> > logLik(f) > 'log Lik.' -1.968056 (df=3) > > The 3 
> is coming 
> > from f$rank + 1.  Shouldn't it be f$rank?  This affects > AIC(f).
> > 
> > I count three parameters in a simple linear regression: 
> alpha, beta, 
> > sigma.
> > 
> > From a generic-likelihood point of view, I don't see how 
> you can omit 
> > the last one.
> > 
> > -pd
> > 
> > --
> > Peter Dalgaard
> > Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 
> > 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Email: pd@cbs.dk  Priv: pda...@gmail.com
> > 
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> 
> 
> -
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Degrees-of-freedom-for-lm-in-log
> Lik-and-AIC-tp3410687p3411759.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] R helps win competitions

2011-03-23 Thread Rubén Roa


DeaR ComRades,

This is a quote from a News article in Science's 11-February issue, about 
competitions to model data:

"For Chris Raimondi, a search-engine expert based in Baltimore, Maryland, and 
winner of the HIV-treatment competition, the Kaggle contest motivated him to 
hone his skills in a newly learned computer language called R, which he used to 
encode the winning data model. Raimondi also enjoys the competitive aspect of 
Kaggle challenges: "It was nice to be able to compare yourself with others; ... 
it became kind of addictive. ... I spent more time on this than I should."

If you are interested read the full article here:

http://www.sciencemag.org/content/331/6018/698.full


Rubén

________
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R² for non-linear model

2011-03-18 Thread Rubén Roa

> -Mensaje original-
> De: Kjetil Halvorsen [mailto:kjetilbrinchmannhalvor...@gmail.com] 
> Enviado el: jueves, 17 de marzo de 2011 16:19
> Para: Rubén Roa
> CC: Alexx Hardt; r-help@r-project.org
> Asunto: Re: [R] R² for non-linear model
> 
> see inline.
> 
> On Thu, Mar 17, 2011 at 4:58 AM, Rubén Roa  wrote:
> > Hi Alexx,
> >
> > I don't see any problem in comparing models based on 
> different distributions for the same data using the AIC, as 
> long as they have a different number of parameters and all 
> the constants are included.
> > For example, you can compare distribution mixture models 
> with different number of components using the AIC.
> > This is one example:
> > Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth 
> With Multiple Length Frequency Data. Journal of Biological, 
> Agricultural and Environmental Statistics 15:416-429.
> > Here is another example:
> > www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf
> > Prof. Dayton writes above that one advantage of AIC over 
> hypothesis testing is:
> > "(d) Considerations related to underlying distributions   
> for   random   
> > variables   can   be incorporated  into  the  
> decision-making  process 
> > rather than being treated as an assumption whose robustness 
>  must  be  
> > considered  (e.g.,  models based  on  normal  densities  
> and  on  log-normal densities can be compared)."
> 
> My  reading of this is that AIC can be used to compare models 
> with densities relative to the same dominating measure.
> 
> Kjetil

I think this is correct. 
It is probably not wise to use the AIC to compare distribution models based on 
the counting measure with distribution models based on the Lebesgue measure!


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R² for non-linear model

2011-03-17 Thread Rubén Roa

Hi Alexx,

I don't see any problem in comparing models based on different distributions 
for the same data using the AIC, as long as they have a different number of 
parameters and all the constants are included.
For example, you can compare distribution mixture models with different number 
of components using the AIC.
This is one example:
Roa-Ureta. 2010. A Likelihood-Based Model of Fish Growth With Multiple Length 
Frequency Data. Journal of Biological, Agricultural and Environmental 
Statistics 15:416-429.
Here is another example:
www.education.umd.edu/EDMS/fac/Dayton/PCIC_JMASM.pdf
Prof. Dayton writes above that one advantage of AIC over hypothesis testing is:
"(d) Considerations related to underlying distributions   for   random   
variables   can   be 
incorporated  into  the  decision-making  process rather than being treated as 
an assumption whose 
robustness  must  be  considered  (e.g.,  models based  on  normal  densities  
and  on  log-normal 
densities can be compared)."
Last, if you read Akaike's theorem you will see there is nothing precluding 
comparing models built on different distributional models. Here it is:
" the expected (over the sample space and the space of parameter estimates) 
maximum log-likelihood of some data on a working model overshoots the expected 
(over the sample space only) maximum log-likelihood of the data under the true 
model that 
generated the data by exactly the number of  parameters in the working model."
A remarkable result.

Rubén

-Original Message-
From: r-help-boun...@r-project.org on behalf of Alexx Hardt
Sent: Wed 3/16/2011 7:42 PM
To: r-help@r-project.org
Subject: Re: [R] R² for non-linear model
 
Am 16.03.2011 19:34, schrieb Anna Gretschel:
> Am 16.03.2011 19:21, schrieb Alexx Hardt:
>> And to be on-topic: Anna, as far as I know anova's are only useful to
>> compare a submodel (e.g. with one less regressor) to another model.
>>
> thanks! i don't get it either what they mean by fortune...

It's an R-package (and a pdf [1]) with collected quotes from the mailing 
list.
Be careful with the suggestion to use AIC. If you wanted to compare two 
models using AICs, you need the same distribution (that is, 
Verteilungsannahme) in both models.
To my knowledge, there is no way to "compare" a gaussian model to an 
exponential one (except common sense), but my knowledge is very limited.

[1] http://cran.r-project.org/web/packages/fortunes/vignettes/fortunes.pdf

-- 
alexx@alexx-fett:~$ vi .emacs

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to produce glm graph

2010-11-21 Thread Rubén Roa

In addition to David's suggestion, you might want to examine the termplot 
function,
?termplot
HTH

-Original Message-
From: r-help-boun...@r-project.org on behalf of David Winsemius
Sent: Sat 11/20/2010 3:54 PM
To: Sonja Klein
Cc: r-help@r-project.org
Subject: Re: [R] How to produce glm graph

On Nov 20, 2010, at 4:27 AM, Sonja Klein wrote:

>
> I'm very new to R and modeling but need some help with visualization  
> of glms.
[snip]
> Is there a way to produce a graph in R that has these features?

Of course. Modeling would be of little value without such capability.  
In R, regression functions produce an object with a particular class  
("glm" in this case) and there is generally have predict method for  
each class. There is also a vector of fitted values within the object  
that may be accessed with the fitted or fitted values functions.

The predict.glm help page has a worked example.

-- 

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] 0.3 is not 0.3, bug in seq() function?

2010-10-28 Thread Rubén Roa

Enrico,

The same happens with other numbers/sequences.
seq(0.1,0.9,0.1)[7]==0.7
[1] FALSE
seq(0.1,1.3,0.1)[12]==1.2
[1] FALSE

Rounding seems to fix it,

round(seq(0.1,0.5,0.1),1)[3]==0.3
round(seq(0.1,0.9,0.1),1)[7]==0.7
round(seq(0.1,1.3,0.1),1)[12]==1.2

They all return TRUE.


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de Enrico R. Crema
> Enviado el: jueves, 28 de octubre de 2010 12:24
> Para: r-help@r-project.org
> Asunto: [R] 0.3 is not 0.3, bug in seq() function?
> 
> Dear List,
> 
> I've been running a numerical simulation and I found this odd 
> error in my code where the which command could not identify 
> which rows of a column of data.frame were corresponding to 
> the value 0.3. There are 7 unique values in this column 
> (0.01,0.05,0.1,0.2,0.3,0.4,0.5), and this does not work only 
> for 0.3. So I looked at the column and manually tried to use 
> the which() command, and the results were all FALSE despite I 
> could see those number. So I recreated my sequence of number 
> and tested:
> 
> seq(0.1,0.5,0.1)[3]==0.3
> 
> which gave me FALSE!!! All the other numbers 
> (0.1,0.2,0.4,0.5) give me TRUE, but 0.3 was not working. So I did:
> 
> seq(0.1,0.5,0.1)[3]-0.3
> 
> which gave me 5.551115e-17. If you run a similar sequence like:
> 
> seq(0.2,0.6,0.1)[2]==0.3
> 
> this will still give me FALSE. No, for my own purpose, I 
> fixed the problem in this way:
> 
> zerothree=seq(0.1,0.5,0.1)[3]
> which(data[,1]==zerothree)
> 
> but I guess this bug is a bit of problem...Apologies if it is 
> the wrong place to post this bug, and apologies also if this 
> was a known issue. My version of R is :
> 
> platform   x86_64-pc-linux-gnu  
> arch   x86_64   
> os linux-gnu
> system x86_64, linux-gnu
> status  
> major  2
> minor  10.1 
> year   2009 
> month  12   
> day14   
> svn rev50720
> language   R
> version.string R version 2.10.1 (2009-12-14)
> 
> 
> Many Thanks,
> 
> Enrico
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Set value if else...

2010-10-15 Thread Rubén Roa

x <- data.frame(x=1:10)
require(gtools)
x$y <- ifelse(odd(x$x),0,1)
HTH

R.

 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de Joel
> Enviado el: viernes, 15 de octubre de 2010 10:16
> Para: r-help@r-project.org
> Asunto: [R] Set value if else...
> 
> 
> Hi
> 
> I want to set a variable to either 1 or 0 depending on an 
> value of a dataframe and then add this as a colum to the dataframe.
> 
> This could be done with a loop but as we are able to do 
> questions on a complete row or colum without a loop it would 
> be sweet if it could be done.
> 
> for example:
> 
> table:
> 
> Name  Age
> Joel 24
> agust   17
> maja40
> and so on...
> 
> And what I want to do is a command that gives me
> VoteRight<-1 if table$age>18 else set VoteRight to 0
> 
> Then use cbind or something to add it to the table.
> 
> so i get
> Name  Age  VoteRight
> Joel 241
> agust   170
> maja401
> 
> And as I said before im guessing this can be done without a loop...
> 
> //Joel
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Set-value-if-else-tp2996667p2996667.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory limit problem

2010-10-12 Thread Rubén Roa

Hi,

Probably Windows cann't allocate enough contiguous free space.
Try this:
Find boot.ini (usually at the root c:\)
Without changing anything else, add this line at the end of the script:
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional 
3GB" /3GB /noexecute=optin /fastdetect
Reboot.
When prompted select the 3GB boot-up. Performance will decay but bigger objects 
could be saved to disk.
Hope that this will be enough to get you code working.
When finished re-boot and start with the normal boot-up.
An example of a boot.ini script that I had to prepare for one big simulation 
work in an XP machine is at the end of my message.

HTH

Rubén

[boot loader]
timeout=30
default=multi(0)disk(0)rdisk(0)partition(1)\WINDOWS
[operating systems]
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" 
/noexecute=optin /fastdetect
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional 
3GB" /3GB /noexecute=optin /fastdetect


-Original Message-
From: r-help-boun...@r-project.org on behalf of Tim Clark
Sent: Tue 10/12/2010 5:49 AM
To: r help r-help
Cc: Tim Clark
Subject: [R] Memory limit problem
 
Dear List,

I am trying to plot bathymetry contours around the Hawaiian Islands using the 
package rgdal and PBSmapping.  I have run into a memory limit when trying to 
combine two fairly small objects using cbind().  I have increased the memory to 
4GB, but am being told I can't allocate a vector of size 240 Kb.  I am running 
R 
2.11.1 on a Dell Optiplex 760 with Windows XP.  I have pasted the error message 
and summaries of the objects below.  Thanks for your help.  Tim


> xyz<-cbind(hi.to.utm,z=b.depth$z)
Error: cannot allocate vector of size 240 Kb

> memory.limit()
[1] 4000
> memory.size()
[1] 1971.68

> summary(hi.to.utm)
Object of class SpatialPoints
Coordinates:
    min   max
x  708745.5  923406.7
y 2046153.1 2327910.9
Is projected: TRUE 
proj4string :
[+proj=utm +zone=4 +datum=NAD83 +ellps=GRS80 +towgs84=0,0,0]
Number of points: 15328

> str(hi.to.utm)
Formal class 'SpatialPoints' [package "sp"] with 3 slots
  ..@ coords : num [1:15328, 1:2] 708746 710482 712218 713944 715681 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "x" "y"
  ..@ bbox   : num [1:2, 1:2] 708746 2046153 923407 2327911
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "x" "y"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slots
  .. .. ..@ projargs: chr " +proj=utm +zone=4 +datum=NAD83 +ellps=GRS80 
+towgs84=0,0,0"


> summary(b.depth)
   x    y   z    
 Min.   :-157.0   Min.   :18.50   Min.   :-5783  
 1st Qu.:-156.6   1st Qu.:18.98   1st Qu.:-4565  
 Median :-156.1   Median :19.80   Median :-3358  
 Mean   :-156.1   Mean   :19.73   Mean   :-3012  
 3rd Qu.:-155.5   3rd Qu.:20.41   3rd Qu.:-1601  
 Max.   :-155.0   Max.   :21.00   Max.   :    0  

> str(b.depth)
'data.frame':   15328 obs. of  3 variables:
 $ x: num  -157 -157 -157 -157 -157 ...
 $ y: num  21 21 21 21 21 ...
 $ z: num  -110 -114 -110 -88 -76 -122 -196 -224 -240 -238 ...

 

Tim Clark
Marine Ecologist
National Park of American Samoa





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [Fwd: RE: maximum likelihood problem]

2010-10-02 Thread Rubén Roa

If your problem was as you described it, you'd simply find the 1st derivative 
of your eq. w.r.t. k, equate to 0, and then solve for k (and check that the 
solution is a maximum). But I guess what you really want to do is to estimate k 
from data given your equation and _another model_ for the likelihood of the 
data. 
If that is the case then you haven't yet defined the model for the likelihood 
of the data as a function of your single parameter k. You have to define an R 
function that contains your process model (apparently your equation) and your 
likelihood model, which will be made from some probability model for the data. 
Try this to get help on how to write new functions:
?"function"
After defining this function with the two models then you can maximize the 
likelihood of the data as a function of the process model parameter k. If there 
is only one parameter then it is recommended to use optimise(), instead of 
optim().
It is not necessary to maximize, just minimize the negative of the 
log(likelihood) once you have defined what this likelihood is.

HTH

Rubén

-Original Message-
From: r-help-boun...@r-project.org on behalf of mlar...@rsmas.miami.edu
Sent: Sat 10/2/2010 3:11 AM
To: r-help@r-project.org
Subject: [R] [Fwd: RE:  maximum likelihood problem]
 
I forgot to add that I first gave a starting value for K.

Nonlinear least squares won't work because my errors are not normally
distributed.

Any advide on my maximum likelihood function would be greatly appreciated.


 Original Message 
Subject: RE: [R] maximum likelihood problem
From:"Ravi Varadhan" 
Date:Fri, October 1, 2010 5:10 pm
To:  mlar...@rsmas.miami.edu
 r-help@r-project.org
--

Do you want to do a nonlinear least-squares estimation (which is MLE if the
errors are Gaussian)?

If so, you have to define a function that takes the parameter (k) and data
matrix (LR, T, LM), as arguments, and returns a scalar, which is the
residual sum of squares.  Then you can optimize (minimize) that function.

Ravi.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of mlar...@rsmas.miami.edu
Sent: Friday, October 01, 2010 4:40 PM
To: r-help@r-project.org
Subject: [R] maximum likelihood problem

I am trying to figure out how to run maximum likelihood in R.  Here is my
situation:

I have the following equation:
equation<-(1/LR-(exp(-k*T)*LM)*(1-exp(-k)))

LR, T, and LM are vectors of data.  I want to R to change the value of k
to maximize the value of equation.

My attempts at optim and optimize have been unsuccessful.  Are these the
recommended functions that I should use to maximize my equation?

With optim I wanted the function to be maximized so I had to make the
fnscale negative.  Here is what I put:

L<-optim(k,equation,control=(fnscale=-1))

My result:   Error: could not find function "fn"


Here is what I put for optimize:

L<-optimise(equation,k,maximum=TRUE)

My result:   Error: 'xmin' not less than 'xmax'

Any advise would be greatly appreciated.
Mike

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] geoR - likfit failure

2010-08-04 Thread Rubén Roa

Questions about geoR should be directed to R-SIG-GEO.
Anyway, you should provide more info about your problem.
Read the Posting Guide.
Have you tried changing the model? Sometimes falling back from Matern to 
exponential or Gaussian allows successful convergence.
 
HTH
 
Rubén



De: r-help-boun...@r-project.org en nombre de Mohammed Ouassou
Enviado el: mié 04/08/2010 10:36
Para: R-help@r-project.org
Asunto: [R] geoR - likfit failure



Hi

I'm using geoR package to perform linear spatial interpolation(OK).
The function likfit() fails to compute REML.

The error meassage is : Error in solve.default(v$varcov, xmat);


How I can find out that likfit() is failed to process and retrieving the error 
message ?


Thank you so much for your help.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] ed50

2010-07-12 Thread Rubén Roa



> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de Dipa Hari
> Enviado el: lunes, 12 de julio de 2010 22:19
> Para: r-help@r-project.org
> Asunto: [R] ed50
> 
> I am using semiparametric Model
>  library(mgcv)
> sm1=gam(y~x1+s(x2),family=binomial, f)
> 
> How should I  find out standard error for ed50 for the above 
> model ED50 =( -sm1$coef[1]-f(x2)) / sm1$coef [2]
>  
> f(x2) is estimated value for non parametric term.
>  
> Thanks

Two ways, 

1) Re-parameterise the model so that ed50 is an explicit parameter in the 
model, or
2) Taylor series (aka Delta method) using the standard errors of coef[1], 
coef[2] and their correlation.

HTH
________
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] selection of optim parameters

2010-07-05 Thread Rubén Roa

> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de Fabian Gehring
> Enviado el: lunes, 05 de julio de 2010 21:53
> Para: r-help@r-project.org
> Asunto: [R] selection of optim parameters
> 
> Hi all,
> 
> I am trying to rebuild the results of a study using a 
> different data set. I'm using about 450 observations. The 
> code I've written seems to work well, but I have some 
> troubles minimizing the negative of the LogLikelyhood 
> function using 5 free parameters.
> 
> As starting values I am using the result of the paper I am rebuiling. 
> The system.time of the calculation of the function is about 0.65 sec. 
> Since the free parameters should be within some boundaries I 
> am using the following command:
> 
> optim(fn=calculateLogLikelyhood, c(0.4, 2, 0.4, 8, 0.8), 
> lower=c(0,0,0,0,0), upper=c(1, 5, Inf, Inf, 1), control=list(trace=1,
> maxit=1000))
> 
> Unfortunately the result doesn't seem to be reasonable. 3 of 
> the optimized parameters are on the boundaries.

1) Your parameters seem to vary over several orders of magnitude. The control 
argument has a parscale parameter that can be used to re-scale all parameters 
to the same order of magnitude. Alternatively, your could estimate the log of 
your parameters, say par=c(log(0.4), log(2), log(0.4), log(8), log(0.8) 
(and equivalent changes in lower and upper), and in your function, instead of 
the parameter value, use exp(parameter value9. That way the _numerical 
optimization_ occurs in the log space whereas the _function evaluation_ occurs 
in the original space. This transformation approach would make your parameter 
estimates and their hessian matrix (in case you are interested in the hessian) 
be output in the transformed space, so estimates and their covariance matrix 
will have to be back-transformed. For estimates just use exp(), whereas for the 
covariance matrix you might have to use something like Taylor series.
2) Did you use "L-BFGS-B" in the method argument of optim()? This method admits 
box-constrained optimization whereas the default (which you seem to be using, 
Nelder-Mead) in unconstrained, as far as I know.

> Unfortunately I don't have much experience using 
> optimizatzion methods. 
> That's why I am asking you.
> Do you have any hints for me what should be taken into 
> account when doing such an optimization.
> 
> Is there a good way to implement the boundaries into the code 
> (instead of doing it while optimizing)? I've read about 
> parscale in the help-section. Unfortunately I don't really 
> know how to use it. And anyways, could this help? What other 
> points/controls should be taken into account?

About using parscale, see Ravi Varadhan's post "Re: optim() not finding optimal 
values" the 27th of June.

HTH

Rubén



Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] optim() not finding optimal values

2010-06-27 Thread Rubén Roa

Derek,

As a general strategy, and as an alternative to parscale when using optim, you 
can just estimate the logarithm of your parameters. So in optim the par 
argument would contain the logarithm of your parameters, whereas in the model 
itself you would write exp(par) in the place of the parameter.

The purpose of this is to bring all parameters to a similar scale to aid the 
numerical algorithm in finding the optimum over several dimensions.

Due to the functional invariance property of maximum likelihood estimates your 
transformed pars back to the original scale are also the MLEs of the pars in 
your model. If you were using ADMB you'd get the standard error of the pars in 
the original scale simply by declaring them sd_report number class. With optim, 
you would get the standard error of pars in the original scale post-hoc by 
using Taylor series (a.k.a. Delta method) which in this case is very simple 
since the transformation is just the exponential.

In relation to your model/data combination, since you have only 17 years of 
data and just one series of cpue, and this is a rather common case, you may 
want to give the choice to set B0=K, i.e. equilibrium conditions at the start, 
in your function, to reduce the dimensionality of your profile likelihood 
approximation thus helping the optimizer.

HTH


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de Derek Ogle
> Enviado el: sábado, 26 de junio de 2010 22:28
> Para: R (r-help@R-project.org)
> Asunto: [R] optim() not finding optimal values
> 
> I am trying to use optim() to minimize a sum-of-squared 
> deviations function based upon four parameters.  The basic 
> function is defined as ...
> 
> SPsse <- function(par,B,CPE,SSE.only=TRUE)  {
>   n <- length(B) # get number of 
> years of data
>   B0 <- par["B0"]# isolate B0 parameter
>   K <- par["K"]  # isolate K parameter
>   q <- par["q"]  # isolate q parameter
>   r <- par["r"]  # isolate r parameter
>   predB <- numeric(n)
>   predB[1] <- B0
>   for (i in 2:n) predB[i] <- 
> predB[i-1]+r*predB[i-1]*(1-predB[i-1]/K)-B[i-1]
>   predCPE <- q*predB
>   sse <- sum((CPE-predCPE)^2)
>   if (SSE.only) sse
> else list(sse=sse,predB=predB,predCPE=predCPE)
> }
> 
> My call to optim() looks like this
> 
> # the data
> d <- data.frame(catch= 
> c(9,113300,155860,181128,198584,198395,139040,109969,71896
> ,59314,62300,65343,76990,88606,118016,108250,108674), 
> cpe=c(109.1,112.4,110.5,99.1,84.5,95.7,74.1,70.2,63.1,66.4,60.
> 5,89.9,117.0,93.0,116.6,90.0,105.1))
> 
> pars <- c(80,100,0.0001,0.17)   # put 
> all parameters into one vector
> names(pars) <- c("B0","K","q","r")  # 
> name the parameters
> ( SPoptim <- optim(pars,SPsse,B=d$catch,CPE=d$cpe) )# run optim()
> 
> 
> This produces parameter estimates, however, that are not at 
> the minimum value of the SPsse function.  For example, these 
> parameter estimates produce a smaller SPsse,
> 
> parsbox <- c(732506,1160771,0.0001484,0.4049)
> names(parsbox) <- c("B0","K","q","r")
> ( res2 <- SPsse(parsbox,d$catch,d$cpe,SSE.only=FALSE) )
> 
> Setting the starting values near the parameters shown in 
> parsbox even resulted in a movement away from (to a larger 
> SSE) those parameter values.
> 
> ( SPoptim2 <- optim(parsbox,SPsse,B=d$catch,CPE=d$cpe) )# 
> run optim()
> 
> 
> This "issue" most likely has to do with my lack of 
> understanding of optimization routines but I'm thinking that 
> it may have to do with the optimization method used, 
> tolerance levels in the optim algorithm, or the shape of the 
> surface being minimized.
> 
> Ultimately I was hoping to provide an alternative method to 
> fisheries biologists who use Excel's solver routine.
> 
> If anyone can offer any help or insight into my problem here 
> I would be greatly appreciative.  Thank you in advance.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with package installation

2010-06-23 Thread Rubén Roa


It was the Panda antivirus installed in the main corporate server of my company.
This antivirus program was corrupting the zip files containing the packages for 
Windows systems.
The problem was solved by including CRAN repositories as trustable webpages.

Best,
Rubén


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN



> -Mensaje original-
> De: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] 
> Enviado el: lunes, 21 de junio de 2010 16:52
> Para: Rubén Roa
> CC: r-help@r-project.org
> Asunto: Re: [R] Problem with package installation
> 
> No idea, do you have the right repository seledcted? Is there 
> nay proxy in your network that may change the download data 
> for some reason? I have not observed that before and I 
> haven't seen similar reports before.
> 
> Best,
> Uwe Ligges
> 
> On 21.06.2010 11:03, Rubén Roa wrote:
> >
> > Dear ComRades,
> >
> > I am having the "wrong MD5 checksums" error with every 
> package I try to install.
> > It happened with R 2.11.0, then I updated to R 2.11.1, same thing.
> >
> > sessionInfo()
> > R version 2.11.1 (2010-05-31)
> > i386-pc-mingw32
> >
> > locale:
> > [1] LC_COLLATE=Spanish_Spain.1252  
> LC_CTYPE=Spanish_Spain.1252LC_MONETARY=Spanish_Spain.1252
> > [4] LC_NUMERIC=C   LC_TIME=Spanish_Spain.1252
> >
> > attached base packages:
> > [1] stats graphics  grDevices utils datasets  methods   base
> >
> > loaded via a namespace (and not attached):
> > [1] tools_2.11.1
> >
> > I triend changing the mirror, to no avail. Then I tried downloading 
> > zip files and installing from local zips but again I got the "wrong 
> > MD5 checksums" error
> >
> > utils:::menuInstallLocal()
> > files R/MASS.rdb, data/Rdata.rdb, help/MASS.rdb have the wrong MD5 
> > checksums
> >
> > In some cases, another error message pointed to a namespace 
> problem, for example:
> >
> > library(Formula)
> > Error in get(Info[i, 1], envir = env) :
> >internal error -3 in R_decompress1
> > Error: package/namespace load failed for 'Formula'
> >
> > Any hints?
> >
> > 
> __
> > __
> >
> > Dr. Rubén Roa-Ureta
> > AZTI - Tecnalia / Marine Research Unit Txatxarramendi Ugartea z/g
> > 48395 Sukarrieta (Bizkaia)
> > SPAIN
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Problem with package installation

2010-06-21 Thread Rubén Roa


Dear ComRades,

I am having the "wrong MD5 checksums" error with every package I try to install.
It happened with R 2.11.0, then I updated to R 2.11.1, same thing.

sessionInfo()
R version 2.11.1 (2010-05-31) 
i386-pc-mingw32 

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C   LC_TIME=Spanish_Spain.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

loaded via a namespace (and not attached):
[1] tools_2.11.1

I triend changing the mirror, to no avail. Then I tried downloading zip files 
and installing from local zips but again I got the "wrong MD5 checksums" error

utils:::menuInstallLocal()
files R/MASS.rdb, data/Rdata.rdb, help/MASS.rdb have the wrong MD5 checksums

In some cases, another error message pointed to a namespace problem, for 
example:

library(Formula)
Error in get(Info[i, 1], envir = env) : 
  internal error -3 in R_decompress1
Error: package/namespace load failed for 'Formula'

Any hints?

________
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [ADMB Users] an alternative to R for nonlinear stat models

2010-06-17 Thread Rubén Roa




De: Chris Gast [mailto:cmg...@gmail.com] 
Enviado el: jueves, 17 de junio de 2010 22:32
Para: Rubén Roa
CC: r-help@r-project.org; us...@admb-project.org
Asunto: Re: [ADMB Users] an alternative to R for nonlinear stat models


I spoke with my colleague who did most of the testing, and he has 
informed me that much of the hessian sensitivity actually came from a separate 
program (based on Numerical Recipes in C++ code) that did not use optim(), 
after having stopped using optim() due to speed issues. 

In my experience with optim, the reltol argument has improved important 
in this regard.  Very small changes in the parameter estimates at the converged 
solution (influenced by reltol) can lead to different standard error estimates 
by inverting the hessian, especially for parameter estimates close to zero (as 
vulnerability coefficients can be in many models with such a feature).  It is a 
limitation of the finite difference method for computing the hessian based on 
optimal parameter estimates.



Chris

If the problem originates in estimates being close to zero, a simple 
transformation (log or exp) might help. 
However, what I would really like to know is the performance of different 
methods of the optim function. I guess the reltol parameter and other control 
parameters would act different depending on the method, whether Nelder-Mead, 
CG, etc. I am currently experimenting with CG and although it is slow for my 
model, it has always produced hessian matrices that were invertible and with 
all diagonals positive after inversion (unlike Nelder-Mead, the default).

Rubén


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

 
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [ADMB Users] an alternative to R for nonlinear stat models

2010-06-16 Thread Rubén Roa



De: users-boun...@admb-project.org [mailto:users-boun...@admb-project.org] En 
nombre de Chris Gast
Enviado el: miércoles, 16 de junio de 2010 21:11
Para: Arni Magnusson
CC: r-help@r-project.org; us...@admb-project.org
Asunto: Re: [ADMB Users] an alternative to R for nonlinear stat models

Hi Arni (and others), 
 My dissertation work involves use (and extension) of models of the same ilk 
(sometimes exactly the same) as those described by Nancy Gove and John Skalski 
in their 2002 article.  I began with R, and moved to my own home-brewed C/C++ 
programs for the sake of of speed when fitting models and real and simulated 
data.  In addition, we found that the estimated standard errors (based on the 
inverse hessian output from optim()) were very sensitive to tolerance 
criteria--often changing orders of magnitude. 


Hi,
Regarding the last bit, optim() has several methods (Nelder-Mead, simulated 
annealing, conjugate gradient, etc). It is interesting to me which method 
produced what result with the standard errors from the inverse Hessian. Can you 
briefly ellaborate?
Thanks
Rubén


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] adding year information to a scatter plot

2010-05-02 Thread Rubén Roa


> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de Ozan Bakis
> Enviado el: domingo, 02 de mayo de 2010 21:25
> Para: r-help@r-project.org
> Asunto: [R] adding year information to a scatter plot
> 
> Hi R users,
> 
> I would like to add year information to each point in a 
> scatter plot. The following example shows what i mean:
> 
> df=data.frame(year=c(2001,2002,2003),a=c(8,9,7),b=c(100,90,95))
> df
> plot(b~a,data=df)
> 
> Now, I would like to see which point belongs to 2001, 2002 
> and 2003 in the above graphic.
> 
> Thank you very much,
> ozan

Use text,
df <- data.frame(year=c(2001,2002,2003),a=c(8,9,7),b=c(100,90,95))
df
plot(b~a,data=df,ylim=c(.98*min(b),1.02*max(b)))
text(x=df$a,y=df$b-.01*max(df$b),lab=format(df$year))

________
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] non linear estimation

2010-04-29 Thread Rubén Roa


> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de JamesHuang
> Enviado el: jueves, 29 de abril de 2010 3:38
> Para: r-help@r-project.org
> Asunto: Re: [R] non linear estimation
> 
> 
> any suggestion? actually I just wanna know if there is a 
> package for non linear estimation with restriction, thanks. I 
> am a new for R

I do not know if there is any specific package for optimization with 
restrictions, but you can use optim with method="L-BFGS-B"
This only lets you set bounds of single parameters, so for restrictions such as 
a+b<19 in
Y=a+(b+c*x)*exp(-d*x)
you could deduce your restrictions in terms of single parameters (for example, 
in your original mail you put that a>10, a+b<19, and b<3, so the restriction 
a+b>19 is actually redundant), or else you could think of some 
re-parameterization that would put a+b (and all other multi-par restrictions) 
as a single parameter.

Wait, is this a homework?

________
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Converting daily data series to monthly series

2010-04-20 Thread Rubén Roa


> -Mensaje original-
> De: r-help-boun...@r-project.org 
> [mailto:r-help-boun...@r-project.org] En nombre de zow...@ncst.go.ke
> Enviado el: miércoles, 21 de abril de 2010 5:16
> Para: r-help@r-project.org
> Asunto: [R] Converting daily data series to monthly series
> 
> Hi Users,
> I have daily series of data from 1962 - 2000, with  the data 
> for February 29th in leap years  excluded, leaving 365 daily 
> values for each year. I wish to convert the daily series to 
> monthly series. How can I do this using the zoo package or 
> any other package?
> 
> Thanks 
> 
> ZABLONE OWITI
>  GRADUATE STUDENT
> Nanjing University of Information, Science and Technology 
> College of International Education

Let df be your dataframe,

#In case you have to format your data as date before setting the montth
df$date   <- as.Date(df$date, "%d/%m/%Y")
#Getting year, month and week from your correctly formatted date
df$Year   <- as.numeric(format(df$date, "%Y"))#Year
df$Month  <- as.numeric(format(df$date, "%m"))#Month
df$Week   <- as.numeric(format(df$date, "%W")) +1 #Week


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Truncated Normal Distribution and Truncated Pareto distribution

2010-04-19 Thread Rubén Roa

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Julia Cains
Enviado el: lunes, 19 de abril de 2010 9:22
Para: r-help@r-project.org
Asunto: [R] Truncated Normal Distribution and Truncated Pareto distribution

Dear R helpers,

I have a bimodal dataset dealing with loss amounts. I have divided this dataset 
into two with the bounds for the first dataset i.e. dataset-A being 5,000$ to 
100,000$ and the dataset-B deals with the losses exceeding 100,000$ i.e. 
dataset-B is left truncated. 

I need to fit truncated normal disribution to dataset - I having lower bound of 
5000 and upper bound of 100,000. While I need to fit truncated Pareto for the 
lossess exceeding 100,000$.

Is there any package in R which will guide me to fit these two distrubitions 
also giving KS (Kolmogorov Smirnov) test and Anderson Darling test results.

Please guide

Julia

---
See
library(MASS)
?fitdistr
You can define your customized truncated density as a function in the parameter 
densfun of fitdistr.

See also
http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg34540.html
http://www.mail-archive.com/r-h...@stat.math.ethz.ch/msg34548.html

HTH


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN






Only a man of Worth sees Worth in other men






  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Scanning only specific columns into R from a VERY large file

2010-04-18 Thread Rubén Roa

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Josh B
Enviado el: sábado, 17 de abril de 2010 0:12
Para: R Help
Asunto: [R] Scanning only specific columns into R from a VERY large file

Hi,

I turn to you, the R Sages, once again for help. You've never let me down!

(1) Please make the following toy files:

x <- read.table(textConnection("var.1 var.2 var.3 var.1000
indv.1 1 5 9 7
indv.21 2 9 3 8"), header = TRUE)

y <- read.table(textConnection("var.3 var.1000"), header = TRUE)

write.csv(x, file = "x.csv")
write.csv(y, file = "y.csv")

(2) Pretend you are starting with the files "x.csv" and "y.csv." They come from 
another source -- an online database. Pretend that these files are much, much, 
much larger. Specifically: 
(a) Pretend that "x.csv" contains 1000 columns by 210,000 rows. 
(b) "y.csv" contains just header titles. Pretend that there are 90 header 
titles in "y.csv" in total. These header titles are a subset of the header 
titles in "x.csv."

(3) What I want to do is scan (or import, or whatever the appropriate word is) 
only a subset of the columns from "x.csv" into an R. Specifically, I only want 
to scan the columns of data from "x.csv" into R that are indicated in the file 
"y.csv." I still want to scan in all 21 rows from "x.csv," but only for the 
aforementioned columns listed in "y.csv."

Can you guys recommend a strategy for me? I think I need to use the scan 
command, based on the hugeness of "x.csv," but I don't know what exactly to do. 
Specific code that gets the job done would be the most useful. 

Thank you very much in advance!
Josh

---
Try with something like

do.call("cbind",scan(file="yourfile.csv",what=list(NULL,NULL,,0,NULL,0,NULL,NULL,...,NULL),flush=TRUE))
 

you have to work out how to set up the list of parameter 'what' to read the 
headers of 'y'. In the above the only columns read are those indicated by a '0'.

HTH

Ruben 



 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] From THE R BOOK -> Warning: In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

2010-03-30 Thread Rubén Roa

-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Corrado
Enviado el: martes, 30 de marzo de 2010 16:52
Para: r-help@r-project.org
Asunto: [R] From THE R BOOK -> Warning: In eval(expr, envir, enclos) : 
non-integer #successes in a binomial glm!

Dear friends,

I am testing glm as at page 514/515 of THE R BOOK by M.Crawley, that is on 
proportion data.

I use glm(y~x1+,family=binomial)

y is a proportion in (0,1), and x is a real number.

I get the error:

In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!

But that is exactly what was suggested in the book, where there is no mention 
of a similar warning. Where am I going wrong?

Here is the output:

 > glm(response.prepared~x,data=,family=binomial)

Call:  glm(formula = response.prepared ~ x, family = binomial, data = )

Coefficients:
(Intercept)x 
-0.3603   0.4480 

Degrees of Freedom: 510554 Total (i.e. Null);  510553 Residual
Null Deviance:  24420
Residual Deviance: 23240AIC: 700700
Warning message:
In eval(expr, envir, enclos) : non-integer #successes in a binomial glm!
 >



Regards
-- 

Corrado Topi
PhD Researcher
Global Climate Change and Biodiversity
Area 18,Department of Biology
University of York, York, YO10 5YW, UK
Phone: + 44 (0) 1904 328645, E-mail: ct...@york.ac.uk

---

Probably you are misreading Crawley's Book?
A proportion would usually be modeled with the Beta distribution, not the 
binomial, which is for counts.
If you are modeling a proportion try the betareg function in betareg package.
HTH
Ruben
 


____
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] remove from the R mailing list

2010-03-30 Thread Rubén Roa


-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de zoe zhang
Enviado el: martes, 30 de marzo de 2010 12:18
Para: r-help@r-project.org
Asunto: [R] remove from the R mailing list

I would like to be removed from the R mailing list.

Thanks.


---

Hi,

Would you like me to remove you?

Rubén 



 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] add information above bars of a barplot()

2010-03-22 Thread Rubén Roa


-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Martin Batholdy
Enviado el: lunes, 22 de marzo de 2010 15:53
Para: r help
Asunto: [R] add information above bars of a barplot()

hi,


I have a barplot with six clusters of four bars each.
Now I would like to add the exact value of each bar as a number above the bar.

I hoped to get some tips here.
I could simply add text at the different positions, but I don't understand how 
the margins on the x-axis are calculated (how can I get / calculate the x-ticks 
of a barplot?).

Also I would like to code this flexible enough so that it still works when I 
have more bars in each cluster.



thanks for any suggestions!





If you are barplotting x

barplot(x)
text(x=barplot(x),y=x,label=format(x),po=3)

should get you closer to what you want.

HTH

Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tm[,-1]

2010-03-11 Thread Rubén Roa


-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de ManInMoon
Enviado el: jueves, 11 de marzo de 2010 12:22
Para: r-help@r-project.org
Asunto: [R] tm[,-1]


This does what I was hoping it would:

aggregate(tm[,-1],by=list(tm[,10]),FUN="mean")

but I don't know what "tm[,-1]" means (well - the -1 bit anyway. 

Does it somehow means the whole matrix?

---

No, it means the matrix 'tm' minus the 1st column,

(a <- matrix(rnorm(16,4,5),4,4))
a[,-1]

________
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Tinn-R RGui Send problem

2010-03-10 Thread Rubén Roa

ComRades,

On a recent Windows XP system that I have to use at work, I installed Tinn-R 
2.3.4.4 and R 2.10.1

When I mark-select a piece of code, say

summary(x)

written in Tinn-R and send it to the RGui using the send button, the RGui 
console instead of showing me my piece of code, it shows something like

source(.trPaths[5], echo=FALSE, max.deparse.length=150)

The code is executed fine but if I want to re-call it to the console, say to 
make a little change, using the Up arrow of the keyboard, I do not receive the 
code but again the line quoted above. That's not very useful.

Checking

.TrPaths[5]

shows that Tinn-R is writing a temporary file in ..\\Program 
Data\\tmp\selection.r and that file contains my code, and is being overwritten 
with every new code that I select and send to the console.

Does anyone know what to do to make the console re-call the selected piece of 
code instead of the call to the temporary file? I guess it is a Tinn-R issue

Thanks


 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] tcltk

2010-03-08 Thread Rubén Roa

I doubt it. Guess why?

 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN


-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Vasco Cadavez
Enviado el: lunes, 08 de marzo de 2010 15:46
Para: r-help@r-project.org
Asunto: [R] tcltk

Hi,

I'm trying to install tcltk in R-2.10.1, however I get error.
someone can help?

thanks

Vasco

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random real numbers

2010-03-02 Thread Rubén Roa

Lower case vignette("Runuran")
Sorry, it's this clever email program.

R. 
____
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN


-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de Rubén Roa
Enviado el: martes, 02 de marzo de 2010 14:13
Para: frederik vanhaelst; r-h...@stat.math.ethz.ch
Asunto: Re: [R] Random real numbers

>From what distribution?
If the uniform,
runif(100,0,2*pi)
If another, install package Runuran, and do this ?Runuran
Vignette("Runuran")

HTH
________
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN


-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de frederik vanhaelst Enviado el: martes, 02 de marzo de 2010 11:52
Para: r-h...@stat.math.ethz.ch
Asunto: [R] Random real numbers

Hi,

How could i generate random real numbers between 0 en 2*pi?

Thanks,

Frederik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Random real numbers

2010-03-02 Thread Rubén Roa

>From what distribution?
If the uniform,
runif(100,0,2*pi)
If another, install package Runuran, and do this
?Runuran
Vignette("Runuran")

HTH
____
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)
SPAIN


-Mensaje original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] En 
nombre de frederik vanhaelst
Enviado el: martes, 02 de marzo de 2010 11:52
Para: r-h...@stat.math.ethz.ch
Asunto: [R] Random real numbers

Hi,

How could i generate random real numbers between 0 en 2*pi?

Thanks,

Frederik

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Plotmath: suprscript on scalable delimiter?

2010-01-29 Thread Rubén Roa


ComRades,

How do you put a superscript on a scalable delimiter?
I want to put 'b' as the power of the expression in the following plot:

t <- 1:25
K <- 0.2
y <- ((1-exp(-K*t))/(1-exp(-K*t)*exp(K)))^3
plot(t,y,"l",main="K=0.2, b=3")
text(15,5,expression(bgroup("(",frac(1-e^-Kt,1-e^-Kt*e^K),")")))

Plotmath examples in demo(plotmath) do not include this.
I've tried a few things to no avail and I did an RSiteSearch("superscript 
delimiter") with zero results.
Thx

Rubén

________
 

Dr. Rubén Roa-Ureta
AZTI - Tecnalia / Marine Research Unit
Txatxarramendi Ugartea z/g
48395 Sukarrieta (Bizkaia)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] discriminant analysis

2009-08-24 Thread Rubén Roa Ureta


Beatriz Yannicelli wrote:

Dear all:

Is it possible to conduct a discriminant analysis in R with categorical and
continuous variables as predictors?

Beatriz
  

Beatriz,
Simply doing this in the R console:
RSiteSearch("discriminant")
yields many promising links. In particular, check documentation of 
package mda.

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] MLE for bimodal distribution

2009-04-08 Thread Rubén Roa-Ureta


_nico_ wrote:

Hello everyone,

I'm trying to use mle from package stats4 to fit a bi/multi-modal
distribution to some data, but I have some problems with it.
Here's what I'm doing (for a bimodal distribution):

# Build some fake binormally distributed data, the procedure fails also with
real data, so the problem isn't here
data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3))
# Just to check it's bimodal
plot(density(data))
f = function(m, s, m2, s2, w)
{
-log( sum(w*dnorm(data, mean=m, sd=s)) + sum((1-w)*dnorm(data, mean=m2,
sd=s2)) )
}

res = mle(f, start=list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6))

This gives an error:
Error in optim(start, f, method = method, hessian = TRUE, ...) : 
  non-finite finite-difference value [2]

In addition: There were 50 or more warnings (use warnings() to see the first
50)
And the warnings are about dnorm producing NaNs

So, my questions are:
1) What does "non-finite finite-difference value" mean? My guess would be
that an Inf was passed somewhere where a finite number was expected...? I'm
not sure how optim works though...
2) Is the log likelihood function I wrote correct? Can I use the same type
of function for 3 modes?
  

I think it is correct but it is difficult to fit as others have pointed out.
As an alternative, you may discretise your variable so the mixture is 
fit to counts on a histogram using a multinomial likelihood.

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R in the NY Times

2009-01-07 Thread Rubén Roa-Ureta


Zaslavsky, Alan M. wrote:

This article is accompanied by nice pictures of Robert and Ross.

Data Analysts Captivated by Power of R
  http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
  

Thanks for the heads up. The R morale is going through the roof!
I've given three courses on R since the second half of 2007 here in 
Chile (geostatistics, Fisheries Libraries for R, and generalized linear 
models) and all my three audiences (professionals working in academia, 
government, and private research institutions) were very much impressed 
by the power of R. I spent as much time on R itself as on the 
statistical topics, since students wanted to learn data management and 
graphics once they started to grasp the basic elements.
R creators, Core Team, package creators and maintainers, and experts on 
the list, thanks so much for such a great work and such an open 
attitude. You lead by example.

Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] AD Model Builder now freely available

2008-11-25 Thread Rubén Roa-Ureta


dave fournier wrote:

Hi All,

Following Mike Praeger's posting on this list,
I'm happy to pass on that AD Model Builder is now freely available from
the ADMB Foundation.

 http://admb-foundation.org/

Two areas where AD Model builder would be especially useful to R users
are multi-parmater smooth optimization as in larg scale maximum
likelihood estimation and the analysis of general nonlinear random
effects models.

Cheers,

 Dave
  

Thank you Dave, this is really great news!
So now I can download ADMB and ADMB-RE for free?
And use Mike's X2R package to connect my ADMB scripts with the power of R.
Only one word: awesome!
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fitting data to a sigmoidal curve

2008-11-12 Thread Rubén Roa-Ureta


Greg Snow wrote:

Sarah,

Doing:

  

RSiteSearch('gompertz', restrict='functions')



At the command prompt gives several promising results.

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.

  

And you can also do:

nobs <- length(data$salam.size.observed)
fn<-function(p){
  salam.size.model <- 
salam.size.inf*exp(-G1*exp(-G2*data$age.observed)); # Gompertz model
  squdiff<- 
(salam.size.model-data$salam.size.observed)^2; #vector of squared 
differences
  sigmasqu<- sum(squdiff)/nobs; # nuisance sigma parameter 
profiled out
  minloglik<- (nobs/2)*log(sigmasqu); #profile likelihood 
as approx. to true likelihood

  }
(salam.size.likfit <- nlm(fn,p=c(init. val. 4 salam.size.inf, init. val. 
4 G1, init. val 4 G2), hessian=TRUE))


where data is a dataframe with observed sizes and ages.
Invert Hessian to obtain measures of precision.
Note also that if you know the size at birth then you can 
re-parameterize, put size at birth instead of asymptotic size 
(salam.size.inf), and reduce model complexity to two parameters (but of 
course the Gompertz formula changes).
If the Gompertz model is not flexible enough, note that it is a 
particular case of another more flexible, and more complex model, 
Schnute's, which I have found often provides a better explanation of 
growth data (as measured by the AIC), especially if you have sizes of 
very young individuals.


HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fitting a modified logistic with glm?

2008-11-09 Thread Rubén Roa-Ureta


Mike Lawrence wrote:

On Sat, Nov 8, 2008 at 3:59 PM, Rubén Roa-Ureta <[EMAIL PROTECTED]> wrote:

  

...
The fit is for grouped data.
...




As illustrated in my example code, I'm not dealing with data that can be
grouped (x is a continuous random variable).
  

Four points:
1) I've showed you an approach that you can follow in order to fit a 
modified (custom-made) logistic model, it's up to you to follow this 
approach and modify it accordingly, it was never my intention to give 
you the precise code for your problem (obviously),
2) If you do follow this approach and you keep your predictor 
continuous, you have to change the line of the likelihood definition,
3) If it is really true that your predictor is a *random* variable, then 
you have not a simple glmn (you may have a glmm), and
4) You can always make your continuous predictor categorical, for 
example, as when you make a histogram of said predictor.

R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fitting a modified logistic with glm?

2008-11-08 Thread Rubén Roa-Ureta


Mike Lawrence wrote:

Hi all,

Where f(x) is a logistic function, I have data that follow:
g(x) = f(x)*.5 + .5

How would you suggest I modify the standard glm(..., family='binomial')
function to fit this? Here's an example of a clearly ill-advised attempt to
simply use the standard glm(..., family='binomial') approach:


# First generate some data

#define the scale and location of the modified logistic to be fit
location = 20
scale = 30

# choose some x values
x = runif(200,-200,200)

# generate some random noise to add to x in order to
# simulate real-word measurement and avoid perfect fits
x.noise = runif(length(x),-10,10)

# define the probability of success for each x given the modified logistic
prob.success = plogis(x+x.noise,location,scale)*.5 + .5

# obtain y, the observed success/failure at each x
y = rep(NA,length(x))
for(i in 1:length(x)){
y[i] = sample(
x = c(1,0)
, size = 1
, prob = c(prob.success[i], 1-prob.success[i])
)
}

#show the data and the source modified logistic
plot(x,y)
curve(plogis(x,location,scale)*.5 + .5,add=T)


# Now try to fit the data


#fit using standard glm and plot the result
fit = glm(y~x,family='binomial')
curve(plogis(fit$coefficients[1]+x*fit$coefficients[2])*.5+.5,add=T,col='red',lty=2)

# It's clear that it's inappropriate to use the standard
"glm(y~x,family='binomial')"
# method to fit the modified logistic data, but what is the alternative?
  

A custom-made fit function using nlm?
Below, in the second line the logistic model has been re-parameterized 
to include p[1]=level of the predictor which predicts a 50% probability 
of success, and p[2]=level of the predictor which predicts a 95% 
probability of success. You can change the model in this line to your 
version. In the 4th line (negative log-likelihood) mat is a vector of 0s 
and 1s representing success and imat is a vector of 1s and 0s 
representing failure. The fit is for grouped data.

fn <- function(p){
 prop.est  <- 1/(1+exp(log(1/19)*(l-p[1])/(p[2]-p[1])));  # estimated 
proportion of success
 iprop.est <- 1-prop.est;# estimated 
proportion of failure
 negloglik <- -sum(count*(mat*log(prop.est)+imat*log(iprop.est)));  
#binomial likelihood, prob. model

 }
prop.lik <- nlm(fn,p=c(25.8,33.9),hessian=TRUE)

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] descretizing xy data

2008-11-04 Thread Rubén Roa-Ureta


Rubén Roa-Ureta wrote:

Jon A wrote:

Hello,
I have a dataset with a continuous independent variable (fish length, 
range:

30-150 mm) and a binary response (foraging success, 0 or 1). I want to
discretize fish length into 5 mm bins and give the proportion of 
individuals
who successfully foraged in each each size bin. I have used the cut 
function
to discretize the length values into my desired bins, but I can't 
figure out
how to manipulate my response data in terms of the levels I've 
created. Any

advice on how to achieve my task?

Thanks in advance.
  

You have the option of using catspec.
Here is another, more transparent solution, using hist().
lb <- 30
ub <- 150
bk <- 5
x <- data.frame(cbind(runif(1000,lb,ub),rbinom(1000,1,0.75)))
x$X3 <- cut(x$X1,breaks=(ub-lb)/bk,labels=FALSE)
y <- 
data.frame(cbind(hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks[-1],hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$counts,0)) 


for (i in 1:length(y$X1)) {
 for (j in 1:length(x$X1)) {
if(identical(x$X3[j],i)) y$X3[i] <- y$X3[i]+x$X2[j]
 }
}
sum(x$X2) #check that counting was correct
sum(y$X3)
y$X4 <- ifelse(y$X3 > 0,y$X3/y$X2,NA)
plot(y$X1,y$X4,pch=19)
abline(v=hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks)

BTW, if you add the line below,
text(x=y$X1,y=y$X4+.01,format(y$X2),cex=.5)
you show the sample size at each proportion.
R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] descretizing xy data

2008-11-04 Thread Rubén Roa-Ureta


Jon A wrote:

Hello,
I have a dataset with a continuous independent variable (fish length, range:
30-150 mm) and a binary response (foraging success, 0 or 1). I want to
discretize fish length into 5 mm bins and give the proportion of individuals
who successfully foraged in each each size bin. I have used the cut function
to discretize the length values into my desired bins, but I can't figure out
how to manipulate my response data in terms of the levels I've created. Any
advice on how to achieve my task?

Thanks in advance.
  

You have the option of using catspec.
Here is another, more transparent solution, using hist().
lb <- 30
ub <- 150
bk <- 5
x <- data.frame(cbind(runif(1000,lb,ub),rbinom(1000,1,0.75)))
x$X3 <- cut(x$X1,breaks=(ub-lb)/bk,labels=FALSE)
y <- 
data.frame(cbind(hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks[-1],hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$counts,0))

for (i in 1:length(y$X1)) {
 for (j in 1:length(x$X1)) {
if(identical(x$X3[j],i)) y$X3[i] <- y$X3[i]+x$X2[j]
 }
}
sum(x$X2) #check that counting was correct
sum(y$X3)
y$X4 <- ifelse(y$X3 > 0,y$X3/y$X2,NA)
plot(y$X1,y$X4,pch=19)
abline(v=hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks)

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Distributions Comparison

2008-10-28 Thread Rubén Roa-Ureta


Igor Telezhinsky wrote:

Dear all,

I have recently found out about the R project. Could anyone tell me if it is
possible to make the comparison of two distributions using R packages? The
problem is TRICKY because I have the distributions of measurements and each
measurement in the given distribution has its own error, which I need to
take into account when comparing these distributions. If anyone knows it is
possible with R, could you please tell me what package to use. I will really
appreciate some hints in code as well.

 


Thank you.
  

Dear Igor,
Welcome to the list and to R. Please read the posting guide and study 
your problem b4 attempting to seek help from us. Your question as it 
stands is impossibly ambiguous. Try to make a toy example or formulate 
it better and you may have better luck.

Ruben

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot the chi square distribution and the histogram in the same graph

2008-10-21 Thread Rubén Roa-Ureta


leo_wa wrote:

In the previous post ,i ask how to plot the  normal curve and the histogram
in the same graph.if i want to know how to plot the chi square distribution
to campare the data set ,how can i do that?
  
You should make up your mind, is your random variable X (-inf,+inf) or 
Sum(X^2) (which obviously cann't take negative values)?

They are quite different.
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to plot the histogram and the curve in the same graph

2008-10-21 Thread Rubén Roa-Ureta


leo_wa wrote:

i want to plot the histogram and the curve in the same graph.if i have a set
of data ,i plot the histogram and also want to see what distribution it
was.So i want to plot the curve to know what distribution it like.
  
  
To draw the curve and the distribution you should have an idea about the 
distribution.
You cann't just draw the histogram and expect R to make a curve of the 
best distribution to fit that histogram.

But you can plot a curve of a kernel density.

x <- rnorm(1000,5,3)
library(MASS)
(x.normfit <- fitdistr(x,"normal"))
hist(x,prob=TRUE)
lines(density(x,na.rm=TRUE),col="red") # kernel density
curve(dnorm(x,mean= x.normfit$estimate[1],sd= 
x.normfit$estimate[2]),col="blue",add=TRUE) #maximum likelihood estimate


Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] distributions and glm

2008-10-21 Thread Rubén Roa-Ureta


drbn wrote:

Hello,
I have seen that some papers do this:

1.) Group data by year (e.g. 35 years) 


2.) Estimate the mean of the key variable through the distribution that fits
better (some years is a normal distribution , others is a more skewed, gamma
distribution, etc.)

3.) With these estimated means of each year do a GLM.

I'd like to know if it is possible (to use these means in a GLM) or is a
wrong idea.

Thanks in advance

David
  

David,
You can model functions of data, such as means, but you must be careful 
to carry over most of the uncertainty in the original data into the 
model. If you don't, for example if you let the model know only the 
values of the means, then you are actually assuming that these means 
were observed with absolute certainty instead of being estimated from 
the data. To carry over the uncertainty in the original data to your 
modeling you can use a Bayesian approach or you can use a marginal 
likelihood approach. A marginal likelihood is a true likelihood function 
not of the data, but of functions of the data, such as of maximum 
likelihood estimates. If your means per year were estimated using 
maximum likelihood (for example with fitdistr in package MASS) and you 
sample size is not too small then you can use a normal marginal 
likelihood model for the means. Note however that each mean may come 
from a different distribution so the full likelihood model for your data 
would be a mixture of normal distributions. You may not be able to use  
the pre-built glm function so you may face the challenge to write your 
own code.

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Question about glm using R

2008-10-21 Thread Rubén Roa-Ureta


Jordi Garcia wrote:

Good morning,

I am using R to try to model the proportion of burned area in 
Portugal. The dependent variable is the proportion. The family used is 
binomial and the epsilon would be binary.


I am not able to find the package to be used when the proportion (%) 
has to be used in glm. Could someone help me?


I am using normal commands of glm.. for example:

glm_5<- glm(formula=p~Precipitation, family=binomial(link=logit), 
data=dados)


where p is the proportion of burned area, but this error message 
apperars:


Warning message:
non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)

That is why I think I am not using the proper glm package.

Thank you very much in advance.

Jordi

Jordi,
Your statistical model is wrong. The binomial family if four counts data 
(counts of successes given n trials), not for proportions. To model 
proportions, your family is the Beta family. I've modeled proportion 
response variables with function betareg of package betareg. If you want 
my example applications I can send you code and data off list.
Reference: Ferrari and Cribari-Neto. 2004. Beta regression for modelling 
rates and proportions. Journal of Applied Statistics 31:799-815.

Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] plot - central limit theorem

2008-10-15 Thread Rubén Roa-Ureta


Jörg Groß wrote:

Hi,


Is there a way to simulate a population with R and pull out m samples, 
each with n values

for calculating m means?

I need that kind of data to plot a graphic, demonstrating the central 
limit theorem

and I don't know how to begin.

So, perhaps someone can give me some tips and hints how to start and 
which functions to use.




thanks for any help,
joerg


x <- runif(1,0,1)
y <- runif(1,0,1)
z <- runif(1,0,1)
u <- runif(1,0,1)
v <- runif(1,0,1)
w <- runif(1,0,1)
t <- x+y+z+u+v+w
hist(t,breaks=100,xlab="sum of i.i.d r.v with finite mean and 
variance",ylab="Frequency",main="")


Change runif for the corresponding function of the distribution of your 
choice.

Not exactly what you wanted but it does verify the C.L.T.
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R graph with values incorporated

2008-08-28 Thread Rubén Roa-Ureta


Prasanth wrote:

Dear All:

 


Greetings!

By the way, is it possible to have a graph (say line graph) that shows
"values" as well (say y-axis values within the graph)? One could do it in
excel. I am just wondering whether it is possible with R!
  

x <- rnorm(100,2,3)
y <- rnorm(100,2,3)
plot(x,y,pch=19)
text(x=x,y=y+.5,format(x,digits=1),cex=.5)


[[alternative HTML version deleted]]  <-- Read the posting guide.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] draw a function over a histogram

2008-08-22 Thread Rubén Roa-Ureta


Daniela Garavaglia wrote:

Sorry, I have some troubles with the graph device.

How can I draw a function over a histogram?

Thank's so much.
  

Daniela,
What function?
Here is one example using density() and using dnorm()
x <- rnorm(1000,2,2)
hist(x,prob=TRUE)
lines(density(x,na.rm=TRUE),col="red")
library(MASS)
y <- fitdistr(x,"normal")
curve(dnorm(x,mean=y$estimate[1],sd=y$estimate[2]),add=TRUE)

HTH
R.


[[alternative HTML version deleted]] <--- What about this??




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Coordinate systems for geostatistics in R

2008-08-22 Thread Rubén Roa-Ureta


imicola wrote:

Hi,

I read somewhere that when carrying out geostatistical analysis in R you
should not use latitude and longitude...can anyone expand on this a little
for me, and what would be the best coordinate system to use?
  
Not only in R. In most systems, the inter-point distances are assumed to 
be planar (distances over an Euclidean space), whereas latitude and 
longitude are spherical. I guess there could be a geostatistical 
analysis based on spherical distances, but why to make things more 
complicated when projecting the spherical coordinates into planar 
coordinates b4 the analysis produces a good approximation and simplifies 
the analysis significantly? I use UTM coordinates, and transform from 
geodetic to metric with Eino Uikkanen's GeoConv program (it's free).



I have my data in a geographic coordinate system, WGS84, decimal
degreesis this the wrong format for such analyses?
  
If the distances are short, it is not so wrong, and the wrongness 
increases with increasing distance.

I have also converted my data in the UTM projection and so have it in
metres(ranging from 480,000 to 550,000 E and 170,000 to 230,000 N).  


If I was to use the UTM coordinates, should I be using the actual
coordinates in metres, or should I convert this into an arbitrary coordinate
system (i.e. from 0 - 1) somehow?
It would be convenient to use km rather than m, so the range parameter 
would be closer in magnitude to the other parameters of the model. A 
very large range parameter in metres may cause numerical instability 
during minimization of the negative support function in likelihood-based 
models such as that implemented in geoR.
  


I have noticed that running an analysis on the data gives different results
depending on which type of system you use, so I want to make sure I have the
correct system.  I should also probably note that I am a geostatistical
novice!

Thanks,
  
Bottomline, your geostatistical software is probably based on distance 
calculations on an Euclidean space so it is wrong to input locations in 
spherical coordinates.

HTH
Ruben

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Geodata object border

2008-08-12 Thread Rubén Roa-Ureta


imicola wrote:

Sorry, this is probably quite an easy question, but I'm new to R and couldn't
find the answer anywhere.

I'm using geoR and geoRglm, but can't figure out how to get a border in my
geodata object.  Does this need to be defined when I'm importing my data, or
afterwards, and how do I go about doing this?

Thanks
  
You can define the border previously and import it to R with read.table 
or read.csv or you can define it from your geodata object in R.
In the first case don't forget to close the polygon by repeating the 
first vertex at the end of the file.
In the second case you can use a convex hull function in R, such as 
chull(), to define the polygon with your geodata object. Say your 
geodata object is called my.geo, then my.geo$coords would be the 
coordinates. Then try this,

bor <- chull(my.geo$coords)
bor <- c(bor, bor[1]) # close polygon by appending the first vertex
plot(my.geo$coords)
lines(my.geo$coords[pol,])
HTH
Ruben

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] geoR : Passing arguments to "optim" when using "likfit"]

2008-06-26 Thread Rubén Roa-Ureta


Mzabalazo Ngwenya wrote:

Hi everyone !
 
I'm am trying to fit a kriging model to a set of data. When I just run 
the "likfit" command I can obtain the results. However when I try to 
pass additional arguements to the optimization function "optim" I get 
errors. That is I want to obtain the hessian matrix so matrix 
(hessian=TRUE).
 
Heres a little example( 1-D). Can anyone shed some light?  Where am I 
going wrong ?
 
---
 
 
 
obs.points 
<-matrix(c(0.1,0,0.367,0,0.634,0,0.901,0),byrow=TRUE,nrow=4)

MM1.fx <-MM1(obs.points[,1])
MM1.fx


[1] 0.111 0.5789475 1.7272732 9.100
# Creating geoR object
 

MM1.data <-as.geodata(cbind(obs.points,MM1.fx))
MM1.data


$coords
[1,] 0.100 0
[2,] 0.367 0
[3,] 0.634 0
[4,] 0.901 0
$data
[1] 0.111 0.5789475 1.7272732 9.100
attr(,"class")
[1] "geodata"
# Starting values for MLE
 

sta.vals =expand.grid(seq(1,20),seq(1,20))


# Maximum likelihood estimation of covariance parameters

 HERE IS THE PROBLEM 
 
MM1.fit 
<-likfit(MM1.data,ini.cov.pars=sta.vals,fix.nugget=TRUE,cov.model="gaussian",optim(hessian=TRUE)) 





MM1.fit 
<-likfit(MM1.data,ini.cov.pars=sta.vals,fix.nugget=TRUE,cov.model="gaussian",hessian=TRUE) 


MM1.fit$info.minimisation.function$hessian
For the observed Fisher information:
solve(MM1.fit$info.minimisatrion.function$hessian)

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Solving for parameter values at likelihood points

2008-05-06 Thread Rubén Roa-Ureta


ComRades,
On Oct 16, 2006, I posted a question regarding how to find the parameter 
values that made the likelihood function take a value equal to 1/8th of 
the maximum likelihood value. There was no reply but I found a solution 
and would like to know if there is a better solution using the function 
uniroot(). I think with uniroot() I can get better parameter values at 
these bounds.

This is my solution, using the function nearest() from package GenKern.
The example concerns the marginal likelihood for the normal variance 
parameter when the mean is a nuisance parameter (see section 7.3 in 
Royall, 1997, Statistical evidence. A likelihood paradigm, Chapman & Hall).

Rubén

y<-c(-2.250357,10.723614,7.238959,9.179939,5.831603,9.301580,4.019388,9.777280,5.587761,3.600276,9.753381,5.154928,2.776350,9.940350,14.185712,4.281198,10.994180,9.333027,13.082535,2.067826,10.772551,-3.811529,5.446021,-5.333287,1.421544)
n <- length(y)
s.sq <- (1/(n-1))*sum((y-mean(y))^2)
sigma.sq <- seq(1,100,0.1)
Lik.M <- sigma.sq^(-(n-1)/2)*exp(-(n-1)*s.sq/(2*sigma.sq))
Rel.Lik.M <- Lik.M/max(Lik.M)
plot(sigma.sq,Rel.Lik.M,type="l")
abline(h=1/8)
arrows(y0=0.4,y1=0.18,x0=0,x1=12)
arrows(y0=0.4,y1=0.18,x0=62,x1=50)
library(GenKern)
cbind(sigma.sq,Rel.Lik.M)[Rel.Lik.M==1] #maximum likelihood estimate
max.idx <- nearest(Rel.Lik.M,1,outside=TRUE) # vector index value at the 
maximum likelihood
#finding the parameter value at 1/8th the max. likelihood to the left of 
the maximum

cbind(sigma.sq[1:max.idx],Rel.Lik.M[1:max.idx])[nearest(Rel.Lik.M[1:max.idx],0.125),]
#finding the parameter value at 1/8th the max. likelihood to the right 
of the maximum

cbind(sigma.sq[max.idx:length(sigma.sq)],Rel.Lik.M[max.idx:length(sigma.sq)])[nearest(Rel.Lik.M[max.idx:length(sigma.sq)],0.125),]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] deleting variables

2008-04-29 Thread Rubén Roa-Ureta


Richard Pearson wrote:

?rm

Richard




Ralf Goertz wrote:

How can I automatically exclude one variable from being saved in the
workspace when I quit an R session? The problem is I don't know how to
erase a variable once it has been created.


[...]


Ralf


More on the use of rm. If you want to delete many variables in one go,

x<-runif(10)
y<-runif(10)
z<-runif(10)
ls() #you check all the objects in your workspace
#[1] "x" "y" "z"
rm(list=your.list <- ls()[2:3]) #you selected to delete all those 
objects whose indices are between 2 and 3.

rm("your.list") #remove the temporary list with variable names
ls()
[1] "x"

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R Newbie Question/Data Model

2008-04-25 Thread Rubén Roa-Ureta

[EMAIL PROTECTED] wrote:
> Given a data set and a set of predictors and a response in the data,
> we would like to find a model that fits the data set best.
> Suppose that we do not know what kind of model (linear, polynomial
> regression,... ) might be good, we are wondering if there is R-package(s)
> can auctomatically do this.
> Otherwise, can you direct me, or point out reference(s),
> basic steps to do this. Thanks.
> 
> -james

The best-fitting model for any data is a model with a lot of parameters, 
so maybe the best fitting model for any data is a model with an infinite 
number of parameters. However, any model with more parameters than data 
will have a negative number of degrees of freedom, and you do not want 
that. The best-fitting model for any data subject to the constraint that 
the number of degrees of freedom is non-negative, is the data itself, 
with zero degrees of freedom.
The AIC tells you this too. The AIC for the model formed by the data 
itsel is 2n, whereas the AIC for any model with negative degrees of 
freedom is > 2n.
But I guess you want to make inference from sample to population. If 
that is indeed the case, then you should consider changing your focus 
from finding "a model that fits the data set best" to a model that best 
summarizes the information contained in your sample about the population 
the sample comes from. To do that, start by defining the nature of your 
response variable. What is the nature of the natural process generating 
this response variable? Is it continuous or discrete? Is it univariate 
or multivariate? Can it take negative and positive values? Can it take 
values of zero? After you have clarified the probabilistic model for the 
response variable, then you can start thinking about the mathematical 
relation between the response variable and the predictors. Is it linear 
or nonlinear? Are the predictors categorical or continuous?
Read the posting guide, formulate a clear question, and maybe you will 
be given more specific help.
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Modeling presence only data in R

2008-04-22 Thread Rubén Roa-Ureta

milton ruser wrote:
> Dear All,
> 
> I have a set of environental maps and presence-only points for some species.
> How can I generate distributions models on R using these presence-only data?
> What packages and functions can I use to do so?
> 
> Kind regards,
> 
> Miltinho
> Brazil

If you have location data for each absence/presence (such as linear 
location data, easting and northing) consider using geoRglm.
Christensen, O. F., and Ribeiro, P. J. 2002. geoRglm: a package for 
generalized linear spatial models. R-NEWS, 2: 26–28.
HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] finding an unknown distribution

2008-04-21 Thread Rubén Roa-Ureta

andrea previtali wrote:
> Hi,
> I need to analyze the influences of several factors on a variable that is a 
> measure of fecundity, consisting of 73 observations ranging from 0 to 5. The
> variable is continuous and highly positive skewed, none of the typical
> transformations was able to normalize the data. Thus, I was thinking in 
> analyzing these data using a generalized linear model where I
> can specify a distribution other than normal. I'm thinking it may fit a
> gamma or exponential distribution. But I'm not sure if the data meets
> the assumptions of those distributions because their definitions are
> too complex for my understanding!

Roughly, the exponential distribution is the model of a random variable 
describing the time/distance between two independent events that occur 
at the same constant rate. The gamma distribution is the model of a 
random variable that can be thought of as the sum of exponential random 
variables. I don't think fecundity data, the count of reproductive 
cells, qualifies as a random variable to be modeled by either of these 
distributions. If the count of reproductive cells is very large, and you 
are modeling this count as a function of animal size, such as length, 
you should consider the lognormal distribution, since the count of cells 
grow multiplicatively (volumetrically) with the increase in length. In 
that case you can model your response variable using glm with 
family=gaussian(link="log").
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] memory issues

2008-04-16 Thread Rubén Roa-Ureta

I think any geostatistical program/R package would have trouble handling 
12000 observations on a PC. The empirical variogram would be built with 
the combinations of 12000 over 2 pairs, nearly 72 millions pairs, and 
during kriging, if you didn't restrict the search neighbourhood, 
interpolation would involve solving something very big, more so if you 
defined a fine grid, ... Try restricting the search neighbourhood, if 
you didn't, with maxdist and nmax.

Rubén


Prof Brian Ripley wrote:
> I think the clue is that the message you quote comes from gstat, which 
> does not use R's memory allocator.  It is gstat and not R that has failed 
> to allocate memory.
> 
> Try re-reading the help page for memory.size.  'max=T' does not indicate 
> the limit (that is the job of memory.limit()), but the maximum 'used' 
> (acquired from the OS) in that session.  So 19Mb looks reasonable for R's 
> usage.
> 
> I don't understand the message from memory.limit() (and the formatting is 
> odd).  memory.limit() does not call max() (it is entirely internal), so I 
> wonder if that really is the output from that command.  (If you can 
> reproduce it, please let us have precise reproduction instructions.)
> 
> There isn't much point in increasing the memory limit from the default 
> 1.5Gb on a 2Gb XP machine.  The problem is that the user address space 
> limit is 2Gb and fragmentation means that you are unlikely to be able to 
> get over 1.5Gb unless you have very many small objects, in which case R 
> will run very slowly.  In any case, that is not the issue here.
> 
> On Wed, 16 Apr 2008, Dave Depew wrote:
> 
>> Hi all,
>> I've read the R for windows FAQ and am a little confused re:
>> memory.limit and memory.size
>>
>> to start using R 2.6.2 on WinXP, 2GB RAM, I have the command line "sdi
>> --max-mem-size=2047M"
>> Once the Rgui is open, memory.limit() returns 2047, memory.size()
>> returns 11.315, and memory.size(max=T) returns 19.615
>>
>> Shouldn't memory.size(max=T) return 2047?
>>
>> Upon running several operations involving kriging (gstat package,
>> original data file 3 variables, 12000 observations)
>> the program runs out of memory
>>
>> "memory.c", line 57: can't allocate memory in function m_get()
>> Error in predict.gstat(fit.uk, newdata = EcoSAV.grid.clip.spxdf,
>> debug.level = -2,  :
>>  m_get
>>
>> Immediately following this,
>>
>> memory.limit() returns [1] -Inf
>>Warning message:
>>In memory.limit() : no non-missing arguments
>> to max; returning -Inf
>>
>> memory.size() returns 24.573.
>>
>> memory.size(max=T) returns 46.75
>>
>> To my untrained eye, it appears that R is not being allowed access to
>> the full memory limit specified in the cmd lineif this is the case,
>> how does one ensure that R is getting access to the full allotment of RAM?
>> Any insight is appreciated...
>>
>>
>>> sessionInfo()
>> R version 2.6.2 (2008-02-08)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats graphics  grDevices datasets  tcltk utils
>> methods   base
>>
>> other attached packages:
>> [1] maptools_0.7-7 foreign_0.8-23 gstat_0.9-43   rgdal_0.5-24
>> lattice_0.17-4 sp_0.9-23  svSocket_0.9-5 svIO_0.9-5
>> R2HTML_1.58svMisc_0.9-5   svIDE_0.9-5
>>
>> loaded via a namespace (and not attached):
>> [1] grid_2.6.2  tools_2.6.2
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confidence intervals of log transformed data

2008-04-16 Thread Rubén Roa-Ureta

tom soyer wrote:
> Hi
> 
>  I have a general statistics question on calculating confidence interval of
> log transformed data.
> 
> I log transformed both x and y, regressed the transformed y on transformed
> x: lm(log(y)~log(x)), and I get the following relationship:
> 
> log(y) = alpha + beta * log(x) with se as the standard error of residuals
> 
> My question is how do I calculate the confidence interval in the original
> scale of x and y? Should I use

[...]

Confidence interval for the mean of Y? If that is the case, when you 
transformed Y to logY and run a regression assuming normal deviates you 
were in fact assuming that Y distributes lognormally. Your interval must 
be assymetric, reflecting the shape of the lognormal.  The lognormal 
mean is lambda=exp(mu + 0.5*sigma^2), where mu and sigma^2 are the 
parameters of the normal variate logY. A confidence interval for lambda is
Lower Bound=exp(mean(logY)+0.5*var(logY)+sd(logY)*H_alpha/sqrt(n-1))
Upper Bound=exp(mean(logY)+0.5*var(logY)+sd(logY)*H_(1-alpha)/sqrt(n-1))
where the quantiles H_alpha and H_(1-alpha) are quantiles of the 
distribution of linear combinations of the normal mean and variance 
(Land, 1971, Ann. Math. Stat. 42:1187-1205, and Land, 1975, Sel. Tables 
Math. Stat. 3:385-419).
Alternatively, you can model directly
Y=p1*X^p2, p1=exp(your alpha), p1=beta
with a lognormal likelihood and predict the mean of Y with the fitted 
model (I'm guessing here).
It could be useful to check Crow and Shimizu, Lognormal distributions. 
Theory and practice, 1988, Dekker, NY.
HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mixture distribution analisys

2008-04-10 Thread Rubén Roa-Ureta

Antonio Olinto wrote:
> Hello,
> 
> I'm analyzing some fish length-frequency data to access relative age and 
> growth
> information and I would like to do something different from FiSAT / ELEFAN 
> analysis.
> 
> I found a package named MIX in http://www.math.mcmaster.ca/peter/mix/mix.html
> but it's compiled for R 2.0.0
> 
> So I have two questions:
> Is there any problem to run this package in R 2.7.0? - probably yes …
> And, is there any other package in R that can be used to analyze and to 
> separate
> mixture distributions?
> 
> Thanks for any help. Best regards,
> 
> Antonio Olinto
> Sao Paulo Fisheries Institute
> Brazil

This problem is not too difficult. Maybe you can write your own code.
Say, you enter the number of fish you measured, n, and a two-column 
dataframe with the mid point of the length class in the first column 
(call it l) and the observed relative frequency of each length class in 
the second column (call it obs_prob). Then using the multinomial 
likelihood for the number of fish in each length class as the random 
variable (the approach in Peter Macdonald's Mix), minimize
log_likelihood=-n*sum(elementwise product(obs_prob,log(pred_prob)))
where
pred_prob=(p1*0.3989423/s1)*exp(-0.5*square((l-m1)/s1))
 +(p2*0.3989423/s2)*exp(-0.5*square((l-m2)/s2))
 +(p3*0.3989423/s3)*exp(-0.5*square((l-m3)/s3))
where s1, s2, s3, m1, m2, m3, p1 and p2 (p3=1-p1+p2) are the parameters.
Do you know your number of components (cohorts) in the mixture?
In the model above it is three. If you don't know it, try several models 
with different number of components and the AIC may let you know which 
one is the best working model. Don't use the likelihood ratio test as 
one of the p parameters is on the boundary of parameter space if the 
null is true.
Also, you should bound the parameters (m1https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] comparing length-weight regressions]

2008-03-20 Thread Rubén Roa-Ureta

[EMAIL PROTECTED] wrote:
> I'd like to compare length-weight regressions among years. Any information 
> would be appreciated.
> 
> a. gray
> fisheries consultant

Your message is rather cryptic for a general statistical audience,
whereas I'm sure in a fisheries group everybody would understand what
you want.
Use a glm with family=Gaussian(link=log) for all data sets together (in
original units) with year as a factor, then run the model again ignoring
the year factor, and compare the different fits using anova(name of your
glm objects) for a likelihood ratio test, or inspect the AICs for a
non-frequentist model selection method.
R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] analyzing binomial data with spatially correlated errors

2008-03-20 Thread Rubén Roa-Ureta

Roger Bivand wrote:
> Ben Bolker  ufl.edu> writes:
> 
>> Jean-Baptiste Ferdy  univ-montp2.fr> writes:
>>
>>> Dear R users,
>>>
>>> I want to explain binomial data by a serie of fixed effects. My 
>>> problem is that my binomial data  are spatially correlated. Naively, 
>>> I thought I could found something similar to gls to analyze such
>>> data. After some reading, I decided that lmer is probably to tool
>>> I need. The model I want to fit would look like
>>>
> (...)
>> You could *almost* use glmmPQL from the MASS package,
>> which allows you to fit any lme model structure
>> within a GLM 'wrapper', but as far as I know it wraps only lme (
>> which requires at least one random effect) and not gls.
>>
> 
> The trick used in:
> 
> Dormann, C. F., McPherson, J. M., Araujo, M. B., Bivand, R.,
> Bolliger, J., Carl, G., Davies, R. G., Hirzel, A., Jetz, W., 
> Kissling, W. D., Kühn, I., Ohlemüller, R., Peres-Neto, P. R., 
> Reineking, B., Schröder, B., Schurr, F. M. & Wilson, R. J. (2007): 
> Methods to account for spatial autocorrelation in the analysis of 
> species distributional data: a review. Ecography 30: 609–628
> 
> (see online supplement), is to add a constant term "group", and set 
> random=~1|group. The specific use with a binomial family there is for 
> a (0,1) response, rather than a two-column matrix. 
> 
>>   You could try gee or geoRglm -- neither trivially easy, I think ...
> 
> The same paper includes a GEE adaptation, but for a specific spatial
> configuration rather than a general one.
> 
> Roger Bivand
> 
>>   Ben Bolker

I suggest you also check out the package geoRglm, where you can model 
binomial and Poisson spatially correlated data. I used it to model 
spatially correlated binomial data but without covariates, i.e. without 
your fixed effects (so my model was a logistic regression with the 
intercept only) (Reference below). But I understand that you can add 
covariates and use them to estimate the non-random set of predictors. 
Here is the geoRglm webpage:
http://www.daimi.au.dk/~olefc/geoRglm/
This approach would be like tackling the problem from the point of view 
of geostatistics, rather than from mixed models. But I believe the 
likelihood-based geostatistical model is the same as a generalized 
linear mixed model where the distance is the random effect.
In SAS you can do this using the macro glimmix but from the point of 
view of generalized linear mixed models because there they have 
implemented a correlation term, so that you can identify typical spatial 
correlation functions such as Gauss and exponential, particular cases of 
the Matern family.

Rubén

Roa-Ureta, R. and E.N. Niklitschek (2007) Biomass estimation from 
surveys with likelihood-based geostatistics. ICES Journal of Marine 
Science 64:1723-1734

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading data into R

2008-01-03 Thread Rubén Roa-Ureta

BEP wrote:
> Hello all,
> 
> I am working with a very large data set into R, and I have no interest in
> reviving my SAS skills.  To do this, I will need to drop unwanted variables
> given the size of the data file.  The most common strategy seems to be
> subsetting the data after it is read into R.  Unfortunately, given the size
> of the data set, I can't get the file read and then subsquently do the
> subset procedure.  I would be appreciative of help on the following:
> 
> 1.  What are the possibilities of reading in just a small set of variables
> during the  statement (or another 'read' statement)?  That is,
> is it possible specify just the variables that I want to keep?
> 
> 2.  Can I randomly select a set of observations during the 'read' statement?
> 
> 
> I have searched various R resources for this information, so if I am simply
> overlooking a key resource on this issue, pointing that out to me would be
> greatly appreciated.
> 
> Thanks in advance.
> 
> Brian

Check this for input of specific columns from a large data matrix:
mysubsetdata<-do.call("cbind",scan(file='location and name of your 
file',what=list(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,0,NULL,0,NULL,NULL),flush=TRUE))
This will input only columns 10 and 11 into 'mysubsetdata'.
With this method you can work out the way to select random columns.
HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

91 matches

Mail list logo