Re: [R] discriminant analysis
Beatriz Yannicelli wrote: Dear all: Is it possible to conduct a discriminant analysis in R with categorical and continuous variables as predictors? Beatriz Beatriz, Simply doing this in the R console: RSiteSearch("discriminant") yields many promising links. In particular, check documentation of package mda. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] MLE for bimodal distribution
_nico_ wrote: Hello everyone, I'm trying to use mle from package stats4 to fit a bi/multi-modal distribution to some data, but I have some problems with it. Here's what I'm doing (for a bimodal distribution): # Build some fake binormally distributed data, the procedure fails also with real data, so the problem isn't here data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3)) # Just to check it's bimodal plot(density(data)) f = function(m, s, m2, s2, w) { -log( sum(w*dnorm(data, mean=m, sd=s)) + sum((1-w)*dnorm(data, mean=m2, sd=s2)) ) } res = mle(f, start=list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6)) This gives an error: Error in optim(start, f, method = method, hessian = TRUE, ...) : non-finite finite-difference value [2] In addition: There were 50 or more warnings (use warnings() to see the first 50) And the warnings are about dnorm producing NaNs So, my questions are: 1) What does "non-finite finite-difference value" mean? My guess would be that an Inf was passed somewhere where a finite number was expected...? I'm not sure how optim works though... 2) Is the log likelihood function I wrote correct? Can I use the same type of function for 3 modes? I think it is correct but it is difficult to fit as others have pointed out. As an alternative, you may discretise your variable so the mixture is fit to counts on a histogram using a multinomial likelihood. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in the NY Times
Zaslavsky, Alan M. wrote: This article is accompanied by nice pictures of Robert and Ross. Data Analysts Captivated by Power of R http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html Thanks for the heads up. The R morale is going through the roof! I've given three courses on R since the second half of 2007 here in Chile (geostatistics, Fisheries Libraries for R, and generalized linear models) and all my three audiences (professionals working in academia, government, and private research institutions) were very much impressed by the power of R. I spent as much time on R itself as on the statistical topics, since students wanted to learn data management and graphics once they started to grasp the basic elements. R creators, Core Team, package creators and maintainers, and experts on the list, thanks so much for such a great work and such an open attitude. You lead by example. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AD Model Builder now freely available
dave fournier wrote: Hi All, Following Mike Praeger's posting on this list, I'm happy to pass on that AD Model Builder is now freely available from the ADMB Foundation. http://admb-foundation.org/ Two areas where AD Model builder would be especially useful to R users are multi-parmater smooth optimization as in larg scale maximum likelihood estimation and the analysis of general nonlinear random effects models. Cheers, Dave Thank you Dave, this is really great news! So now I can download ADMB and ADMB-RE for free? And use Mike's X2R package to connect my ADMB scripts with the power of R. Only one word: awesome! Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting data to a sigmoidal curve
Greg Snow wrote: Sarah, Doing: RSiteSearch('gompertz', restrict='functions') At the command prompt gives several promising results. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. And you can also do: nobs <- length(data$salam.size.observed) fn<-function(p){ salam.size.model <- salam.size.inf*exp(-G1*exp(-G2*data$age.observed)); # Gompertz model squdiff<- (salam.size.model-data$salam.size.observed)^2; #vector of squared differences sigmasqu<- sum(squdiff)/nobs; # nuisance sigma parameter profiled out minloglik<- (nobs/2)*log(sigmasqu); #profile likelihood as approx. to true likelihood } (salam.size.likfit <- nlm(fn,p=c(init. val. 4 salam.size.inf, init. val. 4 G1, init. val 4 G2), hessian=TRUE)) where data is a dataframe with observed sizes and ages. Invert Hessian to obtain measures of precision. Note also that if you know the size at birth then you can re-parameterize, put size at birth instead of asymptotic size (salam.size.inf), and reduce model complexity to two parameters (but of course the Gompertz formula changes). If the Gompertz model is not flexible enough, note that it is a particular case of another more flexible, and more complex model, Schnute's, which I have found often provides a better explanation of growth data (as measured by the AIC), especially if you have sizes of very young individuals. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting a modified logistic with glm?
Mike Lawrence wrote: On Sat, Nov 8, 2008 at 3:59 PM, Rubén Roa-Ureta <[EMAIL PROTECTED]> wrote: ... The fit is for grouped data. ... As illustrated in my example code, I'm not dealing with data that can be grouped (x is a continuous random variable). Four points: 1) I've showed you an approach that you can follow in order to fit a modified (custom-made) logistic model, it's up to you to follow this approach and modify it accordingly, it was never my intention to give you the precise code for your problem (obviously), 2) If you do follow this approach and you keep your predictor continuous, you have to change the line of the likelihood definition, 3) If it is really true that your predictor is a *random* variable, then you have not a simple glmn (you may have a glmm), and 4) You can always make your continuous predictor categorical, for example, as when you make a histogram of said predictor. R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fitting a modified logistic with glm?
Mike Lawrence wrote: Hi all, Where f(x) is a logistic function, I have data that follow: g(x) = f(x)*.5 + .5 How would you suggest I modify the standard glm(..., family='binomial') function to fit this? Here's an example of a clearly ill-advised attempt to simply use the standard glm(..., family='binomial') approach: # First generate some data #define the scale and location of the modified logistic to be fit location = 20 scale = 30 # choose some x values x = runif(200,-200,200) # generate some random noise to add to x in order to # simulate real-word measurement and avoid perfect fits x.noise = runif(length(x),-10,10) # define the probability of success for each x given the modified logistic prob.success = plogis(x+x.noise,location,scale)*.5 + .5 # obtain y, the observed success/failure at each x y = rep(NA,length(x)) for(i in 1:length(x)){ y[i] = sample( x = c(1,0) , size = 1 , prob = c(prob.success[i], 1-prob.success[i]) ) } #show the data and the source modified logistic plot(x,y) curve(plogis(x,location,scale)*.5 + .5,add=T) # Now try to fit the data #fit using standard glm and plot the result fit = glm(y~x,family='binomial') curve(plogis(fit$coefficients[1]+x*fit$coefficients[2])*.5+.5,add=T,col='red',lty=2) # It's clear that it's inappropriate to use the standard "glm(y~x,family='binomial')" # method to fit the modified logistic data, but what is the alternative? A custom-made fit function using nlm? Below, in the second line the logistic model has been re-parameterized to include p[1]=level of the predictor which predicts a 50% probability of success, and p[2]=level of the predictor which predicts a 95% probability of success. You can change the model in this line to your version. In the 4th line (negative log-likelihood) mat is a vector of 0s and 1s representing success and imat is a vector of 1s and 0s representing failure. The fit is for grouped data. fn <- function(p){ prop.est <- 1/(1+exp(log(1/19)*(l-p[1])/(p[2]-p[1]))); # estimated proportion of success iprop.est <- 1-prop.est;# estimated proportion of failure negloglik <- -sum(count*(mat*log(prop.est)+imat*log(iprop.est))); #binomial likelihood, prob. model } prop.lik <- nlm(fn,p=c(25.8,33.9),hessian=TRUE) HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] descretizing xy data
Rubén Roa-Ureta wrote: Jon A wrote: Hello, I have a dataset with a continuous independent variable (fish length, range: 30-150 mm) and a binary response (foraging success, 0 or 1). I want to discretize fish length into 5 mm bins and give the proportion of individuals who successfully foraged in each each size bin. I have used the cut function to discretize the length values into my desired bins, but I can't figure out how to manipulate my response data in terms of the levels I've created. Any advice on how to achieve my task? Thanks in advance. You have the option of using catspec. Here is another, more transparent solution, using hist(). lb <- 30 ub <- 150 bk <- 5 x <- data.frame(cbind(runif(1000,lb,ub),rbinom(1000,1,0.75))) x$X3 <- cut(x$X1,breaks=(ub-lb)/bk,labels=FALSE) y <- data.frame(cbind(hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks[-1],hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$counts,0)) for (i in 1:length(y$X1)) { for (j in 1:length(x$X1)) { if(identical(x$X3[j],i)) y$X3[i] <- y$X3[i]+x$X2[j] } } sum(x$X2) #check that counting was correct sum(y$X3) y$X4 <- ifelse(y$X3 > 0,y$X3/y$X2,NA) plot(y$X1,y$X4,pch=19) abline(v=hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks) BTW, if you add the line below, text(x=y$X1,y=y$X4+.01,format(y$X2),cex=.5) you show the sample size at each proportion. R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] descretizing xy data
Jon A wrote: Hello, I have a dataset with a continuous independent variable (fish length, range: 30-150 mm) and a binary response (foraging success, 0 or 1). I want to discretize fish length into 5 mm bins and give the proportion of individuals who successfully foraged in each each size bin. I have used the cut function to discretize the length values into my desired bins, but I can't figure out how to manipulate my response data in terms of the levels I've created. Any advice on how to achieve my task? Thanks in advance. You have the option of using catspec. Here is another, more transparent solution, using hist(). lb <- 30 ub <- 150 bk <- 5 x <- data.frame(cbind(runif(1000,lb,ub),rbinom(1000,1,0.75))) x$X3 <- cut(x$X1,breaks=(ub-lb)/bk,labels=FALSE) y <- data.frame(cbind(hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks[-1],hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$counts,0)) for (i in 1:length(y$X1)) { for (j in 1:length(x$X1)) { if(identical(x$X3[j],i)) y$X3[i] <- y$X3[i]+x$X2[j] } } sum(x$X2) #check that counting was correct sum(y$X3) y$X4 <- ifelse(y$X3 > 0,y$X3/y$X2,NA) plot(y$X1,y$X4,pch=19) abline(v=hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks) HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Distributions Comparison
Igor Telezhinsky wrote: Dear all, I have recently found out about the R project. Could anyone tell me if it is possible to make the comparison of two distributions using R packages? The problem is TRICKY because I have the distributions of measurements and each measurement in the given distribution has its own error, which I need to take into account when comparing these distributions. If anyone knows it is possible with R, could you please tell me what package to use. I will really appreciate some hints in code as well. Thank you. Dear Igor, Welcome to the list and to R. Please read the posting guide and study your problem b4 attempting to seek help from us. Your question as it stands is impossibly ambiguous. Try to make a toy example or formulate it better and you may have better luck. Ruben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot the chi square distribution and the histogram in the same graph
leo_wa wrote: In the previous post ,i ask how to plot the normal curve and the histogram in the same graph.if i want to know how to plot the chi square distribution to campare the data set ,how can i do that? You should make up your mind, is your random variable X (-inf,+inf) or Sum(X^2) (which obviously cann't take negative values)? They are quite different. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to plot the histogram and the curve in the same graph
leo_wa wrote: i want to plot the histogram and the curve in the same graph.if i have a set of data ,i plot the histogram and also want to see what distribution it was.So i want to plot the curve to know what distribution it like. To draw the curve and the distribution you should have an idea about the distribution. You cann't just draw the histogram and expect R to make a curve of the best distribution to fit that histogram. But you can plot a curve of a kernel density. x <- rnorm(1000,5,3) library(MASS) (x.normfit <- fitdistr(x,"normal")) hist(x,prob=TRUE) lines(density(x,na.rm=TRUE),col="red") # kernel density curve(dnorm(x,mean= x.normfit$estimate[1],sd= x.normfit$estimate[2]),col="blue",add=TRUE) #maximum likelihood estimate Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] distributions and glm
drbn wrote: Hello, I have seen that some papers do this: 1.) Group data by year (e.g. 35 years) 2.) Estimate the mean of the key variable through the distribution that fits better (some years is a normal distribution , others is a more skewed, gamma distribution, etc.) 3.) With these estimated means of each year do a GLM. I'd like to know if it is possible (to use these means in a GLM) or is a wrong idea. Thanks in advance David David, You can model functions of data, such as means, but you must be careful to carry over most of the uncertainty in the original data into the model. If you don't, for example if you let the model know only the values of the means, then you are actually assuming that these means were observed with absolute certainty instead of being estimated from the data. To carry over the uncertainty in the original data to your modeling you can use a Bayesian approach or you can use a marginal likelihood approach. A marginal likelihood is a true likelihood function not of the data, but of functions of the data, such as of maximum likelihood estimates. If your means per year were estimated using maximum likelihood (for example with fitdistr in package MASS) and you sample size is not too small then you can use a normal marginal likelihood model for the means. Note however that each mean may come from a different distribution so the full likelihood model for your data would be a mixture of normal distributions. You may not be able to use the pre-built glm function so you may face the challenge to write your own code. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question about glm using R
Jordi Garcia wrote: Good morning, I am using R to try to model the proportion of burned area in Portugal. The dependent variable is the proportion. The family used is binomial and the epsilon would be binary. I am not able to find the package to be used when the proportion (%) has to be used in glm. Could someone help me? I am using normal commands of glm.. for example: glm_5<- glm(formula=p~Precipitation, family=binomial(link=logit), data=dados) where p is the proportion of burned area, but this error message apperars: Warning message: non-integer #successes in a binomial glm! in: eval(expr, envir, enclos) That is why I think I am not using the proper glm package. Thank you very much in advance. Jordi Jordi, Your statistical model is wrong. The binomial family if four counts data (counts of successes given n trials), not for proportions. To model proportions, your family is the Beta family. I've modeled proportion response variables with function betareg of package betareg. If you want my example applications I can send you code and data off list. Reference: Ferrari and Cribari-Neto. 2004. Beta regression for modelling rates and proportions. Journal of Applied Statistics 31:799-815. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] plot - central limit theorem
Jörg Groß wrote: Hi, Is there a way to simulate a population with R and pull out m samples, each with n values for calculating m means? I need that kind of data to plot a graphic, demonstrating the central limit theorem and I don't know how to begin. So, perhaps someone can give me some tips and hints how to start and which functions to use. thanks for any help, joerg x <- runif(1,0,1) y <- runif(1,0,1) z <- runif(1,0,1) u <- runif(1,0,1) v <- runif(1,0,1) w <- runif(1,0,1) t <- x+y+z+u+v+w hist(t,breaks=100,xlab="sum of i.i.d r.v with finite mean and variance",ylab="Frequency",main="") Change runif for the corresponding function of the distribution of your choice. Not exactly what you wanted but it does verify the C.L.T. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R graph with values incorporated
Prasanth wrote: Dear All: Greetings! By the way, is it possible to have a graph (say line graph) that shows "values" as well (say y-axis values within the graph)? One could do it in excel. I am just wondering whether it is possible with R! x <- rnorm(100,2,3) y <- rnorm(100,2,3) plot(x,y,pch=19) text(x=x,y=y+.5,format(x,digits=1),cex=.5) [[alternative HTML version deleted]] <-- Read the posting guide. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] draw a function over a histogram
Daniela Garavaglia wrote: Sorry, I have some troubles with the graph device. How can I draw a function over a histogram? Thank's so much. Daniela, What function? Here is one example using density() and using dnorm() x <- rnorm(1000,2,2) hist(x,prob=TRUE) lines(density(x,na.rm=TRUE),col="red") library(MASS) y <- fitdistr(x,"normal") curve(dnorm(x,mean=y$estimate[1],sd=y$estimate[2]),add=TRUE) HTH R. [[alternative HTML version deleted]] <--- What about this?? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coordinate systems for geostatistics in R
imicola wrote: Hi, I read somewhere that when carrying out geostatistical analysis in R you should not use latitude and longitude...can anyone expand on this a little for me, and what would be the best coordinate system to use? Not only in R. In most systems, the inter-point distances are assumed to be planar (distances over an Euclidean space), whereas latitude and longitude are spherical. I guess there could be a geostatistical analysis based on spherical distances, but why to make things more complicated when projecting the spherical coordinates into planar coordinates b4 the analysis produces a good approximation and simplifies the analysis significantly? I use UTM coordinates, and transform from geodetic to metric with Eino Uikkanen's GeoConv program (it's free). I have my data in a geographic coordinate system, WGS84, decimal degreesis this the wrong format for such analyses? If the distances are short, it is not so wrong, and the wrongness increases with increasing distance. I have also converted my data in the UTM projection and so have it in metres(ranging from 480,000 to 550,000 E and 170,000 to 230,000 N). If I was to use the UTM coordinates, should I be using the actual coordinates in metres, or should I convert this into an arbitrary coordinate system (i.e. from 0 - 1) somehow? It would be convenient to use km rather than m, so the range parameter would be closer in magnitude to the other parameters of the model. A very large range parameter in metres may cause numerical instability during minimization of the negative support function in likelihood-based models such as that implemented in geoR. I have noticed that running an analysis on the data gives different results depending on which type of system you use, so I want to make sure I have the correct system. I should also probably note that I am a geostatistical novice! Thanks, Bottomline, your geostatistical software is probably based on distance calculations on an Euclidean space so it is wrong to input locations in spherical coordinates. HTH Ruben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Geodata object border
imicola wrote: Sorry, this is probably quite an easy question, but I'm new to R and couldn't find the answer anywhere. I'm using geoR and geoRglm, but can't figure out how to get a border in my geodata object. Does this need to be defined when I'm importing my data, or afterwards, and how do I go about doing this? Thanks You can define the border previously and import it to R with read.table or read.csv or you can define it from your geodata object in R. In the first case don't forget to close the polygon by repeating the first vertex at the end of the file. In the second case you can use a convex hull function in R, such as chull(), to define the polygon with your geodata object. Say your geodata object is called my.geo, then my.geo$coords would be the coordinates. Then try this, bor <- chull(my.geo$coords) bor <- c(bor, bor[1]) # close polygon by appending the first vertex plot(my.geo$coords) lines(my.geo$coords[pol,]) HTH Ruben __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] geoR : Passing arguments to "optim" when using "likfit"]
Mzabalazo Ngwenya wrote: Hi everyone ! I'm am trying to fit a kriging model to a set of data. When I just run the "likfit" command I can obtain the results. However when I try to pass additional arguements to the optimization function "optim" I get errors. That is I want to obtain the hessian matrix so matrix (hessian=TRUE). Heres a little example( 1-D). Can anyone shed some light? Where am I going wrong ? --- obs.points <-matrix(c(0.1,0,0.367,0,0.634,0,0.901,0),byrow=TRUE,nrow=4) MM1.fx <-MM1(obs.points[,1]) MM1.fx [1] 0.111 0.5789475 1.7272732 9.100 # Creating geoR object MM1.data <-as.geodata(cbind(obs.points,MM1.fx)) MM1.data $coords [1,] 0.100 0 [2,] 0.367 0 [3,] 0.634 0 [4,] 0.901 0 $data [1] 0.111 0.5789475 1.7272732 9.100 attr(,"class") [1] "geodata" # Starting values for MLE sta.vals =expand.grid(seq(1,20),seq(1,20)) # Maximum likelihood estimation of covariance parameters HERE IS THE PROBLEM MM1.fit <-likfit(MM1.data,ini.cov.pars=sta.vals,fix.nugget=TRUE,cov.model="gaussian",optim(hessian=TRUE)) MM1.fit <-likfit(MM1.data,ini.cov.pars=sta.vals,fix.nugget=TRUE,cov.model="gaussian",hessian=TRUE) MM1.fit$info.minimisation.function$hessian For the observed Fisher information: solve(MM1.fit$info.minimisatrion.function$hessian) HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Solving for parameter values at likelihood points
ComRades, On Oct 16, 2006, I posted a question regarding how to find the parameter values that made the likelihood function take a value equal to 1/8th of the maximum likelihood value. There was no reply but I found a solution and would like to know if there is a better solution using the function uniroot(). I think with uniroot() I can get better parameter values at these bounds. This is my solution, using the function nearest() from package GenKern. The example concerns the marginal likelihood for the normal variance parameter when the mean is a nuisance parameter (see section 7.3 in Royall, 1997, Statistical evidence. A likelihood paradigm, Chapman & Hall). Rubén y<-c(-2.250357,10.723614,7.238959,9.179939,5.831603,9.301580,4.019388,9.777280,5.587761,3.600276,9.753381,5.154928,2.776350,9.940350,14.185712,4.281198,10.994180,9.333027,13.082535,2.067826,10.772551,-3.811529,5.446021,-5.333287,1.421544) n <- length(y) s.sq <- (1/(n-1))*sum((y-mean(y))^2) sigma.sq <- seq(1,100,0.1) Lik.M <- sigma.sq^(-(n-1)/2)*exp(-(n-1)*s.sq/(2*sigma.sq)) Rel.Lik.M <- Lik.M/max(Lik.M) plot(sigma.sq,Rel.Lik.M,type="l") abline(h=1/8) arrows(y0=0.4,y1=0.18,x0=0,x1=12) arrows(y0=0.4,y1=0.18,x0=62,x1=50) library(GenKern) cbind(sigma.sq,Rel.Lik.M)[Rel.Lik.M==1] #maximum likelihood estimate max.idx <- nearest(Rel.Lik.M,1,outside=TRUE) # vector index value at the maximum likelihood #finding the parameter value at 1/8th the max. likelihood to the left of the maximum cbind(sigma.sq[1:max.idx],Rel.Lik.M[1:max.idx])[nearest(Rel.Lik.M[1:max.idx],0.125),] #finding the parameter value at 1/8th the max. likelihood to the right of the maximum cbind(sigma.sq[max.idx:length(sigma.sq)],Rel.Lik.M[max.idx:length(sigma.sq)])[nearest(Rel.Lik.M[max.idx:length(sigma.sq)],0.125),] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] deleting variables
Richard Pearson wrote: ?rm Richard Ralf Goertz wrote: How can I automatically exclude one variable from being saved in the workspace when I quit an R session? The problem is I don't know how to erase a variable once it has been created. [...] Ralf More on the use of rm. If you want to delete many variables in one go, x<-runif(10) y<-runif(10) z<-runif(10) ls() #you check all the objects in your workspace #[1] "x" "y" "z" rm(list=your.list <- ls()[2:3]) #you selected to delete all those objects whose indices are between 2 and 3. rm("your.list") #remove the temporary list with variable names ls() [1] "x" HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R Newbie Question/Data Model
[EMAIL PROTECTED] wrote: > Given a data set and a set of predictors and a response in the data, > we would like to find a model that fits the data set best. > Suppose that we do not know what kind of model (linear, polynomial > regression,... ) might be good, we are wondering if there is R-package(s) > can auctomatically do this. > Otherwise, can you direct me, or point out reference(s), > basic steps to do this. Thanks. > > -james The best-fitting model for any data is a model with a lot of parameters, so maybe the best fitting model for any data is a model with an infinite number of parameters. However, any model with more parameters than data will have a negative number of degrees of freedom, and you do not want that. The best-fitting model for any data subject to the constraint that the number of degrees of freedom is non-negative, is the data itself, with zero degrees of freedom. The AIC tells you this too. The AIC for the model formed by the data itsel is 2n, whereas the AIC for any model with negative degrees of freedom is > 2n. But I guess you want to make inference from sample to population. If that is indeed the case, then you should consider changing your focus from finding "a model that fits the data set best" to a model that best summarizes the information contained in your sample about the population the sample comes from. To do that, start by defining the nature of your response variable. What is the nature of the natural process generating this response variable? Is it continuous or discrete? Is it univariate or multivariate? Can it take negative and positive values? Can it take values of zero? After you have clarified the probabilistic model for the response variable, then you can start thinking about the mathematical relation between the response variable and the predictors. Is it linear or nonlinear? Are the predictors categorical or continuous? Read the posting guide, formulate a clear question, and maybe you will be given more specific help. Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Modeling presence only data in R
milton ruser wrote: > Dear All, > > I have a set of environental maps and presence-only points for some species. > How can I generate distributions models on R using these presence-only data? > What packages and functions can I use to do so? > > Kind regards, > > Miltinho > Brazil If you have location data for each absence/presence (such as linear location data, easting and northing) consider using geoRglm. Christensen, O. F., and Ribeiro, P. J. 2002. geoRglm: a package for generalized linear spatial models. R-NEWS, 2: 26–28. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] finding an unknown distribution
andrea previtali wrote: > Hi, > I need to analyze the influences of several factors on a variable that is a > measure of fecundity, consisting of 73 observations ranging from 0 to 5. The > variable is continuous and highly positive skewed, none of the typical > transformations was able to normalize the data. Thus, I was thinking in > analyzing these data using a generalized linear model where I > can specify a distribution other than normal. I'm thinking it may fit a > gamma or exponential distribution. But I'm not sure if the data meets > the assumptions of those distributions because their definitions are > too complex for my understanding! Roughly, the exponential distribution is the model of a random variable describing the time/distance between two independent events that occur at the same constant rate. The gamma distribution is the model of a random variable that can be thought of as the sum of exponential random variables. I don't think fecundity data, the count of reproductive cells, qualifies as a random variable to be modeled by either of these distributions. If the count of reproductive cells is very large, and you are modeling this count as a function of animal size, such as length, you should consider the lognormal distribution, since the count of cells grow multiplicatively (volumetrically) with the increase in length. In that case you can model your response variable using glm with family=gaussian(link="log"). Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory issues
I think any geostatistical program/R package would have trouble handling 12000 observations on a PC. The empirical variogram would be built with the combinations of 12000 over 2 pairs, nearly 72 millions pairs, and during kriging, if you didn't restrict the search neighbourhood, interpolation would involve solving something very big, more so if you defined a fine grid, ... Try restricting the search neighbourhood, if you didn't, with maxdist and nmax. Rubén Prof Brian Ripley wrote: > I think the clue is that the message you quote comes from gstat, which > does not use R's memory allocator. It is gstat and not R that has failed > to allocate memory. > > Try re-reading the help page for memory.size. 'max=T' does not indicate > the limit (that is the job of memory.limit()), but the maximum 'used' > (acquired from the OS) in that session. So 19Mb looks reasonable for R's > usage. > > I don't understand the message from memory.limit() (and the formatting is > odd). memory.limit() does not call max() (it is entirely internal), so I > wonder if that really is the output from that command. (If you can > reproduce it, please let us have precise reproduction instructions.) > > There isn't much point in increasing the memory limit from the default > 1.5Gb on a 2Gb XP machine. The problem is that the user address space > limit is 2Gb and fragmentation means that you are unlikely to be able to > get over 1.5Gb unless you have very many small objects, in which case R > will run very slowly. In any case, that is not the issue here. > > On Wed, 16 Apr 2008, Dave Depew wrote: > >> Hi all, >> I've read the R for windows FAQ and am a little confused re: >> memory.limit and memory.size >> >> to start using R 2.6.2 on WinXP, 2GB RAM, I have the command line "sdi >> --max-mem-size=2047M" >> Once the Rgui is open, memory.limit() returns 2047, memory.size() >> returns 11.315, and memory.size(max=T) returns 19.615 >> >> Shouldn't memory.size(max=T) return 2047? >> >> Upon running several operations involving kriging (gstat package, >> original data file 3 variables, 12000 observations) >> the program runs out of memory >> >> "memory.c", line 57: can't allocate memory in function m_get() >> Error in predict.gstat(fit.uk, newdata = EcoSAV.grid.clip.spxdf, >> debug.level = -2, : >> m_get >> >> Immediately following this, >> >> memory.limit() returns [1] -Inf >>Warning message: >>In memory.limit() : no non-missing arguments >> to max; returning -Inf >> >> memory.size() returns 24.573. >> >> memory.size(max=T) returns 46.75 >> >> To my untrained eye, it appears that R is not being allowed access to >> the full memory limit specified in the cmd lineif this is the case, >> how does one ensure that R is getting access to the full allotment of RAM? >> Any insight is appreciated... >> >> >>> sessionInfo() >> R version 2.6.2 (2008-02-08) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices datasets tcltk utils >> methods base >> >> other attached packages: >> [1] maptools_0.7-7 foreign_0.8-23 gstat_0.9-43 rgdal_0.5-24 >> lattice_0.17-4 sp_0.9-23 svSocket_0.9-5 svIO_0.9-5 >> R2HTML_1.58svMisc_0.9-5 svIDE_0.9-5 >> >> loaded via a namespace (and not attached): >> [1] grid_2.6.2 tools_2.6.2 >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Confidence intervals of log transformed data
tom soyer wrote: > Hi > > I have a general statistics question on calculating confidence interval of > log transformed data. > > I log transformed both x and y, regressed the transformed y on transformed > x: lm(log(y)~log(x)), and I get the following relationship: > > log(y) = alpha + beta * log(x) with se as the standard error of residuals > > My question is how do I calculate the confidence interval in the original > scale of x and y? Should I use [...] Confidence interval for the mean of Y? If that is the case, when you transformed Y to logY and run a regression assuming normal deviates you were in fact assuming that Y distributes lognormally. Your interval must be assymetric, reflecting the shape of the lognormal. The lognormal mean is lambda=exp(mu + 0.5*sigma^2), where mu and sigma^2 are the parameters of the normal variate logY. A confidence interval for lambda is Lower Bound=exp(mean(logY)+0.5*var(logY)+sd(logY)*H_alpha/sqrt(n-1)) Upper Bound=exp(mean(logY)+0.5*var(logY)+sd(logY)*H_(1-alpha)/sqrt(n-1)) where the quantiles H_alpha and H_(1-alpha) are quantiles of the distribution of linear combinations of the normal mean and variance (Land, 1971, Ann. Math. Stat. 42:1187-1205, and Land, 1975, Sel. Tables Math. Stat. 3:385-419). Alternatively, you can model directly Y=p1*X^p2, p1=exp(your alpha), p1=beta with a lognormal likelihood and predict the mean of Y with the fitted model (I'm guessing here). It could be useful to check Crow and Shimizu, Lognormal distributions. Theory and practice, 1988, Dekker, NY. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] mixture distribution analisys
Antonio Olinto wrote: > Hello, > > I'm analyzing some fish length-frequency data to access relative age and > growth > information and I would like to do something different from FiSAT / ELEFAN > analysis. > > I found a package named MIX in http://www.math.mcmaster.ca/peter/mix/mix.html > but it's compiled for R 2.0.0 > > So I have two questions: > Is there any problem to run this package in R 2.7.0? - probably yes … > And, is there any other package in R that can be used to analyze and to > separate > mixture distributions? > > Thanks for any help. Best regards, > > Antonio Olinto > Sao Paulo Fisheries Institute > Brazil This problem is not too difficult. Maybe you can write your own code. Say, you enter the number of fish you measured, n, and a two-column dataframe with the mid point of the length class in the first column (call it l) and the observed relative frequency of each length class in the second column (call it obs_prob). Then using the multinomial likelihood for the number of fish in each length class as the random variable (the approach in Peter Macdonald's Mix), minimize log_likelihood=-n*sum(elementwise product(obs_prob,log(pred_prob))) where pred_prob=(p1*0.3989423/s1)*exp(-0.5*square((l-m1)/s1)) +(p2*0.3989423/s2)*exp(-0.5*square((l-m2)/s2)) +(p3*0.3989423/s3)*exp(-0.5*square((l-m3)/s3)) where s1, s2, s3, m1, m2, m3, p1 and p2 (p3=1-p1+p2) are the parameters. Do you know your number of components (cohorts) in the mixture? In the model above it is three. If you don't know it, try several models with different number of components and the AIC may let you know which one is the best working model. Don't use the likelihood ratio test as one of the p parameters is on the boundary of parameter space if the null is true. Also, you should bound the parameters (m1https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing length-weight regressions]
[EMAIL PROTECTED] wrote: > I'd like to compare length-weight regressions among years. Any information > would be appreciated. > > a. gray > fisheries consultant Your message is rather cryptic for a general statistical audience, whereas I'm sure in a fisheries group everybody would understand what you want. Use a glm with family=Gaussian(link=log) for all data sets together (in original units) with year as a factor, then run the model again ignoring the year factor, and compare the different fits using anova(name of your glm objects) for a likelihood ratio test, or inspect the AICs for a non-frequentist model selection method. R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] analyzing binomial data with spatially correlated errors
Roger Bivand wrote: > Ben Bolker ufl.edu> writes: > >> Jean-Baptiste Ferdy univ-montp2.fr> writes: >> >>> Dear R users, >>> >>> I want to explain binomial data by a serie of fixed effects. My >>> problem is that my binomial data are spatially correlated. Naively, >>> I thought I could found something similar to gls to analyze such >>> data. After some reading, I decided that lmer is probably to tool >>> I need. The model I want to fit would look like >>> > (...) >> You could *almost* use glmmPQL from the MASS package, >> which allows you to fit any lme model structure >> within a GLM 'wrapper', but as far as I know it wraps only lme ( >> which requires at least one random effect) and not gls. >> > > The trick used in: > > Dormann, C. F., McPherson, J. M., Araujo, M. B., Bivand, R., > Bolliger, J., Carl, G., Davies, R. G., Hirzel, A., Jetz, W., > Kissling, W. D., Kühn, I., Ohlemüller, R., Peres-Neto, P. R., > Reineking, B., Schröder, B., Schurr, F. M. & Wilson, R. J. (2007): > Methods to account for spatial autocorrelation in the analysis of > species distributional data: a review. Ecography 30: 609–628 > > (see online supplement), is to add a constant term "group", and set > random=~1|group. The specific use with a binomial family there is for > a (0,1) response, rather than a two-column matrix. > >> You could try gee or geoRglm -- neither trivially easy, I think ... > > The same paper includes a GEE adaptation, but for a specific spatial > configuration rather than a general one. > > Roger Bivand > >> Ben Bolker I suggest you also check out the package geoRglm, where you can model binomial and Poisson spatially correlated data. I used it to model spatially correlated binomial data but without covariates, i.e. without your fixed effects (so my model was a logistic regression with the intercept only) (Reference below). But I understand that you can add covariates and use them to estimate the non-random set of predictors. Here is the geoRglm webpage: http://www.daimi.au.dk/~olefc/geoRglm/ This approach would be like tackling the problem from the point of view of geostatistics, rather than from mixed models. But I believe the likelihood-based geostatistical model is the same as a generalized linear mixed model where the distance is the random effect. In SAS you can do this using the macro glimmix but from the point of view of generalized linear mixed models because there they have implemented a correlation term, so that you can identify typical spatial correlation functions such as Gauss and exponential, particular cases of the Matern family. Rubén Roa-Ureta, R. and E.N. Niklitschek (2007) Biomass estimation from surveys with likelihood-based geostatistics. ICES Journal of Marine Science 64:1723-1734 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading data into R
BEP wrote: > Hello all, > > I am working with a very large data set into R, and I have no interest in > reviving my SAS skills. To do this, I will need to drop unwanted variables > given the size of the data file. The most common strategy seems to be > subsetting the data after it is read into R. Unfortunately, given the size > of the data set, I can't get the file read and then subsquently do the > subset procedure. I would be appreciative of help on the following: > > 1. What are the possibilities of reading in just a small set of variables > during the statement (or another 'read' statement)? That is, > is it possible specify just the variables that I want to keep? > > 2. Can I randomly select a set of observations during the 'read' statement? > > > I have searched various R resources for this information, so if I am simply > overlooking a key resource on this issue, pointing that out to me would be > greatly appreciated. > > Thanks in advance. > > Brian Check this for input of specific columns from a large data matrix: mysubsetdata<-do.call("cbind",scan(file='location and name of your file',what=list(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,0,NULL,0,NULL,NULL),flush=TRUE)) This will input only columns 10 and 11 into 'mysubsetdata'. With this method you can work out the way to select random columns. HTH Rubén __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.