Re: [R] discriminant analysis

2009-08-24 Thread Rubén Roa Ureta

Beatriz Yannicelli wrote:

Dear all:

Is it possible to conduct a discriminant analysis in R with categorical and
continuous variables as predictors?

Beatriz
  

Beatriz,
Simply doing this in the R console:
RSiteSearch("discriminant")
yields many promising links. In particular, check documentation of 
package mda.

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MLE for bimodal distribution

2009-04-08 Thread Rubén Roa-Ureta

_nico_ wrote:

Hello everyone,

I'm trying to use mle from package stats4 to fit a bi/multi-modal
distribution to some data, but I have some problems with it.
Here's what I'm doing (for a bimodal distribution):

# Build some fake binormally distributed data, the procedure fails also with
real data, so the problem isn't here
data = c(rnorm(1000, 3, 0.5), rnorm(500, 5, 0.3))
# Just to check it's bimodal
plot(density(data))
f = function(m, s, m2, s2, w)
{
-log( sum(w*dnorm(data, mean=m, sd=s)) + sum((1-w)*dnorm(data, mean=m2,
sd=s2)) )
}

res = mle(f, start=list("m"=3, "s"=0.5, "m2"=5, "s2"=0.35, "w"=0.6))

This gives an error:
Error in optim(start, f, method = method, hessian = TRUE, ...) : 
  non-finite finite-difference value [2]

In addition: There were 50 or more warnings (use warnings() to see the first
50)
And the warnings are about dnorm producing NaNs

So, my questions are:
1) What does "non-finite finite-difference value" mean? My guess would be
that an Inf was passed somewhere where a finite number was expected...? I'm
not sure how optim works though...
2) Is the log likelihood function I wrote correct? Can I use the same type
of function for 3 modes?
  

I think it is correct but it is difficult to fit as others have pointed out.
As an alternative, you may discretise your variable so the mixture is 
fit to counts on a histogram using a multinomial likelihood.

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R in the NY Times

2009-01-07 Thread Rubén Roa-Ureta

Zaslavsky, Alan M. wrote:

This article is accompanied by nice pictures of Robert and Ross.

Data Analysts Captivated by Power of R
  http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html
  

Thanks for the heads up. The R morale is going through the roof!
I've given three courses on R since the second half of 2007 here in 
Chile (geostatistics, Fisheries Libraries for R, and generalized linear 
models) and all my three audiences (professionals working in academia, 
government, and private research institutions) were very much impressed 
by the power of R. I spent as much time on R itself as on the 
statistical topics, since students wanted to learn data management and 
graphics once they started to grasp the basic elements.
R creators, Core Team, package creators and maintainers, and experts on 
the list, thanks so much for such a great work and such an open 
attitude. You lead by example.

Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] AD Model Builder now freely available

2008-11-25 Thread Rubén Roa-Ureta

dave fournier wrote:

Hi All,

Following Mike Praeger's posting on this list,
I'm happy to pass on that AD Model Builder is now freely available from
the ADMB Foundation.

 http://admb-foundation.org/

Two areas where AD Model builder would be especially useful to R users
are multi-parmater smooth optimization as in larg scale maximum
likelihood estimation and the analysis of general nonlinear random
effects models.

Cheers,

 Dave
  

Thank you Dave, this is really great news!
So now I can download ADMB and ADMB-RE for free?
And use Mike's X2R package to connect my ADMB scripts with the power of R.
Only one word: awesome!
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting data to a sigmoidal curve

2008-11-12 Thread Rubén Roa-Ureta

Greg Snow wrote:

Sarah,

Doing:

  

RSiteSearch('gompertz', restrict='functions')



At the command prompt gives several promising results.

Hope this helps,

--
Gregory (Greg) L. Snow Ph.D.

  

And you can also do:

nobs <- length(data$salam.size.observed)
fn<-function(p){
  salam.size.model <- 
salam.size.inf*exp(-G1*exp(-G2*data$age.observed)); # Gompertz model
  squdiff<- 
(salam.size.model-data$salam.size.observed)^2; #vector of squared 
differences
  sigmasqu<- sum(squdiff)/nobs; # nuisance sigma parameter 
profiled out
  minloglik<- (nobs/2)*log(sigmasqu); #profile likelihood 
as approx. to true likelihood

  }
(salam.size.likfit <- nlm(fn,p=c(init. val. 4 salam.size.inf, init. val. 
4 G1, init. val 4 G2), hessian=TRUE))


where data is a dataframe with observed sizes and ages.
Invert Hessian to obtain measures of precision.
Note also that if you know the size at birth then you can 
re-parameterize, put size at birth instead of asymptotic size 
(salam.size.inf), and reduce model complexity to two parameters (but of 
course the Gompertz formula changes).
If the Gompertz model is not flexible enough, note that it is a 
particular case of another more flexible, and more complex model, 
Schnute's, which I have found often provides a better explanation of 
growth data (as measured by the AIC), especially if you have sizes of 
very young individuals.


HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting a modified logistic with glm?

2008-11-09 Thread Rubén Roa-Ureta

Mike Lawrence wrote:

On Sat, Nov 8, 2008 at 3:59 PM, Rubén Roa-Ureta <[EMAIL PROTECTED]> wrote:

  

...
The fit is for grouped data.
...




As illustrated in my example code, I'm not dealing with data that can be
grouped (x is a continuous random variable).
  

Four points:
1) I've showed you an approach that you can follow in order to fit a 
modified (custom-made) logistic model, it's up to you to follow this 
approach and modify it accordingly, it was never my intention to give 
you the precise code for your problem (obviously),
2) If you do follow this approach and you keep your predictor 
continuous, you have to change the line of the likelihood definition,
3) If it is really true that your predictor is a *random* variable, then 
you have not a simple glmn (you may have a glmm), and
4) You can always make your continuous predictor categorical, for 
example, as when you make a histogram of said predictor.

R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fitting a modified logistic with glm?

2008-11-08 Thread Rubén Roa-Ureta

Mike Lawrence wrote:

Hi all,

Where f(x) is a logistic function, I have data that follow:
g(x) = f(x)*.5 + .5

How would you suggest I modify the standard glm(..., family='binomial')
function to fit this? Here's an example of a clearly ill-advised attempt to
simply use the standard glm(..., family='binomial') approach:


# First generate some data

#define the scale and location of the modified logistic to be fit
location = 20
scale = 30

# choose some x values
x = runif(200,-200,200)

# generate some random noise to add to x in order to
# simulate real-word measurement and avoid perfect fits
x.noise = runif(length(x),-10,10)

# define the probability of success for each x given the modified logistic
prob.success = plogis(x+x.noise,location,scale)*.5 + .5

# obtain y, the observed success/failure at each x
y = rep(NA,length(x))
for(i in 1:length(x)){
y[i] = sample(
x = c(1,0)
, size = 1
, prob = c(prob.success[i], 1-prob.success[i])
)
}

#show the data and the source modified logistic
plot(x,y)
curve(plogis(x,location,scale)*.5 + .5,add=T)


# Now try to fit the data


#fit using standard glm and plot the result
fit = glm(y~x,family='binomial')
curve(plogis(fit$coefficients[1]+x*fit$coefficients[2])*.5+.5,add=T,col='red',lty=2)

# It's clear that it's inappropriate to use the standard
"glm(y~x,family='binomial')"
# method to fit the modified logistic data, but what is the alternative?
  

A custom-made fit function using nlm?
Below, in the second line the logistic model has been re-parameterized 
to include p[1]=level of the predictor which predicts a 50% probability 
of success, and p[2]=level of the predictor which predicts a 95% 
probability of success. You can change the model in this line to your 
version. In the 4th line (negative log-likelihood) mat is a vector of 0s 
and 1s representing success and imat is a vector of 1s and 0s 
representing failure. The fit is for grouped data.

fn <- function(p){
 prop.est  <- 1/(1+exp(log(1/19)*(l-p[1])/(p[2]-p[1])));  # estimated 
proportion of success
 iprop.est <- 1-prop.est;# estimated 
proportion of failure
 negloglik <- -sum(count*(mat*log(prop.est)+imat*log(iprop.est)));  
#binomial likelihood, prob. model

 }
prop.lik <- nlm(fn,p=c(25.8,33.9),hessian=TRUE)

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] descretizing xy data

2008-11-04 Thread Rubén Roa-Ureta

Rubén Roa-Ureta wrote:

Jon A wrote:

Hello,
I have a dataset with a continuous independent variable (fish length, 
range:

30-150 mm) and a binary response (foraging success, 0 or 1). I want to
discretize fish length into 5 mm bins and give the proportion of 
individuals
who successfully foraged in each each size bin. I have used the cut 
function
to discretize the length values into my desired bins, but I can't 
figure out
how to manipulate my response data in terms of the levels I've 
created. Any

advice on how to achieve my task?

Thanks in advance.
  

You have the option of using catspec.
Here is another, more transparent solution, using hist().
lb <- 30
ub <- 150
bk <- 5
x <- data.frame(cbind(runif(1000,lb,ub),rbinom(1000,1,0.75)))
x$X3 <- cut(x$X1,breaks=(ub-lb)/bk,labels=FALSE)
y <- 
data.frame(cbind(hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks[-1],hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$counts,0)) 


for (i in 1:length(y$X1)) {
 for (j in 1:length(x$X1)) {
if(identical(x$X3[j],i)) y$X3[i] <- y$X3[i]+x$X2[j]
 }
}
sum(x$X2) #check that counting was correct
sum(y$X3)
y$X4 <- ifelse(y$X3 > 0,y$X3/y$X2,NA)
plot(y$X1,y$X4,pch=19)
abline(v=hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks)

BTW, if you add the line below,
text(x=y$X1,y=y$X4+.01,format(y$X2),cex=.5)
you show the sample size at each proportion.
R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] descretizing xy data

2008-11-04 Thread Rubén Roa-Ureta

Jon A wrote:

Hello,
I have a dataset with a continuous independent variable (fish length, range:
30-150 mm) and a binary response (foraging success, 0 or 1). I want to
discretize fish length into 5 mm bins and give the proportion of individuals
who successfully foraged in each each size bin. I have used the cut function
to discretize the length values into my desired bins, but I can't figure out
how to manipulate my response data in terms of the levels I've created. Any
advice on how to achieve my task?

Thanks in advance.
  

You have the option of using catspec.
Here is another, more transparent solution, using hist().
lb <- 30
ub <- 150
bk <- 5
x <- data.frame(cbind(runif(1000,lb,ub),rbinom(1000,1,0.75)))
x$X3 <- cut(x$X1,breaks=(ub-lb)/bk,labels=FALSE)
y <- 
data.frame(cbind(hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks[-1],hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$counts,0))

for (i in 1:length(y$X1)) {
 for (j in 1:length(x$X1)) {
if(identical(x$X3[j],i)) y$X3[i] <- y$X3[i]+x$X2[j]
 }
}
sum(x$X2) #check that counting was correct
sum(y$X3)
y$X4 <- ifelse(y$X3 > 0,y$X3/y$X2,NA)
plot(y$X1,y$X4,pch=19)
abline(v=hist(x$X1,breaks=(ub-lb)/bk,plot=FALSE)$breaks)

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Distributions Comparison

2008-10-28 Thread Rubén Roa-Ureta

Igor Telezhinsky wrote:

Dear all,

I have recently found out about the R project. Could anyone tell me if it is
possible to make the comparison of two distributions using R packages? The
problem is TRICKY because I have the distributions of measurements and each
measurement in the given distribution has its own error, which I need to
take into account when comparing these distributions. If anyone knows it is
possible with R, could you please tell me what package to use. I will really
appreciate some hints in code as well.

 


Thank you.
  

Dear Igor,
Welcome to the list and to R. Please read the posting guide and study 
your problem b4 attempting to seek help from us. Your question as it 
stands is impossibly ambiguous. Try to make a toy example or formulate 
it better and you may have better luck.

Ruben

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot the chi square distribution and the histogram in the same graph

2008-10-21 Thread Rubén Roa-Ureta

leo_wa wrote:

In the previous post ,i ask how to plot the  normal curve and the histogram
in the same graph.if i want to know how to plot the chi square distribution
to campare the data set ,how can i do that?
  
You should make up your mind, is your random variable X (-inf,+inf) or 
Sum(X^2) (which obviously cann't take negative values)?

They are quite different.
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to plot the histogram and the curve in the same graph

2008-10-21 Thread Rubén Roa-Ureta

leo_wa wrote:

i want to plot the histogram and the curve in the same graph.if i have a set
of data ,i plot the histogram and also want to see what distribution it
was.So i want to plot the curve to know what distribution it like.
  
  
To draw the curve and the distribution you should have an idea about the 
distribution.
You cann't just draw the histogram and expect R to make a curve of the 
best distribution to fit that histogram.

But you can plot a curve of a kernel density.

x <- rnorm(1000,5,3)
library(MASS)
(x.normfit <- fitdistr(x,"normal"))
hist(x,prob=TRUE)
lines(density(x,na.rm=TRUE),col="red") # kernel density
curve(dnorm(x,mean= x.normfit$estimate[1],sd= 
x.normfit$estimate[2]),col="blue",add=TRUE) #maximum likelihood estimate


Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] distributions and glm

2008-10-21 Thread Rubén Roa-Ureta

drbn wrote:

Hello,
I have seen that some papers do this:

1.) Group data by year (e.g. 35 years) 


2.) Estimate the mean of the key variable through the distribution that fits
better (some years is a normal distribution , others is a more skewed, gamma
distribution, etc.)

3.) With these estimated means of each year do a GLM.

I'd like to know if it is possible (to use these means in a GLM) or is a
wrong idea.

Thanks in advance

David
  

David,
You can model functions of data, such as means, but you must be careful 
to carry over most of the uncertainty in the original data into the 
model. If you don't, for example if you let the model know only the 
values of the means, then you are actually assuming that these means 
were observed with absolute certainty instead of being estimated from 
the data. To carry over the uncertainty in the original data to your 
modeling you can use a Bayesian approach or you can use a marginal 
likelihood approach. A marginal likelihood is a true likelihood function 
not of the data, but of functions of the data, such as of maximum 
likelihood estimates. If your means per year were estimated using 
maximum likelihood (for example with fitdistr in package MASS) and you 
sample size is not too small then you can use a normal marginal 
likelihood model for the means. Note however that each mean may come 
from a different distribution so the full likelihood model for your data 
would be a mixture of normal distributions. You may not be able to use  
the pre-built glm function so you may face the challenge to write your 
own code.

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about glm using R

2008-10-21 Thread Rubén Roa-Ureta

Jordi Garcia wrote:

Good morning,

I am using R to try to model the proportion of burned area in 
Portugal. The dependent variable is the proportion. The family used is 
binomial and the epsilon would be binary.


I am not able to find the package to be used when the proportion (%) 
has to be used in glm. Could someone help me?


I am using normal commands of glm.. for example:

glm_5<- glm(formula=p~Precipitation, family=binomial(link=logit), 
data=dados)


where p is the proportion of burned area, but this error message 
apperars:


Warning message:
non-integer #successes in a binomial glm! in: eval(expr, envir, enclos)

That is why I think I am not using the proper glm package.

Thank you very much in advance.

Jordi

Jordi,
Your statistical model is wrong. The binomial family if four counts data 
(counts of successes given n trials), not for proportions. To model 
proportions, your family is the Beta family. I've modeled proportion 
response variables with function betareg of package betareg. If you want 
my example applications I can send you code and data off list.
Reference: Ferrari and Cribari-Neto. 2004. Beta regression for modelling 
rates and proportions. Journal of Applied Statistics 31:799-815.

Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plot - central limit theorem

2008-10-15 Thread Rubén Roa-Ureta

Jörg Groß wrote:

Hi,


Is there a way to simulate a population with R and pull out m samples, 
each with n values

for calculating m means?

I need that kind of data to plot a graphic, demonstrating the central 
limit theorem

and I don't know how to begin.

So, perhaps someone can give me some tips and hints how to start and 
which functions to use.




thanks for any help,
joerg


x <- runif(1,0,1)
y <- runif(1,0,1)
z <- runif(1,0,1)
u <- runif(1,0,1)
v <- runif(1,0,1)
w <- runif(1,0,1)
t <- x+y+z+u+v+w
hist(t,breaks=100,xlab="sum of i.i.d r.v with finite mean and 
variance",ylab="Frequency",main="")


Change runif for the corresponding function of the distribution of your 
choice.

Not exactly what you wanted but it does verify the C.L.T.
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R graph with values incorporated

2008-08-28 Thread Rubén Roa-Ureta

Prasanth wrote:

Dear All:

 


Greetings!

By the way, is it possible to have a graph (say line graph) that shows
"values" as well (say y-axis values within the graph)? One could do it in
excel. I am just wondering whether it is possible with R!
  

x <- rnorm(100,2,3)
y <- rnorm(100,2,3)
plot(x,y,pch=19)
text(x=x,y=y+.5,format(x,digits=1),cex=.5)


[[alternative HTML version deleted]]  <-- Read the posting guide.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] draw a function over a histogram

2008-08-22 Thread Rubén Roa-Ureta

Daniela Garavaglia wrote:

Sorry, I have some troubles with the graph device.

How can I draw a function over a histogram?

Thank's so much.
  

Daniela,
What function?
Here is one example using density() and using dnorm()
x <- rnorm(1000,2,2)
hist(x,prob=TRUE)
lines(density(x,na.rm=TRUE),col="red")
library(MASS)
y <- fitdistr(x,"normal")
curve(dnorm(x,mean=y$estimate[1],sd=y$estimate[2]),add=TRUE)

HTH
R.


[[alternative HTML version deleted]] <--- What about this??




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Coordinate systems for geostatistics in R

2008-08-22 Thread Rubén Roa-Ureta

imicola wrote:

Hi,

I read somewhere that when carrying out geostatistical analysis in R you
should not use latitude and longitude...can anyone expand on this a little
for me, and what would be the best coordinate system to use?
  
Not only in R. In most systems, the inter-point distances are assumed to 
be planar (distances over an Euclidean space), whereas latitude and 
longitude are spherical. I guess there could be a geostatistical 
analysis based on spherical distances, but why to make things more 
complicated when projecting the spherical coordinates into planar 
coordinates b4 the analysis produces a good approximation and simplifies 
the analysis significantly? I use UTM coordinates, and transform from 
geodetic to metric with Eino Uikkanen's GeoConv program (it's free).



I have my data in a geographic coordinate system, WGS84, decimal
degreesis this the wrong format for such analyses?
  
If the distances are short, it is not so wrong, and the wrongness 
increases with increasing distance.

I have also converted my data in the UTM projection and so have it in
metres(ranging from 480,000 to 550,000 E and 170,000 to 230,000 N).  


If I was to use the UTM coordinates, should I be using the actual
coordinates in metres, or should I convert this into an arbitrary coordinate
system (i.e. from 0 - 1) somehow?
It would be convenient to use km rather than m, so the range parameter 
would be closer in magnitude to the other parameters of the model. A 
very large range parameter in metres may cause numerical instability 
during minimization of the negative support function in likelihood-based 
models such as that implemented in geoR.
  


I have noticed that running an analysis on the data gives different results
depending on which type of system you use, so I want to make sure I have the
correct system.  I should also probably note that I am a geostatistical
novice!

Thanks,
  
Bottomline, your geostatistical software is probably based on distance 
calculations on an Euclidean space so it is wrong to input locations in 
spherical coordinates.

HTH
Ruben

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Geodata object border

2008-08-12 Thread Rubén Roa-Ureta

imicola wrote:

Sorry, this is probably quite an easy question, but I'm new to R and couldn't
find the answer anywhere.

I'm using geoR and geoRglm, but can't figure out how to get a border in my
geodata object.  Does this need to be defined when I'm importing my data, or
afterwards, and how do I go about doing this?

Thanks
  
You can define the border previously and import it to R with read.table 
or read.csv or you can define it from your geodata object in R.
In the first case don't forget to close the polygon by repeating the 
first vertex at the end of the file.
In the second case you can use a convex hull function in R, such as 
chull(), to define the polygon with your geodata object. Say your 
geodata object is called my.geo, then my.geo$coords would be the 
coordinates. Then try this,

bor <- chull(my.geo$coords)
bor <- c(bor, bor[1]) # close polygon by appending the first vertex
plot(my.geo$coords)
lines(my.geo$coords[pol,])
HTH
Ruben

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] geoR : Passing arguments to "optim" when using "likfit"]

2008-06-26 Thread Rubén Roa-Ureta

Mzabalazo Ngwenya wrote:

Hi everyone !
 
I'm am trying to fit a kriging model to a set of data. When I just run 
the "likfit" command I can obtain the results. However when I try to 
pass additional arguements to the optimization function "optim" I get 
errors. That is I want to obtain the hessian matrix so matrix 
(hessian=TRUE).
 
Heres a little example( 1-D). Can anyone shed some light?  Where am I 
going wrong ?
 
---
 
 
 
obs.points 
<-matrix(c(0.1,0,0.367,0,0.634,0,0.901,0),byrow=TRUE,nrow=4)

MM1.fx <-MM1(obs.points[,1])
MM1.fx


[1] 0.111 0.5789475 1.7272732 9.100
# Creating geoR object
 

MM1.data <-as.geodata(cbind(obs.points,MM1.fx))
MM1.data


$coords
[1,] 0.100 0
[2,] 0.367 0
[3,] 0.634 0
[4,] 0.901 0
$data
[1] 0.111 0.5789475 1.7272732 9.100
attr(,"class")
[1] "geodata"
# Starting values for MLE
 

sta.vals =expand.grid(seq(1,20),seq(1,20))


# Maximum likelihood estimation of covariance parameters

 HERE IS THE PROBLEM 
 
MM1.fit 
<-likfit(MM1.data,ini.cov.pars=sta.vals,fix.nugget=TRUE,cov.model="gaussian",optim(hessian=TRUE)) 





MM1.fit 
<-likfit(MM1.data,ini.cov.pars=sta.vals,fix.nugget=TRUE,cov.model="gaussian",hessian=TRUE) 


MM1.fit$info.minimisation.function$hessian
For the observed Fisher information:
solve(MM1.fit$info.minimisatrion.function$hessian)

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Solving for parameter values at likelihood points

2008-05-06 Thread Rubén Roa-Ureta

ComRades,
On Oct 16, 2006, I posted a question regarding how to find the parameter 
values that made the likelihood function take a value equal to 1/8th of 
the maximum likelihood value. There was no reply but I found a solution 
and would like to know if there is a better solution using the function 
uniroot(). I think with uniroot() I can get better parameter values at 
these bounds.

This is my solution, using the function nearest() from package GenKern.
The example concerns the marginal likelihood for the normal variance 
parameter when the mean is a nuisance parameter (see section 7.3 in 
Royall, 1997, Statistical evidence. A likelihood paradigm, Chapman & Hall).

Rubén

y<-c(-2.250357,10.723614,7.238959,9.179939,5.831603,9.301580,4.019388,9.777280,5.587761,3.600276,9.753381,5.154928,2.776350,9.940350,14.185712,4.281198,10.994180,9.333027,13.082535,2.067826,10.772551,-3.811529,5.446021,-5.333287,1.421544)
n <- length(y)
s.sq <- (1/(n-1))*sum((y-mean(y))^2)
sigma.sq <- seq(1,100,0.1)
Lik.M <- sigma.sq^(-(n-1)/2)*exp(-(n-1)*s.sq/(2*sigma.sq))
Rel.Lik.M <- Lik.M/max(Lik.M)
plot(sigma.sq,Rel.Lik.M,type="l")
abline(h=1/8)
arrows(y0=0.4,y1=0.18,x0=0,x1=12)
arrows(y0=0.4,y1=0.18,x0=62,x1=50)
library(GenKern)
cbind(sigma.sq,Rel.Lik.M)[Rel.Lik.M==1] #maximum likelihood estimate
max.idx <- nearest(Rel.Lik.M,1,outside=TRUE) # vector index value at the 
maximum likelihood
#finding the parameter value at 1/8th the max. likelihood to the left of 
the maximum

cbind(sigma.sq[1:max.idx],Rel.Lik.M[1:max.idx])[nearest(Rel.Lik.M[1:max.idx],0.125),]
#finding the parameter value at 1/8th the max. likelihood to the right 
of the maximum

cbind(sigma.sq[max.idx:length(sigma.sq)],Rel.Lik.M[max.idx:length(sigma.sq)])[nearest(Rel.Lik.M[max.idx:length(sigma.sq)],0.125),]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] deleting variables

2008-04-29 Thread Rubén Roa-Ureta

Richard Pearson wrote:

?rm

Richard




Ralf Goertz wrote:

How can I automatically exclude one variable from being saved in the
workspace when I quit an R session? The problem is I don't know how to
erase a variable once it has been created.


[...]


Ralf


More on the use of rm. If you want to delete many variables in one go,

x<-runif(10)
y<-runif(10)
z<-runif(10)
ls() #you check all the objects in your workspace
#[1] "x" "y" "z"
rm(list=your.list <- ls()[2:3]) #you selected to delete all those 
objects whose indices are between 2 and 3.

rm("your.list") #remove the temporary list with variable names
ls()
[1] "x"

HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R Newbie Question/Data Model

2008-04-25 Thread Rubén Roa-Ureta
[EMAIL PROTECTED] wrote:
> Given a data set and a set of predictors and a response in the data,
> we would like to find a model that fits the data set best.
> Suppose that we do not know what kind of model (linear, polynomial
> regression,... ) might be good, we are wondering if there is R-package(s)
> can auctomatically do this.
> Otherwise, can you direct me, or point out reference(s),
> basic steps to do this. Thanks.
> 
> -james

The best-fitting model for any data is a model with a lot of parameters, 
so maybe the best fitting model for any data is a model with an infinite 
number of parameters. However, any model with more parameters than data 
will have a negative number of degrees of freedom, and you do not want 
that. The best-fitting model for any data subject to the constraint that 
the number of degrees of freedom is non-negative, is the data itself, 
with zero degrees of freedom.
The AIC tells you this too. The AIC for the model formed by the data 
itsel is 2n, whereas the AIC for any model with negative degrees of 
freedom is > 2n.
But I guess you want to make inference from sample to population. If 
that is indeed the case, then you should consider changing your focus 
from finding "a model that fits the data set best" to a model that best 
summarizes the information contained in your sample about the population 
the sample comes from. To do that, start by defining the nature of your 
response variable. What is the nature of the natural process generating 
this response variable? Is it continuous or discrete? Is it univariate 
or multivariate? Can it take negative and positive values? Can it take 
values of zero? After you have clarified the probabilistic model for the 
response variable, then you can start thinking about the mathematical 
relation between the response variable and the predictors. Is it linear 
or nonlinear? Are the predictors categorical or continuous?
Read the posting guide, formulate a clear question, and maybe you will 
be given more specific help.
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Modeling presence only data in R

2008-04-22 Thread Rubén Roa-Ureta
milton ruser wrote:
> Dear All,
> 
> I have a set of environental maps and presence-only points for some species.
> How can I generate distributions models on R using these presence-only data?
> What packages and functions can I use to do so?
> 
> Kind regards,
> 
> Miltinho
> Brazil

If you have location data for each absence/presence (such as linear 
location data, easting and northing) consider using geoRglm.
Christensen, O. F., and Ribeiro, P. J. 2002. geoRglm: a package for 
generalized linear spatial models. R-NEWS, 2: 26–28.
HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] finding an unknown distribution

2008-04-21 Thread Rubén Roa-Ureta
andrea previtali wrote:
> Hi,
> I need to analyze the influences of several factors on a variable that is a 
> measure of fecundity, consisting of 73 observations ranging from 0 to 5. The
> variable is continuous and highly positive skewed, none of the typical
> transformations was able to normalize the data. Thus, I was thinking in 
> analyzing these data using a generalized linear model where I
> can specify a distribution other than normal. I'm thinking it may fit a
> gamma or exponential distribution. But I'm not sure if the data meets
> the assumptions of those distributions because their definitions are
> too complex for my understanding!

Roughly, the exponential distribution is the model of a random variable 
describing the time/distance between two independent events that occur 
at the same constant rate. The gamma distribution is the model of a 
random variable that can be thought of as the sum of exponential random 
variables. I don't think fecundity data, the count of reproductive 
cells, qualifies as a random variable to be modeled by either of these 
distributions. If the count of reproductive cells is very large, and you 
are modeling this count as a function of animal size, such as length, 
you should consider the lognormal distribution, since the count of cells 
grow multiplicatively (volumetrically) with the increase in length. In 
that case you can model your response variable using glm with 
family=gaussian(link="log").
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory issues

2008-04-16 Thread Rubén Roa-Ureta
I think any geostatistical program/R package would have trouble handling 
12000 observations on a PC. The empirical variogram would be built with 
the combinations of 12000 over 2 pairs, nearly 72 millions pairs, and 
during kriging, if you didn't restrict the search neighbourhood, 
interpolation would involve solving something very big, more so if you 
defined a fine grid, ... Try restricting the search neighbourhood, if 
you didn't, with maxdist and nmax.

Rubén


Prof Brian Ripley wrote:
> I think the clue is that the message you quote comes from gstat, which 
> does not use R's memory allocator.  It is gstat and not R that has failed 
> to allocate memory.
> 
> Try re-reading the help page for memory.size.  'max=T' does not indicate 
> the limit (that is the job of memory.limit()), but the maximum 'used' 
> (acquired from the OS) in that session.  So 19Mb looks reasonable for R's 
> usage.
> 
> I don't understand the message from memory.limit() (and the formatting is 
> odd).  memory.limit() does not call max() (it is entirely internal), so I 
> wonder if that really is the output from that command.  (If you can 
> reproduce it, please let us have precise reproduction instructions.)
> 
> There isn't much point in increasing the memory limit from the default 
> 1.5Gb on a 2Gb XP machine.  The problem is that the user address space 
> limit is 2Gb and fragmentation means that you are unlikely to be able to 
> get over 1.5Gb unless you have very many small objects, in which case R 
> will run very slowly.  In any case, that is not the issue here.
> 
> On Wed, 16 Apr 2008, Dave Depew wrote:
> 
>> Hi all,
>> I've read the R for windows FAQ and am a little confused re:
>> memory.limit and memory.size
>>
>> to start using R 2.6.2 on WinXP, 2GB RAM, I have the command line "sdi
>> --max-mem-size=2047M"
>> Once the Rgui is open, memory.limit() returns 2047, memory.size()
>> returns 11.315, and memory.size(max=T) returns 19.615
>>
>> Shouldn't memory.size(max=T) return 2047?
>>
>> Upon running several operations involving kriging (gstat package,
>> original data file 3 variables, 12000 observations)
>> the program runs out of memory
>>
>> "memory.c", line 57: can't allocate memory in function m_get()
>> Error in predict.gstat(fit.uk, newdata = EcoSAV.grid.clip.spxdf,
>> debug.level = -2,  :
>>  m_get
>>
>> Immediately following this,
>>
>> memory.limit() returns [1] -Inf
>>Warning message:
>>In memory.limit() : no non-missing arguments
>> to max; returning -Inf
>>
>> memory.size() returns 24.573.
>>
>> memory.size(max=T) returns 46.75
>>
>> To my untrained eye, it appears that R is not being allowed access to
>> the full memory limit specified in the cmd lineif this is the case,
>> how does one ensure that R is getting access to the full allotment of RAM?
>> Any insight is appreciated...
>>
>>
>>> sessionInfo()
>> R version 2.6.2 (2008-02-08)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats graphics  grDevices datasets  tcltk utils
>> methods   base
>>
>> other attached packages:
>> [1] maptools_0.7-7 foreign_0.8-23 gstat_0.9-43   rgdal_0.5-24
>> lattice_0.17-4 sp_0.9-23  svSocket_0.9-5 svIO_0.9-5
>> R2HTML_1.58svMisc_0.9-5   svIDE_0.9-5
>>
>> loaded via a namespace (and not attached):
>> [1] grid_2.6.2  tools_2.6.2
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Confidence intervals of log transformed data

2008-04-16 Thread Rubén Roa-Ureta
tom soyer wrote:
> Hi
> 
>  I have a general statistics question on calculating confidence interval of
> log transformed data.
> 
> I log transformed both x and y, regressed the transformed y on transformed
> x: lm(log(y)~log(x)), and I get the following relationship:
> 
> log(y) = alpha + beta * log(x) with se as the standard error of residuals
> 
> My question is how do I calculate the confidence interval in the original
> scale of x and y? Should I use

[...]

Confidence interval for the mean of Y? If that is the case, when you 
transformed Y to logY and run a regression assuming normal deviates you 
were in fact assuming that Y distributes lognormally. Your interval must 
be assymetric, reflecting the shape of the lognormal.  The lognormal 
mean is lambda=exp(mu + 0.5*sigma^2), where mu and sigma^2 are the 
parameters of the normal variate logY. A confidence interval for lambda is
Lower Bound=exp(mean(logY)+0.5*var(logY)+sd(logY)*H_alpha/sqrt(n-1))
Upper Bound=exp(mean(logY)+0.5*var(logY)+sd(logY)*H_(1-alpha)/sqrt(n-1))
where the quantiles H_alpha and H_(1-alpha) are quantiles of the 
distribution of linear combinations of the normal mean and variance 
(Land, 1971, Ann. Math. Stat. 42:1187-1205, and Land, 1975, Sel. Tables 
Math. Stat. 3:385-419).
Alternatively, you can model directly
Y=p1*X^p2, p1=exp(your alpha), p1=beta
with a lognormal likelihood and predict the mean of Y with the fitted 
model (I'm guessing here).
It could be useful to check Crow and Shimizu, Lognormal distributions. 
Theory and practice, 1988, Dekker, NY.
HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] mixture distribution analisys

2008-04-10 Thread Rubén Roa-Ureta
Antonio Olinto wrote:
> Hello,
> 
> I'm analyzing some fish length-frequency data to access relative age and 
> growth
> information and I would like to do something different from FiSAT / ELEFAN 
> analysis.
> 
> I found a package named MIX in http://www.math.mcmaster.ca/peter/mix/mix.html
> but it's compiled for R 2.0.0
> 
> So I have two questions:
> Is there any problem to run this package in R 2.7.0? - probably yes …
> And, is there any other package in R that can be used to analyze and to 
> separate
> mixture distributions?
> 
> Thanks for any help. Best regards,
> 
> Antonio Olinto
> Sao Paulo Fisheries Institute
> Brazil

This problem is not too difficult. Maybe you can write your own code.
Say, you enter the number of fish you measured, n, and a two-column 
dataframe with the mid point of the length class in the first column 
(call it l) and the observed relative frequency of each length class in 
the second column (call it obs_prob). Then using the multinomial 
likelihood for the number of fish in each length class as the random 
variable (the approach in Peter Macdonald's Mix), minimize
log_likelihood=-n*sum(elementwise product(obs_prob,log(pred_prob)))
where
pred_prob=(p1*0.3989423/s1)*exp(-0.5*square((l-m1)/s1))
 +(p2*0.3989423/s2)*exp(-0.5*square((l-m2)/s2))
 +(p3*0.3989423/s3)*exp(-0.5*square((l-m3)/s3))
where s1, s2, s3, m1, m2, m3, p1 and p2 (p3=1-p1+p2) are the parameters.
Do you know your number of components (cohorts) in the mixture?
In the model above it is three. If you don't know it, try several models 
with different number of components and the AIC may let you know which 
one is the best working model. Don't use the likelihood ratio test as 
one of the p parameters is on the boundary of parameter space if the 
null is true.
Also, you should bound the parameters (m1https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] comparing length-weight regressions]

2008-03-20 Thread Rubén Roa-Ureta
[EMAIL PROTECTED] wrote:
> I'd like to compare length-weight regressions among years. Any information 
> would be appreciated.
> 
> a. gray
> fisheries consultant

Your message is rather cryptic for a general statistical audience,
whereas I'm sure in a fisheries group everybody would understand what
you want.
Use a glm with family=Gaussian(link=log) for all data sets together (in
original units) with year as a factor, then run the model again ignoring
the year factor, and compare the different fits using anova(name of your
glm objects) for a likelihood ratio test, or inspect the AICs for a
non-frequentist model selection method.
R.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] analyzing binomial data with spatially correlated errors

2008-03-20 Thread Rubén Roa-Ureta
Roger Bivand wrote:
> Ben Bolker  ufl.edu> writes:
> 
>> Jean-Baptiste Ferdy  univ-montp2.fr> writes:
>>
>>> Dear R users,
>>>
>>> I want to explain binomial data by a serie of fixed effects. My 
>>> problem is that my binomial data  are spatially correlated. Naively, 
>>> I thought I could found something similar to gls to analyze such
>>> data. After some reading, I decided that lmer is probably to tool
>>> I need. The model I want to fit would look like
>>>
> (...)
>> You could *almost* use glmmPQL from the MASS package,
>> which allows you to fit any lme model structure
>> within a GLM 'wrapper', but as far as I know it wraps only lme (
>> which requires at least one random effect) and not gls.
>>
> 
> The trick used in:
> 
> Dormann, C. F., McPherson, J. M., Araujo, M. B., Bivand, R.,
> Bolliger, J., Carl, G., Davies, R. G., Hirzel, A., Jetz, W., 
> Kissling, W. D., Kühn, I., Ohlemüller, R., Peres-Neto, P. R., 
> Reineking, B., Schröder, B., Schurr, F. M. & Wilson, R. J. (2007): 
> Methods to account for spatial autocorrelation in the analysis of 
> species distributional data: a review. Ecography 30: 609–628
> 
> (see online supplement), is to add a constant term "group", and set 
> random=~1|group. The specific use with a binomial family there is for 
> a (0,1) response, rather than a two-column matrix. 
> 
>>   You could try gee or geoRglm -- neither trivially easy, I think ...
> 
> The same paper includes a GEE adaptation, but for a specific spatial
> configuration rather than a general one.
> 
> Roger Bivand
> 
>>   Ben Bolker

I suggest you also check out the package geoRglm, where you can model 
binomial and Poisson spatially correlated data. I used it to model 
spatially correlated binomial data but without covariates, i.e. without 
your fixed effects (so my model was a logistic regression with the 
intercept only) (Reference below). But I understand that you can add 
covariates and use them to estimate the non-random set of predictors. 
Here is the geoRglm webpage:
http://www.daimi.au.dk/~olefc/geoRglm/
This approach would be like tackling the problem from the point of view 
of geostatistics, rather than from mixed models. But I believe the 
likelihood-based geostatistical model is the same as a generalized 
linear mixed model where the distance is the random effect.
In SAS you can do this using the macro glimmix but from the point of 
view of generalized linear mixed models because there they have 
implemented a correlation term, so that you can identify typical spatial 
correlation functions such as Gauss and exponential, particular cases of 
the Matern family.

Rubén

Roa-Ureta, R. and E.N. Niklitschek (2007) Biomass estimation from 
surveys with likelihood-based geostatistics. ICES Journal of Marine 
Science 64:1723-1734

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading data into R

2008-01-03 Thread Rubén Roa-Ureta
BEP wrote:
> Hello all,
> 
> I am working with a very large data set into R, and I have no interest in
> reviving my SAS skills.  To do this, I will need to drop unwanted variables
> given the size of the data file.  The most common strategy seems to be
> subsetting the data after it is read into R.  Unfortunately, given the size
> of the data set, I can't get the file read and then subsquently do the
> subset procedure.  I would be appreciative of help on the following:
> 
> 1.  What are the possibilities of reading in just a small set of variables
> during the  statement (or another 'read' statement)?  That is,
> is it possible specify just the variables that I want to keep?
> 
> 2.  Can I randomly select a set of observations during the 'read' statement?
> 
> 
> I have searched various R resources for this information, so if I am simply
> overlooking a key resource on this issue, pointing that out to me would be
> greatly appreciated.
> 
> Thanks in advance.
> 
> Brian

Check this for input of specific columns from a large data matrix:
mysubsetdata<-do.call("cbind",scan(file='location and name of your 
file',what=list(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,0,NULL,0,NULL,NULL),flush=TRUE))
This will input only columns 10 and 11 into 'mysubsetdata'.
With this method you can work out the way to select random columns.
HTH
Rubén

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.