Re: [R] Fisher's r to z' transformation - help needed

2007-05-24 Thread Mike White
Duncan and Peter
[resent to include on R-Help]
Thank you for your help. It seems my data structure is not suitable for use
with Fisher's z' transformation.

The simulated data was intended to represent the outputs of several
instruments over time. Each row of dat represents the output from one
instrument on a particular day and each column represents a variable being
measured.  Each instrument sensitivity is different and may be a small
offset, so that the output is effectively transformed as ax +b where x is
the 'true' output and the values of a and b are not known.  Pearson's r was
therefore used to check the correlation between outputs.  I then want to
plot the r values on a control chart and set an upper warning line and
action line for a maximum acceptable  value for 1-r based on either
comparing each output with every other output or by comparing to a mean of
the outputs.  I was then hoping to use  Fisher's z' transformation to set
the usual warning and action lines based on a single sided normal
distribution.

The only alternative I can think of is to use the simulation to produce the
r distribution and then use the quantile function to set limits based on
probability? I would be grateful for any help and advice you can provide.

Thanks
Mike White

- Original Message - 
From: "Duncan Murdoch" <[EMAIL PROTECTED]>
To: "Mike White" <[EMAIL PROTECTED]>
Cc: 
Sent: Wednesday, May 23, 2007 1:38 PM
Subject: Re: [R] Fisher's r to z' transformation - help needed


> On 5/23/2007 7:40 AM, Mike White wrote:
> > I am trying to use Fisher's z' transformation of the Pearson's r but the
> > standard error does not appear to be correct.  I have simulated an
example
> > using the R code below.  The z' data appears to have a reasonably normal
> > distribution but the standard error given by the formula 1/sqrt(N-3)
(from
> > http://davidmlane.com/hyperstat/A98696.html) gives a different results
than
> > sd(z).  Can anyone tell me where I am going wrong?
>
> Your simulation is very strange.  Why are you calculating the
> correlation of data with its own mean?
>
> Here's a simpler simulation that seems to confirm the approximation is
> reasonable:
>
>  > p <- 10
>  > sdx <- 1
>  > sdy <- 1
>  > x <- matrix(rnorm(1000*p, sd=sdx), 1000, p)
>  > y <- matrix(rnorm(1000*p, mean=x, sd=sdy), 1000, p)
>
> The true correlation is sdx/sqrt(sdx^2 + sdy^2), i.e. 0.71.
>
>  > r <- numeric(1000)
>  > for (i in 1:1000) r[i] <- cor(x[i,], y[i,])
>  > f <- 0.5*(log(1+r) - log(1-r))
>  > sd(f)
> [1] 0.3739086
>  > 1/sqrt(p-3)
> [1] 0.3779645
>
>  > p <- 5
>  > x <- matrix(rnorm(1000*p, sd=sdx), 1000, p)
>  > y <- matrix(rnorm(1000*p, mean=x, sd=sdy), 1000, p)
>  > r <- numeric(1000)
>  > for (i in 1:1000) r[i] <- cor(x[i,], y[i,])
>  > f <- 0.5*(log(1+r) - log(1-r))
>  > sd(f)
> [1] 0.6571383
>  > 1/sqrt(p-3)
> [1] 0.7071068
>
> Duncan Murdoch
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fisher's r to z' transformation - help needed

2007-05-23 Thread Mike White
I am trying to use Fisher's z' transformation of the Pearson's r but the
standard error does not appear to be correct.  I have simulated an example
using the R code below.  The z' data appears to have a reasonably normal
distribution but the standard error given by the formula 1/sqrt(N-3) (from
http://davidmlane.com/hyperstat/A98696.html) gives a different results than
sd(z).  Can anyone tell me where I am going wrong?

library(amap)

## SIMULATED DATA #
p<-10
n<-3000
means<-1000*c(1:p)
SDs<-rep(100,p)
set.seed(1)
dat<-mapply(rnorm, mean=means, sd=SDs, n=n)
colnames(dat)<-paste("V",1:p, sep="")
rownames(dat)<-1:n
# calculate centroid of simulated data
dat.mean<-apply(dat,2,mean)
# calculated Pearson's r to centroid
r<-apply(dat,1,cor, y=dat.mean)
plot(density(r))
# Fisher's z' transformation
z<-0.5*log((1+r)/(1-r))
plot(density(z))
sd(z)
# [1] 0.2661779

1/sqrt(p-3)
# [1] 0.3779645

## alternatively use comparisons for all possible pairs
## Centred Pearson's r on raw data
r<-1-Dist(dat,"corr")
plot(density(r))
z<-0.5*log((1+r)/(1-r))
plot(density(z))
sd(z)
# [1] 0.2669787
1/sqrt(p-3)
# [1] 0.3779645

Many thanks
Mike White

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mahalanobis distance and probability of group membership using Hotelling's T2 distribution

2007-02-20 Thread Mike White
I want to calculate the probability that a group will include a particular
point using the squared Mahalanobis distance to the centroid. I understand
that the squared Mahalanobis distance is distributed as chi-squared but that
for a small number of random samples from a multivariate normal population
the Hotellings T2 (T squared) distribution should be used.
I cannot find a function for Hotelling's T2 distribution in R (although from
a previous post I have been provided with functions for the Hotelling Test).
My understanding is that the Hotelling's T2 distribution is related to the F
distribution using the equation:
 T2(u,v) = F(u, v-u+1)*vu/(v-u+1)
where u is the number of variables and v the number of group members.

I have written the R code below to compare the results from the chi-squared
distribution with the Hotelling's T2 distribution for probability of a
member being included within a group.
Please can anyone confirm whether or not this is the correct way to use
Hotelling's T2 distribution for probability of group membership. Also, when
testing a particular group member, is it preferable to leave that member out
when calculating the centre and covariance of the group for the Mahalanobis
distances?

Thanks
Mike White



## Hotelling T^2 distribution function
ph<-function(q, u, v, ...){
# q vector of quantiles as in function pf
# u number of independent variables
# v number of observations
if (!v > u+1) stop("n must be greater than p+1")
df1 <- u
df2 <- v-u+1
pf(q*df2/(v*u), df1, df2, ...)
}

# compare Chi-squared and Hotelling T^2 distributions for a group member
u<-3
v<-10
set.seed(1)
mat<-matrix(rnorm(v*u), nrow=v, ncol=u)
MD2<-mahalanobis(mat, center=colMeans(mat), cov=cov(mat))
d<-MD2[order(MD2)]
# select a point midway between nearest and furthest from centroid
dm<-d[length(d)/2]
1-ph(dm,u,v)# probability using Hotelling T^2 distribution
# [1] 0.6577069
1-pchisq(dm, u) # probability using Chi-squared distribution
# [1] 0.5538466

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mahalanobis distance and probability of group membership using Hotelling's T2 distribution

2007-02-20 Thread Mike White
I want to calculate the probability that a group will include a particular
point using the squared Mahalanobis distance to the centroid. I understand
that the squared Mahalanobis distance is distributed as chi-squared but that
for a small number of random samples from a multivariate normal population
the Hotellings T2 (T squared) distribution should be used.
I cannot find a function for Hotelling's T2 distribution in R (although from
a previous post I have been provided with functions for the Hotelling Test).
My understanding is that the Hotelling's T2 distribution is related to the F
distribution using the equation:
 T2(u,v) = F(u, v-u+1)*vu/(v-u+1)
where u is the number of variables and v the number of group members.

I have written the R code below to compare the results from the chi-squared
distribution with the Hotelling's T2 distribution for probability of a
member being included within a group.
Please can anyone confirm whether or not this is the correct way to use
Hotelling's T2 distribution for probability of group membership. Also, when
testing a particular group member, is it preferable to leave that member out
when calculating the centre and covariance of the group for the Mahalanobis
distances?

Thanks
Mike White



## Hotelling T^2 distribution function
ph<-function(q, u, v, ...){
# q vector of quantiles as in function pf
# u number of independent variables
# v number of observations
if (!v > u+1) stop("n must be greater than p+1")
df1 <- u
df2 <- v-u+1
pf(q*df2/(v*u), df1, df2, ...)
}

# compare Chi-squared and Hotelling T^2 distributions for a group member
u<-3
v<-10
set.seed(1)
mat<-matrix(rnorm(v*u), nrow=v, ncol=u)
MD2<-mahalanobis(mat, center=colMeans(mat), cov=cov(mat))
d<-MD2[order(MD2)]
# select a point midway between nearest and furthest from centroid
dm<-d[length(d)/2]
1-ph(dm,u,v)# probability using Hotelling T^2 distribution
# [1] 0.6577069
1-pchisq(dm, u) # probability using Chi-squared distribution
# [1] 0.5538466

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Confidence intervals of quantiles

2007-02-05 Thread Mike White
Can anyone please tell me if there is a function to calculate confidence
intervals for the results of the quantile function.
Some of my data is normally distributed but some is also a squewed
distribution or a capped normal distribution. Some of the data sets contain
about 700 values whereas others are smaller with about 100-150 values, so I
would like to see how the confidence intervals change for the different
distributions and different data sizes.

Thanks
Mike White

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Block clustering and largest square block in binary matrix

2006-11-29 Thread Mike White
Hi
I have a large binary square matrix (about 2500 x 2500) obtained from a
distance matrix on x (smaller simulation below), in which I want to find the
largest square block of 1's for all permutations of the rows and columns.  I
then want to remove the rows and columns corresponding to this block and
repeat the process, so that finally all the rows of data in x are assigned
to a particular square block or are isolates.  Does anyone know of a
function or algorithm that can do this?  I have looked at the blockmodel
function in the sna package but this does not appear to find the largest
block and then the next largest block etc. I have also search the email
archives without success.

rm(list=ls())
set.seed(1)
x<-rnorm(100,10,1)
d.mat<-as.matrix(dist(x))
d.mat[d.mat<=0.2]<- -1
d.mat[d.mat>0.2]<-0
d.mat<-d.mat*-1

## I have tried sorting by row and col sums does not identify the largest
squate block in every situation
rs<-rowSums(d.mat)
cs<-colSums(d.mat)
d.mat1<-d.mat[order(rs), order(cs)]

Any help would be much appreciated

Thanks
Mike White

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lapply with mahalanobis function

2006-04-05 Thread Mike White
Hi
I am trying to use lapply with a function that requires two list type
variables but I cannot find a way to program it and could not find any hints
under lapply or searching the R-list.  As an example I have used the
mahalanobis function with the iris data. Is there a way to replace the for
loop with the lapply function or similar?

data(iris)
set.seed(0)
sa<-sample(nrow(iris),2)
train<-iris[!(1:nrow(iris) %in% sa),]
test<-iris[sa,1:4]
Sp.data<-split(train[,1:4], train[,"Species"])

cov.mat<-lapply(Sp.data, cov)
centroids<-lapply(Sp.data, function(x) apply(x,2,mean))

## can the for loop be replaced by an lapply function?
result<-list()
for (i in 1:length(Sp.data)){
result[[i]]<-mahalanobis(test, center=centroids[[i]], cov=cov.mat[[i]])
}

result
[[1]]
135  40
620.9687898   0.6085936

[[2]]
  13540
 27.61433 104.97892

[[3]]
  135    40
 10.96607 167.81203

Thanks
Mike White

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Setting xlim in lattice plots

2006-03-16 Thread Mike White
Sundar
Thanks, that is what I needed, but is it possible to customise the xlim
ranges for individual plots and if so how do I convert the dataframe of
range values to the list format required by the histogram fuction?

Mike

- Original Message - 
From: "Sundar Dorai-Raj" <[EMAIL PROTECTED]>
To: "Mike White" <[EMAIL PROTECTED]>
Cc: 
Sent: Wednesday, March 15, 2006 5:08 PM
Subject: Re: [R] Setting xlim in lattice plots


>
>
> Mike White wrote:
> > I am having difficulty setting different xlim values in the lattice
> > histogram plot function.
> > An example is shown below.  I think I need to convert the limits
data.frame
> > to a list of paired values but don't know how. Any help would be
> > appreciated.
> >
> > library(lattice)
> > mat <- as.data.frame(matrix(abs(c(rnorm(100),
> > 10*rnorm(100),20*rnorm(100),30*rnorm(100))),ncol=4))
> > colnames(mat)<-c("C1","C2","C3","C4")
> > mat2<-stack(mat)
> >
> > limits<-cbind(rep(0, ncol(mat)),apply(mat,2,max))
> > histogram(~ values | ind, data=mat2, xlim=limits,
> > scales=list(x=list(relation="free")))
> >
> > Thanks
> > Mike White
> >
>
>
> I think you want a breaks=NULL in your call to histogram:
>
> library(lattice)
> set.seed(42)
> mat <- data.frame(C1 = abs(rnorm(100)),
>C2 = abs(10 * rnorm(100)),
>C3 = abs(20 * rnorm(100)),
>C4 = abs(30 * rnorm(100)))
> mat2 <- stack(mat)
> histogram(~ values | ind, data = mat2, breaks = NULL,
>scales = list(x = list(relation = "free")))
>
> HTH,
>
> --sundar
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Setting xlim in lattice plots

2006-03-15 Thread Mike White
I am having difficulty setting different xlim values in the lattice
histogram plot function.
An example is shown below.  I think I need to convert the limits data.frame
to a list of paired values but don't know how. Any help would be
appreciated.

library(lattice)
mat <- as.data.frame(matrix(abs(c(rnorm(100),
10*rnorm(100),20*rnorm(100),30*rnorm(100))),ncol=4))
colnames(mat)<-c("C1","C2","C3","C4")
mat2<-stack(mat)

limits<-cbind(rep(0, ncol(mat)),apply(mat,2,max))
histogram(~ values | ind, data=mat2, xlim=limits,
scales=list(x=list(relation="free")))

Thanks
Mike White

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Unlink a directory with leading and trailing space

2006-01-04 Thread Mike White
Thanks for your suggestions.
I am using Windows 2000 Professional with R 2.1.0 and have replicated the
problem with the code below which creates the directory " testdir "

# pathin is the parent path and ends with /
pathtest<-paste(pathin, "testdir", "/")  # I forgot to include sep=""
dir.create(pathtest) # creates directory " testdir "

The directory cannot be deleted with the short name as below
setwd(pathin)
unlink("testdi~1")
unlink(" testdi~1")

>From the console I have also tried
rmdir " testdir "
but without success.

Mike

- Original Message - 
From: "Prof Brian Ripley" <[EMAIL PROTECTED]>
To: "Duncan Murdoch" <[EMAIL PROTECTED]>
Cc: "Mike White" <[EMAIL PROTECTED]>; 
Sent: Wednesday, January 04, 2006 1:49 PM
Subject: Re: [R] Unlink a directory with leading and trailing space


> On Wed, 4 Jan 2006, Duncan Murdoch wrote:
>
> > On 1/3/2006 11:01 AM, Mike White wrote:
> >> Using paste without defining a separator to generate a directory name
for
> >> dir.create, I have inadvertently created a directory with a leading and
> >> trailing space. I cannot now delete this directory with unlink or from
> >> Windows explorer.  Any help deleting this directory would be
appreciated.
> >
> > What name did you end up with?  I don't have any problem removing the
> > directory " test " just by right-clicking and choosing Delete.  You can
> > also open a command window, and put the name in quotes, e.g.
> >
> > rmdir " test "
>
> But AFAICS Windows always drops trailing spaces.
>
> > dir.create(" test ")
> > dir()[1]
> [1] " test"
>
> and similarly at the command prompt.
>
> > I do see the problem in unlink().
>
> Which in turn is a problem in MSVCRT's _rmdir.  The usual trick here is
> to use short path names instead, and that works e.g.
>
>   unlink("TEST~1", recursive=TRUE)
>
> R-devel has a shortPathName() function, so you could do
>
>   unlink(shortPathName(" test "), recursive = TRUE)
>
> (but I've now fixed the source code to use the short name).
>
> -- 
> Brian D. Ripley,  [EMAIL PROTECTED]
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel:  +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UKFax:  +44 1865 272595
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Unlink a directory with leading and trailing space

2006-01-04 Thread Mike White
Using paste without defining a separator to generate a directory name for
dir.create, I have inadvertently created a directory with a leading and
trailing space. I cannot now delete this directory with unlink or from
Windows explorer.  Any help deleting this directory would be appreciated.

Thanks
Mike White

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Plotting an ellipse in 3D

2005-09-09 Thread Mike White
I have been using the ellipse function from the car package and the
covariance matrix to draw an ellipse around a group of points to show the
confidence limits.  However, the points are actually represented by 3
variables so rather than plot each pairwise combination of variables in 2D I
would like to plot the 'ellipse' in 3D using the djmrgl package.  Can anyone
offer advice on how I can plot the surface of  a 3D 'ellipse' using the
covariance matrix to define the shape, so that the points inside can also be
seen.

Thanks
Mike White

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Re: Finding the right number of clusters

2005-05-23 Thread Mike White
Philip
I have been using the kgs function in the maptree package, although you
still need to decide on a weighting factor.

Mike White

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Help with predict.lm

2005-04-20 Thread Mike White
Ted
Thank you for giving me so much of your time to provide an explanation of
the likelihood ratio approach to the calibration problem.  It has certainly
provided me with a new insight into this problem. I will check out the
references you mentioned to get a better understanding of the statistics.
Out of interest I have also tried the function provided by David Lucy back
in March 2002 with a correction to the code and addition of the t-value.
The results are very close to those obtained by the likelihood ratio method,
although the output needs tidying up.


# R function to do classical calibration
# val is a data.frame of the new indep variables to predict dep

calib <- function(indep, dep, val, prob=0.95)
{

 # generate the model y on x and get out predicted values for x
reg <- lm(dep~indep)
xpre <- (val - coef(reg)[1])/coef(reg)[2]

# generate a confidence
yyat <- ((dep - predict(reg))^2)
sigyyat <- ((sum(yyat)/(length(dep)-2))^0.5)
term1 <- sigyyat/coef(reg)[2]
sigxxbar <- sum((indep - mean(indep))^2)
denom <- sigxxbar * ((coef(reg)[2])^2)
 t<-qt(p=(prob+(1-prob)/2), df=(length(dep)-2))
conf <- t*abs1+1/length(dep))+(((val -
  mean(dep))^2)/denom))^0.5)*term1)

results<-list(Predicted=xpre, Conf.interval=c(xpre-conf, xpre+conf))
return(results)
}

conc<-seq(100, 280, 20) #  mg/l
abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744, 1.852, 1.936, 2.046)
# absorbance units

new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))

calib(conc, abs, new.abs, prob=0.95)

$Predicted
   abs
1 131.2771
2 144.6649
3 168.1394

$Conf.interval
$Conf.interval$abs
[1] 124.4244 137.9334 161.5477

$Conf.interval$abs
[1] 138.1298 151.3964 174.7310

Thanks
Mike


- Original Message -
From: "Ted Harding" <[EMAIL PROTECTED]>
To: "Mike White" <[EMAIL PROTECTED]>
Cc: 
Sent: Wednesday, April 20, 2005 8:58 AM
Subject: RE: [R] Help with predict.lm


> Sorry, I was doing this too late last night!
> All stands as before except for the calculation at the end
> which is now corrected as follows:
>
> On 19-Apr-05 Ted Harding wrote:
> [code repeated for reference]
> > The following function implements the above expressions.
> > It is a very crude approach to solving the problem, and
> > I'm sure that a more thoughtful approach would lead more
> > directly to the answer, but at least it gets you there
> > eventually.
> >
> > ===
> >
> > R.calib <- function(x,y,X,Y){
> >   n<-length(x) ; mx<-mean(x) ; my<-mean(y) ;
> >   x<-(x-mx) ; y<-(y-my) ; X<-(X-mx) ; Y<-(Y-my)
> >
> >   ah<-0 ; bh<-(sum(x*y))/(sum(x^2)) ; Xh <- Y/bh
> >   sh2 <- (sum((y-ah-bh*x)^2))/(n+1)
> >
> >   D<-(n+1)*sum(x^2) + n*X^2
> >   at<-(Y*sum(x^2) - X*sum(x*y))/D; bt<-((n+1)*sum(x*y) + n*X*Y)/D
> >   st2<-(sum((y - at - bt*x)^2) + (Y - at - bt*X)^2)/(n+1)
> >
> >   R<-(sh2/st2)
> >
> >   F<-(n-2)*(1-R)/R
> >
> >   x<-(x+mx) ; y<-(y+my) ;
> >   X<-(X+mx) ; Y<-(Y+my) ; Xh<-(Xh+mx) ;
> >   PF<-(pf(F,1,(n-2)))
> >   list(x=x,y=y,X=X,Y=Y,R=R,F=F,PF=PF,
> >ahat=ah,bhat=bh,sh2=sh2,
> >atil=at,btil=bt,st2=st2,
> >Xhat=Xh)
> > }
> >
> > 
> >
> > Now lets take your original data and the first Y-value
> > in your list (namely Y = 1.251), and suppose you want
> > a 95% confidence interval for X. The X-value corresponding
> > to Y which you would get by regressing x (conc) on y (abs)
> > is X = 131.3813 so use this as a "starting value".
> >
> > So now run this function with x<-conc, y<-abs, and these values
> > of X and Y:
> >
> >   R.calib(x,y,131.3813,1.251)
> >
> > You get a long list of stuff, amongst which
> >
> >   $PF
> >   [1] 0.02711878
> >
> > and
> >
> >   $Xhat
> >   [1] 131.2771
> >
> > So now you know that Xhat (the MLE of X for that Y) = 131.2771
> > and the F-ratio probability is 0.027...
> >
> *> You want to push $PF upwards till it reaches 0.05, so work
> *> *outwards* in the X-value:
> WRONG!! Till it reaches ***0.95***
>
>   R.calib(x,y,125.,1.251)$PF
>   [1] 0.9301972
>
>   ...
>
>   R.calib(x,y,124.3510,1.251)$PF
>   [1] 0.949989
>
>
> > and you're there in that direction. Now go back to the MLE
> > and work out in the other direction:
> >
> >   R.calib(x,y,131.2771,1.251)$PF
> >   [1] 1.987305e-06
>
>   ...
>
>   R

Re: [R] Help with predict.lm

2005-04-19 Thread Mike White
Ted
I agree that with the example data the "inverse regression" approach would
be adequate, but it would be useful to have a function to predict the
results and confidence intervals correctly. I look forward to your solution
to the calibration problem.

In the mean time, I have looked at the calib function referred to by Andy
Liaw (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/1202.html) but the
formula for the confidence intervals appears to be incorrect (at least
according to "Quality Assurance in Analytical Chemistry, W.Funk, V. Dammann
and G.Donnevert). The brackets multipled by term1 should have been to the
power 0.5 (square root)  not 2 (squared). Also the function needs be
multiplied by the t value for the appropriate probability and degrees of
freedom. May corrected code is shown below, any suggestions for calculating
t?


# R function to do classical calibration
# input the x and y vectors and the value of y to make an
# estimate for
# val is a value of dep
# returns the calibrated value of x and it's
# single confidence interval about x
# form of calibration particularly suited to jackknifing
calib <- function(indep, dep, val)
{  # generate the model y on x and get out predicted values for x
reg <- lm(dep~indep)
xpre <- (val - coef(reg)[1])/coef(reg)[2]

# generate a confidence
yyat <- ((dep - predict(reg))^2)
sigyyat <- ((sum(yyat)/(length(dep)-2))^0.5)
term1 <- sigyyat/coef(reg)[2]
sigxxbar <- sum((indep - mean(indep))^2)
denom <- sigxxbar * ((coef(reg)[2])^2)
 t<- ## needs function to assign t value from t-table
conf <- t*abs(((1+(1/length(dep))+(((val -
mean(dep))^2)/denom))^0.5)*term1) ## squared term changed to square root
results<-list(Predicted=xpre, Confidence=conf)  ## returns results as list
to avoid warning message
return(results)
}

Thanks
Mike White
- Original Message -----
From: "Ted Harding" <[EMAIL PROTECTED]>
To: "Mike White" <[EMAIL PROTECTED]>
Cc: 
Sent: Tuesday, April 19, 2005 3:20 PM
Subject: RE: [R] Help with predict.lm


> On 19-Apr-05 Mike White wrote:
> > Hi
> > I have measured the UV absorbance (abs) of 10 solutions
> > of a substance at known concentrations (conc) and have
> > used a linear model to plot a calibration graph with
> > confidence limits.  I now want to predict the concentration
> > of solutions with UV absorbance results given in the
> > new.abs data.frame, however predict.lm only appears to work
> > for new "conc" variables not new "abs" variables.
> >
> > I have search the help files and did find a similar problem
> > in June 2000, but unfortunately no solution was offered.
> > Any help and how to use predict.lm with the new "abs" data
> > to predict "conc" with confidence limits would be appreciated.
> >
> > conc<-seq(100, 280, 20) #  mg/l
> > abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744,
> >1.852, 1.936,2.046) # absorbance units
> > lm.calibration<-lm(abs ~ conc)
> > pred.w.plim <- predict(lm.calibration,  interval="prediction")
> > pred.w.clim <- predict(lm.calibration,  interval="confidence")
> > matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]),
> >lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= "conc mg/l")
> > points(conc, abs, pch=21, col="blue")
> >
> > new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
> >
> > predict(calibration.lm, new.abs) # does not work
>
> Apart from the apparent typo "calibration.lm" which Matthias pointed
> out, this will not work anyway since "predict" predicts in the
> same direction as the model was fitted in the first place.
>
> You have fitted "abs ~ conc", so the lm object lm.calibration
> contains a representation of the model equivalent to
>
>   abs = a + b*conc
>
> When you use predict(calibration.lm, new.abs) it will expect
> the dataframe "new.abs" to contain values in a list named "conc".
> Since you have supplied a list named "abs", this is apparently
> ignored, and as a result the function uses the default dataframe
> which is the data encapsulated in the calibration.lm object.
>
> You can verify this by comparing the outputs of
>
>   predict(lm.calibration, new.abs)
>
> and
>
>   predict(lm.calibration)
>
> You will see that they are identical!
>
> Either way, "predict" predicts "ans" from "conc" by using the
> above equation. It does not attempt to invert the equation
> even if you call the "new" data

Re: [R] Help with predict.lm

2005-04-19 Thread Mike White
Andy
Thanks, the link was very helpful.  I havn't checked the code for the calib
function but it appears to work with my data.
[NB the return function generates a warning message with later verions of R
and needs to be amended to return a list]

In the last line of my code, calibration.lm should have been lm.calibration,
but it still does not work with predict.lm
I presume the solution of reversing the variables as in the linear model to
force predict.lm to work is an invalid statistical method as the regression
line will be slightly different, i.e.

predict(lm(conc ~ abs), new.abs) # gives slightly different results than
the calib function

Thanks to all who replied,
Mike White


- Original Message -
From: "Liaw, Andy" <[EMAIL PROTECTED]>
To: "'Mike White'" <[EMAIL PROTECTED]>; 
Sent: Tuesday, April 19, 2005 1:53 PM
Subject: RE: [R] Help with predict.lm


> Will this help?
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/1202.html
>
> [Found by RSiteSearch("calibration") in R-2.1.0.]
>
> Andy
>
> > From: Mike White
> >
> > Hi
> > I have measured the UV absorbance (abs) of 10 solutions of a
> > substance at
> > known concentrations (conc) and have used a linear model to plot a
> > calibration graph with confidence limits.  I now want to predict the
> > concentration of solutions with UV absorbance results given
> > in the  new.abs
> > data.frame, however predict.lm only appears to work for new
> > "conc" variables
> > not new "abs" variables.
> >
> > I have search the help files and did find a similar problem
> > in June 2000,
> > but unfortunately no solution was offered.
> > Any help and how to use predict.lm with the new "abs" data to
> > predict "conc"
> > with confidence limits would be appreciated.
> >
> > conc<-seq(100, 280, 20) #  mg/l
> > abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744,
> > 1.852, 1.936,
> > 2.046) # absorbance units
> > lm.calibration<-lm(abs ~ conc)
> > pred.w.plim <- predict(lm.calibration,  interval="prediction")
> > pred.w.clim <- predict(lm.calibration,  interval="confidence")
> > matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]),
> >  lty=c(1,2,2,3,3), type="l", ylab="abs", xlab=
> > "conc mg/l")
> > points(conc, abs, pch=21, col="blue")
> >
> > new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))
> >
> > predict(calibration.lm, new.abs) # does not work
> >
> >
> > Thanks
> > Mike White
> >
> > __
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
> >
> >
>
>
>
> --

> Notice:  This e-mail message, together with any attachment...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Help with predict.lm

2005-04-19 Thread Mike White
Hi
I have measured the UV absorbance (abs) of 10 solutions of a substance at
known concentrations (conc) and have used a linear model to plot a
calibration graph with confidence limits.  I now want to predict the
concentration of solutions with UV absorbance results given in the  new.abs
data.frame, however predict.lm only appears to work for new "conc" variables
not new "abs" variables.

I have search the help files and did find a similar problem in June 2000,
but unfortunately no solution was offered.
Any help and how to use predict.lm with the new "abs" data to predict "conc"
with confidence limits would be appreciated.

conc<-seq(100, 280, 20) #  mg/l
abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744, 1.852, 1.936,
2.046) # absorbance units
lm.calibration<-lm(abs ~ conc)
pred.w.plim <- predict(lm.calibration,  interval="prediction")
pred.w.clim <- predict(lm.calibration,  interval="confidence")
matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]),
 lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= "conc mg/l")
points(conc, abs, pch=21, col="blue")

new.abs<-data.frame(abs=c(1.251, 1.324, 1.452))

predict(calibration.lm, new.abs) # does not work


Thanks
Mike White

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Extracting a numeric prefix from a string

2005-02-02 Thread Mike White
Thanks for you contributions.  Jonnes' solution (after sorting) works fine
for my purposes but it would be useful to have a function that works for any
numeric prefix.  Another case to include would be a signed numeric:
x<-c("+12.3.abc", "-0.12xyz")

Mike
- Original Message -
From: "Peter Dalgaard" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: "R user" <[EMAIL PROTECTED]>; ; "Mike White"
<[EMAIL PROTECTED]>
Sent: Monday, January 31, 2005 11:05 PM
Subject: Re: [R] Extracting a numeric prefix from a string


> (Ted Harding) <[EMAIL PROTECTED]> writes:
>
> > On 31-Jan-05 R user wrote:
> > > You could use something like
> > >
> > > y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x)
> > > as.numeric(y)
> > >
> > > But maybe there's a much nicer way.
> > >
> > > Jonne.
> >
> > I doubt it -- full marks for neat regexp footwork!
>
> Hmm, I'd have to deduct a few points for forgetting to escape the dot...
>
> > x <- "2a4"
> > y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x)
> > y
> [1] "2a4"
> >  as.numeric(y)
> [1] NA
> Warning message:
> NAs introduced by coercion
>
> and maybe a few more for using gsub() where sub() suffices.
>
> There are a few more nits to pick, since "2.", ".2", "2e-7" are also
> numbers, but ".", ".e-2" are not. In fact it seems quite hard even to
> handle all cases in, e.g.,
>
>  x <- c("2.2abc","2.def",".2ghi",".jkl")
>
> with a single regular expression. The first one that worked for me was
>
> > r <- regexpr('^(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))',x)
> > substr(x,r,r+attr(r,"match.length")-1)
> [1] "2.2" "2."  ".2"  ""
>
> but several "obvious" attempts had failed.
>
> The problem is that regular expressions try to find the
> longest match, but not necessary of subexpressions, so
>
> > sub('(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))?.*','\\1',x)
> [1] "2." "2." ".2" ""
>
> even though
>
> > sub('(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))','XXX',x)
> [1] "XXXabc" "XXXdef" "XXXghi" ".jkl"
>
> Actually, this one comes pretty close:
>
> > sub('([0-9]*(\\.[0-9]+)?)?.*','\\1',x)
> [1] "2.2" "2"   ".2"  ""
>
> It only loses a trailing dot which is immaterial in the present
> context. However, next try extending the RE to handle an exponent
> part...
>
> --
>O__   Peter Dalgaard Blegdamsvej 3
>   c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
>  (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
> ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907
>

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Extracting a numeric prefix from a string

2005-01-31 Thread Mike White
Hi
Does anyone know if there is a function similar to as.numeric that will
extract a numeric prefix from a string as in the following examples?

x<-c(3, "abc", 5.67, "2.4a", "6a", "6b", "2.4.a", 3, "4.2a")
df.x<-data.frame(Code=x)
x.str<-levels(df.x[,1])
# required function  result
2.40 3.00 4.20 5.67 6.00 NA

Thanks
Mike White

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Ward clustering problem

2004-06-04 Thread Mike White
I have a training set of data for known classes with 5 observations of 12
variables for each class.  I want to use this information to classify new
data into classes which are known to be different to those in the training
set but each new class may contain one or more observations.  The
distribution of within class distances is expected to be similar for all
classes and this is found to be the case for the training data.  I have
tried using the maximum within class distance for the training data to set
the h variable in cutree for the clustered new data.  This appears to work
fine for "average" and "complete" clustering methods but not for the Ward
clustering method as the distance axis of the dendrogram does not directly
relate to the distances between observations.

Can anyone advise on how to optimise the h value of cutree when using the
Ward clustering method or is there a better approach to this type of
classification problem?

Thanks
Mike White

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Problem with mclust surfacePlot function

2004-05-06 Thread Mike White
I am trying to follow the mclust examples in "MCLUST: Software for Model
Based Clustering, Density Estimation and Disriminant Analysis" by Chris
Fraley and Adrian Raftery, but I cannot reproduce the density and
uncertainty surfaces for the Lansing Woods maples.  I am using R 1.8.1 with
the code below.  The same code works fine in S-Plus 6.2

Am I missing something or is this a bug?

Thanks
Mike White

library(mclust)
data(lansing) # R only
maples<-lansing[as.character(lansing[,"species"]) == "maple", -3]
maplesBIC <- EMclust(maples)
maplesModel <- summary(maplesBIC, maples)
plotMaples2 <- function(type, what, transformation)
{
 out <- do.call("surfacePlot", c(maplesModel, list(data=maples, type=type,
what=what,transformation=transformation)))
 invisible()
}
par(pty="s", mfrow=c(1,2))
plotMaples2(type="contour", what="density", transformation="log")
par(pty="s")
plotMaples2(type="contour", what="uncertainty", transformation = "log")

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] predict.lda problem with posterior probabilities

2004-01-23 Thread Mike White
With predict.lda the posterior probabilities only relate to the existing
Class definitions.  This is fine for Class definitions like gender but it is
a problem when new data does not necessarily belong to an existing Class.
Is there a classification method that gives posterior probabilities for
Class membership and does not assume the new data must belong to one of the
existing Classes?  A new class would then be indicated by low posterior
probabilities for all the existing Classes.

Thanks
Mike White

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Changing distance scale in plclust()

2003-12-01 Thread Mike White
I want to plot the cluster trees from the results of hclust() on different
datasets, but all with the same distance scale corresponding to the dataset
with the largest distance range.  However, plclust() does not accept ylim
values.  Does anyone know of a way around this problem?

Mike White

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Fw: [R] SIMCA algorithm implementation

2003-10-15 Thread Mike White
I have used PCA for data classification by visual examination of the 3D
scatter plot of the first 3 principal components.  I now want to use the
results to predict the class for new data.  I have used predict.princomp to
predict the scores and then visualise the results on a 3D scatter plot using
the rgl library.  However, is there an R function that will fit the new data
to the class assignments derived from PCA?  I think this is similar to what
the SIMCA algoirthm does.

Thanks
Mike

- Original Message -
From: "Mike White" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, October 08, 2003 9:25 AM
Subject: Re: [R] SIMCA algorithm implementation


> Dear All
> Is there a SIMCA (Soft Independent Modelling Class Analogy) implementation
> on R or does anyone know if is it possible to replicate the SIMCA
algorithm
> using existing R functions?
>
> Thanks
> Mike White
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] SIMCA algorithm implementation

2003-10-08 Thread Mike White
Dear All
Is there a SIMCA (Soft Independent Modelling Class Analogy) implementation
on R or does anyone know if is it possible to replicate the SIMCA algorithm
using existing R functions?

Thanks
Mike White

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] Rounding problem R vs Excel

2003-06-03 Thread Mike White
Duncan
If the numbers are not represently exactly how does R resolve problems like
the one below? Is there something that needs to be set up in the R
environment like the number of significant figures?

> x<-4.145*100+0.5
> x
[1] 415
> floor(x)
[1] 414
> as.integer(x)
[1] 414
> trunc(x)
[1] 414

Mike White
- Original Message -
From: "Duncan Murdoch" <[EMAIL PROTECTED]>
To: "Hedderik van Rijn" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, June 03, 2003 2:40 AM
Subject: Re: [R] Rounding problem R vs Excel


> On Mon, 2 Jun 2003 20:50:20 -0400, you wrote:
>
> >Does this script do what you want?
> >
> >cround <- function(x,digits=0) {
> >   a <- ifelse(x>0,.5,-.5)
> >   if (digits==0) {
> > floor(x+a)
> >   } else {
> > m <- 10^digits
> > floor(x*m+a)/m
> >   }
> >}
>
> No, the problem is that R uses binary formats, and some numbers aren't
> representable there.  So for example,
>
> > cround(0.145,2)
> [1] 0.14
>
> because 0.145 isn't representable exactly, and is actually being
> represented as 0.149 or something similar.
>
> Duncan Murdoch
>
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


[R] Rounding problem R vs Excel

2003-06-02 Thread Mike White
For numbers ending in 5 Excel always rounds up and I want to reproduce this in R 
rather than use the round function.
I have tried:

x<-as.integer(x*100+0.5)/100

and

x<-floor(x*100+0.5)/100

However, some values of x cause problems e.g

x<-floor(4.145*100+0.5)/100
4.14

I have tried breaking it down into steps and force the results to 10 significant 
figures in the following function, which seems to work

roundoff<-function(x){
a<-signif(x*100+0.5,digits=10)
roundoff<-floor(a)/100
roundoff
}

Is there a more sensible way of do this for all numbers?

Thanks
Mike

[[alternate HTML version deleted]]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help