Re: [R] Fisher's r to z' transformation - help needed
Duncan and Peter [resent to include on R-Help] Thank you for your help. It seems my data structure is not suitable for use with Fisher's z' transformation. The simulated data was intended to represent the outputs of several instruments over time. Each row of dat represents the output from one instrument on a particular day and each column represents a variable being measured. Each instrument sensitivity is different and may be a small offset, so that the output is effectively transformed as ax +b where x is the 'true' output and the values of a and b are not known. Pearson's r was therefore used to check the correlation between outputs. I then want to plot the r values on a control chart and set an upper warning line and action line for a maximum acceptable value for 1-r based on either comparing each output with every other output or by comparing to a mean of the outputs. I was then hoping to use Fisher's z' transformation to set the usual warning and action lines based on a single sided normal distribution. The only alternative I can think of is to use the simulation to produce the r distribution and then use the quantile function to set limits based on probability? I would be grateful for any help and advice you can provide. Thanks Mike White - Original Message - From: "Duncan Murdoch" <[EMAIL PROTECTED]> To: "Mike White" <[EMAIL PROTECTED]> Cc: Sent: Wednesday, May 23, 2007 1:38 PM Subject: Re: [R] Fisher's r to z' transformation - help needed > On 5/23/2007 7:40 AM, Mike White wrote: > > I am trying to use Fisher's z' transformation of the Pearson's r but the > > standard error does not appear to be correct. I have simulated an example > > using the R code below. The z' data appears to have a reasonably normal > > distribution but the standard error given by the formula 1/sqrt(N-3) (from > > http://davidmlane.com/hyperstat/A98696.html) gives a different results than > > sd(z). Can anyone tell me where I am going wrong? > > Your simulation is very strange. Why are you calculating the > correlation of data with its own mean? > > Here's a simpler simulation that seems to confirm the approximation is > reasonable: > > > p <- 10 > > sdx <- 1 > > sdy <- 1 > > x <- matrix(rnorm(1000*p, sd=sdx), 1000, p) > > y <- matrix(rnorm(1000*p, mean=x, sd=sdy), 1000, p) > > The true correlation is sdx/sqrt(sdx^2 + sdy^2), i.e. 0.71. > > > r <- numeric(1000) > > for (i in 1:1000) r[i] <- cor(x[i,], y[i,]) > > f <- 0.5*(log(1+r) - log(1-r)) > > sd(f) > [1] 0.3739086 > > 1/sqrt(p-3) > [1] 0.3779645 > > > p <- 5 > > x <- matrix(rnorm(1000*p, sd=sdx), 1000, p) > > y <- matrix(rnorm(1000*p, mean=x, sd=sdy), 1000, p) > > r <- numeric(1000) > > for (i in 1:1000) r[i] <- cor(x[i,], y[i,]) > > f <- 0.5*(log(1+r) - log(1-r)) > > sd(f) > [1] 0.6571383 > > 1/sqrt(p-3) > [1] 0.7071068 > > Duncan Murdoch > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fisher's r to z' transformation - help needed
I am trying to use Fisher's z' transformation of the Pearson's r but the standard error does not appear to be correct. I have simulated an example using the R code below. The z' data appears to have a reasonably normal distribution but the standard error given by the formula 1/sqrt(N-3) (from http://davidmlane.com/hyperstat/A98696.html) gives a different results than sd(z). Can anyone tell me where I am going wrong? library(amap) ## SIMULATED DATA # p<-10 n<-3000 means<-1000*c(1:p) SDs<-rep(100,p) set.seed(1) dat<-mapply(rnorm, mean=means, sd=SDs, n=n) colnames(dat)<-paste("V",1:p, sep="") rownames(dat)<-1:n # calculate centroid of simulated data dat.mean<-apply(dat,2,mean) # calculated Pearson's r to centroid r<-apply(dat,1,cor, y=dat.mean) plot(density(r)) # Fisher's z' transformation z<-0.5*log((1+r)/(1-r)) plot(density(z)) sd(z) # [1] 0.2661779 1/sqrt(p-3) # [1] 0.3779645 ## alternatively use comparisons for all possible pairs ## Centred Pearson's r on raw data r<-1-Dist(dat,"corr") plot(density(r)) z<-0.5*log((1+r)/(1-r)) plot(density(z)) sd(z) # [1] 0.2669787 1/sqrt(p-3) # [1] 0.3779645 Many thanks Mike White __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mahalanobis distance and probability of group membership using Hotelling's T2 distribution
I want to calculate the probability that a group will include a particular point using the squared Mahalanobis distance to the centroid. I understand that the squared Mahalanobis distance is distributed as chi-squared but that for a small number of random samples from a multivariate normal population the Hotellings T2 (T squared) distribution should be used. I cannot find a function for Hotelling's T2 distribution in R (although from a previous post I have been provided with functions for the Hotelling Test). My understanding is that the Hotelling's T2 distribution is related to the F distribution using the equation: T2(u,v) = F(u, v-u+1)*vu/(v-u+1) where u is the number of variables and v the number of group members. I have written the R code below to compare the results from the chi-squared distribution with the Hotelling's T2 distribution for probability of a member being included within a group. Please can anyone confirm whether or not this is the correct way to use Hotelling's T2 distribution for probability of group membership. Also, when testing a particular group member, is it preferable to leave that member out when calculating the centre and covariance of the group for the Mahalanobis distances? Thanks Mike White ## Hotelling T^2 distribution function ph<-function(q, u, v, ...){ # q vector of quantiles as in function pf # u number of independent variables # v number of observations if (!v > u+1) stop("n must be greater than p+1") df1 <- u df2 <- v-u+1 pf(q*df2/(v*u), df1, df2, ...) } # compare Chi-squared and Hotelling T^2 distributions for a group member u<-3 v<-10 set.seed(1) mat<-matrix(rnorm(v*u), nrow=v, ncol=u) MD2<-mahalanobis(mat, center=colMeans(mat), cov=cov(mat)) d<-MD2[order(MD2)] # select a point midway between nearest and furthest from centroid dm<-d[length(d)/2] 1-ph(dm,u,v)# probability using Hotelling T^2 distribution # [1] 0.6577069 1-pchisq(dm, u) # probability using Chi-squared distribution # [1] 0.5538466 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mahalanobis distance and probability of group membership using Hotelling's T2 distribution
I want to calculate the probability that a group will include a particular point using the squared Mahalanobis distance to the centroid. I understand that the squared Mahalanobis distance is distributed as chi-squared but that for a small number of random samples from a multivariate normal population the Hotellings T2 (T squared) distribution should be used. I cannot find a function for Hotelling's T2 distribution in R (although from a previous post I have been provided with functions for the Hotelling Test). My understanding is that the Hotelling's T2 distribution is related to the F distribution using the equation: T2(u,v) = F(u, v-u+1)*vu/(v-u+1) where u is the number of variables and v the number of group members. I have written the R code below to compare the results from the chi-squared distribution with the Hotelling's T2 distribution for probability of a member being included within a group. Please can anyone confirm whether or not this is the correct way to use Hotelling's T2 distribution for probability of group membership. Also, when testing a particular group member, is it preferable to leave that member out when calculating the centre and covariance of the group for the Mahalanobis distances? Thanks Mike White ## Hotelling T^2 distribution function ph<-function(q, u, v, ...){ # q vector of quantiles as in function pf # u number of independent variables # v number of observations if (!v > u+1) stop("n must be greater than p+1") df1 <- u df2 <- v-u+1 pf(q*df2/(v*u), df1, df2, ...) } # compare Chi-squared and Hotelling T^2 distributions for a group member u<-3 v<-10 set.seed(1) mat<-matrix(rnorm(v*u), nrow=v, ncol=u) MD2<-mahalanobis(mat, center=colMeans(mat), cov=cov(mat)) d<-MD2[order(MD2)] # select a point midway between nearest and furthest from centroid dm<-d[length(d)/2] 1-ph(dm,u,v)# probability using Hotelling T^2 distribution # [1] 0.6577069 1-pchisq(dm, u) # probability using Chi-squared distribution # [1] 0.5538466 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Confidence intervals of quantiles
Can anyone please tell me if there is a function to calculate confidence intervals for the results of the quantile function. Some of my data is normally distributed but some is also a squewed distribution or a capped normal distribution. Some of the data sets contain about 700 values whereas others are smaller with about 100-150 values, so I would like to see how the confidence intervals change for the different distributions and different data sizes. Thanks Mike White __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Block clustering and largest square block in binary matrix
Hi I have a large binary square matrix (about 2500 x 2500) obtained from a distance matrix on x (smaller simulation below), in which I want to find the largest square block of 1's for all permutations of the rows and columns. I then want to remove the rows and columns corresponding to this block and repeat the process, so that finally all the rows of data in x are assigned to a particular square block or are isolates. Does anyone know of a function or algorithm that can do this? I have looked at the blockmodel function in the sna package but this does not appear to find the largest block and then the next largest block etc. I have also search the email archives without success. rm(list=ls()) set.seed(1) x<-rnorm(100,10,1) d.mat<-as.matrix(dist(x)) d.mat[d.mat<=0.2]<- -1 d.mat[d.mat>0.2]<-0 d.mat<-d.mat*-1 ## I have tried sorting by row and col sums does not identify the largest squate block in every situation rs<-rowSums(d.mat) cs<-colSums(d.mat) d.mat1<-d.mat[order(rs), order(cs)] Any help would be much appreciated Thanks Mike White __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lapply with mahalanobis function
Hi I am trying to use lapply with a function that requires two list type variables but I cannot find a way to program it and could not find any hints under lapply or searching the R-list. As an example I have used the mahalanobis function with the iris data. Is there a way to replace the for loop with the lapply function or similar? data(iris) set.seed(0) sa<-sample(nrow(iris),2) train<-iris[!(1:nrow(iris) %in% sa),] test<-iris[sa,1:4] Sp.data<-split(train[,1:4], train[,"Species"]) cov.mat<-lapply(Sp.data, cov) centroids<-lapply(Sp.data, function(x) apply(x,2,mean)) ## can the for loop be replaced by an lapply function? result<-list() for (i in 1:length(Sp.data)){ result[[i]]<-mahalanobis(test, center=centroids[[i]], cov=cov.mat[[i]]) } result [[1]] 135 40 620.9687898 0.6085936 [[2]] 13540 27.61433 104.97892 [[3]] 135 40 10.96607 167.81203 Thanks Mike White __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Setting xlim in lattice plots
Sundar Thanks, that is what I needed, but is it possible to customise the xlim ranges for individual plots and if so how do I convert the dataframe of range values to the list format required by the histogram fuction? Mike - Original Message - From: "Sundar Dorai-Raj" <[EMAIL PROTECTED]> To: "Mike White" <[EMAIL PROTECTED]> Cc: Sent: Wednesday, March 15, 2006 5:08 PM Subject: Re: [R] Setting xlim in lattice plots > > > Mike White wrote: > > I am having difficulty setting different xlim values in the lattice > > histogram plot function. > > An example is shown below. I think I need to convert the limits data.frame > > to a list of paired values but don't know how. Any help would be > > appreciated. > > > > library(lattice) > > mat <- as.data.frame(matrix(abs(c(rnorm(100), > > 10*rnorm(100),20*rnorm(100),30*rnorm(100))),ncol=4)) > > colnames(mat)<-c("C1","C2","C3","C4") > > mat2<-stack(mat) > > > > limits<-cbind(rep(0, ncol(mat)),apply(mat,2,max)) > > histogram(~ values | ind, data=mat2, xlim=limits, > > scales=list(x=list(relation="free"))) > > > > Thanks > > Mike White > > > > > I think you want a breaks=NULL in your call to histogram: > > library(lattice) > set.seed(42) > mat <- data.frame(C1 = abs(rnorm(100)), >C2 = abs(10 * rnorm(100)), >C3 = abs(20 * rnorm(100)), >C4 = abs(30 * rnorm(100))) > mat2 <- stack(mat) > histogram(~ values | ind, data = mat2, breaks = NULL, >scales = list(x = list(relation = "free"))) > > HTH, > > --sundar > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Setting xlim in lattice plots
I am having difficulty setting different xlim values in the lattice histogram plot function. An example is shown below. I think I need to convert the limits data.frame to a list of paired values but don't know how. Any help would be appreciated. library(lattice) mat <- as.data.frame(matrix(abs(c(rnorm(100), 10*rnorm(100),20*rnorm(100),30*rnorm(100))),ncol=4)) colnames(mat)<-c("C1","C2","C3","C4") mat2<-stack(mat) limits<-cbind(rep(0, ncol(mat)),apply(mat,2,max)) histogram(~ values | ind, data=mat2, xlim=limits, scales=list(x=list(relation="free"))) Thanks Mike White __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Unlink a directory with leading and trailing space
Thanks for your suggestions. I am using Windows 2000 Professional with R 2.1.0 and have replicated the problem with the code below which creates the directory " testdir " # pathin is the parent path and ends with / pathtest<-paste(pathin, "testdir", "/") # I forgot to include sep="" dir.create(pathtest) # creates directory " testdir " The directory cannot be deleted with the short name as below setwd(pathin) unlink("testdi~1") unlink(" testdi~1") >From the console I have also tried rmdir " testdir " but without success. Mike - Original Message - From: "Prof Brian Ripley" <[EMAIL PROTECTED]> To: "Duncan Murdoch" <[EMAIL PROTECTED]> Cc: "Mike White" <[EMAIL PROTECTED]>; Sent: Wednesday, January 04, 2006 1:49 PM Subject: Re: [R] Unlink a directory with leading and trailing space > On Wed, 4 Jan 2006, Duncan Murdoch wrote: > > > On 1/3/2006 11:01 AM, Mike White wrote: > >> Using paste without defining a separator to generate a directory name for > >> dir.create, I have inadvertently created a directory with a leading and > >> trailing space. I cannot now delete this directory with unlink or from > >> Windows explorer. Any help deleting this directory would be appreciated. > > > > What name did you end up with? I don't have any problem removing the > > directory " test " just by right-clicking and choosing Delete. You can > > also open a command window, and put the name in quotes, e.g. > > > > rmdir " test " > > But AFAICS Windows always drops trailing spaces. > > > dir.create(" test ") > > dir()[1] > [1] " test" > > and similarly at the command prompt. > > > I do see the problem in unlink(). > > Which in turn is a problem in MSVCRT's _rmdir. The usual trick here is > to use short path names instead, and that works e.g. > > unlink("TEST~1", recursive=TRUE) > > R-devel has a shortPathName() function, so you could do > > unlink(shortPathName(" test "), recursive = TRUE) > > (but I've now fixed the source code to use the short name). > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Unlink a directory with leading and trailing space
Using paste without defining a separator to generate a directory name for dir.create, I have inadvertently created a directory with a leading and trailing space. I cannot now delete this directory with unlink or from Windows explorer. Any help deleting this directory would be appreciated. Thanks Mike White __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Plotting an ellipse in 3D
I have been using the ellipse function from the car package and the covariance matrix to draw an ellipse around a group of points to show the confidence limits. However, the points are actually represented by 3 variables so rather than plot each pairwise combination of variables in 2D I would like to plot the 'ellipse' in 3D using the djmrgl package. Can anyone offer advice on how I can plot the surface of a 3D 'ellipse' using the covariance matrix to define the shape, so that the points inside can also be seen. Thanks Mike White __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Re: Finding the right number of clusters
Philip I have been using the kgs function in the maptree package, although you still need to decide on a weighting factor. Mike White __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help with predict.lm
Ted Thank you for giving me so much of your time to provide an explanation of the likelihood ratio approach to the calibration problem. It has certainly provided me with a new insight into this problem. I will check out the references you mentioned to get a better understanding of the statistics. Out of interest I have also tried the function provided by David Lucy back in March 2002 with a correction to the code and addition of the t-value. The results are very close to those obtained by the likelihood ratio method, although the output needs tidying up. # R function to do classical calibration # val is a data.frame of the new indep variables to predict dep calib <- function(indep, dep, val, prob=0.95) { # generate the model y on x and get out predicted values for x reg <- lm(dep~indep) xpre <- (val - coef(reg)[1])/coef(reg)[2] # generate a confidence yyat <- ((dep - predict(reg))^2) sigyyat <- ((sum(yyat)/(length(dep)-2))^0.5) term1 <- sigyyat/coef(reg)[2] sigxxbar <- sum((indep - mean(indep))^2) denom <- sigxxbar * ((coef(reg)[2])^2) t<-qt(p=(prob+(1-prob)/2), df=(length(dep)-2)) conf <- t*abs1+1/length(dep))+(((val - mean(dep))^2)/denom))^0.5)*term1) results<-list(Predicted=xpre, Conf.interval=c(xpre-conf, xpre+conf)) return(results) } conc<-seq(100, 280, 20) # mg/l abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744, 1.852, 1.936, 2.046) # absorbance units new.abs<-data.frame(abs=c(1.251, 1.324, 1.452)) calib(conc, abs, new.abs, prob=0.95) $Predicted abs 1 131.2771 2 144.6649 3 168.1394 $Conf.interval $Conf.interval$abs [1] 124.4244 137.9334 161.5477 $Conf.interval$abs [1] 138.1298 151.3964 174.7310 Thanks Mike - Original Message - From: "Ted Harding" <[EMAIL PROTECTED]> To: "Mike White" <[EMAIL PROTECTED]> Cc: Sent: Wednesday, April 20, 2005 8:58 AM Subject: RE: [R] Help with predict.lm > Sorry, I was doing this too late last night! > All stands as before except for the calculation at the end > which is now corrected as follows: > > On 19-Apr-05 Ted Harding wrote: > [code repeated for reference] > > The following function implements the above expressions. > > It is a very crude approach to solving the problem, and > > I'm sure that a more thoughtful approach would lead more > > directly to the answer, but at least it gets you there > > eventually. > > > > === > > > > R.calib <- function(x,y,X,Y){ > > n<-length(x) ; mx<-mean(x) ; my<-mean(y) ; > > x<-(x-mx) ; y<-(y-my) ; X<-(X-mx) ; Y<-(Y-my) > > > > ah<-0 ; bh<-(sum(x*y))/(sum(x^2)) ; Xh <- Y/bh > > sh2 <- (sum((y-ah-bh*x)^2))/(n+1) > > > > D<-(n+1)*sum(x^2) + n*X^2 > > at<-(Y*sum(x^2) - X*sum(x*y))/D; bt<-((n+1)*sum(x*y) + n*X*Y)/D > > st2<-(sum((y - at - bt*x)^2) + (Y - at - bt*X)^2)/(n+1) > > > > R<-(sh2/st2) > > > > F<-(n-2)*(1-R)/R > > > > x<-(x+mx) ; y<-(y+my) ; > > X<-(X+mx) ; Y<-(Y+my) ; Xh<-(Xh+mx) ; > > PF<-(pf(F,1,(n-2))) > > list(x=x,y=y,X=X,Y=Y,R=R,F=F,PF=PF, > >ahat=ah,bhat=bh,sh2=sh2, > >atil=at,btil=bt,st2=st2, > >Xhat=Xh) > > } > > > > > > > > Now lets take your original data and the first Y-value > > in your list (namely Y = 1.251), and suppose you want > > a 95% confidence interval for X. The X-value corresponding > > to Y which you would get by regressing x (conc) on y (abs) > > is X = 131.3813 so use this as a "starting value". > > > > So now run this function with x<-conc, y<-abs, and these values > > of X and Y: > > > > R.calib(x,y,131.3813,1.251) > > > > You get a long list of stuff, amongst which > > > > $PF > > [1] 0.02711878 > > > > and > > > > $Xhat > > [1] 131.2771 > > > > So now you know that Xhat (the MLE of X for that Y) = 131.2771 > > and the F-ratio probability is 0.027... > > > *> You want to push $PF upwards till it reaches 0.05, so work > *> *outwards* in the X-value: > WRONG!! Till it reaches ***0.95*** > > R.calib(x,y,125.,1.251)$PF > [1] 0.9301972 > > ... > > R.calib(x,y,124.3510,1.251)$PF > [1] 0.949989 > > > > and you're there in that direction. Now go back to the MLE > > and work out in the other direction: > > > > R.calib(x,y,131.2771,1.251)$PF > > [1] 1.987305e-06 > > ... > > R
Re: [R] Help with predict.lm
Ted I agree that with the example data the "inverse regression" approach would be adequate, but it would be useful to have a function to predict the results and confidence intervals correctly. I look forward to your solution to the calibration problem. In the mean time, I have looked at the calib function referred to by Andy Liaw (http://finzi.psych.upenn.edu/R/Rhelp02a/archive/1202.html) but the formula for the confidence intervals appears to be incorrect (at least according to "Quality Assurance in Analytical Chemistry, W.Funk, V. Dammann and G.Donnevert). The brackets multipled by term1 should have been to the power 0.5 (square root) not 2 (squared). Also the function needs be multiplied by the t value for the appropriate probability and degrees of freedom. May corrected code is shown below, any suggestions for calculating t? # R function to do classical calibration # input the x and y vectors and the value of y to make an # estimate for # val is a value of dep # returns the calibrated value of x and it's # single confidence interval about x # form of calibration particularly suited to jackknifing calib <- function(indep, dep, val) { # generate the model y on x and get out predicted values for x reg <- lm(dep~indep) xpre <- (val - coef(reg)[1])/coef(reg)[2] # generate a confidence yyat <- ((dep - predict(reg))^2) sigyyat <- ((sum(yyat)/(length(dep)-2))^0.5) term1 <- sigyyat/coef(reg)[2] sigxxbar <- sum((indep - mean(indep))^2) denom <- sigxxbar * ((coef(reg)[2])^2) t<- ## needs function to assign t value from t-table conf <- t*abs(((1+(1/length(dep))+(((val - mean(dep))^2)/denom))^0.5)*term1) ## squared term changed to square root results<-list(Predicted=xpre, Confidence=conf) ## returns results as list to avoid warning message return(results) } Thanks Mike White - Original Message ----- From: "Ted Harding" <[EMAIL PROTECTED]> To: "Mike White" <[EMAIL PROTECTED]> Cc: Sent: Tuesday, April 19, 2005 3:20 PM Subject: RE: [R] Help with predict.lm > On 19-Apr-05 Mike White wrote: > > Hi > > I have measured the UV absorbance (abs) of 10 solutions > > of a substance at known concentrations (conc) and have > > used a linear model to plot a calibration graph with > > confidence limits. I now want to predict the concentration > > of solutions with UV absorbance results given in the > > new.abs data.frame, however predict.lm only appears to work > > for new "conc" variables not new "abs" variables. > > > > I have search the help files and did find a similar problem > > in June 2000, but unfortunately no solution was offered. > > Any help and how to use predict.lm with the new "abs" data > > to predict "conc" with confidence limits would be appreciated. > > > > conc<-seq(100, 280, 20) # mg/l > > abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744, > >1.852, 1.936,2.046) # absorbance units > > lm.calibration<-lm(abs ~ conc) > > pred.w.plim <- predict(lm.calibration, interval="prediction") > > pred.w.clim <- predict(lm.calibration, interval="confidence") > > matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]), > >lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= "conc mg/l") > > points(conc, abs, pch=21, col="blue") > > > > new.abs<-data.frame(abs=c(1.251, 1.324, 1.452)) > > > > predict(calibration.lm, new.abs) # does not work > > Apart from the apparent typo "calibration.lm" which Matthias pointed > out, this will not work anyway since "predict" predicts in the > same direction as the model was fitted in the first place. > > You have fitted "abs ~ conc", so the lm object lm.calibration > contains a representation of the model equivalent to > > abs = a + b*conc > > When you use predict(calibration.lm, new.abs) it will expect > the dataframe "new.abs" to contain values in a list named "conc". > Since you have supplied a list named "abs", this is apparently > ignored, and as a result the function uses the default dataframe > which is the data encapsulated in the calibration.lm object. > > You can verify this by comparing the outputs of > > predict(lm.calibration, new.abs) > > and > > predict(lm.calibration) > > You will see that they are identical! > > Either way, "predict" predicts "ans" from "conc" by using the > above equation. It does not attempt to invert the equation > even if you call the "new" data
Re: [R] Help with predict.lm
Andy Thanks, the link was very helpful. I havn't checked the code for the calib function but it appears to work with my data. [NB the return function generates a warning message with later verions of R and needs to be amended to return a list] In the last line of my code, calibration.lm should have been lm.calibration, but it still does not work with predict.lm I presume the solution of reversing the variables as in the linear model to force predict.lm to work is an invalid statistical method as the regression line will be slightly different, i.e. predict(lm(conc ~ abs), new.abs) # gives slightly different results than the calib function Thanks to all who replied, Mike White - Original Message - From: "Liaw, Andy" <[EMAIL PROTECTED]> To: "'Mike White'" <[EMAIL PROTECTED]>; Sent: Tuesday, April 19, 2005 1:53 PM Subject: RE: [R] Help with predict.lm > Will this help? > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/1202.html > > [Found by RSiteSearch("calibration") in R-2.1.0.] > > Andy > > > From: Mike White > > > > Hi > > I have measured the UV absorbance (abs) of 10 solutions of a > > substance at > > known concentrations (conc) and have used a linear model to plot a > > calibration graph with confidence limits. I now want to predict the > > concentration of solutions with UV absorbance results given > > in the new.abs > > data.frame, however predict.lm only appears to work for new > > "conc" variables > > not new "abs" variables. > > > > I have search the help files and did find a similar problem > > in June 2000, > > but unfortunately no solution was offered. > > Any help and how to use predict.lm with the new "abs" data to > > predict "conc" > > with confidence limits would be appreciated. > > > > conc<-seq(100, 280, 20) # mg/l > > abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744, > > 1.852, 1.936, > > 2.046) # absorbance units > > lm.calibration<-lm(abs ~ conc) > > pred.w.plim <- predict(lm.calibration, interval="prediction") > > pred.w.clim <- predict(lm.calibration, interval="confidence") > > matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]), > > lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= > > "conc mg/l") > > points(conc, abs, pch=21, col="blue") > > > > new.abs<-data.frame(abs=c(1.251, 1.324, 1.452)) > > > > predict(calibration.lm, new.abs) # does not work > > > > > > Thanks > > Mike White > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > > > > > -- > Notice: This e-mail message, together with any attachment...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Help with predict.lm
Hi I have measured the UV absorbance (abs) of 10 solutions of a substance at known concentrations (conc) and have used a linear model to plot a calibration graph with confidence limits. I now want to predict the concentration of solutions with UV absorbance results given in the new.abs data.frame, however predict.lm only appears to work for new "conc" variables not new "abs" variables. I have search the help files and did find a similar problem in June 2000, but unfortunately no solution was offered. Any help and how to use predict.lm with the new "abs" data to predict "conc" with confidence limits would be appreciated. conc<-seq(100, 280, 20) # mg/l abs<-c(1.064, 1.177, 1.303, 1.414, 1.534, 1.642, 1.744, 1.852, 1.936, 2.046) # absorbance units lm.calibration<-lm(abs ~ conc) pred.w.plim <- predict(lm.calibration, interval="prediction") pred.w.clim <- predict(lm.calibration, interval="confidence") matplot(conc, cbind(pred.w.clim, pred.w.plim[,-1]), lty=c(1,2,2,3,3), type="l", ylab="abs", xlab= "conc mg/l") points(conc, abs, pch=21, col="blue") new.abs<-data.frame(abs=c(1.251, 1.324, 1.452)) predict(calibration.lm, new.abs) # does not work Thanks Mike White __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Extracting a numeric prefix from a string
Thanks for you contributions. Jonnes' solution (after sorting) works fine for my purposes but it would be useful to have a function that works for any numeric prefix. Another case to include would be a signed numeric: x<-c("+12.3.abc", "-0.12xyz") Mike - Original Message - From: "Peter Dalgaard" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: "R user" <[EMAIL PROTECTED]>; ; "Mike White" <[EMAIL PROTECTED]> Sent: Monday, January 31, 2005 11:05 PM Subject: Re: [R] Extracting a numeric prefix from a string > (Ted Harding) <[EMAIL PROTECTED]> writes: > > > On 31-Jan-05 R user wrote: > > > You could use something like > > > > > > y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x) > > > as.numeric(y) > > > > > > But maybe there's a much nicer way. > > > > > > Jonne. > > > > I doubt it -- full marks for neat regexp footwork! > > Hmm, I'd have to deduct a few points for forgetting to escape the dot... > > > x <- "2a4" > > y <- gsub('([0-9]+(.[0-9]+)?)?.*','\\1',x) > > y > [1] "2a4" > > as.numeric(y) > [1] NA > Warning message: > NAs introduced by coercion > > and maybe a few more for using gsub() where sub() suffices. > > There are a few more nits to pick, since "2.", ".2", "2e-7" are also > numbers, but ".", ".e-2" are not. In fact it seems quite hard even to > handle all cases in, e.g., > > x <- c("2.2abc","2.def",".2ghi",".jkl") > > with a single regular expression. The first one that worked for me was > > > r <- regexpr('^(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))',x) > > substr(x,r,r+attr(r,"match.length")-1) > [1] "2.2" "2." ".2" "" > > but several "obvious" attempts had failed. > > The problem is that regular expressions try to find the > longest match, but not necessary of subexpressions, so > > > sub('(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))?.*','\\1',x) > [1] "2." "2." ".2" "" > > even though > > > sub('(([0-9]+\\.?)|(\\.[0-9]+)|([0-9]+\\.[0-9]+))','XXX',x) > [1] "XXXabc" "XXXdef" "XXXghi" ".jkl" > > Actually, this one comes pretty close: > > > sub('([0-9]*(\\.[0-9]+)?)?.*','\\1',x) > [1] "2.2" "2" ".2" "" > > It only loses a trailing dot which is immaterial in the present > context. However, next try extending the RE to handle an exponent > part... > > -- >O__ Peter Dalgaard Blegdamsvej 3 > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Extracting a numeric prefix from a string
Hi Does anyone know if there is a function similar to as.numeric that will extract a numeric prefix from a string as in the following examples? x<-c(3, "abc", 5.67, "2.4a", "6a", "6b", "2.4.a", 3, "4.2a") df.x<-data.frame(Code=x) x.str<-levels(df.x[,1]) # required function result 2.40 3.00 4.20 5.67 6.00 NA Thanks Mike White __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Ward clustering problem
I have a training set of data for known classes with 5 observations of 12 variables for each class. I want to use this information to classify new data into classes which are known to be different to those in the training set but each new class may contain one or more observations. The distribution of within class distances is expected to be similar for all classes and this is found to be the case for the training data. I have tried using the maximum within class distance for the training data to set the h variable in cutree for the clustered new data. This appears to work fine for "average" and "complete" clustering methods but not for the Ward clustering method as the distance axis of the dendrogram does not directly relate to the distances between observations. Can anyone advise on how to optimise the h value of cutree when using the Ward clustering method or is there a better approach to this type of classification problem? Thanks Mike White __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Problem with mclust surfacePlot function
I am trying to follow the mclust examples in "MCLUST: Software for Model Based Clustering, Density Estimation and Disriminant Analysis" by Chris Fraley and Adrian Raftery, but I cannot reproduce the density and uncertainty surfaces for the Lansing Woods maples. I am using R 1.8.1 with the code below. The same code works fine in S-Plus 6.2 Am I missing something or is this a bug? Thanks Mike White library(mclust) data(lansing) # R only maples<-lansing[as.character(lansing[,"species"]) == "maple", -3] maplesBIC <- EMclust(maples) maplesModel <- summary(maplesBIC, maples) plotMaples2 <- function(type, what, transformation) { out <- do.call("surfacePlot", c(maplesModel, list(data=maples, type=type, what=what,transformation=transformation))) invisible() } par(pty="s", mfrow=c(1,2)) plotMaples2(type="contour", what="density", transformation="log") par(pty="s") plotMaples2(type="contour", what="uncertainty", transformation = "log") __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] predict.lda problem with posterior probabilities
With predict.lda the posterior probabilities only relate to the existing Class definitions. This is fine for Class definitions like gender but it is a problem when new data does not necessarily belong to an existing Class. Is there a classification method that gives posterior probabilities for Class membership and does not assume the new data must belong to one of the existing Classes? A new class would then be indicated by low posterior probabilities for all the existing Classes. Thanks Mike White __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Changing distance scale in plclust()
I want to plot the cluster trees from the results of hclust() on different datasets, but all with the same distance scale corresponding to the dataset with the largest distance range. However, plclust() does not accept ylim values. Does anyone know of a way around this problem? Mike White __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Fw: [R] SIMCA algorithm implementation
I have used PCA for data classification by visual examination of the 3D scatter plot of the first 3 principal components. I now want to use the results to predict the class for new data. I have used predict.princomp to predict the scores and then visualise the results on a 3D scatter plot using the rgl library. However, is there an R function that will fit the new data to the class assignments derived from PCA? I think this is similar to what the SIMCA algoirthm does. Thanks Mike - Original Message - From: "Mike White" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: Wednesday, October 08, 2003 9:25 AM Subject: Re: [R] SIMCA algorithm implementation > Dear All > Is there a SIMCA (Soft Independent Modelling Class Analogy) implementation > on R or does anyone know if is it possible to replicate the SIMCA algorithm > using existing R functions? > > Thanks > Mike White > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] SIMCA algorithm implementation
Dear All Is there a SIMCA (Soft Independent Modelling Class Analogy) implementation on R or does anyone know if is it possible to replicate the SIMCA algorithm using existing R functions? Thanks Mike White __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] Rounding problem R vs Excel
Duncan If the numbers are not represently exactly how does R resolve problems like the one below? Is there something that needs to be set up in the R environment like the number of significant figures? > x<-4.145*100+0.5 > x [1] 415 > floor(x) [1] 414 > as.integer(x) [1] 414 > trunc(x) [1] 414 Mike White - Original Message - From: "Duncan Murdoch" <[EMAIL PROTECTED]> To: "Hedderik van Rijn" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, June 03, 2003 2:40 AM Subject: Re: [R] Rounding problem R vs Excel > On Mon, 2 Jun 2003 20:50:20 -0400, you wrote: > > >Does this script do what you want? > > > >cround <- function(x,digits=0) { > > a <- ifelse(x>0,.5,-.5) > > if (digits==0) { > > floor(x+a) > > } else { > > m <- 10^digits > > floor(x*m+a)/m > > } > >} > > No, the problem is that R uses binary formats, and some numbers aren't > representable there. So for example, > > > cround(0.145,2) > [1] 0.14 > > because 0.145 isn't representable exactly, and is actually being > represented as 0.149 or something similar. > > Duncan Murdoch > > __ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
[R] Rounding problem R vs Excel
For numbers ending in 5 Excel always rounds up and I want to reproduce this in R rather than use the round function. I have tried: x<-as.integer(x*100+0.5)/100 and x<-floor(x*100+0.5)/100 However, some values of x cause problems e.g x<-floor(4.145*100+0.5)/100 4.14 I have tried breaking it down into steps and force the results to 10 significant figures in the following function, which seems to work roundoff<-function(x){ a<-signif(x*100+0.5,digits=10) roundoff<-floor(a)/100 roundoff } Is there a more sensible way of do this for all numbers? Thanks Mike [[alternate HTML version deleted]] __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help