[R] embedFonts() and bounding box
Dear R gurus I have a question regarding the function embedFonts(). I assume the in that function which calls gs, the bounding box of the eps file is changed. Is that by intention? Do I have call explicitly some gs-options to avoid it and if yes, how? Thank you very much for your help. Best regards, Christoph Buser ## R example postscript("test.eps", width = 14, height = 8, onefile = FALSE, horizontal=FALSE, paper="special") plot(1:10) dev.off() embedFonts(file = "test.eps", outfile = "test1.eps") ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Preconditions for a variance analysis
Dear David I'd not recommend the tests (for normality, equal variances) as they are described on your second link (wikipedia). I would use graphical tools such as Tukey-Anscombe Plot (residuals against fitted values), quantile (or normal) plot, leverage plot. see also ?plot.lm for some short descriptions of these plots in R. Best regards, Christoph Daniel Tahin writes: > Thanx for your answer. I don't have the book, but found something on > the web: > http://www.basic.northwestern.edu/statguidefiles/oneway_anova.html > and > http://en.wikipedia.org/wiki/Analysis_of_variance#Assumptions > > Seems to be the same on both of the sites :-) > Is this, that was meant? > > Thanx again, > Daniel > > > > > > Dear David > > > > Yes. There are assumptions that should be verified in an > > analysis of variance. Without checking them, the results are not > > reliable. > > I'd recommend e.g. > > > > Robert O. Kuehl, Design of Experiments: Statistical Principles > > of Research Design and Analysis, Duxbury Press, 2000 > > > > You will find a chapter about assumptions and how to check them > > by residual analysis, > > > > And also > > > > W. N. Venables and B. D. Ripley, Modern Applied Statistics > > with S, Springer-Verlag, New York, 2002 > > > > in which you find residual analysis and how to obtain it in R. > > > > Best regards, > > > > Christoph > > > > ------ > > > > Credit and Surety PML study: visit our web page www.cs-pml.org > > > > -- > > Christoph Buser <[EMAIL PROTECTED]> > > Seminar fuer Statistik, LEO C13 > > ETH Zurich 8092 Zurich SWITZERLAND > > phone: x-41-44-632-4673fax: 632-1228 > > http://stat.ethz.ch/~buser/ > > -- > > > > > > Daniel Tahin writes: > > > Hello everbody, > > > > > > i'm currently using the anova()-test for a small data.frame of 40 > > > rows and 2 columns. It works well, but is there any preconditions for > > > a valid variance analysis, that i should consider? > > > > > > Thank you for your answer, > > > Daniel > > > > > > __ > > > R-help@stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Preconditions for a variance analysis
Dear David Yes. There are assumptions that should be verified in an analysis of variance. Without checking them, the results are not reliable. I'd recommend e.g. Robert O. Kuehl, Design of Experiments: Statistical Principles of Research Design and Analysis, Duxbury Press, 2000 You will find a chapter about assumptions and how to check them by residual analysis, And also W. N. Venables and B. D. Ripley, Modern Applied Statistics with S, Springer-Verlag, New York, 2002 in which you find residual analysis and how to obtain it in R. Best regards, Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Daniel Tahin writes: > Hello everbody, > > i'm currently using the anova()-test for a small data.frame of 40 > rows and 2 columns. It works well, but is there any preconditions for > a valid variance analysis, that i should consider? > > Thank you for your answer, > Daniel > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lines connecting the boxes in a boxplot
Dear Arne I' recommend to save the information of your boxplots a <- boxplot(...) str(a) Then you have the information that you need about your boxplot (e.g. the value of the median) and can use segments() to draw the lines you want. Hope this helps Best regards, Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Arne Brutschy writes: > Hello, > > I'm currently using a boxplot to visualize data for three different > models. As I have three models, I'm plotting three parallel boxplots > for each factor. > > This works fine - what I need now is a line connecting the medians of > each boxplot of each model. I want to do this in order to visualize > the trend that one of the models exhibit. Basically, I want to plot a > curve for each model (slightly offset on the x axis), with a boxplot > on each datapoint. > > It's only an idea, and I don't know if it's not too confusing after > adding the lines... Is it possible? Has anyone done this before? > > Sorry if this has been asked before or is a standard feature, I simply > have now clue how to name the feature I want. Ergo: I cannot search > for it.. :\ > > Regards, > Arne > > PS: this is my current code > > require(gplots) > boxwex=0.15 > > data <- read.table("all_runs_fitness.data"); > colnames(data)=c("model","matrix","fitness") > > boxplot(fitness ~ matrix, > data=data, boxwex=boxwex, at=(1:7 - 0.2), > main="Fitness for Matrix/Models", xlab="Matrixtype", > ylab="Fitness", ylim=c(20,100), > subset=(model=="dyn"), col="lightblue", xaxt="n", whisklty=1) > boxplot(fitness ~ matrix, > data=data, boxwex=boxwex, at = 1:7, add=TRUE, > subset=(model=="dl3"), col="mistyrose", xaxt="n", whisklty=1) > boxplot(fitness ~ matrix, > data=data, boxwex=boxwex, at=(1:7 + 0.2), add=TRUE, > subset=(model=="dl4"), col="lightcyan", xaxt="n", whisklty=1) > > axis(1, 1:8-0.5, labels=FALSE) > axis(1, 1:7, tck=FALSE, labels=levels(data[,2])) > > smartlegend(x="left", y="bottom", inset = 0.01, > c("dyn","dl3","dl4"), fill = c("lightblue", "mistyrose", > "lightcyan")) > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data transformation for chi-square test.
Dear Charlie dat <- data.frame(id = 1:10, member = c(4,2,3,5,6,2,4,6,3,4), cost = c(320,150,420,330,540,310,169,647,347,567)) dat[,"costF"] <- cut(dat[,"cost"], breaks = seq(100, 700, by=100)) table(dat[,"costF"], dat[,"member"]) This should create the table you like. Best regards, Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Charlie Chi writes: > Dear all R users > : > I am a IT student with few statistical background and new R user for only > have two month exprience. I have a data named medcost, import by > read.table() as follow for example (real dataset has 500 cases), the > heander id means case id, member means members in a family and cost is the > family pay for medical cost every 6 months. > > idmember cost > 1 4 320 > 2 2 150 > 3 3 420 > 4 5 330 > 5 6 540 > 6 2 310 > 7 4 169 > 8 6 647 > 9 3 347 > 10 4 567 > > I would like to use this dataset with chi-sqare analysis to see if there is > any realationship between family member and medical cost (more members in a > family will rise their medical cost?) I have found the pacage called stats, > but I think need to transform the dataset into a contingency table as I > read from books. I am not sure if I correct, I think the table should looks > like: > member > cost[2] [3] [4] [5] [6] Total > [0,100] 1 000 0 1 > [100,200] 0 010 0 1 > [200,300] 0 000 0 0 > [300,400] 1 111 0 4 > [400,500] 0 100 0 1 > [500,600] 0 010 1 2 > [600,700] 0 000 1 1 > Total 2 2 3 1 2 10 > > I did try to use the method in chapter 5.0 of "R Introduction" to create > freqency table, but it did not work. I am wondering if any one can help me > with it? Thank you for your help. > > Regards > > Charlie > .. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prop.test or chisq.test ..?
Hi Some comments are inside. Dylan Beaudette writes: > Hi everyone, > > Suppose I have a count the occurrences of positive results, and the total > number of occurrences: > > > pos <- 14 > total <- 15 > > testing that the proportion of positive occurrences is greater than 0.5 > gives > a p-value and confidence interval: > > prop.test( pos, total, p=0.5, alternative='greater') > > 1-sample proportions test with continuity correction > > data: 14 out of 15, null probability 0.5 > X-squared = 9.6, df = 1, p-value = 0.0009729 > alternative hypothesis: true p is greater than 0.5 > 95 percent confidence interval: > 0.706632 1.00 > sample estimates: > p > 0.933 > First of all by default there is a continuity correction in prop.test(). If you use prop.test(pos, total, p=0.5, alternative="greater", correct = FALSE) 1-sample proportions test without continuity correction data: pos out of total, null probability 0.5 X-squared = 11.2667, df = 1, p-value = 0.0003946 alternative hypothesis: true p is greater than 0.5 95 percent confidence interval: 0.7492494 1.000 sample estimates: p 0.933 Remark that know the X-squared is identical to your result from chisq.test(), but the p-value is 0.0007891/2 The reason is that you are testing here the against the alternative "greater" If you use a two sided test prop.test(pos, total, p=0.5, alternative="two.sided", correct = FALSE) then you reporduce the results form chisq.test(). > > My question is how does the use of chisq.test() differ from the above > operation. For example: > > chisq.test(table( c(rep('pos', 14), rep('neg', 1)) )) > > Chi-squared test for given probabilities > > data: table(c(rep("pos", 14), rep("neg", 1))) > X-squared = 11.2667, df = 1, p-value = 0.0007891 > > ... gives slightly different results. I am corrent in interpreting that the > chisq.test() function in this case is giving me a p-value associated with > the > test that the probabilities of pos are *different* than the probabilities of > neg -- and thus a larger p-value than the prop.test(... , p=0.5, > alternative='greater') ? > Yes. In your example chisq.test() tests the null hypothesis if all population probabilities are equal ?chisq.test says: "In this case, the hypothesis tested is whether the population probabilities equal those in 'p', or are all equal if 'p' is not given." which means p1 = p2 = 0.5 in your two population case against the alternative p1 != p2. This is similar to the test in prop.test() p=0.5 against p!=0.5 and therefore you get identical results if you choose alternative="two.sided" in prop.test(). > I realize that this is a rather elementary question, and references to a > text > would be just as helpful. Ideally, I would like a measure of how much I > can 'trust' that a larger proportion is also statistically meaningful. Thus > far the results from prop.test() match my intuition, but > affirmation would be Your intuition was correct. Nevertheless in your original results (with the continuity correction), the p-value of prop.test() (0.0009729) was larger than the p-value of chisq.test() (0.0007891) and therefore against your intuition. > great. > > Cheers, > > > -- > Dylan Beaudette > Soils and Biogeochemistry Graduate Group > University of California at Davis > 530.754.7341 > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. Hope this helps Christoph Buser -- Credit and Surety PML study: visit our web page www.cs-pml.org -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the value of Delta
Or better slot(sum.sam.out, "row.sig.genes") jing writes: > > > > > Dear all, > > I am running R 2.4.1. > > > library(siggenes); > > library(multtest); > > cl<-rep(c(0,1),c(3,3)); > > sub<-exprs(AffyExpData[,c(1:3,7:9)]); > > gn<-geneNames(AffyRAwData); > > sam.out<-sam(sub,cl,rand=123,gene.names=gn); > > We're doing 20 complete permutations > > > sam.out > SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances > >Deltap0 False Called FDR > 10.1 0.929 292.25293 0.927 > 20.4 0.929 43.60 56 0.724 > 30.7 0.929 12.25 20 0.569 > 41.0 0.929 7.25 14 0.481 > 51.3 0.929 2.60 7 0.345 > 61.7 0.929 1.30 5 0.242 > 72.0 0.929 1.30 5 0.242 > 82.3 0.929 0.45 2 0.209 > 92.6 0.929 0.45 2 0.209 > 10 2.9 0.929 0.45 2 0.209 > > > sum.sam.out<-summary(sam.out,1,ll=FALSE); > > sum.sam.out$row.sig.genes; > NULL > > sum.sam.out<-summary(sam.out,0.1,ll=FALSE); > > sum.sam.out$row.sig.genes; > NULL > > sum.sam.out<-summary(sam.out,0.001,ll=FALSE); > > sum.sam.out$row.sig.genes; > NULL > > sum.sam.out<-summary(sam.out,1000,ll=FALSE); > > sum.sam.out$row.sig.genes; > NULL > > > Any value of Delta I chosen: 1000,1,0,1 0.001, the outcome is NULL. > > Any help or suggestions that you can provide will be greatly appreciated. > > Qing > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] the value of Delta
I have not installed the package "siggenes", but from the manual on" http://www.ugrad.stat.ubc.ca/R/library/siggenes/ I assume it is programmed as an S4 class since they use [EMAIL PROTECTED] instead of "$" Hope this helps Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- jing writes: > > > > > Dear all, > > I am running R 2.4.1. > > > library(siggenes); > > library(multtest); > > cl<-rep(c(0,1),c(3,3)); > > sub<-exprs(AffyExpData[,c(1:3,7:9)]); > > gn<-geneNames(AffyRAwData); > > sam.out<-sam(sub,cl,rand=123,gene.names=gn); > > We're doing 20 complete permutations > > > sam.out > SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances > >Deltap0 False Called FDR > 10.1 0.929 292.25293 0.927 > 20.4 0.929 43.60 56 0.724 > 30.7 0.929 12.25 20 0.569 > 41.0 0.929 7.25 14 0.481 > 51.3 0.929 2.60 7 0.345 > 61.7 0.929 1.30 5 0.242 > 72.0 0.929 1.30 5 0.242 > 82.3 0.929 0.45 2 0.209 > 92.6 0.929 0.45 2 0.209 > 10 2.9 0.929 0.45 2 0.209 > > > sum.sam.out<-summary(sam.out,1,ll=FALSE); > > sum.sam.out$row.sig.genes; > NULL > > sum.sam.out<-summary(sam.out,0.1,ll=FALSE); > > sum.sam.out$row.sig.genes; > NULL > > sum.sam.out<-summary(sam.out,0.001,ll=FALSE); > > sum.sam.out$row.sig.genes; > NULL > > sum.sam.out<-summary(sam.out,1000,ll=FALSE); > > sum.sam.out$row.sig.genes; > NULL > > > Any value of Delta I chosen: 1000,1,0,1 0.001, the outcome is NULL. > > Any help or suggestions that you can provide will be greatly appreciated. > > Qing > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aov y lme
Dear Tomas You can produce the results in Montgomery Montgomery with lme. Please remark that you should indicate the nesting with the levels in your nested factor. Therefore I recreated your data, but used 1,...,12 for the levels of batch instead of 1,...,4. purity<-c(1,-2,-2,1,-1,-3,0,4, 0,-4, 1,0, 1,0,-1,0,-2,4,0,3, -3,2,-2,2,2,-2,1,3,4,0,-1,2,0,2,2,1) suppli<-factor(c(rep(1,12),rep(2,12),rep(3,12))) batch<-factor(c(rep(1:4,3), rep(5:8,3), rep(9:12,3))) material<-data.frame(purity,suppli,batch) As you remarked you can use aov summary(material.aov<-aov(purity~suppli+suppli:batch,data=material)) Df Sum Sq Mean Sq F value Pr(>F) suppli2 15.056 7.528 2.8526 0.07736 . suppli:batch 9 69.917 7.769 2.9439 0.01667 * Residuals24 63.333 2.639 --- Signif. codes: 0 $,1rx(B***$,1ry(B 0.001 $,1rx(B**$,1ry(B 0.01 $,1rx(B*$,1ry(B 0.05 $,1rx(B.$,1ry(B 0.1 $,1rx(B $,1ry(B 1 Remark that the test of "suppli" is not the same as in Montgomery. Here it is wrongly tested against the Residuals and you should perform the calculate the test with: (Fsuppi <- summary(material.aov)[[1]][1,"Mean Sq"]/ summary(material.aov)[[1]][2,"Mean Sq"]) pf(Fsuppi, df1 = 2, df2 = 9) To use lme the correct level numbering is now important to indicate the nesting. You should specify your random component as random=~1|batch If you use "random=~1|suppli/batch" instead, random components for batch AND suppli are included in the model and you specify a model that incorporates suppli as random and fixed. Therefore the correct syntax is library(nlme) material.lme<-lme(purity~suppli,random=~1|batch,data=material) ## You obtain the F-test for suppli using "anova" anova(material.lme) summary(material.lme) ## Remark that in the summary output, the random effects are ## standard deviations and not variance components and you ## should square them to compare them with Montgomery ## 1.307622^2 = 1.71 1.624466^2 = 2.64 ## Or you can use VarCorr(material.lme) I hope this helps you. Best regards, Christoph Buser -- Credit and Surety PML study: visit our web page www.cs-pml.org -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Tomas Goicoa writes: > > > > Dear R user, > > I am trying to reproduce the results in Montgomery D.C (2001, chap 13, > example 13-1). > > Briefly, there are three suppliers, four batches nested within suppliers > and three determinations of purity (response variable) on each batch. It is > a two stage nested design, where suppliers are fixed and batches are random. > > y_ijk=mu+tau_i+beta_j(nested in tau_i)+epsilon_ijk > > Here are the data, > > purity<-c(1,-2,-2,1, > -1,-3, 0,4, >0,-4, 1, 0, >1,0,-1,0, >-2,4,0,3, >-3,2,-2,2, >2,-2,1,3, >4,0,-1,2, >0,2,2,1) > > suppli<-factor(c(rep(1,12),rep(2,12),rep(3,12))) > batch<-factor(rep(c(1,2,3,4),9)) > > material<-data.frame(purity,suppli,batch) > > If I use the function aov, I get > > material.aov<-aov(purity~suppli+suppli:batch,data=material) > summary(material.aov) > Df Sum Sq Mean Sq F value Pr(>F) > suppli2 15.056 7.528 2.8526 0.07736 . > suppli:batch 9 69.917 7.769 2.9439 0.01667 * > Residuals24 63.333 2.639 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > and I can estimate the variance component for the batches as > > (7.769- 2.639)/3=1.71 > > which is the way it is done in Montgomery, D. > > I want to use the function lme because I would like to make a diagnosis of > the model, and I think it is more appropriate. > > Looking at Pinheiro and Bates, I have tried the following, > > library(nlme) > material.lme<-lme(purity~suppli,random=~1|suppli/batch,data=material) > VarCorr(material.lme) > > Variance StdDev > suppli =pdLogChol(1) > (Intercept) 1.563785 1.250514 > batch = pdLogChol(1) > (Intercept) 1.709877 1.307622 > Residual2.638889 1.624466 > > material.lme > > Linear mixed-effects model fit by REML >Data: material >Log-restricted-likelihood: -71.42198 >Fixed: purity ~ suppli > (Interce
Re: [R] mathematical symbols in plots
Dear Sebastian plot(1:10, 1:10) text(4, 9, expression(paste("<", k, ">"))) should work here. Best regards, Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Sebastian Weber writes: > Hello everyone! > > I'm trying to plot some mathematical expression along my axis, but > demo(plotmath) did not have the symbol I was looking for. In particular, > I would like to denote the mean of an observable by writing > > > > which I tried to enter with > > expression(group("<", k, ">")) > > However, my naive try doesn't work and the help doesn't want to tell me, > does someone know? > > And here another one: How can I sepcify which fonts get used with which > R prints those mathematical symbols? Since I finally include my plots in > latex-documents as eps, I would love to use the same font-encoding for > all postscript stuff. A problem in the past has been, that R embedded > it's own font within the ps-files generated. These were not compatible > with the fonts used at the magazine where I published my document. This > lead to quite some confusion as \gamma became g and so on. Any solution > to this problem? Any hint? As I'm not too much into font-encoding, I > have actually no real clue where to even start searching. > > Thank you very much for any help. > > Greetings, > > Sebastian Weber > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to compute p-value?
Dear Michael It calls the function pt() to calculate the p-value based on your t statistics and the degree of freedoms. If you are interested how it is calculated in details, you can have a look into the source code: /R-2.4.0/src/nmath/pt.c There you can find the C-code. Hope this helps Christoph Buser -- Credit and Surety PML study: visit our web page www.cs-pml.org -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Michael writes: > Hi all, > > I just want to understand how R computes p-value in a simple linear > regression model? The reason is that in Matlab in the > > function which "evaluate standard errors for multivariate normal > regression", it just provide estimates and standard errors, without giving > out p-value, > > It computes t-statistics as follows: > > abs(beta_hat/std_beta_hat) > > how to go further to get p-value? > > Thanks > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question on multilevel modeling
Dear Christine I think the problem in your second model is that you are including "CORUMTO" both as a fixed effect and as a random effect. Regards, Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Christine A. Calmes writes: > Hi, > > I am trying to run a multilevel model with time nested in people and > people nested in dyads (3 levels of nesting) by initially running a > series of models to test whether the slope/intercept should be fixed or > random. The problem that I am experiencing appears to arise between the > random intercept, fixed slope equation AND. > > (syntax: > rint<-lme(BDIAFTER~BDI+WEEK+CORUMTO, random=~1|DYADID/PARTICIP, > data=new) > summary(rint)) > > the random slope, random intercept model > > (syntax: > rslint<-lme(BDIAFTER~BDI+WEEK+CORUMTO, random=~CORUMTO|DYADID/PARTICIP, > data=new) > summary(rslint)) > > at which point I obtain the exact same results for each model suggesting > that one of the model is not properly specifying the slope or intercept. > > Or, I receive the following error message when I try to run the random > slope/random intercept model. > > Error in solve.default(pdMatrix(a, fact = TRUE)) : > system is computationally singular: reciprocal condition number > = 6.77073e-017 > > (whether I receive an error message or the same results depends on the > specific variables in the model). > > It has been suggested that I may need to change the default starting > values in the model because I may be approaching a boundary-is this a > plausible explanation for my difficulties? If so, how do I do this in R > and can you refer me to a source that might highlight what would be > reasonable starting values? > If this does not seem like the problem, any idea what the problem may be > and how I might fix it? > > Thank you so much for your assistance, > Christine Calmes > > > Christine A. Calmes, MA > Dept of Psychology > University at Buffalo: The State University of New York > Park Hall 216 > Buffalo, NY 14260 > [EMAIL PROTECTED] > (716) 645-3650 x578 > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparing 2 dataframes
Hi Maybe this example can help you to find your solution: dat1 <- data.frame(CUSTOMER_ID = c("1000786BR", "1002047BR", "10127BR", "1004166834BR"," 1004310897BR", "1006180BR", "10064798BR", "1007311BR", "1007621BR", "1008195BR", "10126BR", "95323994BR"), CUSTOMER_RR = c("5+", "4", "5+", "2", "X", "4", "4", "5+", "4", "4-", "5+", "4")) dat2 <- data.frame(CUSTOMER_ID = c("1200786BR", "1802047BR", "1027BR", "10166834BR", "107BR", "100BR", "164798BR", "1008195BR", "10126BR"), CUSTOMER_RR = c("6+", "4", "1+", "2", "X", "4", "4", "4", "5+")) ## Merge, but only by "CUSTOMER_ID" datM <- merge(dat1, dat2, by = "CUSTOMER_ID") datM ## Select only cases that have a similar "CUSTOMER_RR" datM1 <- datM[as.character(datM[, "CUSTOMER_RR.x"]) %in% as.character(datM[,"CUSTOMER_RR.y"]), ] datM1 Regards, Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Priya Kanhai writes: > Hi, > > I''ve a question about comparing 2 dataframes: RRC_db1 and RRC_db2 of > different length. > > For example: > > RRC_db1: > > CUSTOMER_ID CUSTOMER_RR > 1 1000786BR 5+ > 2 1002047BR4 > 3 10127BR 5+ > 4 1004166834BR2 > 5 1004310897BRX > 6 1006180BR4 > 710064798BR4 > 8 1007311BR 5+ > 9 1007621BR4 > 101008195BR 4- > 11 10126BR 5+ > 12 95323994BR4 > > RRC_db2: > > CUSTOMER_ID CUSTOMER_RR > 1 1200786BR 6+ > 2 1802047BR4 > 3 1027BR 1+ > 4 10166834BR2 > 5 107BR X > 6 100BR4 > 7164798BR4 > 81008195BR 4- > 9 10126BR 5+ > > > I want to pick the CUSTOMER_ID of RRC_db1 which also exist in RRC_db2: > third <- merge(RRC_db1, RRC_db2) or third <-subset(RRC_db1, CUSTOMER_ID%in% > RRC_db2$CUSTOMER_ID) > > But I also want to check if the CUSTOMER_RR is correct. I had tried this: > > > test <- function(RRC_db1,RRC_db2) > + { > + noteq <- c() > + for( i in 1:length(RRC_db1$CUSTOMER_ID)){ > + for( j in 1:length(RRC_db2$CUSTOMER_ID)){ > + if(RRC_db1$CUSTOMER_ID[i] == RRC_db2$CUSTOMER_ID[j]){ > + if(RRC_db1$CUSTOMER_RR[i] != RRC_db2$CUSTOMER_RR[j]){ > + noteq <- c(noteq,RRC_db1$CUSTOMER_ID[i]); > + } > + } > + } > + } > + noteq; > + } > > > > test(RRC_db1, RRC_db2) > Error in Ops.factor(RRC_db1$CUSTOMER_ID[i], RRC_db2$CUSTOMER_ID[j]) : > level sets of factors are different > > > But then I got this error. > > I don't only want the CUSTOMER_ID to be the same but also the CUSTOMER_RR. > > Can you please help me? > > Thanks in advance. > > Regards, > > Priya > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple question on a function
Dear Serguei There might be more efficient ways, but this should work: ## Define function that you want to optimize. In your case I ## copied your code, but included freq as a second argument: fun <- function(x, freq) { (freq[1]-(1-x[1])*(1-x[2])-x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[1]+ (freq[2]-(1-x[1])*x[2]+x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[2]+ (freq[3]-x[1]*(1-x[2])+x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[3]+ (freq[4]-x[1]*x[2]-x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[4] } ## Define mat with values for freq (your code) mat<-matrix(c(0.4,0.1,0.1,0.4), byrow=TRUE, nrow=10, ncol=4) ## Use apply on mat apply(mat, 1, function(freq, start) optim(start, fun, method="BFGS", freq = freq)$par, start = c(0.1,0.1,0.1)) You still can use t() to transpose the matrix if you want the solutions by row instead of columns. Please remark that in general optim returns a list, including several arguments, e.g. convergence that indicates if optim has converge. Since you wanted a matrix I only returned optim(...)$par. This might be dangerous since the additional information gets lost. Maybe it is better to save the output in a list. You can try: apply(mat, 1, function(freq, start) optim(start, fun, method="BFGS", freq = freq), start = c(0.1,0.1,0.1)) to see the difference. Hope this helps Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Serguei Kaniovski writes: > I would like to apply the following function to the rows of the matrix > "mat", so that freq[1],...,freq[4] are the four elements of each row. > > min_chi2<-function(freq){ > > obj<-function(x){ > (freq[1]-(1-x[1])*(1-x[2])-x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[1]+ > (freq[2]-(1-x[1])*x[2]+x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[2]+ > (freq[3]-x[1]*(1-x[2])+x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[3]+ > (freq[4]-x[1]*x[2]-x[3]*sqrt(x[1]*(1-x[1])*x[2]*(1-x[2])))^2/freq[4] > } > > optim(c(0.1,0.1,0.1),obj,NULL,method="BFGS")$par > } > > mat<-matrix(c(0.4,0.1,0.1,0.4), byrow=TRUE, nrow=10, ncol=4) > > Questions: > 1. How to do this using the "apply" function? > 2. Can "opmit" be used directly, i.e. without needing to define the > function "min_chi2"? > 3. How to pass the vector of initial conditions (c(0.1,0.1,0.1)) as an > argument to "apply"? > > The output should be a 10x3 matrix containing 0.5 0.5 0.6 in each row. > > Thanks a lot, > Serguei > -- > ___ > > Austrian Institute of Economic Research (WIFO) > > Name: Serguei Kaniovski P.O.Box 91 > Tel.: +43-1-7982601-231 Arsenal Objekt 20 > Fax: +43-1-7989386 1103 Vienna, Austria > Mail: [EMAIL PROTECTED] > > http://www.wifo.ac.at/Serguei.Kaniovski > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] contrasts in aov
Dear John ?ordered will help you. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- John Vokey writes: > useRs, >A no doubt simple question, but I am baffled. Indeed, I think I > once knew the answer, but can't recover it. The default contrasts > for aov (and lm, and...) are contr.treatment and contr.poly for > unordered and ordered factors, respectively. But, how does one > invoke the latter? That is, in a data.frame, how does one indicate > that a factor is an *ordered* factor such that contr.poly is invoked > in the aov or lm call? > -- > Please avoid sending me Word or PowerPoint attachments. > See <http://www.gnu.org/philosophy/no-word-attachments.html> > > -Dr. John R. Vokey > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge problem
Dear Tova There is no reason why the merged data.frame should have exactely 8000 or less rows. The "all=FALSE" options only says that now new rows are generated for cases that appear only in one of the two data.frames. Have a look at this sample > dat1 <- data.frame(a = c(1,2,3,4), b = letters[1:4]) > dat2 <- data.frame(a = c(1,2,3,4,5,6,7,8,1), b = LETTERS[1:9]) > merge(dat1,dat2, by = "a", all = FALSE) 1 1a A 2 1a I 3 2b B 4 3c C 5 4d D Since "1" appears twice in the large data.frame it is repeated as the help page ?merge says: "If there is more than one match, all possible matches contribute one row each." To compare have a look what the option "all = TRUE" changes > merge(dat1,dat2, by = "a", all = TRUE) Probably in your large data frame some rows have identical target ids and get repeated. It should be easy to check it with unique() Hope this helps Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Tova Fuller writes: > Hello all, > > I have read as many merge issues as I possibly could tonight and > although I presume this is a small error, I have not found the > solution to my problem. > > I'm trying to merge two data sets: dat0 and TransTable. As you can > see below, dat0 has 8000 rows, whereas TransTable has 47296 rows. I > would expect when I merge the two data sets, with all.x=F, and > all.y=F, that the intersection would yield 8000 rows, considering > dat0 is a subset of TransTable. > > However, I get a neat little surprise when I check the dimensions of > the resultant data frame - dat0merge, the merged data frame has 8007 > rows! How can this be? Where did these extra 7 rows come from? > This appears to defy logic! > > Thank you in advance for your help. I've put my code below for > reference. > > Tova Fuller > > > dim(dat0) > [1] 8000 60 > > dim(TransTable) > [1] 47296 9 > > dat0merge=merge(TransTable,dat0, > by.x="Target",by.y="TargetID",all.x=F,all.y=F) > > dim(dat0merge) > [1] 8007 68 > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer applied to a wellknown (?) example
Dear Henrik There is an article in the R-News "Fitting linear mixed models in R" in which you can find some examples for the syntax of nested and non-nested design. http://cran.r-project.org/doc/Rnews/Rnews_2005-1.pdf Hope this helps Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Henrik Parn writes: > Dear all, > > During my pre-R era I tried (yes, tried) to understand mixed models by > working through the 'rat example' in Sokal and Rohlfs Biometry (2000) > 3ed p 288-292. The same example was later used by Crawley (2002) in his > Statistical Computing p 363-373 and I have seen the same data being used > elsewhere in the litterature. > > Because this example is so thoroughly described, I thought it would be > very interesting to analyse it also using lmer and to see how the > different approaches and outputs differs - from the more or less manual > old-school (?) approach in Sokal, aov in Crawley, and to mixed models by > lmer. > > In the example, three treatments (Treatment) with two rats (Rat) each > (i.e six unique rats in total). Three liver preparations (Liver) are > taken from each rat (i.e 18 unique liver preparations), and two glycogen > readings (Glycogen) are taken from each liver preparation (36 readings). > > We want to test if treatments has affected the glycogen levels. The > readings are nested in preparation and the preparations nested in rats. > > The data can be found here (or on p. 289 in Sokal): > http://www.bio.ic.ac.uk/research/mjcraw/statcomp/data/rats.txt > // > I was hoping to use the rat example as some kind of reference on my way > to understand mixed models and using lmer. However, first I wish someone > could check my suggested models! > > My suggestions: > > attach(rats) > rats$Glycogen <- as.numeric(Glycogen) > rats$Treatment <- as.factor(Treatment) > rats$Rat <- as.factor(Rat) > rats$Liver <- as.factor(Liver) > str(rats) > > model1 <- lmer(Glycogen ~ Treatment + (1|Liver) + (1|Rat), data=rats) > summary(model1) > > Was that it? > > I also tried to make the 'liver-in-rat' nesting explicit (as suggested > in 'Examples from...') > > model2 <- lmer(Glycogen ~ Treatment + (1|Rat:Liver) + (1|Rat), data=rats) > summary(model2) > > but then the random effects differs from model1. > > Does the non-unique coding of rats and preparations in the original data > set mess things up? Do I need to recode the ids to unique levels... > > rats$rat2 <- as.factor(rep(1:6, each=6)) > rats$liver2 <- as.factor(rep(1:18, each=2)) > str(rats) > > ...and then: > > model3 <- lmer(Glycogen ~ Treatment + (1|liver2) + (1|rat2), data=rats) > # or maybe > model3 <- lmer(Glycogen ~ Treatment + (1|rat2:liver2) + (1|rat2), data=rats) > > > Can anyone help me to get this right! Thanks in advance! > > P.S. > Thanks to all contributors to lme/lmer topic on the list (yes, I have > searched for 'lmer nested'...) and also the examples provided by John > Fox' 'Linear mixed models, Appendix to An R and S-PLUS companion...' and > Douglas Bates' 'Examples from Multilevel Software...' and R-news 5/1. > Very helpful, but as usually I bet I missed something...Sorry. > > Regards, > > Henrik > > -- > > Henrik Pärn > Department of Biology > NTNU > 7491 Trondheim > Norway > > +47 735 96282 (office) > +47 909 89 255 (mobile) > +47 735 96100 (fax) > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] removing intercept from lm() results in oddly high Rsquared
Dear Joseph Have a look at the questions and answers in the two links below. There the topic has been discussed. http://finzi.psych.upenn.edu/R/Rhelp02a/archive/68905.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/6943.html Best regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Joseph LeBouton writes: > Can anyone help me understand why an lm model summary would return an > r.squared of ~0.18 with an intercept term, and an r.squared of ~0.98 > without the intercept? The fit is NOT that much better, according to > plot.lm: residuals are similar between the two models, and a plot of > observed x predicted is almost identical. > > Thanks, > > -Joseph > > -- > > Joseph P. LeBouton > Forest Ecology PhD Candidate > Department of Forestry > Michigan State University > East Lansing, Michigan 48824 > > Office phone: 517-355-7744 > email: [EMAIL PROTECTED] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help
Dear Claire Thank you for providing an example, but it is not totally clear to me how you want to transform your data. For example in row 1, you have value 1 in column f and n, but the results after transformation are not similar "A G" and "A A". Similar problem for column h and i. They transform both to "G G" although the original values 0,2 differ? Furthermore it is not clear if you want to have a vector with two elements as result of the transformation -> "A" "G" or one character "A G". In the first case you cannot work with data.frames or matrices, but you must use a list. An entry of a data.frame is always a single element. Regarding changing elements if they fulfil a condition, it is not necessary to work with loops. E.g. for a vector v[v==5] <- 7 changes all values of v which are 5 to 7. Please read "An Introduction to R" on the R home page about indexing vectors, matrices, and so on. The following code can show you an example how I understood your problem and one possible solution. I hope it helps you. ## Create data.frame DF <- data.frame(a = c(14,58), b = c(24,42), c = c("rcvf", "grde"), d = c("AG","AC"), e = c(5,2), f = c(2,5), g = c(1,0), h = c(0,5), i = c(2,1), n = c(1,0)) ## work with matrix to avoid problems with factors DF <- as.matrix(DF) ## transformation function myfun <- function(v) { v[v=="5"] <- "00" v[v=="1"] <- v["d"] v[v=="0"] <- paste(rep(substr(v["d"],1,1), 2), collapse = "") v[v=="2"] <- paste(rep(substr(v["d"],2,2), 2), collapse = "") v } ## apply transformation function to all rows result <- t(apply(DF, 1, myfun)) One final point. Providing examples is a good way to obtain fast help from others. Could you please try to give some R-code in the example as well to ease the handling for others, e.g. if you provide an example data.frame, please include the code to produce it: ## Create data.frame DF <- data.frame(a = c(14,58), b = c(24,42), c = c("rcvf", "grde"), d = c("AG","AC"), e = c(5,2), f = c(2,5), g = c(1,0), h = c(0,5), i = c(2,1), n = c(1,0)) Have a nice week and enjoy further working with R. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- claire pujoll writes: > Dear R members, > > Sorry for this question. I have a dataframe (please see the example) and i > should do the following transformation: > > DF > abc d e f g h i n > 14 24 rcvfAG5 2 1 0 2 1 > 58 42 grde AC2 5 0 5 1 0 > > > I should transforme my DF like : > > abcd ef g h in > 14 24 rcvf A G0 0 G GA G G G G G A A > 58 42 grdeA CC C 0 0 A A0 0A C A A > > i.e: when DF[i,j]==5 => DF[i,j] <- 0 0 > When DF[i,j] == k (!=0) i should look to d column to do the > transormation > > DF is 20 * 1 so i cant use loops. > > Thanks for your help, > Claire > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] KS Test Warning Message
Dear Justin Ties means that you have identical values in "Year5.lm$residuals". Please remark that you can have a large R^2, but your residuals are not normally distributed. A large R^2 shows a strong linear relationship, but that does not say anything about the error distribution (see example below). So to answer your question. Yes it can take away validity of your model if the residuals are not normally distributed, especially tests and confidence intervals for your parameters are based on the normal assumption. I'd recommend to verify model assumptions by graphical tools, such as qqplot, Tukey-Anscombe Plot, ... Try: plot(Year5.lm) The power of KS-Test is quite small and graphical tools will give you a hint about your true error distribution instead of giving you only a p-value that "tells you" that the errors are not normal. set.seed(3) x <- 1:100 ## t-distributed errors y <- x + rt(100,2) ## Strong linear relationship plot(x,y) ## High R^2 due to strong linear relationship summary(reg <- lm(y~x)) ## The residuals are not normal distributed qqnorm(resid(reg)) ## Small power of KS-Test. Violation of model assumption is not detected ks.test(resid(reg), "pnorm") Best regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- justin rapp writes: > All, > > Happy World Cup and Wimbledon. This morning finds me with the first > of my many daily questions. > > I am running a ks.test on residuals obtained from a regression model. > > I use this code: > > ks.test(Year5.lm$residuals,pnorm) > > and obtain this output > One-sample Kolmogorov-Smirnov test > > data: Year5.lm$residuals > D = 0.7196, p-value < 2.2e-16 > alternative hypothesis: two.sided > > Warning message: > cannot compute correct p-values with ties in: ks.test(Year5.lm$residuals, > pnorm) > > I am wondering if anybody can tell me what this error message means. > > Also, could anybody clarify how I could have a regression model with a > high Rsquared, rouglhy .67, but with nonnormal residuals? Does this > take away from the validity of my model? > > jdr > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] nested mixed-effect model: variance components
Dear John I've put your mail back to the R list since I have no explanation for the "lmer" result. Maybe someone else has an idea. I adapted it to show some else that I do not understand. ## Creation of the data (habitat is nested in lagoon): set.seed(1) dat <- data.frame(y = rnorm(100), lagoon = factor(rep(1:4,each = 25)), habitat = factor(rep(1:20, each = 5))) ## I do not understand how the random intercepts for lagoon and ## lagoon:habitat can both be estimated. It seems a little bit ## strange to me that they are identical (0.46565). library(lme4) summary(reg3 <- lmer(y~habitat+(1|lagoon)+(1|lagoon:habitat), data = dat)) ## Furthermore I do not understand why the standard errors for ## the fixed effect of habitat are 1.131 for some habitats ## and 1.487 for the others??? ## If one removes (1|lagoon), the variance component ## (1|lagoon:habitat) does not change its value (still 0.46565)??? summary(reg3a <- lmer(y~habitat+(1|lagoon:habitat), data = dat)) ## Now all standard errors for the fixed factor habitat are 1.131. ## Altogether it seems a little bit strange to me and with the ## warnings and errors of the lme and aov call, I'd be carefull ## by using the output of lmer in that case. In addition I do ## not understand the interpretation of the random effect lagoon ## in top of the nested FIXED factor habitat. summary(aov(y~habitat + Error(lagoon/habitat), data = dat)) detach(package:Matrix) detach(package:lme4) library(nlme) summary(reg2 <- lme(y~habitat, random = ~1|lagoon/habitat, data = dat)) anova(reg2) Best regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- >> Christoph, >> >> I am sending this off list bacause I tried 'lmer'and >> it seems to work with your contrived data,but I don't know why. >> Can you explain it ? >> >> >> John >> >> >> >> >> > detach(package:nlme) >> > library(Matrix) >> >> summary(lmer(y~habitat+(1|lagoon)+(1|lagoon:habitat), data = dat)) >> >> Linear mixed-effects model fit by REML >> Formula: y ~ habitat + (1 | lagoon) + (1 | lagoon:habitat) >> Data: dat >> AIC BIClogLik MLdeviance REMLdeviance >> 292.3627 349.6764 -124.1814 280.0245 248.3627 >> Random effects: >> Groups NameVariance Std.Dev. >> lagoon:habitat (Intercept) 0.46565 0.68239 >> lagoon (Intercept) 0.46565 0.68239 >> Residual 0.87310 0.93440 >> number of obs: 100, groups: lagoon:habitat, 20; lagoon, 4 >> >> Fixed effects: >> Estimate Std. Error t value >> (Intercept) 0.1292699 1.0516317 0.12292 >> habitat2 0.0058658 1.1316138 0.00518 >> habitat3-0.0911469 1.1316138 -0.08055 >> habitat4 0.3302971 1.1316138 0.29188 >> habitat5-0.0480394 1.1316138 -0.04245 >> habitat6-0.4778469 1.4872319 -0.32130 >> habitat7-0.0867301 1.4872319 -0.05832 >> habitat8 0.0696507 1.4872319 0.04683 >> habitat9-0.0998728 1.4872319 -0.06715 >> habitat100.1096064 1.4872319 0.07370 >> habitat11 -0.0430979 1.4872319 -0.02898 >> habitat120.0714719 1.4872319 0.04806 >> habitat130.3380993 1.4872319 0.22733 >> habitat140.3057808 1.4872319 0.20560 >> habitat15 -0.4915582 1.4872319 -0.33052 >> habitat16 -0.2624539 1.4872319 -0.17647 >> habitat17 -0.2203461 1.4872319 -0.14816 >> habitat180.2165269 1.4872319 0.14559 >> habitat190.6932896 1.4872319 0.46616 >> habitat20 -0.7271468 1.4872319 -0.48893 >> anova(summary(lmer(y~habitat+(1|lagoon)+(1|lagoon:habitat), data = dat))) >> >> Analysis of Variance Table >> Df Sum Sq Mean Sq >> habitat 19 2.65706 0.13985 >> >> > VarCorr(summary(summary(lmer(y~habitat+(1|lagoon)+(1|lagoon:habitat), data >> = dat))) >> + ) >> $`lagoon:habitat` >> 1 x 1 Matrix of class "dpoMatrix" >> (Intercept) >> (Intercept) 0.4656545 >> >> $lagoon >> 1 x 1 Matrix of class "dpoMatrix" >> (Intercept) >> (Intercept) 0.4656545 >> >> attr(,"sc") >> [1] 0.9343993 >> >> Christoph Buser wrote --- >> >> Dear Eric >> >> Do you really have habitats nested within lagoons or are they >> partially crossed (meaning that you have
Re: [R] nested mixed-effect model: variance components
Dear Eric Do you really have habitats nested within lagoons or are they partially crossed (meaning that you have the same habitats in different lagoons)? If you have them perfectly nested, I think that you cannot calculate both a fixed effect for habitats and a random effect for lagoon (see the example below, lme and aov). You can compare e.g. two lagoons by defining a contrast of the habitats of one lagoon against the habitats of the other (if you think that this is a meaningful test to interpret), but you cannot estimate a random effect lagoon in presence of a nested FIXED effect habitat. aov() will not return you the test and warn you about the singular model. lme() will estimate a variance component for lagoon, but does not provide you a test for the fixed factor. Regards, Christoph Buser set.seed(1) dat <- data.frame(y = rnorm(100), lagoon = factor(rep(1:4,each = 25)), habitat = factor(rep(1:20, each = 5))) summary(aov(y~habitat + Error(lagoon/habitat), data = dat)) library(nlme) summary(lme(y~habitat, random = ~1|lagoon/habitat, data = dat)) -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Eric Pante writes: > Dear listers, > > I am trying to assess variance components for a nested, mixed-effects > model. I think I got an answer that make sense from R, but I have a > warning message and I wanted to check that what I am looking at is > actually what I need: > > my data are organized as transects within stations, stations within > habitats, habitats within lagoons. > lagoons: random, habitats: fixed > the question is: how much variation is due to lagoons? habitats? > lagoons*habitat? transects? > > Here is my code: > > res <- aov(COVER ~ HABITAT + Error(HABITAT+LAGOON+LAGOON/HABITAT), > data=cov) > summary(res) > > and I get Sum Sq for each to calculate variance components: > > Error: STRATE > Df Sum Sq Mean Sq > STRATE 5 4493.1 898.6 > > Error: ATOLL >Df Sum Sq Mean Sq F value Pr(>F) > Residuals 5 3340.5 668.1 > > Error: STRATE:ATOLL >Df Sum Sq Mean Sq F value Pr(>F) > Residuals 18 2442.71 135.71 > > Error: Within > Df Sum Sq Mean Sq F value Pr(>F) > Residuals 145 6422.044.3 > > My error message seems to come from the LAGOON/HABITAT, the Error is > computed. > Warning message: Error() model is singular in: aov(COVER ~ HABITAT + > Error(HABITAT+LAGOON+LAGOON/HABITAT), data=cov), > > THANKS !!! > eric > > > > Eric Pante > > College of Charleston, Grice Marine Laboratory > 205 Fort Johnson Road, Charleston SC 29412 > Phone: 843-953-9190 (lab) -9200 (main office) > > > "On ne force pas la curiosite, on l'eveille ..." > Daniel Pennac > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] variable row names
Hi I was confused by your example, too and still am a little bit confused, so please excuse me if my example does not answer your question: You have a data.frame "dat" > dat <- data.frame(colname1 = 1:4, colname2 = 5:8) You have a vector with the colnames of "dat" > namen <- colnames(dat) You want to acess on the values in dat bu using the vector "namen" > for (i in namen) > print(dat[,i]) or > for (i in 1:length(namen)) > print(dat[,namen[i]]) Does that answer your question or is it more complicated? Best regards, Christoph ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- COMTE Guillaume writes: > Thks for your answer, > > I wish to access colnames by using vars like in my example nom="toto" , if i > type toto in R it will display the column named toto with the values that > the column keep, i wish the same by typing nom, but in my case it shows the > content of the var nom not the values of the column named toto, but "toto". > > I can't be more accurate because i'm unable to formulate it an other way. > > another example: > > I've got a set of colnames : > > > nomalarme1 > [1] "AirFlowFailure" "AirSensorFailure""GasLeek" > [4] "SuddenPressureDrop" "InletPressureToLow" "PressureRiseFailure" > [7] "HoseLeakTestFailure" "o50DCVoltageToLow" > > All of these names identify colnames (like AirFlowFailure). > > My problem is if i do > > For (i in 1:length(nomalarme1)) > { > nomalarme1[i] > } > It will display the names of the columns not the datas, is there is > something that can tell R to show data's that have for colname nomalarme[i], > and not just the colname. > > > Thks for your answer, i've looked into S-Poetry and it seems that it isn't > explained there, but my "poor" english maybe have missed it... > > > COMTE Guillaume > > > > > > -Message d'origine- > De : Patrick Burns [mailto:[EMAIL PROTECTED] > Envoyé : mardi 16 mai 2006 17:12 > À : COMTE Guillaume > Objet : Re: [R] variable row names > > I don't understand your question, but perhaps the > subscripting section of chapter 1 of S Poetry can > help you. > > > Patrick Burns > [EMAIL PROTECTED] > +44 (0)20 8525 0696 > http://www.burns-stat.com > (home of S Poetry and "A Guide for the Unwilling S User") > > COMTE Guillaume wrote: > > >Hy all, > > > > > > > >I wish to use a variable as rownames for a set of datas. > > > > > > > >By example : > > > > > > > > > > > > > > > >>nom<-"toto" > >> > >> > > > > > > > >>prenom<-"tutu" > >> > >> > > > > > > > >>res<-c(1,2) > >> > >> > > > > > > > >>res<-t(res) > >> > >> > > > > > > > >>res > >> > >> > > > > [,1] [,2] > > > >[1,]12 > > > > > > > >>colnames(res)<-c(nom,prenom) > >> > >> > > > > > > > >>res > >> > >> > > > > toto tutu > > > >[1,]12 > > > > > > > >>nom > >> > >> > > > >[1] "toto" > > > > > > > > > > > > > >I wish to call the rowname by the variable name, in this case it would > >be < nom > or < prenom > , but if i do that it gives me the name of the > >row not the values of it. > > > > > > > >Is this possible or not? > > > > > > > >Thks all. > > > > > > > >COMTE Guillaume > > > > > > > > > >[[alternative HTML version deleted]] > > > >__ > >R-help@stat.math.ethz.ch mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide! > >http://www.R-project.org/posting-guide.html > > > > > > > > > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] t-test with autocorrelation correction
Dear Denis I do not know the correction by Dale and Fortin, but assuming that it is closely related to page 15 of Cressie, 1993, it's probably similar. In Cressie page 15, it's assumed that the process has an AR(1) structure. Therefore you could use the function gls() from the "nlme" package with this correlation structure. set.seed(1) n <- 100 ## Creating the data (AR(1) structure) x <- rnorm(1,0,0.3) for(i in 2:n) x[i] <- x[i-1]*0.5 + rnorm(1,0,0.3) ## Calculating a model (only intercept) res1 <- gls(model = x ~ 1, correlation = corAR1(), method = "ML") summary(res1) > Value Std.Error t-value p-value > (Intercept) 0.06358824 0.04800817 1.324530 0.1884 ## The test for the intercept is a one-sample t-test for the ## Null Hypothesis mu=0, using the correct standard error. ## Compare it to an ordinary t-test that has a too small ## standard error due to the disregard of the autocorrelation ## and therefore a too small p-value. t.test(x) > t = 2.3069, df = 99, p-value = 0.02314 There might be easier ways to implement a t-test with autocorrelation. The advantage of this approach is that you can calculate more general regression models with autocorrelation if you do know the gls syntax. Best regards, Christoph -- Credit and Surety PML study: visit our web page www.cs-pml.org ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- [EMAIL PROTECTED] writes: > Has anyone implemented a t-test with the effective sample size > correction proposed by Dale and Fortin, Ecoscience 9(2):162-167, 2002, > using a discussion by Cressie, 1993, page 15? > > thanks, > Denis > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] summary.lme: argument "adjustSigma"
Dear Spencer Thank you for your answer. You are correct. The "adjustSigma" argument is only used for the "ML" method. In the code there is: stdFixed <- sqrt(diag(as.matrix(object$varFix))) if (object$method == "ML" && adjustSigma == TRUE) { stdFixed <- sqrt(object$dims$N/(object$dims$N - length(stdFixed))) * stdFixed } But my question is still open: Looking into the code above, "stdFixed" is adapted for the "ML" method if "adjustSigma" is TRUE. Therefore the standard error and the test results for the fixed effects is affected. But I would expect that the residual standard error should be adapted, too. But "object$sigma" of the lme object is unchanged and therefore in the summary output, the estimate of the residual standard error is identical, independent of the value of "adjustSigma". To my understanding this is a discrepancy to the help page that says: adjustSigma: an optional logical value. If 'TRUE' and the estimation method used to obtain 'object' was maximum likelihood, the residual standard error is multiplied by sqrt(nobs/(nobs - npar)), converting it to a REML-like estimate. This argument is only used when a single fitted object is passed to the function. Default is 'TRUE'. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Spencer Graves writes: >It appears that the "adjustSigma" argument of 'summary.lme' does > nothing with the default method, "REML". To check, I tried the > following modification of the 'summary.lme' example: > > fm1 <- lme(distance ~ age, Orthodont, random = ~ age | Subject) > fm1sa <- summary(fm1) > fm1s <- summary(fm1, adjustSigma=FALSE) > fm1saa <- summary(fm1, adjustSigma=TRUE) > all.equal(fm1s, fm1sa) > TRUE > all.equal(fm1s, fm1saa) > TRUE > # > >When I changed 'method' to "ML" in this example, the result suggested > that the adjustment to sigma also affected the standard errors and t > values for the fixed effects. If the p-values had not been 0, they also > would have been affected: > > fm2 <- lme(distance ~ age, Orthodont, random = ~ age | Subject, > method="ML") > (fm2sa <- summary(fm2)) > Linear mixed-effects model fit by maximum likelihood > Data: Orthodont > AIC BIClogLik >451.2116 467.3044 -219.6058 > > Random effects: > Formula: ~age | Subject > Structure: General positive-definite, Log-Cholesky parametrization > StdDevCorr > (Intercept) 2.1940998 (Intr) > age 0.2149245 -0.581 > Residual1.3100399 > > Fixed effects: distance ~ age > Value Std.Error DF t-value p-value > (Intercept) 16.76 0.7678975 80 21.827277 0 > age 0.660185 0.0705779 80 9.353997 0 > Correlation: > (Intr) > age -0.848 > > Standardized Within-Group Residuals: > Min Q1 Med Q3 Max > -3.305969237 -0.487429631 0.007597973 0.482237063 3.922789795 > > Number of Observations: 108 > Number of Groups: 27 > > (fm2s <- summary(fm2, adjustSigma=FALSE)) > Linear mixed-effects model fit by maximum likelihood > Data: Orthodont > AIC BIClogLik >451.2116 467.3044 -219.6058 > > Random effects: > Formula: ~age | Subject > Structure: General positive-definite, Log-Cholesky parametrization > StdDevCorr > (Intercept) 2.1940998 (Intr) > age 0.2149245 -0.581 > Residual1.3100399 > > Fixed effects: distance ~ age > Value Std.Error DF t-value p-value > (Intercept) 16.76 0.7607541 80 22.03223 0 > age 0.660185 0.0699213 80 9.44183 0 > Correlation: > (Intr) > age -0.848 > > Standardized Within-Group Residuals: > Min Q1 Med Q3 Max > -3.305969237 -0.487429631 0.007597973 0.482237063 3.922789795 > > Number of Observations: 108 > Number of Groups: 27 > > >Does this answer the question? >spencer graves > > Christoph Buser wrote: > > > Dear R-list > > > > I have a questio
[R] summary.lme: argument "adjustSigma"
Dear R-list I have a question concerning the argument "adjustSigma" in the function "lme" of the package "nlme". The help page says: "the residual standard error is multiplied by sqrt(nobs/(nobs - npar)), converting it to a REML-like estimate." Having a look into the code I found: stdFixed <- sqrt(diag(as.matrix(object$varFix))) if (object$method == "ML" && adjustSigma == TRUE) { stdFixed <- sqrt(object$dims$N/(object$dims$N - length(stdFixed))) * stdFixed } tTable <- data.frame(fixed, stdFixed, object$fixDF[["X"]], fixed/stdFixed, fixed) To my understanding, only the standard error for the fixed coefficients is adapted and not the residual standard error. Therefore only the tTable of the output is affected by the argument "adjustSigma", but not the estimate for residual standard error (see the artificial example below). May someone explain to me if there is an error in my understanding of the help page and the R code? Thank you very much. Best regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH Zurich 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Example --- set.seed(1) dat <- data.frame(y = rnorm(16), fac1 = rep(1:4, each = 4), fac2 = rep(1:2,each = 8)) telme <- lme(y ~ fac1, data = dat, random = ~ 1 | fac2, method = "ML") summary(telme) summary(telme, adjustSigma = FALSE) __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] difftime arguments
Hi Fred First of all, I have put the email back on the R-list. Please answer by reply to all (including the list). Others might have similar problems and others might have additional proposals, respectively. You should not confound typeof() and class() Try > class(g) and you will see that your object has the class that you have assumed. tyepof() has to do with the internal storage mode of an object. See ?class and ?typeof for more details. Finally use > as.numeric(g) But if you want to remove the units you should use the argument "unit" in difftime(). Otherwise you can get strange results if the unit is chosen automatically and removed afterwards. Regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- --text follows this line-- Fred J. writes: > z <- as.POSIXct(strptime(ts, "%m/%d/%Y %I:%M:%S %p")) > z then has the formate #"2006-01-12 16:00:00 Pacific Standard Time" > one would expect > typeof (z) would be an object of class POSIXct but notice > > typeof(z) > [1] "double" > > now when I put > > g <- difftime(z[11745], z[11746]) > > g > Time difference of 43.61667 mins > > typeof(g) > [1] "double" > > > how can I just have 43.61667 with out the text stuff around it in the object > "g"? > > > > Christoph Buser <[EMAIL PROTECTED]> wrote: "format" is not an argument of > the function difftime(). It is an > argument of the function as.difftime(). > Therefore I think that the direct way without converting the > dates can not be done, but I am not a specialist in time > question in R :-) > > Fred J. writes: > > what about this > > > > > difftime(c("1/12/2006 3:59:45 PM", "1/12/2006 3:59:57 PM"), format > ="%m/%d/%Y %I:%M:%S %p") > > Error in difftime(c("1/12/2006 3:59:45 PM", "1/12/2006 3:59:57 PM"), > format = "%m/%d/%Y %I:%M:%S %p") : > > unused argument(s) (format ...) > > > > > > > Christoph Buser wrote: Dear Fred > > > > You should change your code from > > > x <- strptime(ts, "%m/%d/%y %I:%M:%S %p") > > > > to > > > x <- strptime(ts, "%m/%d/%Y %I:%M:%S %p") > > > > "Y" instead of "y", since your year includes the century > > (2006 and not 06) > > > > Then it should work. > > > > Regards, > > > > Christoph > > > > -- > > Christoph Buser > > Seminar fuer Statistik, LEO C13 > > ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND > > phone: x-41-44-632-4673 fax: 632-1228 > > http://stat.ethz.ch/~buser/ > > -- > > > > > > > > Fred J. writes: > > > Hi > > > I just started using RGui.exe under widnows. > > > I have a text file containing date arranged in columns and rows, > each column has the same format, each row with different formats. 3 of the > columns are something like this 1/12/2006 3:59:45 PM > > > I need to calculate the different in seconds between 2 selected > periods using their row�s index > > > > > > My solution: > > > Read the file in a data frame and collect the columns in concern in > a variable. > > > data1 <- read.table("C:\\path\\data.txt", header=TRUE) > > > ts <- paste(data1[[1]], data1[[3]], data1[[7]]) > > > > > > #ts now looks like this > > > #... "1/12/2006 3:59:45 PM" "1/12/2006 3:59:57 PM" ... > > > > > > #now convert between character representations and object of classes > "POSIXct" > > > x <- strptime(ts, "%m/%d/%y %I:%M:%S %p") > > > > > > this last code line is putting out NA, what did I do wrong? > > > > > > After this then I do > > > z <- as.POSIXct(x) > > > > > > Is my whole approach efficient? > > > Then I can
Re: [R] difftime arguments
Dear Fred You should change your code from > x <- strptime(ts, "%m/%d/%y %I:%M:%S %p") to > x <- strptime(ts, "%m/%d/%Y %I:%M:%S %p") "Y" instead of "y", since your year includes the century (2006 and not 06) Then it should work. Regards, Christoph ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Fred J. writes: > Hi > I just started using RGui.exe under widnows. > I have a text file containing date arranged in columns and rows, each > column has the same format, each row with different formats. 3 of the > columns are something like this 1/12/2006 3:59:45 PM > I need to calculate the different in seconds between 2 selected periods > using their row?s index > > My solution: > Read the file in a data frame and collect the columns in concern in a > variable. > data1 <- read.table("C:\\path\\data.txt", header=TRUE) > ts <- paste(data1[[1]], data1[[3]], data1[[7]]) > > #ts now looks like this > #... "1/12/2006 3:59:45 PM" "1/12/2006 3:59:57 PM" ... > > #now convert between character representations and object of classes > "POSIXct" > x <- strptime(ts, "%m/%d/%y %I:%M:%S %p") > > this last code line is putting out NA, what did I do wrong? > > After this then I do > z <- as.POSIXct(x) > > Is my whole approach efficient? > Then I can use difftime, because using it without the above preparations > gives this > > > difftime(c("1/12/2006 3:59:45 PM", "1/12/2006 3:59:57 PM"), format > ="%m/%d/%y %I:%M:%S %p") > Error in difftime(c("1/12/2006 3:59:45 PM", "1/12/2006 3:59:57 PM"), > format = "%m/%d/%y %I:%M:%S %p") : > unused argument(s) (format ...) > > Thanks > > __ > > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] lmer
Dear Erlend For an answer to your first question about the degrees of freedom, have a look at: https://stat.ethz.ch/pipermail/r-help/2006-January/016384.html To answer the second question: I think that the "AGQ" method is not yet implemented as the error message implies. "lmer" is a function that is under development which needs some time. Regards, Christoph ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Erlend Myre Bremset writes: > Hi! > I've just updated my version of R, from 2.0.1 to 2.2.1. > When I run my old lmer models in the new version, I get no p-values for > the fixed factors, i got them with the older version of R. > Also, when I try to run a binomial model in lmer, with the AGQ method I > get the following message: > Error in lmer(RECRUIT ~ H + I(H^2) + LNRHDAY + (1 | year:isl:nest) + (1 > | : > method = "AGQ" not yet implemented for supernodal representation > > > Erlend Myre Bremset > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] nls number of explantory variables
Hi Harsh As indicated in the answer to your first post, it is not so easy to debug code without the code itself. 1) Have you checked the names of your variables so that you are sure that they are correct. For example undesired white space due to the automatic creation of variable names with paste() results in undesired variable names: dat <- data.frame(x1 = rnorm(10), "x 2" = rnorm(10), y = rnorm(10)) nls(y ~ beta*x1 + gamma*x2 , data = dat, start = list(beta = 1, gamma = 1)) ## variable name "x 2" has been changed to "x.2" str(dat) 2) If you believe that nls can not handle a certain amount of variables you can check it easily by yourself. E.g. If your code works for the variables x1,x2,...,x170, replace one variable by variable x181 and check if it works. If it does not work it is not the amount of variables. In general (also as an answer to your first post: "Non-linear Regression : Error in eval(expr, envir, enclos)", I would recommend to do some debugging by starting with code that works and go stepwise until some errors occurs. If there is a conflict between two or more variables or a problem with over parameterization you have a better chance to find it. Regards, Christoph ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Cal Stats writes: > Hi.. > > is there a limit on the number of explanatory variables in nls ? > > i have a dataframe with the columns names x1,x2..,x300 > when i run nls it gives the error: " x181 not found" > > thought it does run when i have x1,x2,...,x170 variables. > > Thanks > > Harsh > > > > > - > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Help - lm, glm, aov results inconsistent with other stati stical package
Dear Prof. Ripley Please appologize. I was a little too short in my first answer. My guess is that the line equations have been produced by Ben (since JMP does not give me that line equation as output), using the coefficients from JMP with the wrong assumption that JMP uses the constraint that one coefficient for each factor is set on zero instead of the true constraint that the sum of coefficients is equal to zero. The following example shows what JMP does: ## create A with 2 levels - 2 and 4 A <-factor(c(rep(2,times=30),rep(4,times=42))) set.seed(1) ## generate 72 random x points x <-rnorm(72) ## create different slopes & intercepts for A=2 and A=4 ## add a random error term y <-c(x[(1:30)]+1,2*x[(31:72)]-2)+rnorm(72,mean=0,sd=2) dat <- data.frame(A, x, y) options(contrasts = c("contr.sum", "contr.poly" )) ## y_1 = mu + Beta*x(alpha_1 = 0, Beta_int_1 = 0) [1] ## y_2 = mu + alpha + Beta*x + Beta_int*x [2] ## ([1] + [2])/2 = mu + alpha/2 + Beta + Beta_int/2*x[3] test.lm1 <-lm(y~A*x, dat = dat) ## check the output summary(test.lm1) > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.6397 0.2051 -3.119 0.00266 ** > A11.5912 0.2051 7.759 5.99e-11 *** > x 1.6821 0.2226 7.555 1.41e-10 *** > A1:x -0.4372 0.2226 -1.964 0.05367 . > --- > Signif. codes: 0 $,1rx(B***$,1ry(B 0.001 $,1rx(B**$,1ry(B 0.01 > $,1rx(B*$,1ry(B 0.05 $,1rx(B.$,1ry(B 0.1 $,1rx(B $,1ry(B 1 In JMP when I calculate a linear model I get: > Term Estimate Std error t Ratio Prob>|t| > Intercept -0.639704 0.205071 -3.12 0.0027 > A[2] 1.5260508 0.2030697.51 <.001* > x 1.6821048 0.22264 7.56 <.001* > A[2]*(x-0.14909) -0.437177 0.22264-1.96 0.0537 ## y_1 = mu + alpha/2 + Beta*x + Beta_int/2*(x - mean(x))(1a) ## y_2 = mu - alpha/2 - Beta*x - Beta_int/2*(x - mean(x))(2a) mean(dat[,"x"]) > 0.1490923 1.5912 = 1.5260508 - (-0.437177)*0.1490923 Going back to the original post with the line equations: y = 7072.09-1024.94 x (for A=0) and y = 7875.58-970.088 x (for A=1) from JMP. And from R I get y = 6276.7-1259.8 x (for A=0) and y = 7867.5-1150.1 x (for A=1). If one tries to calculate the line equations in belief that the coefficients are estimates, using treatment contrasts, but in reality the coefficients are estimates, using sum contrasts, I would expect that the difference of the slopes of his wrong equations are half of the difference of the true line equations: 1024.94 - 970.088 = 54.852 (1259.8 - 1150.1)/2 = 54.85 Furthermore if I expect to get the intercept of the first wrong equation if I take the mean of the two correct ones: (7867.5 + 6276.7)/2 = 7072.1 Therefore I assume that not JMP generated the wrong equations rather than Ben misinterpreted the coefficients. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Prof Brian Ripley writes: > Similar results for what? > > The original posting quoted fitted lines for the two groups, which do not > depend on the parametrization. If that is what you want, it is best to > parametrize the model directly to give them, which was one of Berwin's > alternatives. I don't have JMP, so do not know how to do that in JMP > (and I still think the solution here is to seem help on using JMP). > > On Wed, 1 Mar 2006, Christoph Buser wrote: > > > Dear Ben > > > > Berwin is correct in his answer about different > > parameterizations. > > > > After changing the contrasts in R from treatment to sum > > > > options(contrasts = c("contr.sum", "contr.poly" )) > > test.lm<-lm(y~A*x) > > summary(test.lm) > > > > I got similar results to JMP. Be careful in doing a correct > > interpretation of your coefficients, especially when you have an > > interaction in your model. > > > > Regards, > > > > Christoph Buser > > > > -- > > Christoph Buser <[EMAIL PROTECTED]> > > Seminar fuer Statistik, LEO C13 > > ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND > > phone: x-41-44-632-4673fax: 632-1228 > > http://stat.ethz.ch/~buser/ > > -
Re: [R] Help - lm, glm, aov results inconsistent with other stati stical package
Dear Ben Berwin is correct in his answer about different parameterizations. After changing the contrasts in R from treatment to sum options(contrasts = c("contr.sum", "contr.poly" )) test.lm<-lm(y~A*x) summary(test.lm) I got similar results to JMP. Be careful in doing a correct interpretation of your coefficients, especially when you have an interaction in your model. Regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Ben Ridenhour writes: > Okay, I took the data to SAS and it gives me the same answer that R does. I > don't why JMP is giving me an incorrect answer, but it seems to be. (Either > that or I have made the same mistake in SAS and R.) Any ideas what JMP might > be doing? > > Ben > > --- > Benjamin Ridenhour > School of Biological Sciences > Washigton State University > P.O. Box 644236 > Pullman, WA 99164-4236 > Phone (509)335-7218 > > "Nothing in biology makes sense except in the light of evolution." > -T. Dobzhansky > > > - Original Message > > To: "Liaw, Andy" <[EMAIL PROTECTED]>; r-help@stat.math.ethz.ch > Sent: Tuesday, February 28, 2006 10:50:18 PM > Subject: Re: [R] Help - lm, glm, aov results inconsistent with other stati > stical package > > Alright, I'll try to give some sample code. > > # create A with 2 levels - 2 and 4 > > A<-c(rep(2,times=30),rep(4,times=42)) > > # make A a factor > > A<-as.factor(A) > >#generate 72 random x points > > x<-rnorm(72) > > #create different slopes & intercepts for A=2 and A=4 > #add a random error term > > y<-c(x[(1:30)]+1,2*x[(31:72)]-2)+rnorm(72,mean=0,sd=2) > > #use model y~A*x for lm (or glm) > > test.lm<-lm(y~A*x) > > #check the output > > summary(test.lm) > > This essentially creates something like my data set and uses the same > model. In response to (1), I was just using 0/1 because these are codings > for the 2 levels, correct? (i.e., when A=2 the dummy variable=0, when A=4 > the dummy variable=1?). In response to (2), yes, I do want different slopes > for the two categories (that is what I am interested in testing). > > If I export the data created above to JMP and run what I believe to be the > same model, I get a different answer for my equations :( > > > --- > Benjamin Ridenhour > School of Biological Sciences > Washigton State University > P.O. Box 644236 > Pullman, WA 99164-4236 > Phone (509)335-7218 > > "Nothing in biology makes sense except in the light of evolution." > -T. Dobzhansky > > > - Original Message > From: "Liaw, Andy" <[EMAIL PROTECTED]> > > Sent: Tuesday, February 28, 2006 5:14:57 PM > Subject: RE: [R] Help - lm, glm, aov results inconsistent with other stati > stical package > > 1. You have levels(A) as "2" and "4", yet you showed equations for A=0 and > A=1? > > 2. y = A + X + A*X means you're allowing the different groups of A to have > different slopes. Probably not what you intended. > > 3. It's probably best to provide a small sample of the data (and R code) so > we know how you got what you got. > > Andy > > From: Ben Ridenhour > > > > Hello, > > > > I 'm sure there must a be a simple explanation for what I'm > > doing wrong but I am stumped. I am a novice R user and this > > has shaken my confidence in what I'm doing! I am trying to > > run a simple ANCOVA using the model y~A*x, where y and x are > > continuous and A has two levels. Everything seems fine until > > I compare the output with what I get from another statistical > > package (JMP). JMP has the model y=A+x+A*x (this should be > > the same as what I specified to R, correct?). In comparison > > I get the line equations > > > > y = 7072.09-1024.94 x (for A=0) and > > y = 7875.58-970.088 x (for A=1) > > > > from JMP. And from R I get > > > > y = 6276.7-1259.8
Re: [R] Extracting variance components from lmer
Hi Rick There may be a better way, but the following should work: attributes(vc.fit)$sc Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Rick DeShon writes: > Hi All. > > I need a bit of help extracting the residual error variance from the VarCorr > structure from lmer. > > #Here's a 2-way random effects model > lmer.1<- lmer(rating ~ (1|person)+(1|rater), data = dat) > > #Get the structure > vc.fit <- VarCorr(lmer.1) > > #results in. > $person > 1 x 1 Matrix of class "dpoMatrix" > (Intercept) > (Intercept) 0.7755392 > > $rater > 1 x 1 Matrix of class "dpoMatrix" > (Intercept) > (Intercept) 0.2054469 > > attr(,"sc") > [1] 0.5051518 > > #I can pull out the person and rater variance components easy enough. For > example... > vc.person <- [EMAIL PROTECTED] > > I'm sure it's simple but I have not been able to grab the residual variance > in the last matrix. I simply wish to asign the residual to a scalar > variable. Any suggestions would be appreciated! > > Thanks! > > Rick DeShon > > > -- > Rick DeShon > 306 Psychology Building > Department of Psychology > Michigan State University > East Lansing, MI 48824-1116 > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] logical condition in vector operation
Dear Frederico >From your example it is not clear to me what you like to obtain: Please have a look on the slightly changed example here (I changed two values to show a potentially undesired side effect of your coding. test <- data.frame(rbind(c(3,3,10,21,0), c(2,3,11,12,0), c(3,4,12,23,0), c(3,5,13,24,0))) names(test) <- c("x","y","p","d","su") test >> x y p d su >> 1 3 3 10 21 0 >> 2 2 3 11 12 0 >> 3 3 4 12 23 0 >> 4 3 5 13 24 0 j <- 3 test[test[,1] == j, 5] <- test[test[,1] == j,2] + test[test[,2] == j,1] >> > > Warning message: >> longer object length >> is not a multiple of shorter object length in: >> test[test[, 1] == j, 2] + test[test[, 2] == j, 1] Your code example produces now a warning for the adapted data frame "test", since one tries to add two vectors of length 2 and 3, respectively. The result is based on recycling of the smaller vector. In your example there was no warning since the second column had only one entry. The result with the adapted data frame is: test >> x y p d su >> 1 3 3 10 21 6 >> 2 2 3 11 12 0 >> 3 3 4 12 23 6 >> 4 3 5 13 24 8 Is that kind of recycling desired in your application. Otherwise you should be careful with the coding example above. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Federico Calboli writes: > HI All, > > I have a data frame such as: > > > test > x y p d > [1,] 1 0 10 21 0 > [2,] 2 3 11 12 0 > [3,] 3 4 12 23 0 > [4,] 3 5 13 24 0 > > > and I want to perfor some operations on the first two coulums, > conditional on the uneqaulity values on the 3rd and 4th columns. > > For instance: > > j = 3 > test[test[,1] == j, 5] = test[test[,1] == j,2] + test[test[,2] == j,1] > > gives me the result: > > test: > > x y p d > [1,] 1 0 10 21 0 > [2,] 2 3 11 12 0 > [3,] 3 4 12 23 6 > [4,] 3 5 13 24 7 > > > My probblem is the following: I want to perform the operation > test[test[,1] == j,2] + test[test[,2] == j,1] only if the value of > column p and column d are different at the positions where x or y = j. > In practice, I don't want to perform the first operation because > test[2,4 is 12 and test[1,3] is 12 as well. > > I tried an if statement with little success: > > if(test[test[,1] == j,3] != test[test[,2] == j,4]){ > test[test[,1] == j, 5] = test[test[,1] == j,2] + test[test[,2] == j,1] > } > Warning message: > the condition has length > 1 and only the first element will be used in: > if (test[test[, 1] == j, 3] != test[test[, 2] == j, 4]) { > > Could anyone lend some advice? > > Cheers, > > Federico > -- > Federico C. F. Calboli > Department of Epidemiology and Public Health > Imperial College, St Mary's Campus > Norfolk Place, London W2 1PG > > Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 > > f.calboli [.a.t] imperial.ac.uk > f.calboli [.a.t] gmail.com > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Mixed models and missing p-value...
Dear Simon There was a discussion about degrees of freedom in linear mixed models on the R-list last week. Have a look at https://stat.ethz.ch/pipermail/r-help/2006-January/085560.html and following. There have been earlier discussions about that topic. Try also: RSiteSearch("degree freedom mixed") Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Simon Blanchet writes: > Dear R-users, > > I computed a simple mixed models which was: > > mod<-lmer(nb ~ site + (1|patelle),tr) > > The output was: > > Linear mixed-effects model fit by REML > Formula: nb ~ site + (1 | patelle) > Data: tr >AIC BIClogLik MLdeviance REMLdeviance > 1157.437 1168.686 -574.7184 1164.523 1149.437 > Random effects: > Groups NameVariance Std.Dev. > patelle (Intercept) 34.995 5.9157 > Residual 744.736 27.2899 > # of obs: 123, groups: patelle, 33 > > Fixed effects: > Estimate Std. Error t value > (Intercept) 60.3483 4.3929 13.7378 > siteLCN -20.1969 7.8070 -2.5870 > siteLCS -18.2154 6.1514 -2.9612 > > Correlation of Fixed Effects: > (Intr) sitLCN > siteLCN -0.563 > siteLCS -0.714 0.402 > > I don't understand why D.F. and p-values associated to the fixed-effects > coefficients are missing. > Could anyone help me? > > When I tried another model (mod2<-lmer(nb ~ site + > (1|patelle),tr,family=poisson)), D.F. and p-values were given... > > Thank you in advance. > > Very sincerely, Simon > > > > BLANCHET Simon > PhD student > Université Laval - Québec-Océan / CIRSA > Pavillon Alexandre-Vachon > Local 8022 > Québec (Québec), Canada G1K 7P4 > Téléphone : (418) 656-2131 poste 8022 > courriel : [EMAIL PROTECTED] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Randomised Block Design
Hi try ?aov That's a good starting point with references to other functions in R and some literature. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Krzysztof Suwada writes: > Hi, I'm studying math, and i have to make an analysys using Randomised > Block Design. I have two factors, i know how to do this in Statistica, but > how to do this in R, i read some manuals but the only thing that i have > found was 2 factor ANOVA. > > Please could someone help me, or give some usefull links ?? > > > Krzytsztof Suwada > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Type II SS for fixed-effect in mixed model
Dear Julien Have a look at the "type" argument in ?anova.lme Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Martin Julien writes: > Hi > In mixed-model with lme() > How can I obtain Type II SS or Type III SS for fixed effect? > Thanks > Julien > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] linear contrasts with anova
Dear Marco If you are interested in a comparison of a reference level against each other level (in your case level 1 against level 2 and level 1 against level 3), you can use the summary.lm(), since this is the default contrast (see ?contr.treatment) ar <- data.frame(GROUP = factor(rep(1:3, each = 8)), DIP = c(3.0, 3.0, 4.0, 5.0, 6.0, 7.0, 3.0, 2.0, 1.0, 6.0, 5.0, 7.0, 2.0, 3.0, 1.5, 1.7, 17.0, 12.0, 15.0, 16.0, 12.0, 23.0, 19.0, 21.0)) r.aov10 <- aov(DIP ~ GROUP, data = ar) anova(r.aov10) summary.lm(r.aov10) As result you will get the comparison GROUP 2 against GROUP 1, denoted by GROUP2 and the comparison GROUP 3 against GROUP 1, denoted by GROUP3. Be careful. In your example you include both GROUP and C1 or C2, respectively in your model. This results in a over parameterized model and you get a warning that not all coefficients have been estimated, due to singularities. It is possible to use other contrasts than contr.treatment, contr.sum, contr.helmert, contr.poly, but then you have to specify the correctly in the model. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Posta Univ. Cagliari writes: > I have some doubts about the validity of my procedure to estimeate linear > contrasts ina a factorial design. > For sake of semplicity, let's imagine a one way ANOVA with three levels. I > am interested to test the significance of the difference between the first > and third level (called here contrast C1) and between the first and the > seconda level (called here contrast C2). I used the following procedure: > > > --- reading data from a text file --- > > > ar <-read.table("C:/Programmi/R/myworks/contrasti/cont1.txt",header=TRUE) > > > ar > > CC GROUP > > 1 3.0 0 > > 2 3.0 0 > > 3 4.0 0 > > 4 5.0 0 > > 5 6.0 0 > > 6 7.0 0 > > 7 3.0 0 > > 8 2.0 0 > > 9 1.0 1 > > 10 6.0 1 > > 11 5.0 1 > > 12 7.0 1 > > 13 2.0 1 > > 14 3.0 1 > > 15 1.5 1 > > 16 1.7 1 > > 17 17.0 2 > > 18 12.0 2 > > 19 15.0 2 > > 20 16.0 2 > > 21 12.0 2 > > 22 23.0 2 > > 23 19.0 2 > > 24 21.0 2 > > > > --- creating a new array of data--- > > > ar<-data.frame(GROUP=factor(ar$GROUP),DIP=ar$CC) > > > ar > >GROUP DIP > > 1 0 3.0 > > 2 0 3.0 > > 3 0 4.0 > > 4 0 5.0 > > 5 0 6.0 > > 6 0 7.0 > > 7 0 3.0 > > 8 0 2.0 > > 9 1 1.0 > > 10 1 6.0 > > 11 1 5.0 > > 12 1 7.0 > > 13 1 2.0 > > 14 1 3.0 > > 15 1 1.5 > > 16 1 1.7 > > 17 2 17.0 > > 18 2 12.0 > > 19 2 15.0 > > 20 2 16.0 > > 21 2 12.0 > > 22 2 23.0 > > 23 2 19.0 > > 24 2 21.0 > > > > --- creating two dummy variables (C1 and C2) for linear > contrasts--- > > > ar<-data.frame(GROUP=factor(ar$GROUP),C1=factor(ar$GROUP),C2=factor(ar$GROUP),DIP=ar$DIP) > > > ar > >GROUP C1 C2 DIP > > 1 0 0 0 3.0 > > 2 0 0 0 3.0 > > 3 0 0 0 4.0 > > 4 0 0 0 5.0 > > 5 0 0 0 6.0 > > 6 0 0 0 7.0 > > 7 0 0 0 3.0 > > 8 0 0 0 2.0 > > 9 1 1 1 1.0 > > 10 1 1 1 6.0 > > 11 1 1 1 5.0 > > 12 1 1 1 7.0 > > 13 1 1 1 2.0 > > 14 1 1 1 3.0 > > 15 1 1 1 1.5 > > 16 1 1 1 1.7 > > 17 2 2 2 17.0 > > 18 2 2 2 12.0 > > 19 2 2 2 15.0 > > 20 2 2 2 16.0 > > 21 2 2 2 12.0 > > 22 2 2 2 23.0 > > 23 2 2 2 19.0 > > 24 2 2 2 21.0 > > > > --- selecting the contrast levels--- > > > ar$C1 <- C(ar$C1, c(1,0,-1), how.many = 1) > > > ar$C2
Re: [R] F-test degree of freedoms in lme4 ?
Dear Wilhelm There is an article, including a part about fitting linear mixed models. There the problem with the degrees of freedom is described. You can have a look to the second link, too, discussing the problem as well. http://cran.r-project.org/doc/Rnews/Rnews_2005-1.pdf http://finzi.psych.upenn.edu/R/Rhelp02a/archive/67414.html Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Wilhelm B. Kloke writes: > I have a problem moving from multistratum aov analysis to lmer. > > My dataset has observations of ampl at 4 levels of gapf and 2 levels of bl > on 6 subjects levels VP, with 2 replicates wg each, and is balanced. > > Here is the summary of this set with aov: > >> summary(aov(ampl~gapf*bl+Error(VP/(bl*gapf)),hframe2)) > > > >Error: VP > > Df Sum Sq Mean Sq F value Pr(>F) > >Residuals 5531 106 > > > >Error: VP:bl > > Df Sum Sq Mean Sq F value Pr(>F) > >bl 1 1700170037.8 0.0017 ** > >Residuals 5225 45 > >--- > >Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > >Error: VP:gapf > > Df Sum Sq Mean Sq F value Pr(>F) > >gapf 3933 31124.2 5.3e-06 *** > >Residuals 15193 13 > >--- > >Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > >Error: VP:bl:gapf > > Df Sum Sq Mean Sq F value Pr(>F) > >gapf:bl3 93.931.33.68 0.036 * > >Residuals 15 127.6 8.5 > >--- > >Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > >Error: Within > > Df Sum Sq Mean Sq F value Pr(>F) > >Residuals 48318 7 > > > This is mostly identical the analysis by BMDP 4V, except for the > Greenhouse-Geisser epsilons, which are not estimated this way. > > I have to analyse a similar dataset, which is not balanced. So I need to > change the method. Following Pinheiro/Bates p.90f, I tried > > hf2.lme <- > > lme(ampl~gapf*bl,hframe2,random=list(VP=pdDiag(~gapf*bl),bl=pdDiag(~gapf))) > and some variations of this to get the same F tests generated. At least, > I got the F-test on error stratum VP:bl this way, but not the other two: > >> anova(hf2.lme) > >numDF denDF F-value p-value > >(Intercept) 178 764.86 <.0001 > >gapf378 17.68 <.0001 > >bl 1 5 37.81 0.0017 > >gapf:bl 3782.99 0.0362 > > Then I tried to move to lmer. > I tried to find something equivalent to the above lme call, with no > success at all. > > In case, that the problem is in the data, here is the set: > > VP ampl wg bl gapf > 1 WJ 22 w s 144 > 2 CR 23 w s 144 > 3 MZ 25 w s 144 > 4 MP 34 w s 144 > 5 HJ 36 w s 144 > 6 SJ 26 w s 144 > 7 WJ 34 w s 80 > 8 CR 31 w s 80 > 9 MZ 33 w s 80 > 10 MP 36 w s 80 > 11 HJ 37 w s 80 > 12 SJ 32 w s 80 > 13 WJ 34 w s 48 > 14 CR 37 w s 48 > 15 MZ 38 w s 48 > 16 MP 38 w s 48 > 17 HJ 40 w s 48 > 18 SJ 32 w s 48 > 19 WJ 36 w s 16 > 20 CR 40 w s 16 > 21 MZ 39 w s 16 > 22 MP 40 w s 16 > 23 HJ 40 w s 16 > 24 SJ 38 w s 16 > 25 WJ 16 g s 144 > 26 CR 28 g s 144 > 27 MZ 18 g s 144 > 28 MP 33 g s 144 > 29 HJ 37 g s 144 > 30 SJ 28 g s 144 > 31 WJ 28 g s 80 > 32 CR 33 g s 80 > 33 MZ 24 g s 80 > 34 MP 34 g s 80 > 35 HJ 36 g s 80 > 36 SJ 30 g s 80 > 37 WJ 32 g s 48 > 38 CR 38 g s 48 > 39 MZ 34 g s 48 > 40 MP 37 g s 48 > 41 HJ 39 g s 48 > 42 SJ 30 g s 48 > 43 WJ 36 g s 16 > 44 CR 34 g s 16 > 45 MZ 36 g s 16 > 46 MP 40 g s 16 > 47 HJ 40 g s 16 > 48 SJ 36 g s 16 > 49 WJ 22 w b 144 > 50 CR 24 w b 144 > 51 MZ 20 w b 144 > 52 MP 26 w b 144 > 53 HJ 22 w b 144 > 54 SJ 16 w b 144 > 55 WJ 26 w b 80 > 56 CR 24 w b 80 > 57 MZ 26 w b 80 > 58 MP 27 w b 80 > 59 HJ 26 w b 80 > 60 SJ 18 w b 80 > 61 WJ 28 w b 48 > 62 CR 23 w b 48 > 63 MZ 28 w b 48 > 64 MP 29 w b 48 > 65 HJ 27 w b 48 > 66 SJ 24 w b 48 > 67 WJ 32 w b 16 >
Re: [R] Difficulty with 'merge'
Dear Michael Please remark that merge calculates all possible combinations if you have repeated elements as you can see in the example below. ?merge "... If there is more than one match, all possible matches contribute one row each. ..." Maybe you can apply "aggregate" in a reasonable way on your data.frame first to summarize your repeated values to unique ones and the proceed with merge, but that depends on your problem. Regards, Christoph ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- example with repeated values v1 <- c("a", "b", "a", "b", "a") n1 <- 1:5 v2 <- c("b", "b", "a", "a", "a") n2 <- 6:10 (f1 <- data.frame(v1, n1)) (f2 <- data.frame(v2, n2)) (m12 <- merge(f1, f2, by.x = "v1", by.y = "v2", sort = F)) Michael Kubovy writes: > Dear R-helpers, > > Happy New Year to all the helpful members of the list. > > Here is the behavior I'm looking for: > > v1 <- c("a","b","c") > > n1 <- c(0, 1, 2) > > v2 <- c("c", "a", "b") > > n2 <- c(0, 1 , 2) > > (f1 <- data.frame(v1, n1)) >v1 n1 > 1 a 0 > 2 b 1 > 3 c 2 > > (f2 <- data.frame(v2, n2)) >v2 n2 > 1 c 0 > 2 a 1 > 3 b 2 > > (m12 <- merge(f1, f2, by.x = "v1", by.y = "v2", sort = F)) >v1 n1 n2 > 1 c 2 0 > 2 a 0 1 > 3 b 1 2 > > Now to my data: > > summary(pL) > pairL > a fondo : 41 > alto : 41 > ampio : 41 > angoloso : 41 > aperto: 41 > appoggiato: 41 > (Other) :1271 > > > pL$pairL[c(1,42)] > [1] appoggiato dentro > 37 Levels: a fondo alto ampio angoloso aperto appoggiato asimmetrico > complicato convesso davanti dentro destra ... verticale > > > summary(oppN) > pairL pairR subject > LLLRR M > a fondo : 41 a galla: 41 S1 : 37 Min. :0.3646 > Min. :0.02083 Min. :0.0010 Min. :0. > alto : 41 acuto : 41 S10: 37 1st Qu.:0.5521 > 1st Qu.:0.37500 1st Qu.:0.1771 1st Qu.:0.1042 > ampio : 41 arrotondato: 41 S11: 37 Median :0.6354 > Median :0.47917 Median :0.2708 Median :0.2292 > angoloso : 41 basso : 41 S12: 37 Mean :0.6403 > Mean :0.46452 Mean :0.2760 Mean :0.2598 > aperto: 41 chiuso : 41 S13: 37 3rd Qu.:0.7188 > 3rd Qu.:0.55208 3rd Qu.:0.3750 3rd Qu.:0.3854 > appoggiato: 41 compl : 41 S14: 37 Max. :0.9375 > Max. :0.92708 Max. :0.6042 Max. :0.7812 > (Other) :1271 (Other):1271 (Other): > 1295 NA's :3. NA's : > 3. >asym polarpolar_a1 clust > Min. :-0. Min. :-1.2410 Min. :-2.949e+00 c1:492 > 1st Qu.: 0.2091 1st Qu.: 0.4571 1st Qu.:-1.902e-01 c2:287 > Median : 0. Median : 1.1832 Median :-1.110e-16 c3: 82 > Mean : 0.6265 Mean : 1.3428 Mean :-5.745e-02 c4:246 > 3rd Qu.: 0.9383 3rd Qu.: 2.0712 3rd Qu.: 1.168e-01 c5: 82 > Max. : 2.7081 Max. : 4.6151 Max. : 4.218e+00 c6:328 > NA's : 3. NA's : 3.000e+00 > > > oppN$pairL[c(1,42)] > [1] spesso fine > 37 Levels: a fondo alto ampio angoloso aperto appoggiato asimmetrico > complicato convesso davanti dentro destra ... verticale > > > unique(sort(oppM$pairL)) == unique(sort(pL$pairL)) > [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > [26] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > > In other words I think that pL$pairL and oppN$pairL consists of 37 > blocks of 41 repetitions of names, and that these blocks are > permutations of each other, > > However: > > > summary(m1 <- merge(oppM, pairL, by.x = "pairL", by.y = "pairL", > sort = F)) > pairL pairR subject > LLLRR
Re: [R] Newbie - Summarize function
Dear Andrew Could you provide a reproducible example, please? Since we do not have your data "test.csv", we can not reproduce your code and that complicates bug fixing a lot. You can use set.seed to create artificial reproducible examples. The only error in your code that I recognized: > > test.2 <- with(test,summarize(test$x,llist(test$Stratum,test$plot),g)) You have a llist(...) instead of list(...) If this is only copy-paste error, then you should provide a reproducible example. If it already solves the problem, you were lucky :-) Regards, Christoph ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- [EMAIL PROTECTED] writes: > Dear R Users, > > I have searched through the archives but I am still struggling to find a > way to process the below dataset. I have a dataset that has stratum and > plot identifier. Within each plot there is variable (Top) stating the > number of measurments that should be used to to calculate the mean to the > largest "top" elements within one of the vectors (X). I would like to > process this summary statistic by groups. At this stage I have been trying > to use the "summarize" function within the Hmisc library but I am getting > the following error "Error in eval(expr, envir, enclos) : numeric 'envir' > arg not of length one In addition: Warning message: no finite arguments to > max; returning -Inf". > > Any suggetsions on how I can fix this would be greatly appreciated. > > Kind regards > > Andrew > > test <- read.table("test.csv", header=TRUE, sep=",") > #function to calculate mean of "top" elements within a plot > > g<-function(y) { > + top_no <-max(y$top) > + weight <- with(y,as.numeric(x>=x[order(x,decreasing=TRUE)[top_no]])) > + wtd.mean(y$x,weight) > + } > > g(test) > [1] 172.6667 > #call to summarize function - use function g and summarise by stratum plot > > test.2 <- with(test,summarize(test$x,llist(test$Stratum,test$plot),g)) > Error in eval(expr, envir, enclos) : numeric 'envir' arg not of length one > In addition: Warning message: > no finite arguments to max; returning -Inf > > >traceback() > 9: eval(substitute(expr), data, enclos = parent.frame()) > 8: with.default(y, as.numeric(x >= x[order(x, decreasing = > TRUE)[top_no]])) > 7: with(y, as.numeric(x >= x[order(x, decreasing = TRUE)[top_no]])) > 6: FUN(X, ...) > 5: summarize(test$x, llist(test$Stratum, test$plot), g) > 4: eval(expr, envir, enclos) > 3: eval(substitute(expr), data, enclos = parent.frame()) > 2: with.default(test, summarize(test$x, llist(test$Stratum, test$plot), >g)) > 1: with(test, summarize(test$x, llist(test$Stratum, test$plot), >g)) > > The version im running on is > > $platform > [1] "i386-pc-mingw32" > > $arch > [1] "i386" > > $os > [1] "mingw32" > > $system > [1] "i386, mingw32" > > $status > [1] "" > > $major > [1] "2" > > $minor > [1] "2.0" > > $year > [1] "2005" > > $month > [1] "10" > > $day > [1] "06" > > $"svn rev" > [1] "35749" > > $language > [1] "R" > > > > > > Stratum plot idtop x > 1 1 1 2 12 > 1 1 2 2 41 > 1 1 3 2 12 > 1 1 4 2 43 > 1 1 5 2 12 > 1 1 6 2 14 > 1 1 7 2 43 > 1 1 8 2 12 > 1 2 1 4 42 > 1 2 2 4 12 > 1 2 3 4 432 > 1 2 4 4 12 > 1 2 5 4 12 > 1 2 6 4 14 > 1 2 7 4 41 > 1 2 8 4 1 > 2 1 1 2 12 > 2 1 2 2 41 > 2 1 3 2 12 > 2 1 4 2 43 > 2 1 5 2 12 > 2 1 6 2 14 > 2 1 7 2 43 > 2 1 8 2 12 > 2 2 1 3 42 > 2 2
Re: [R] Find main effect in 2-way ANOVA
Hi It is not so clear to me what your intention is. Could you provide a reproducible example to show what you want to do. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Gao Fay writes: > Hi, > > I use anova() to find interaction effect and then I need to find data with > main effect from those data with interaction effect. How to do that? > > I used : anova(lm(data~factor1*factor2)), then select data with interaction > effect. Then I need to select those data also with main effect of factor1 > or factor2, from previous selected data. How to do that? Many thanks for > your time on my quesiton! > > Fay > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Compiled name to object
Dear Claus You can write earlySessions[,"seconds"] instead of earlySessions$seconds and with that syntax you can also use: col1 <- "seconds" earlySessions[,col1] and you will get what you are looking for. Best regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Claus Atzenbeck writes: > Hi, > > I have a tricky problem with composing a name that should represent an > object: > > I want to pass a data frame and a string that contains a column name to > a function, e.g.: > > check(testFrame, "seconds") > > The function should now divide the testFrame into two subsets and call a > t-test on the numbers that are in the given column: > > check <- function(frame, column) { > sessionNo <- c("s01", "s02", "s03", "s04", "s05", "s06") > earlySessions <- subset(frame, frame$session %in% sessionNo) > lateSessions <- subset(frame, !frame$session %in% sessionNo) > t.test(earlySessions$column, lateSessions$column) > } > > How can I get R to change "earlySessions$column" to whatever was passed > as string, in the above example "earlySessions$seconds"? > > Thanks for any hint. > Claus > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R questions
Hi There is a very good introduction script to R on http://www.r-project.org/ under manuals, including an index of nice functions. For example you will find the basic plots, how to sort vectors and so on. Furthermore have a look at ?RSiteSearch It is very useful to search in R archives for similar problems. Lots of answers are already there and only wait to be found. So there is a good chance that you will find the information that you are looking for in these archives. Best regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Yuying Shi writes: > Dear Sir/Madam, > I am a beginner in R. Here is my questions. > 1. Can you give me one test for randomness (a name and descriptive > paragraph is sufficient). > 2. I have learned a uniform random number generator [e.g. not the > algorithms: i)Wichmann-Hill, ii) Marsaglia-Multicarry, iii) Super-Duper > (Marsaglia), iv) Mersenne-Twister, v) TAOCP-1997 (Knuth), or vi) TAOCP-2002 > (Knuth)] . Is there any other method besides that? > 3. How to generate 100 random standard normal deviates using the Box-Muller > method for standard normal random deviates and sort the sequence, smallest > to largest? > Your kind help is greatly appreciated. Thanks in advance! > best wishes > yuying shi > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] solve the quadratic equation ax^2+bx+c=0
Hi Have a look at ?polyroot. This might be helpful. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Yuying Shi writes: > If I have matrics as follows: > > a <- c(1,1,0,0) > > b <- c(4,4,0,0) > > c <- c(3,5,5,6) > How can I use R code to solve the equation ax^2+bx+c=0. > thanks! > yuying shi > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Random effect models
Dear Jacques You may be interested in the answer of Doug Bates to another post. The question is different, but in the answer there is a part about testing if a variance component may be zero: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/51080.html Hope this helps. Best regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Jacques VESLOT writes: > Dear R-users, > > Sorry for reposting. I put it in another way : > > I want to test random effects in this random effect model : > Rendement ~ Pollinisateur (random) + Lignee (random) + Pollinisateur:Lignee > (random) > > Of course : > summary(aov(Rendement ~ Pollinisateur * Lignee, data = mca2)) > gives wrong tests for random effects. > > But : > summary(aov1 <- aov(Rendement ~ Error(Pollinisateur * Lignee), data = mca2)) > gives no test at all, and I have to do it with mean squares lying in > summary(aov1). > > With "lme4" package (I did'nt succeed in writing a working formula with lme > from "nlme" package), > I can "see" standard deviations of random effects (I don't know how to find > them) but I can't find F > tests for random effects. > > I only want to know if there is an easy way (a function ?) to do F tests for > random effects in > random effect models. > > Thanks in advance. > > Best regards, > > Jacques VESLOT > > > Data and output are as follows : > > > head(mca2) >Lignee Pollinisateur Rendement > 1 L1P1 13.4 > 2 L1P1 13.3 > 3 L2P1 12.4 > 4 L2P1 12.6 > 5 L3P1 12.7 > 6 L3P1 13.0 > > > > names(mca2) > [1] "Lignee""Pollinisateur" "Rendement" > > > dim(mca2) > [1] 100 3 > > > replications(Rendement ~ Lignee * Pollinisateur, data = mca2) >LigneePollinisateur Lignee:Pollinisateur >20 102 > > > summary(aov1 <- aov(Rendement ~ Error(Pollinisateur * Lignee), data = > mca2) > > Error: Pollinisateur >Df Sum Sq Mean Sq F value Pr(>F) > Residuals 9 11.9729 1.3303 > > Error: Lignee >Df Sum Sq Mean Sq F value Pr(>F) > Residuals 4 18.0294 4.5074 > > Error: Pollinisateur:Lignee >Df Sum Sq Mean Sq F value Pr(>F) > Residuals 36 5.1726 0.1437 > > Error: Within >Df Sum Sq Mean Sq F value Pr(>F) > Residuals 50 3.7950 0.0759 > > > # F tests : > > > Femp <- c(tab1[1:3, 3]/tab1[c(3,3,4), 3]) > > names(Femp) <- c("Pollinisateur", "Lignee", "Interaction") > > Femp > PollinisateurLignee Interaction > 9.258709 31.370027 1.893061 > > > 1 - pf(Femp, tab1[1:3,1], tab1[c(3,3,4),1]) > PollinisateurLignee Interaction > 4.230265e-07 2.773448e-11 1.841028e-02 > > # Standard deviation : > > > variances <- c(c(tab1[1:3, 3] - tab1[c(3,3,4), 3]) / c(2*5, 2*10, 2), > tab1[4,3]) > > names(variances) <- c(names(Femp), "Residuelle") > > variances > PollinisateurLignee InteractionResiduelle > 0.118663890.218183330.033891670.0759 > > # Using lmer : > > > library(lme4) > > lme1 <- lmer(Rendement ~ (1|Pollinisateur) + (1|Lignee) + > (1|Pollinisateur:Lignee), data = mca2)) > > summary(lme1) > Linear mixed-effects model fit by REML > Formula: Rendement ~ (1 | Pollinisateur) + (1 | Lignee) + (1 | > Pollinisateur:Lignee) > Data: mca2 >AIC BIClogLik MLdeviance REMLdeviance > 105.3845 118.4104 -47.69227 94.35162 95.38453 > Random effects: > Groups NameVariance Std.Dev. > Pollinisateur:Lignee (Intercept) 0.033892 0.18410 > Pollinisateur(Intercept) 0.118664 0.34448 > Lignee (Intercept) 0.218183 0.46710 > Residual 0.075900 0.27550 > # of obs: 100, groups: Pollinisateur:Lignee, 50; Pollinisateur, 10; Lignee, 5 > > Fixed effects: > Estimate Std. Error DF t value Pr(>|t|) > (Intercept) 12.601000.23862 99 52.808
Re: [R] Backtransforming regression coefficient for scaled covariate
Dear Gregor The solution of Andres was correct, but by reproducing his example you did a copy paste error. In the model lm2 you should scale your variable after the polynomial transformation I(scale(x^2)) and not I(scale(x)^2) Then the backtransformation in your example should work. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Gorjanc Gregor writes: > Andres, this seems not to be the case. Look bellow > the coefficients. They are not the same as in unscaled > regression. > > R> (lm1 <- lm(y ~ x + I(x^2))) > > Call: > lm(formula = y ~ x + I(x^2)) > > Coefficients: > (Intercept)x I(x^2) > 4.62069 1.78811 -0.00751 > > R> ## Fit regression with transformed i.e. standardized covariate Wrong line > R> (lm2 <- lm(y ~ scale(x) + I(scale(x)^2))) Correct one > R> (lm2 <- lm(y ~ scale(x) + I(scale(x^2 > > Call: > lm(formula = y ~ scale(x) + I(scale(x)^2)) > > Coefficients: > (Intercept) scale(x) I(scale(x)^2) > 75.12 29.86 -6.21 > > R> coef(lm2)[3]/sd(x^2) > I(scale(x)^2) >-0.0020519 > > R> coef(lm2)[2]/sd(x) > scale(x) > 1.0384 > > -Original Message- > From: Andres Legarra [mailto:[EMAIL PROTECTED] > Sent: Mon 2005-09-12 08:53 > To: Gorjanc Gregor; r-help@stat.math.ethz.ch > Subject: Re: [R] Backtransforming regression coefficient for scaled covariate > > [R] Backtransforming regression coefficient for scaled covariate > > Your > covariate in the second part of the polynomial is x^2 and not x. Therefore > the transformation should be applied to x^2. > Like this: > (lm2 <- lm(y ~ scale(x) + I(scale(x^2)) ) > then you would use > coef(lm2)[3]/sd(x^2) > > Andres > -- > Andres Legarra > NEIKER > Apdo. 46 > Vitoria-Gasteiz 01080 Spain > -- > > > > - Original Message - > From: Gorjanc Gregor > To: r-help@stat.math.ethz.ch > Sent: Sunday, September 11, 2005 10:25 PM > Subject: [R] Backtransforming regression coefficient for scaled covariate > > > Hello! > Scaling i.e. (x - mean(x)) / sd(x) of covariates in the model > can improve the efficiency of estimation. That is nice, but > sometimes one needs to report estimates for original scale. I > was able to backtransform estimates of linear regression quite > easily but I stumped on higher polynomials. Is there a general > rule that I am not aware of or is my algebra so bad? > I appologize for not pure R question but I hope others will also > benefit. I attached the R code for example bellow. > ## --- Generate data for linear regression --- > e <- rnorm(n = 100, sd = 10) > x <- rnorm(n = 100, mean = 100, sd = 10) > b <- 3 > mu <- 2 > y <- mu + b * x + e > plot(y = y, x = x) > ## Fit linear regression > (lm1 <- lm(y ~ x)) > ## Fit linear regression with transformed i.e. standardized covariate > (lm2 <- lm(y ~ scale(x))) > ## Backtransform estimate of regression coefficient > coef(lm2)[2] / sd(x) > ## --- Generate data for quadratic regression --- > e <- rnorm(n = 100, sd = 10) > x <- runif(n = 100, min = 1, max = 100) > b1 <- 2 > b2 <- -0.01 > mu <- 2 > y <- mu + b1 * x + b2 * x^2 + e > plot(y = y, x = x) > ## Fit regression > (lm1 <- lm(y ~ x + I(x^2))) > ## Fit regression with transformed i.e. standardized covariate > (lm2 <- lm(y ~ scale(x) + I(scale(x)^2))) > ## Backtransform estimates of regression coefficients > ## ?? > Lep pozdrav / With regards, > Gregor Gorjanc > -- > University of Ljubljana > Biotechnical FacultyURI: http://www.bfro.uni-lj.si/MR/ggorjan > Zootechnical Department mail: gregor.gorjanc bfro.uni-lj.si > Groblje 3 tel: +386 (0)1 72 17 861 > SI-1230 Domzale fax: +386 (0)1 72 17 888 > Slovenia, Europe > -- > "One must learn by doing the thing; for though you think you know it, > you have no certainty until you try." Sophocles ~ 450 B.C. > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help &g
Re: [R] test for exponential,lognormal and gammadistribution
Hi Nadja It depends on your purpose. Often people are using tests to show that a sample follows a distribution (normal, exponential, lognormal, ...). If a test rejects the null hypothesis that the sample comes from the specified distribution, you are on the safe side, since you are controlling the significance level (e.g. 5%) and therefore know the alpha error. But if a test do not reject the null hypothesis, generally you have NOT shown that the sample has the specified distribution. This is related to the power of your test (to detect differences). If the power of a test is lousy, the conclusion that "your sample has the distribution ...". based on the nonsignificant test result is misleading or even wrong. As you mentioned below the kolmogorov-smirnov test does not adapt for the fact that the parameters of the distribution you test against are estimated from the data sample. It assumes that the parameters are know. But in practice that's not the case in general. Since the parameter are estimated from the data, but the test do not have this information, but assumes that these parameters are a fixed known quantity, the test is to conservative and has a small power to detect differences. Therefore it is quite dangerous to conclude that a sample has a specified distribution, based on the kolmogorov-smirnov test. An alternative way might be using graphical tools, e.g. quantile plots (see ?qqplot and ?qqnorm). Obviously you have the same difficulty by interpreting the plots, since nobody can tell you for sure if a deviation from the straight line is significant or just by chance. But if you conclude that a sample has a distribution by looking at the plot you will be aware of this subjectivity that can not be avoided. The test result will often give you the wrong impression of objectivity. The best example to see this is if you have a very small sample. In general any test has a small power if your sample is small and it is most probable that the test is nonsignificant. If we look at the quantile plot (with a small sample) we often can not judge if it is a straight line or not (since the sample is to small) and in this case it is the correct conclusion that we can not say anything about the distribution. I hope this will be helpful. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- [EMAIL PROTECTED] writes: > > hello! > i don't want to test my sample data for normality, but exponential- > lognormal- > or gammadistribution. > as i've learnt the anderson-darling-test in R is only for normality and i am > not supposed to use the kolmogorov-smirnov test of R for parameter estimates > from sample data, is that true? > can you help me, how to do this anyway! > thank you very much! > nadja > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > > !DSPAM:43219660106581956711619! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Dummy variables model
Hi If you'd like to fit a fixed effect model without random effects, you can use lm() or aov() (see ?lm and ?aov). If your variable is a factor (?factor) then you can specify your model in lm() without coding all dummy variables. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Tobias Muhlhofer writes: > Hi, all! > > Anyone know an easy way to specify the following model. > > Panel dataset, with stock through time, by firm. > > I want to run a model of y on a bunch of explanatory variables, and one > dummy for each firm, which is 1 for observations that come from firm i, > and 0 everywhere else. I have over 200 firms (and a factor variable that > contains a firm identifier). > > Any easy way of going about this, without having to define all these > dummies? I checked lme() with random = ~ 1|firm, but the problem is that > these are random effects, i.e. that there are firm-by-firm disturbance > terms and overall disturbance terms, whereas I want just overall > disturbance terms. This is generally called a "fixed effects" model, > although it seems like the term "fixed effects" is being used somewhat > differently in the context of the nlme package. > > Toby > > -- > ** > When Thomas Edison invented the light bulb he tried over 2000 > experiments before he got it to work. A young reporter asked > him how it felt to have failed so many times. He said > "I never failed once. I invented the light bulb. > It just happened to be a 2000-step process." > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > > !DSPAM:431c4675196241771238468! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Doubt about nested aov output
Hi I think the problem is related to specifying "Tratamento" both as a fixed factor and in the error term. I attached a script with a reproducible example that shows a similar output. I do not know the details of the original data and the questions of interest, but maybe a model including "Tratamento" is more what you wanted to implement. Regards, Christoph ## R-Script library(nlme) ## Generating the data set.seed(1) ziel <- rep(c(-6,8,20), each = 40) + rep(rnorm(15, 0, 20), each = 4) + rep(rnorm(60, 0, 10), each = 2) + rnorm(120, 0, 3) dat <- data.frame(y = ziel, fix = factor(rep(1:3, each = 40)), R1 = factor(rep(1:15, each = 8)), R2 = factor(rep(1:60, each = 2))) ## Model including "fix" as fixed and random effect. res2 <- aov(y ~ fix + Error(fix/R1/R2), data = dat) summary(res2) reslme2 <- lme(y ~ fix , data = dat, random = ~ 1|fix/R1/R2) summary(reslme2) anova(reslme2) ## Model including "fix" as fixed factor. res1 <- aov(y ~ fix + Error(R1/R2), data = dat) summary(res1) reslme <- lme(y ~ fix , data = dat, random = ~ 1|R1/R2) summary(reslme) anova(reslme) ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Spencer Graves writes: >Others may know the answer to your question, but I don't. However, > since I have not seen a reply, I will offer a few comments: > >1. What version of R are you using? I just tried superficially > similar things with the examples in ?aov in R 2.1.1 patched and > consistently got F and p values. > >2. My preference for this kind of thing is to use lme in > library(nlme) or lmer in library(lme4). Also, I highly recommend > Pinheiro and Bates (2000) Mixed-Effects Models in S and S-Plus (Springer). > >3. If still want to use aov and are getting this problem in R 2.1.1, > could you please provide this list with a small, self contained example > that displays the symptoms that concern you? And PLEASE do read the > posting guide! "http://www.R-project.org/posting-guide.html";. It might > increase the speed and utility of replies. > >spencer graves > > Ronaldo Reis-Jr. wrote: > > > Hi, > > > > I have two doubts about the nested aov output. > > > > 1) I have this: > > > >>anova.ratos <- aov(Glicogenio~Tratamento+Error(Tratamento/Rato/Figado)) > >>summary(anova.ratos) > > > > > > Error: Tratamento > >Df Sum Sq Mean Sq > > Tratamento 2 1557.56 778.78 > > > > Error: Tratamento:Rato > > Df Sum Sq Mean Sq F value Pr(>F) > > Residuals 3 797.67 265.89 > > > > Error: Tratamento:Rato:Figado > > Df Sum Sq Mean Sq F value Pr(>F) > > Residuals 12 594.049.5 > > > > Error: Within > > Df Sum Sq Mean Sq F value Pr(>F) > > Residuals 18 381.00 21.17 > > > > R dont make the F and P automatically, it is possible? > > > > I Like an output like this: > > > > Error: Tratamento > >Df Sum Sq Mean Sq F value Pr(>F) > > Tratamento 2 1557.56 778.78 2.929 0.197 > > > > Error: Tratamento:Rato > > Df Sum Sq Mean Sq F value Pr(>F) > > Residuals 3 797.67 265.89 5.372 0.0141 > > > > Error: Tratamento:Rato:Figado > > Df Sum Sq Mean Sq F value Pr(>F) > > Residuals 12 594.049.5 2.339 0.0503 > > > > Error: Within > > Df Sum Sq Mean Sq F value Pr(>F) > > Residuals 18 381.00 21.17 > > > > Why it not make automatic calculations? It is possible? > > > > > > 2) I can say that Error: Tratamento:Rato means an interaction between > > Tratamento and Rato? Normally the : represents an interaction, but in this > > output I think that it dont mean the interaction. > > > > Any explanation are welcome. > > > > Thanks > > Ronaldo > > -- > Spencer Graves, PhD > Senior Development Engineer > PDF Solutions, Inc. > 333 West San Carlos Street Suite 700 > San Jose, CA 95110, USA > > [EMAIL PROTECTED] > www.pdf.com <http://www.pdf.com> > Tel: 408-938-4420 > Fax: 408-280-7915 > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > > > !DSPAM:431b8510220677348368323! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] plotting 3 functions on same graph
Hi Maybe matplot (see ?matplot) can help you. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- [EMAIL PROTECTED] writes: > hi all, > > I wish to draw on the same graphic device 3 functions. > > But i don't want them to be on different graph, i want to compare them on > the same > > I don't need mfrow or mfcol, I need something else... > > 1 graph on 1 device inside this graph 3 ploted function. > > I saw something unsing data.frame, but i think it's overkill, and something > less complicated must exist, if not why? > > why not plot(func1,func2,func3) ?? > > thks. > > > > // Webmail Oreka : http://www.oreka.com > > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gamma distribution
Hi As Uwe mentioned be careful about the difference the significance level alpha and the power of a test. To do power calculations you should specify and alternative hypothesis H_A, e.g. if you have two populations you want to compare and we assume that they are normal distributed (equal unknown variance for simplicity). We are interested if there is a difference in the mean and want to use the t.test. Our Null hypothesis H_0: there is no difference in the means To do a power calculation for our test, we first have to specify and alternative H_A: the mean difference is 1 (unit) Now for a fix number of observations we can calculate the power of our test, which is in that case the probability that (if the true unknown difference is 1, meaning that H_A is true) our test is significant, meaning if I repeat the test many times (always taking samples with mean difference of 1), the number of significant test divided by the total number of tests is an estimate for the power. In you case the situation is a little bit more complicated. You need to specify an alternative hypothesis. In one of your first examples you draw samples from two gamma distributions with different shape parameter and the same scale. But by varying the shape parameter the two distributions not only differ in their mean but also in their form. I got an email from Prof. Ripley in which he explained in details and very precise some examples of tests and what they are testing. It was in addition to the first posts about t tests and wilcoxon test. I attached the email below and recommend to read it carefully. It might be helpful for you, too. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- From: Prof Brian Ripley <[EMAIL PROTECTED]> To: Christoph Buser <[EMAIL PROTECTED]> cc: "Liaw, Andy" <[EMAIL PROTECTED]> Subject: Re: [R] Alternatives to t-tests (was Code Verification) Date: Thu, 21 Jul 2005 10:33:28 +0100 (BST) I believe there is a rather more to this than Christoph's account. The Wilcoxon test is not testing the same null hypothesis as the t-test, and that may very well matter in practice and it does in the example given. The (default in R) Welch t-test tests a difference in means between two samples, not necessarily of the same variance or shape. A difference in means is simple to understand, and is unambiguously defined at least if the distributions have means, even for real-life long-tailed distributions. Inference from the t-test is quite accurate even a long way from normality and from equality of the shapes of the two distributions, except in very small sample sizes. (I point my beginning students at the simulation study in `The Statistical Sleuth' by Ramsey and Schafer, stressing that the unequal-variance t-test ought to be the default choice as it is in R. So I get them to redo the simulations.) The Wilcoxon test tests a shift in location between two samples from distributions of the same shape differing only by location. Having the same shape is part of the null hypothesis, and so is an assumption that needs to be verified if you want to conclude there is a difference in location (e.g. in means). Even if you assume symmetric distributions (so the location is unambiguously defined) the level of the test depends on the shapes, tending to reject equality of location in the presence of difference of shape. So you really are testing equality of distribution, both location and shape, with power concentrated on location-shift alternatives. Given samples from a gamma(shape=2) and gamma(shape=20) distributions, we know what the t-test is testing (equality of means). What is the Wilcoxon test testing? Something hard to describe and less interesting, I believe. BTW, I don't see the value of the gamma simulation as this simultaneously changes mean and shape between the samples. How about checking holding the mean the same: n <- 1000 z1 <- z2 <- numeric(n) for (i in 1:n) { x <- rgamma(40, 2.5, 0.1) y <- rgamma(40, 10, 0.1*10/2.5) z1[i] <- t.test(x, y)$p.value z2[i] <- wilcox.test(x, y)$p.value } ## Level 1 - sum(z1>0.05)/1000 ## 0.049 1 - sum(z2>0.05)/1000 ## 0.15 ? -- the Wilcoxon test is shown to be a poor test of equality of means. Christoph's simulation shows that it is able to use difference in shape as well as location in the test of these two distributions, whereas the t-test is designed only to use the difference in means. Why compare the power of two tests testing different null hypotheses? I would say a very
Re: [R] gamma distribution
Hi Again to come back on the question why you don't get identical p.values for the untransformed and the transformed data. I ran your script below and I get always 2 identical test per loop. In your text you are talking about the first 1000 values for the untransformed and the next 1000 values for the transformed. But did you consider that in each loop there is a test for the untransformed and the transformed, so the tests are printed alternating. This might be a reason why you did not get equal results. Hope this helps, Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- [EMAIL PROTECTED] writes: > Hi R Users > > > This is a code I wrote and just want to confirm if the first 1000 values are > raw > gamma (z) and the next 1000 values are transformed gamma (k) or not. As I get > 2000 rows once I import into excel, the p - values beyond 1000 dont look that > good, they are very high. > > > -- > sink("a1.txt"); > > for (i in 1:1000) > { > x<-rgamma(10, 2.5, scale = 10) > y<-rgamma(10, 2.5, scale = 10) > z<-wilcox.test(x, y, var.equal = FALSE) > print(z) > x1<-log(x) > y1<-log(y) > k<-wilcox.test(x1, y1, var.equal = FALSE) > print(k) > } > > --- > any suggestions are welcome > > thanks > > -devarshi > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] how to overlook the zero in the denominator
What about x<-c(0,0,0.28,0.55,1.2,2,1.95,1.85, 1.6,0.86,0.78,0.6,0.21,0.18) y<-c(0,0,0,0.53,1.34,1.79,2.07,1.88, 1.52,0.92,0.71,0.55,0.32,0.19) sum(((x-y)^2/x)[x!=0]) Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Chun-Ying Lee writes: > Dear R users: > > I have two set of data, as follow: > x<-c(0,0,0.28,0.55,1.2,2,1.95,1.85, > 1.6,0.86,0.78,0.6,0.21,0.18) > y<-c(0,0,0,0.53,1.34,1.79,2.07,1.88, > 1.52,0.92,0.71,0.55,0.32,0.19) > i<-1:length(x) > > I want to sum each (x[i]-y[i])^2/x[i] together, > like: > >Sum <-sum((x[i]-y[i])^2/x[i]) > >Sum > [1] NaN > > Because the denominator shoud not be zero. > So I want to overlook those when x[i]=0, > and just to sum those x[i] not equal to 0. > What should I do? > Any suggestion. > Thanks in advance !! > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gamma distribution
Hi I am a little bit confused. You create two sample (from a gamma distribution) and you do a wilcoxon test with this two samples. Then you use the same monotone transformation (log) for both samples and redo the wilcoxon test. But since the transformations keeps the order of your samples the second wilcoxon test is identical to the first one: x<-rgamma(10, 2.5, scale = 10) y<-rgamma(10, 2.5, scale = 10) wilcox.test(x, y, var.equal = FALSE) x1<-log(x) y1<-log(y) wilcox.test(x1, y1, var.equal = FALSE) Maybe you can give some more details about the hypothesis you'd like to test. Regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- [EMAIL PROTECTED] writes: > Hi R Users > > > This is a code I wrote and just want to confirm if the first 1000 values are > raw > gamma (z) and the next 1000 values are transformed gamma (k) or not. As I get > 2000 rows once I import into excel, the p - values beyond 1000 dont look that > good, they are very high. > > > -- > sink("a1.txt"); > > for (i in 1:1000) > { > x<-rgamma(10, 2.5, scale = 10) > y<-rgamma(10, 2.5, scale = 10) > z<-wilcox.test(x, y, var.equal = FALSE) > print(z) > x1<-log(x) > y1<-log(y) > k<-wilcox.test(x1, y1, var.equal = FALSE) > print(k) > } > > --- > any suggestions are welcome > > thanks > > -devarshi > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Question about 'text' (add lm summary to a plot)
Dear Dan I can only help you with your third problem, expression and paste. You can use: plot(1:5,1:5, type = "n") text(2,4,expression(paste("Slope : ", 3.45%+-%0.34, sep = "")), pos = 4) text(2,3.8,expression(paste("Intercept : ", -10.43%+-%1.42)), pos = 4) text(2,3.6,expression(paste(R^2,": ", "0.78", sep = "")), pos = 4) I do not have an elegant solution for the alignment. Regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Dan Bolser writes: > > I would like to annotate my plot with a little box containing the slope, > intercept and R^2 of a lm on the data. > > I would like it to look like... > > ++ > | Slope : 3.45 +- 0.34 | > | Intercept : -10.43 +- 1.42 | > | R^2 : 0.78 | > ++ > > However I can't make anything this neat, and I can't find out how to > combine this with symbols for R^2 / +- (plus minus). > > Below is my best attempt (which is franky quite pour). Can anyone > improve on the below? > > Specifically, > > aligned text and numbers, > aligned decimal places, > symbol for R^2 in the text (expression(R^2) seems to fail with > 'paste') and +- > > > > Cheers, > Dan. > > > dat.lm <- lm(dat$AVG_CH_PAIRS ~ dat$CHAINS) > > abline(coef(dat.lm),lty=2,lwd=1.5) > > > dat.lm.sum <- summary(dat.lm) > dat.lm.sum > > attributes(dat.lm.sum) > > my.text.1 <- > paste("Slope : ", round(dat.lm.sum$coefficients[2],2), > "+/-", round(dat.lm.sum$coefficients[4],2)) > > my.text.2 <- > paste("Intercept : ", round(dat.lm.sum$coefficients[1],2), > "+/-", round(dat.lm.sum$coefficients[3],2)) > > my.text.3 <- > paste("R^2 : ", round(dat.lm.sum$r.squared,2)) > > my.text.1 > my.text.2 > my.text.3 > > > ## Add legend > text(x=3, > y=300, > paste(my.text.1, >my.text.2, >my.text.3, >sep="\n"), > adj=c(0,0), > cex=1) > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Code Verification
Dear Andy Since the power of the t-test decreases when there are discrepancies in the data to the normal distribution and there is only a small loss of power if the data is normal distributed, the only reason to use the t.test is his simplicity and the easier interpretation. Generally I'd prefer the wilcoxon test even if the data is normal distributed. But I agree that for a gamma distribution there is no huge loss of power. ## example simulation: n <- 1000 z1 <- numeric(n) z2 <- numeric(n) ## gamma distribution for (i in 1:n) { x<-rgamma(40, 2.5, 0.1) y<-rgamma(40, 3.5, 0.1) z1[i]<-t.test(x, y)$p.value z2[i]<-wilcox.test(x, y)$p.value } ## Power 1 - sum(z1>0.05)/1000 ## 0.71 1 - sum(z2>0.05)/1000 ## 0.76 ## ## t distribution for (i in 1:n) { x<-rt(40, df = 3) y<-1 + rt(40, df = 3) z1[i]<-t.test(x, y)$p.value z2[i]<-wilcox.test(x, y)$p.value } ## Power 1 - sum(z1>0.05)/1000 ## 0.76 1 - sum(z2>0.05)/1000 ## 0.91 ## ## normal distribution for (i in 1:n) { x<-rnorm(40, 0, 3) y<-1 + rnorm(40, 1, 3) z1[i]<-t.test(x, y)$p.value z2[i]<-wilcox.test(x, y)$p.value } ## Power 1 - sum(z1>0.05)/1000 ## 0.83 1 - sum(z2>0.05)/1000 ## 0.81 Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ ------ Liaw, Andy writes: > > From: Christoph Buser > > > > Hi > > > > "t.test" assumes that your data within each group has a normal > > distribution. This is not the case in your example. > > Eh? What happen to the CLT? > > Andy > > > > I would recommend you a non parametric test like "wilcox.test" if > > you want to compare the mean of two samples that are not normal > > distributed. > > see ?wilcox.test > > > > Be careful. Your example produces two gamma distributed samples > > with rate = 10, not scale = 10. > > rate = 1/scale. > > If you want to use scale, you need to specify this argument > > x<-rgamma(40, 2.5, scale = 10) > > see ?rgamma > > > > I do not see the interpretation of your result. Since you do > > know the distribution and the parameters of your sample, you > > know the true means and that they are different. It is only a > > question of the sample size and the power of your test, if this > > difference is detected. > > Is that something you are investigating? Maybe a power > > calculation or something similar. > > > > Regards, > > > > Christoph Buser > > > > -- > > Christoph Buser <[EMAIL PROTECTED]> > > Seminar fuer Statistik, LEO C13 > > ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND > > phone: x-41-44-632-4673fax: 632-1228 > > http://stat.ethz.ch/~buser/ > > -- > > > > [EMAIL PROTECTED] writes: > > > Hi R Users > > > I have a code which I am running for my thesis work. Just > > want to make sure that > > > its ok. Its a t test I am conducting between two gamma > > distributions with > > > different shape parameters. > > > > > > the code looks like: > > > > > > sink("a1.txt"); > > > > > > for (i in 1:1000) > > > { > > > x<-rgamma(40, 2.5, 10) # n = 40, shape = 2.5, Scale = 10 > > > y<-rgamma(40, 2.8, 10) # n = 40, shape = 2.8, Scale = 10 > > > z<-t.test(x, y) > > > print(z) > > > } > > > > > > > > > I will appreciate it if someone could tell me if its alrite or not. > > > > > > thanks > > > > > > -dev > > > > > > __ > > > R-help@stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > __ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guid
Re: [R] Code Verification
Hi "t.test" assumes that your data within each group has a normal distribution. This is not the case in your example. I would recommend you a non parametric test like "wilcox.test" if you want to compare the mean of two samples that are not normal distributed. see ?wilcox.test Be careful. Your example produces two gamma distributed samples with rate = 10, not scale = 10. rate = 1/scale. If you want to use scale, you need to specify this argument x<-rgamma(40, 2.5, scale = 10) see ?rgamma I do not see the interpretation of your result. Since you do know the distribution and the parameters of your sample, you know the true means and that they are different. It is only a question of the sample size and the power of your test, if this difference is detected. Is that something you are investigating? Maybe a power calculation or something similar. Regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- [EMAIL PROTECTED] writes: > Hi R Users > I have a code which I am running for my thesis work. Just want to make sure > that > its ok. Its a t test I am conducting between two gamma distributions with > different shape parameters. > > the code looks like: > > sink("a1.txt"); > > for (i in 1:1000) > { > x<-rgamma(40, 2.5, 10) # n = 40, shape = 2.5, Scale = 10 > y<-rgamma(40, 2.8, 10) # n = 40, shape = 2.8, Scale = 10 > z<-t.test(x, y) > print(z) > } > > > I will appreciate it if someone could tell me if its alrite or not. > > thanks > > -dev > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] initial points for arms in package HI
Dear Martyn Thank you for your fast and helpful answer. It showed me to my wrong thinking. I confused the previous value of the Markov Chain with the initial values to construct the envelope. In the original C code by W. Gilks there are the arguments "xinit" and "xprev". The first is used to construct the envelope, the second is the previous value of the Markov chain. There are comments from the author how to set the initial values "xinit". It is important to choose these values independent from the current parameter, being updated (in the case of not log-concave functions). In the R code and the C code behind the function arms the argument "xinit" is missing, since it is set in the function itself and the user can not change it. Since I confused the two arguments I didn't realize that "y.start" is the value "xprev", the state of the Markov chain and set this value 1 or runif(...). But for an implementation of a Gibbs Sampler this makes no sense. My example was a little bit misleading, since I simulated 1000 points from the same distribution. In reality my distribution is changing at each step and I always need only 1 new value in my Gibbs Sampler. But by implementing it correctly with the previous value of the chain it should work in my context. Best regards, Christoph [EMAIL PROTECTED] writes: > Quoting Christoph Buser <[EMAIL PROTECTED]>: > > > Dear R-users > > > > I have a problem choosing initial points for the function arms() > > in the package HI > > I intend to implement a Gibbs sampler and one of my conditional > > distributions is nonstandard and not logconcave. > > Therefore I'd like to use arms. > > > > But there seem to be a strong influence of the initial point > > y.start. To show the effect I constructed a demonstration > > example. It is reproducible without further information. > > > > Please note that my target density is not logconcave. > > > > Thanks for all comments or ideas. > > > > Christoph Buser > > Dear Christoph, > > There is a Metropolis step at each iteration of the ARMS sampler, in > which it may choose to reject the proposed move to a new point and stick > at the current point (This is what the "M" in "ARMS" stands for) If you > do repeated calls to arms with the same starting point, then the > iterations where the Metropolis step rejects a move will create a spike > in the sample density at your initial value. If you use a uniform random > starting point, then your sample density will be a mixture of the > target distribution (Metropolis accepts move) and a uniform distribution > (Metropolis rejects move). > > You should be doing something like this: > > res1 <- arms(runif(1,0,100), logDichteGam, function(x) (x>0)&(x<100), 1) > for(i in 2:1000) > res1[i] <- arms(res1[i-1], logDichteGam, function(x) (x>0)&(x<100), 1) > > i.e., using each sampled point as the starting value for the next > iteration. The sequence of values in res1 will then be a correlated > sample from the given distribution: > > acf(res1) > > The bottom line is that you can't use ARMS to draw a single sample > from a non-log-concave density. > > If you are still worried about using ARMS, you can verify your results > using the random walk Metropolis sampler (MCMCmetrop1R) in the package > MCMCpack. > > Martyn > > > ## R Code: > > > > library(HI) > > ## parameter for the distribution > > para <- 0.1 > > > > ## logdensity > > logDichteGam <- function(x, u = para, v = para) { > > -(u*x + v*1/x) - log(x) > > } > > ## density except for the constant > > propDichteGam <- function(x, u = para, v = para) { > > exp(-(u*x + v*1/x) - log(x)) > > } > > ## calculating the constant > > (c <- integrate(propDichteGam, 0, 1000, rel.tol = 10^(-12))$value) > > ## density > > DichteGam <- function(x, u = para, v = para) { > > exp(-(u*x + v*1/x) - log(x))/c > > } > > > > ## calculating 1000 values by repeating a call of arms (this would > > ## be the situation in an Gibbs Sample. Of course in a Gibbs sampler > > ## the distribution would change. This is only for demonstration > > res1 <- NULL > > for(i in 1:1000) > > res1[i] <- arms(runif(1,0,100), logDichteGam, function(x) (x>0)&(x<100), > > 1) > > > > ## Generating a sample of thousand observations with 1 call of arms > > res2 <- arms(runif(1,0,100), logDichteGam, function(x) (x>0)&(x<100), 1000
[R] initial points for arms in package HI
Dear R-users I have a problem choosing initial points for the function arms() in the package HI I intend to implement a Gibbs sampler and one of my conditional distributions is nonstandard and not logconcave. Therefore I'd like to use arms. But there seem to be a strong influence of the initial point y.start. To show the effect I constructed a demonstration example. It is reproducible without further information. Please note that my target density is not logconcave. Thanks for all comments or ideas. Christoph Buser ## R Code: library(HI) ## parameter for the distribution para <- 0.1 ## logdensity logDichteGam <- function(x, u = para, v = para) { -(u*x + v*1/x) - log(x) } ## density except for the constant propDichteGam <- function(x, u = para, v = para) { exp(-(u*x + v*1/x) - log(x)) } ## calculating the constant (c <- integrate(propDichteGam, 0, 1000, rel.tol = 10^(-12))$value) ## density DichteGam <- function(x, u = para, v = para) { exp(-(u*x + v*1/x) - log(x))/c } ## calculating 1000 values by repeating a call of arms (this would ## be the situation in an Gibbs Sample. Of course in a Gibbs sampler ## the distribution would change. This is only for demonstration res1 <- NULL for(i in 1:1000) res1[i] <- arms(runif(1,0,100), logDichteGam, function(x) (x>0)&(x<100), 1) ## Generating a sample of thousand observations with 1 call of arms res2 <- arms(runif(1,0,100), logDichteGam, function(x) (x>0)&(x<100), 1000) ## Plot of the samples mult.fig(4) plot(res1, log = "y") plot(res2, log = "y") hist(res1, freq = FALSE, xlim = c(0,4), breaks = seq(0,100,by = 0.1), ylim = c(0,1)) curve(DichteGam, 0,4, add = TRUE, col = 2) hist(res2, freq = FALSE, xlim = c(0,4), breaks = seq(0,100,by = 0.1), ylim = c(0,1)) curve(DichteGam, 0,4, add = TRUE, col = 2) ## If we repeat the procedure, using the fix intial value 1, ## the situation is even worse res3 <- NULL for(i in 1:1000) res3[i] <- arms(1, logDichteGam, function(x) (x>0)&(x<100), 1) ## Generating a sample of thousand observations with 1 call of arms res4 <- arms(1, logDichteGam, function(x) (x>0)&(x<100), 1000) ## Plot of the samples par(mfrow = c(2,2)) plot(res3, log = "y") plot(res4, log = "y") hist(res3, freq = FALSE, xlim = c(0,4), breaks = seq(0,100,by = 0.1), ylim = c(0,1)) curve(DichteGam, 0,4, add = TRUE, col = 2) hist(res4, freq = FALSE, xlim = c(0,4), breaks = seq(0,100,by = 0.1), ylim = c(0,1)) curve(DichteGam, 0,4, add = TRUE, col = 2) ## If I generate the sample in a for-loop (one by one) I do not ## get the correct density. But this is exactly the situation in ## my Gibbs Sampler. Therfore I am concerned about the correct ## application of arms -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Predict
Dear Matthias Can you provide an example to demonstrate what you did? Two remarks to your email. Maybe that answers already your question. 1) Using predict() you will get the estimated value for each observation or for new data. You can reproduce this value by using the coefficients from your estimated model (see the example below). For the interval you can get a confidence interval for the expected value under fixed conditions of the explanatory variables or you can obtain a prediction interval for a single new observation. The latter is of course wider, since you try to catch a single observation and not the expected value. 2) Using confint() you will get the estimated parameters (which are random variables, too) and their confidence interval. You can use the estimated values to calculate the predicted values. But you can NOT use the upper values from confint to estimate the upper values from predict by just putting them into your regression model. Thats not the way how confidence intervals are constructed. (I am not sure if this was your intention. Maybe if you show a reproducible example you can correct me if you meant something different) ## R Code ## Creation of a dataframe set.seed(1) x1 <- runif(40) f1 <- rep(c("a", "b", "c","d"), each = 10) y <- 2*x1 + rep(c(0.5, 0.1, -0.6, 1.5), each = 10) + rnorm(40, 0, 2) dat <- data.frame(y = y, f1 = f1, x1 = x1) ## regression model reg <- lm(y~ x1 + f1, data = dat) summary(reg) confint(reg) predict(reg, type=c("response"), interval = "confidence") ## caluclation of predicted values using the estimated ## coefficients ## estimated coefficients co <- summary(reg)$coefficients[,"Estimate"] ## Using the regression model with that coefficients ## for observation 11 co["(Intercept)"] + dat[11,"x1"]*co["x1"] + co["f1b"] ## prediction of observation 11 predict(reg, type=c("response"))[11] Regards, Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Matthias Eggenberger writes: > When I callculate a linear model, then I can compute via confint the > confidencial intervals. the interval level can be chosen. as result, I get > the parameter of the model according to the interval level. > > On the other hand, I can compute the prediction-values for my model as well > with predict(object, type=c("response") etc.). Here I have also the > possibility to chose a level for the confidential intervals. the output are > the calculatet values for the fit, the lower and upper level. > > the problem now is, that when I calculate the values through the linear > model function with the parameter values I get from confint() an I compare > them with the values I get from predict() these values differ extremely. Why > is that so? Does the command predict() calculate the values through an other > routine? That means the command predict() doesn't use the same parameters to > calculate the prediction-values than the ones given by confint()? > > Greetings Matthias > > -- > GMX DSL = Maximale Leistung zum minimalen Preis! > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Contingency-Coefficient, Variance
Hi You can easily implement it by: Mag. Ferri Leberl writes: > Dear Everybody! > Excuse me for this newbie-questions, but I have not found answers to these > Question: > > - Is there a command calculating the variance if the whole group is known > (thus dividing by n instead of n-1)? var(x)*(n-1)/n > > - Which is the command to calculate the > - contingency-coefficient? Have a look at ?chisq.test There you can calculate and extract the chi-square statistic and then it is easy to program the needed formula. Please note that there is an argument "exact" in chisq.test(). You can set it on FALSE if you do not want to apply Yates continuity correction. Regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- > > Thank you in advance. > Mag. Ferri Leberl > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] maps drawing
There is also the package maptools if you want or need to read ESRI shapefiles. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- m p writes: > Hello, > is there a package in R that would allow map drawing: > coastlines, country/state boundaries, maybe > topography, > rivers etc? > Thanks for any guidance, > Mark > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Name for factor's levels with contr.sum
This is exactly what I've been looking for (without success) when I was speaking about a more elegant and general solution. I agree with your argument that labels might be misleading. Nevertheless if a user is aware what contr.sum calculates, it is practical to have an annotation. Thank you very much for the solution. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Prof Brian Ripley writes: > One way to do this generally is to make a copy of contr.sum, rename it, > and set the dimnames appropriately. > > I think contr.treatment is misleading (it labels contrasts of two levels > by just one of them), and Christoph's labels are informative but > impractically long. But if you want to label each contrast by the level > it contrasts with the rest, > > cont <- array(0, c(lenglev, lenglev - 1), >list(levels, levels[-lenglev])) > > in a modified contr.sum will do it. E.g. > > > z <- factor(letters[1:3]) > > contr.treatment(z) >b c > a 0 0 > b 1 0 > c 0 1 > > contr.sum(z) >[,1] [,2] > a10 > b01 > c -1 -1 > > mycontr.sum(z) > a b > a 1 0 > b 0 1 > c -1 -1 > > > > > On Wed, 13 Jul 2005, Christoph Buser wrote: > > > Dear Ghislain > > > > I do not know a general elegant solution, but for some > > applications the following example may be helpful: > > > > ## Artificial data for demonstration: group is fixed, species is random > > dat <- data.frame(group = c(rep("A",20),rep("B",17),rep("C",24)), > > species = c(rep("sp1", 4), rep("sp2",5), rep("sp3",5), > >rep("sp4",6), rep("sp5",2), rep("sp6",5), > > rep("sp7",3), > >rep("sp8",3), rep("sp9",4), rep("sp10",6), > > rep("sp11",6), > >rep("sp12",6), rep("sp13",6)), > > area = rnorm(61)) > > > > ## You can attach a contrast at your fixed factor of interest "group" > > ## Create the contrast you like to test (in our case contr.sum for 3 > > ## levels) > > mat <- contr.sum(3) > > ## You can add the names you want to see in the output > > ## Be carefull that you give the correct names to the concerned > > ## column. Otherwise there is the big danger of misinterpretation. > > colnames(mat) <- c(": A against rest", ": B against rest") > > ## Attatch the contrast at your factor "group" > > dat[,"group"] <- C(dat[,"group"],mat) > > ## Now calculate the lme > > library(nlme) > > reg.lme <- lme(area ~ group, data = dat, random = ~ 1|species) > > summary(reg.lme) > > > > Maybe someone has a better idea how to do it generally. > > > > Hope this helps > > > > Christoph Buser > > > > -- > > Christoph Buser <[EMAIL PROTECTED]> > > Seminar fuer Statistik, LEO C13 > > ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND > > phone: x-41-44-632-4673fax: 632-1228 > > http://stat.ethz.ch/~buser/ > > -- > > > > > > Ghislain Vieilledent writes: > > > Good morning, > > > > > > I used in R contr.sum for the contrast in a lme model: > > > > > > > options(contrasts=c("contr.sum","contr.poly")) > > > > Septo5.lme<-lme(Septo~Variete+DateSemi,Data4.Iso,random=~1|LieuDit) > > > > intervals(Septo5.lme)$fixed > > > lower est. upper > > > (Intercept) 17.0644033 23.106110 29.147816 > > > Variete1 9.5819873 17.335324 25.088661 > > > Variete2 -3.3794907 6.816101 17.011692 > > > Variete3 -0.5636915 8.452890 17.469472 > > > Variete4 -22.8923812 -10.914912 1.062558 > > > Variete5 -10.7152821 -1.865884 6.983515 > > > Variete6 0.2743390 9.492175 18.710012 > > > Variete7 -23.7943250 -15.070737 -6.347148 > > > Variete8 -21.7310554 -12.380475 -3.02989
Re: [R] Name for factor's levels with contr.sum
Dear Ghislain I do not know a general elegant solution, but for some applications the following example may be helpful: ## Artificial data for demonstration: group is fixed, species is random dat <- data.frame(group = c(rep("A",20),rep("B",17),rep("C",24)), species = c(rep("sp1", 4), rep("sp2",5), rep("sp3",5), rep("sp4",6), rep("sp5",2), rep("sp6",5), rep("sp7",3), rep("sp8",3), rep("sp9",4), rep("sp10",6), rep("sp11",6), rep("sp12",6), rep("sp13",6)), area = rnorm(61)) ## You can attach a contrast at your fixed factor of interest "group" ## Create the contrast you like to test (in our case contr.sum for 3 ## levels) mat <- contr.sum(3) ## You can add the names you want to see in the output ## Be carefull that you give the correct names to the concerned ## column. Otherwise there is the big danger of misinterpretation. colnames(mat) <- c(": A against rest", ": B against rest") ## Attatch the contrast at your factor "group" dat[,"group"] <- C(dat[,"group"],mat) ## Now calculate the lme library(nlme) reg.lme <- lme(area ~ group, data = dat, random = ~ 1|species) summary(reg.lme) Maybe someone has a better idea how to do it generally. Hope this helps Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Ghislain Vieilledent writes: > Good morning, > > I used in R contr.sum for the contrast in a lme model: > > > options(contrasts=c("contr.sum","contr.poly")) > > Septo5.lme<-lme(Septo~Variete+DateSemi,Data4.Iso,random=~1|LieuDit) > > intervals(Septo5.lme)$fixed > lower est. upper > (Intercept) 17.0644033 23.106110 29.147816 > Variete1 9.5819873 17.335324 25.088661 > Variete2 -3.3794907 6.816101 17.011692 > Variete3 -0.5636915 8.452890 17.469472 > Variete4 -22.8923812 -10.914912 1.062558 > Variete5 -10.7152821 -1.865884 6.983515 > Variete6 0.2743390 9.492175 18.710012 > Variete7 -23.7943250 -15.070737 -6.347148 > Variete8 -21.7310554 -12.380475 -3.029895 > Variete9 -27.9782575 -17.480555 -6.982852 > DateSemi1 -5.7903419 -1.547875 2.694592 > DateSemi2 3.6571596 8.428417 13.199675 > attr(,"label") > [1] "Fixed effects:" > > How is it possible to obtain a return with the name of my factor's levels as > with contr.treatment ? > > Thanks for you help. > > -- > Ghislain Vieilledent > 30, rue Bernard Ortet 31 500 TOULOUSE > 06 24 62 65 07 > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] adding a factor column based on levels of another factor
Hi Karen I am not sure if I understand correctly your question. If no, please ignore this answer. Do you want a new factor "group" which contains the same information like "species", just with other names, e.g 1,2,... or "A","B",... ? If yes you can do it like this ## Your data.frame (without ht & diam) dat <- data.frame(uniqueID = factor(1:8), species = c("sp2", "sp2", "sp3", "sp65", "sp43", "sp2", "sp12", "sp3"), elev = c(3.5, 4.2, 3.2, 2.2, 5.4, 2.5, 1.1, 3.4)) str(dat) ## new factor group (copy of species) dat[,"group"] <- dat[,"species"] ## rename the levels into "1", "2", ... or whatever you want: levels(dat[,"group"]) <- list("3" = "sp12", "4" = "sp2", "2" = "sp3", "5" = "sp43", "1" = "sp65") ## control dat[,"species"] dat[,"group"] Please be careful. This only changes the labels into 1,2,... If you use for example as.numeric(dat[,"group"]) you will get the values that are behind the original alphabetical ordering, meaning "sp12" is 1, "sp2" is 2, etc. You can change this, too if necessary, using as.character() and as.numeric() as well. I hope this is helpful fro your problem. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Karen Kotschy writes: > Hi R users > > Does anyone out there have a better/quicker way of adding a factor column > to a data frame based on levels of another factor? > > I have a (large) data frame consisting of records for individual plants, > each represented by a unique ID number. The species of each plant is > indicated in the column "species", which is a factor column with many > different levels (species). There are multiple records for each species, > and there is no pattern to the order in which the species names appear in > the data frame. > > e.g. >uniqueID species elev ht diam > 1 1sp2 3.51.3 55 > 2 2sp2 4.20.5 15 > 3 3sp3 3.21.0 13 > 4 4sp65 2.22.0 14 > 5 5sp43 5.45.7 20 > 6 6sp2 2.54.1 32 > 7 7sp12 1.10.9 5 > 8 8sp3 3.43.6 2 > > I would like to add a factor column to this data frame, indicating to > which group each individual belongs. All individuals of the same species > will belong to the same group. > > Is there a quick way of saying "for all instances of species1, give the > value 5, for all instances of species2, give the value 4, etc" (where 5 > and 4 are levels of a factor)? > > The only way I can think of doing it is to split the data frame by > species, then add a column to each subset showing the group, then > re-join all the subsets. This seems clumsy and prone to errors. Anyone > know a better way? > > I've looked at expand.grid and gl but they don't seem to do what I want. > > Thanks! > > Karen Kotschy > Centre for Water in the Environment > University of the Witwatersrand > Johannesburg > South Africa > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to get the minimum ?
Dear Philipe You can use optimize (see ?optimize), e.g. : funToMin <- function(x, data, a = 1, b = 1) { sum((data[data[,"group"]=="A","y"] - x)^2) + sum((data[data[,"group"]=="B","y"] - a*x - b)^2) } dat <- data.frame(y = rnorm(100), group = rep(c("A","B"), each = 50)) (m <- optimize(function(x) funToMin(x,dat), interval = c(-10,10))) Please be careful. This function is only for demonstration issue. It is bad programmed. It works if x is only 1 number, but if you call the function, using a vector instead of a single number (and I do not prevent this by checking it), you will get warnings or errors. Therefore it will be better to use your own, hopefully better programmed function in optimize. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Philippe Lamy writes: > Hi, > > I have a model with differents observations Xi. > Each observation belongs to a group, either A or B. > I would like to minimize a fonction like : > > sum( Xi - Z)^2 + sum (Xi - aZ -b)^2 > AB > > The first sum contains all observations from group A and the second all > observations from group B. > I want to find the Z-value wich minimize this function. a and b are > predefined > parameters. > > Thanks for help. > > Philippe > > __ > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] ks.test() output interpretation
Hi I would recommend graphical methods to compare two samples from possible different distributions. See ?qqplot Since the Kolmogorov-Smirnov test has in many cases very small power, you can not conclude that two sample come from the same distribution only because the ks.test is not significant. The following example shows you one problem: In a short simulation we generate 1000 times two samples (with 100 observation per sample). The first sample has a standard normal distribution, the second a t-distribution with 1 degree of freedom. For each of these 1000 pairs we calculate the ks.test and save the p.value. x1 <- matrix(nrow = 100, ncol = 1000) y1 <- matrix(nrow = 100, ncol = 1000) test1 <- numeric(1000) for(i in 1:1000) { set.seed(i) x1[,i] <- rnorm(100) y1[,i] <- rt(100, df = 1) test1[i] <- ks.test(x1[,i],y1[,i])$p.value } sum(test1<0.05) Only in 309 of 1000 cases the test shows a significant difference of the two samples. In all other cases we would conclude that the two sample have the same distribution. This is an example with 100 observation per group. If you have smaller groups the power is even worse. If we look at 10 randomly drawn pairs of the 1000 simulations and plot the qqplot: par(mfrow = c(3,3)) ind <- sample(1:1000, 9) tmp <- sapply(ind, function(j) qqplot(x1[,j],y1[,j], xlab = paste("x1[,",j,"]"), ylab = paste("y1[,",j,"]"))) In many cases we see that the two distributions are different. Compare it to the qqplot of two normal distributed random variables: x2 <- matrix(rnorm(900), nrow = 100, ncol = 9) y2 <- matrix(rnorm(900), nrow = 100, ncol = 9) par(mfrow = c(3,3)) tmp <- sapply(1:9, function(j) qqplot(x2[,j],y2[,j], xlab = paste("x2[,",j,"]"), ylab = paste("y2[,",j,"]"))) Of course there are situations for which the graphical methods fail, too, but it becomes apparent that it is a descriptive way to describe two distributions. Calculating the Kolmogorov-Smirnov test pretends a clear test result (that two distribution are the same) which is wrong or at least misleading. Best regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- kapo coulibaly writes: > I'm using ks.test() to compare two different > measurement methods. I don't really know how to > interpret the output in the absence of critical value > table of the D statistic. I guess I could use the > p-value when available. But I also get the message > "cannot compute correct p-values with ties ..." does > it mean I can't use ks.test() for these data or I can > still use the D statistic computed to make a decision > whether the two samples come from the same > distribution. > > Thanks!! > > > > > > Rekindle the Rivalries. Sign up for Fantasy Football > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] is.all.equal
Hi It is corrected in the newer R-version. The help page now says: 'all.equal(x,y)' is a utility to compare R objects 'x' and 'y' testing "near equality". If they are different, comparison is still made to some extent, and a report of the differences is returned.Don't use 'all.equal' directly in 'if' expressions-either use 'isTRUE(all.equal())' or 'identical' if appropriate. My R version is: > version platform x86_64-unknown-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major2 minor1.1 year 2005 month06 day 20 language R Best regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- hadley wickham writes: > Hi, > > The description of all.equal states "is.all.equal should be used for > programming, typically in if expressions. It is a simple wrapper using > identical as shown in the documentation there.", but is.all.equal is > not explicitly defined there (although there is a hint in the comments > that is.all.equal <- function(x,y) isTRUE(all.equal(x,y))). > > Could the documentation be corrected? (or even better, how about > defining is.all.equal by default) > > Thanks, > > Hadley > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] abbreviate
Down to 17 if we carry on your idea using the matching property on the left hand side, too. :-) p$h=pmax(p$h,p$s) Gabor Grothendieck writes: > Agree that this definitely should be pursued. :) In fact, > we can shave off several keystrokes by > > - replacing p[[1]] with p[1] on the left hand side > - p$high and p$settle with p$h and p$s on the right hand > side (which makes use of the matching property of $) > - '=' instead of '<-' > - remove all the spaces > > This gets it down to 18 characters: > > p[1]=pmax(p$h,p$s) > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Simplify formula for iterative programming
Dear Stef Please check my code below carefully, because it is Friday afternoon and there might be mistakes in it: ## initialize two random vectors (length n = 100) a <- rnorm(100) b <- rnorm(100) ## computation of your formula (see ?outer which is very ## useful here) (c <- sum(sqrt(outer(a,a,"-")^2 + outer(b,b,"-")^2))) ## expand a and b to length n+1 a1 <- c(a,rnorm(1)) b1 <- c(b,rnorm(1)) ## computation calculating the whole sum (c1 <- sum(sqrt(outer(a1,a1,"-")^2 + outer(b1,b1,"-")^2))) ## computation using the result of the smaller vectors c + 2*sum(sqrt((a1 - a1[length(a1)])^2 + (b1 - b1[length(b1)])^2)) Regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Stefaan Lhermitte writes: > Dear R-ians, > > I am looking for the simplification of a formula to improve the > calculation speed of my program. Therefore I want to simplify the > following formula: > > H = Si (Sj ( sqrt [ (Ai - Aj)² + (Bi - Bj)² ] ) ) > > where: > A, B = two vectors (with numerical data) of length n > sqrt = square root > Si = summation over i (= 0 to n) > Sj = summation over j (= 0 to n) > Ai = element of A with index i > Aj = element of A with index j > Bi = element of B with index i > Bj = element of B with index j > > n is not fixed, but it changes with every run for my program. Therefore > for I am looking for a simplication of h in order to calculate it when > my A and B get extendend by 1 element (n = n + 1). > > I know a computional simplified formula exists for the standard > deviation (sd) that is much easier in iterative programming. Therefore I > wondered I anybody knew about analog simplifications to simplify H: > > sd = sqrt [ ( Si (Xi - mean(X) )² ) /n ] -> simplified computation -> > sqrt [ (n * Si( X² ) - ( Si( X ) )² )/ n² ] > > This simplied formula is much easier in iterative programming, since I > don't have to keep every element of X. > E.g.: I have a vector X[1:10] and I already have caculated Si( X[1:10]² > ) (I will call this A) and Si( X ) (I will call this B). > When X gets extendend by 1 element (eg. X[11]) it easy fairly simple to > calculate sd(X[1:11]) without having to reuse the elements of X[1:10]. > I just have to calculate: > > sd = sqrt [ (n * (A + X[11]²) - (A + X[11]²)² ) / n² ] > > This is fairly easy in an iterative process, since before we continue > with the next step we set: > A = (A + X[11]²) > B = (B + X[11]) > > Can anybody help me to do something comparable for H? Any other help to > calculate H easily in an iterative process is also welcome! > > Kind regards, > Stef > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] mplot :how to deal with missing data
If you have to deal with missing values, you might be interested in na.omit(). ?na.omit Regards, Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- NATALIA F TCHETCHERINA writes: > Hello all, > I have data: > >Genes time rep vart dye y trt > 130911 sa1-d07 030min 1 col g 9.636244 o > 145771 sa1-d07 030min 1 col r 8.107577 c > 93335sa1-d07 030min 1 ler g 7.409566 o > 94821sa1-d07 030min 1 ler r 5.107160 c > 10119101 sa1-d07 030min 2 col g 8.336862 o > 11605101 sa1-d07 030min 2 col r 7.824530 c > 725313 sa1-d07 030min 2 ler g 8.249347 o > 740171 sa1-d07 030min 2 ler r 7.565084 c > 1160522 sa1-d07 030min 3 col gNA c > 1011922 sa1-d07 030min 3 col rNA o > 562232 sa1-d07 030min 3 ler g 9.974227 c > 547362 sa1-d07 030min 3 ler r 10.341149 o > .. > .. > .. > > I would like to get graphs means for two-way factor combinations > I used Rlab package: > > mplot(data$y[which(data$Genes==sa1-d07)], > data$time[which(data$Genes==sa1-d07)], data$trt[which(data$Genes==sa1-d07)]) > > However, I have the following error message: > > plot window will lay out plots in a 3 by 1 matrix > Error in plot.window(xlim, ylim, log, asp, ...) : > need finite 'ylim' values > In addition: Warning messages: > 1: no finite arguments to min; returning Inf > 2: no finite arguments to max; returning -Inf > > > I think this is because of some y='NA'. > My question is: how I can deal with this problem? > > Sincerely, Natalia. > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to name variables in a single plot
Dear Amari Why not use legend? y<- c(1:100) x1<-seq(0.1,10, by=0.1) x2<-seq(0.5,50,by=0.5) mydata<- data.frame( y=y, x1=x1, x2=x2) matplot(mydata, type = "l" ,xlab="Time",ylab="MSE ", col = 1:3, lty = 1:3) legend(10,90,c("line1", "line2", "line3"), col = 1:3, lty = 1:3) You can work with text to place any text within the plot See ?legend?text Regards, Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Amir Safari writes: > > > > Dear R Friends , > > I want to name my variables( more than 2 variables in a single plot) within > a plot to distinct them from each other, but I cann't. How it is possible? I > don't mean x and y axis using xlab or ylab. At the below , it follows some > lines, only as an example that you could try please, if it is possible. I > really thanks for your attention. > > Amir > > > > > > library(graphics) > > y<- c(1:100) > > x1<-seq(0.1,10, by=0.1) > > x2<-seq(0.5,50,by=0.5) > > mydata<- data.frame( y=y, x1=x1, x2=x2) > > matplot(mydata, type = "l" ) > > matplot(mydata, type = "l" ,xlab="Time") > > matplot(mydata, type = "l" ,xlab="Time",ylab="MSE ") > > > > __ > > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] many chr2factors ?
Dear Christian If you create your data frame by using data.frame all characters are automatically transformed into factors unless you force them to stay a character. Maybe that can solve your problem easily. dat <- data.frame(a=1:10, b=letters[1:10]) str(dat) `data.frame': 10 obs. of 2 variables: $ a: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 $ b: int 1 2 3 4 5 6 7 8 9 10 Assuming that doesn't solve your problem due to the way your data frame are created you can do it afterwards. There are two problems with your code. First: (and that causes the error) you use in your repeat if(!is.character(df[,i])) next Imagine that the last column of you data frame is not a character you jump to the next cycle and then you are outside of the range of your data frame. Your break condition is ignored. Second: You change your data frame inside of a function. Variables that are created or changed within a function are local. Their life ends with the end of the function. Therefore all changes you do will have no effect on the global data frame you want to change. See the example: dat1 <- structure(list(a = 1:10, b = letters[1:10]), .Names = c("a", "b"), row.names = as.character(1:10), class = "data.frame") str(data.frame(dat1)) `data.frame': 10 obs. of 2 variables: $ a: int 1 2 3 4 5 6 7 8 9 10 $ b: chr "a" "b" "c" "d" ... tofac(dat1) [1] 2 str(data.frame(dat1)) `data.frame': 10 obs. of 2 variables: $ a: int 1 2 3 4 5 6 7 8 9 10 $ b: chr "a" "b" "c" "d" ... You can use the following code instead tofac <- function(x){ for(i in 1:length(x)) { if(is.character(x[,i])) x[,i] <- factor(x[,i]) } x } dat1 <- tofac(dat1) [1] 2 str(dat1) `data.frame': 10 obs. of 2 variables: $ a: int 1 2 3 4 5 6 7 8 9 10 $ b: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 The for loop avoids the problem with the index. Therefore it works in example that have a non character variable in the last column, too and by returning x at the end you are sure that you object keeps existing. Regards, Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- christian schulz writes: > Hi, > > i would like transfrom > characters from a data.frame to factors automatic. > > > tofac <- function(df){ > + i=0 > + repeat{ > + i <- i+1 > + if(!is.character(df[,i])) > + next > + df[,i] <- as.factor(df[,i]) > + print(i) > + if(i == length(df)) > + break } > + } > > > > tofac(abrdat) > [1] 7 > [1] 8 > [1] 9 > [1] 11 > [1] 13 > [1] 15 > Error in "[.data.frame"(df, , i) : undefined columns selected > > This are the correct columns and i get the idea put into the loop > a empty matrix with dimension like df and return it!? > > Another check? > abrdat2 <- apply(abrdat,2,function(x) > ifelse(is.character(x),as.factor(x),x)) > > > many thanks & regards, > christian > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] POSIX problem
Dear David I tried to reproduce your example. It is not exact the same. When I create a dataframe, character are transformed to factors except I use I() to protect "them". > PeopleData.df <- data.frame(StartDate = c("29/10/2001", "7/12/2001", "16/11/2001", "28/11/2001", "2/11/2001", "26/11/2001"), StartTime = c("15:26", "10:32", "13:58", "14:00", "15:22", "11:15")) > str(PeopleData.df) data.frame':6 obs. of 2 variables: $ StartDate: Factor w/ 6 levels "16/11/2001","2/..",..: 5 6 1 4 2 3 $ StartTime: Factor w/ 6 levels "10:32","11:15",..: 6 1 3 4 5 2 So there is a small difference to your dataframe. Then the following produces what you want > as.POSIXct(strptime(paste(PeopleData.df$StartDate,PeopleData.df$StartTime), format="%d/%m/%Y %H:%M")) [1] "2001-10-29 15:26:00 CET" "2001-12-07 10:32:00 CET" [3] "2001-11-16 13:58:00 CET" "2001-11-28 14:00:00 CET" [5] "2001-11-02 15:22:00 CET" "2001-11-26 11:15:00 CET" Regards, Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- David Scott writes: > > I am having trouble with creating a POSIXct object. If I create a variable > of class Date first out of the date part of my data, I am ok, but if I > just paste the date and time parts together and try and create the POSIXct > object, I have problems. > > Here is a toy example created from the actual data which caused the > problem. I am using R 2.0.1 on Windows XP. > > > # Data frame with dates and times, as character > > PeopleData.df > StartDate StartTime > 1 29/10/2001 15:26 > 2 7/12/2001 10:32 > 3 16/11/2001 13:58 > 4 28/11/2001 14:00 > 5 2/11/2001 15:22 > 6 26/11/2001 11:15 > > str(PeopleData.df) > `data.frame': 6 obs. of 2 variables: > $ StartDate: chr "29/10/2001" "7/12/2001" "16/11/2001" "28/11/2001" ... > $ StartTime: chr "15:26" "10:32" "13:58" "14:00" ... > > dput(PeopleData.df) > structure(list(StartDate = c("29/10/2001", "7/12/2001", "16/11/2001", > "28/11/2001", "2/11/2001", "26/11/2001"), StartTime = c("15:26", > "10:32", "13:58", "14:00", "15:22", "11:15")), .Names = c("StartDate", > "StartTime"), row.names = c("1", "2", "3", "4", "5", "6"), class = > "data.frame") > > BeginDate <- as.Date(PeopleData.df$StartDate,format="%d/%m/%Y") > > BeginDate > [1] "2001-10-29" "2001-12-07" "2001-11-16" "2001-11-28" "2001-11-02" > [6] "2001-11-26" > > # Create POSIXct date-time object without difficulty > > BeginTime <- as.POSIXct(format(paste(BeginDate,PeopleData.df$StartTime), > + format="%Y/%m/%d %H:%M")) > > BeginTime > [1] "2001-10-29 15:26:00 New Zealand Standard Time" > [2] "2001-12-07 10:32:00 New Zealand Standard Time" > [3] "2001-11-16 13:58:00 New Zealand Standard Time" > [4] "2001-11-28 14:00:00 New Zealand Standard Time" > [5] "2001-11-02 15:22:00 New Zealand Standard Time" > [6] "2001-11-26 11:15:00 New Zealand Standard Time" > > # But not directly from the dates and times > > BeginTime <- > as.POSIXct(format(paste(PeopleData.df$StartDate,PeopleData.df$StartTime), > + format="%d/%m/%Y %H:%M")) > > BeginTime > [1] "0029-10-20 New Zealand Standard Time" > [2] "0007-12-20 New Zealand Standard Time" > [3] "0016-11-20 New Zealand Standard Time" > [4] "0028-11-20 New Zealand Standard Time" > [5] "0002-11-20 New Zealand Standard Time" > [6] "0026-11-20 New Zealand Standard Time" > > # Format looks correct to me > > paste(PeopleData.df$StartDate,PeopleData.df$StartTime) > [1] "29/10/2001 15:26" "7/12/2001 10:32" &
Re: [R] adjusted p-values with TukeyHSD?
Dear Christoph You can use the multcomp package. Please have a look at the following example: library(multcomp) The first two lines were already proposed by Erin Hodgess: summary(fm1 <- aov(breaks ~ wool + tension, data = warpbreaks)) TukeyHSD(fm1, "tension", ordered = TRUE) Tukey multiple comparisons of means 95% family-wise confidence level factor levels have been ordered Fit: aov(formula = breaks ~ wool + tension, data = warpbreaks) $tension difflwr upr M-H 4.72 -4.6311985 14.07564 L-H 14.72 5.3688015 24.07564 L-M 10.00 0.6465793 19.35342 By using the functions simtest or simint you can get the p-values, too: summary(simtest(breaks ~ wool + tension, data = warpbreaks, whichf="tension", type = "Tukey")) Simultaneous tests: Tukey contrasts Call: simtest.formula(formula = breaks ~ wool + tension, data = warpbreaks, whichf = "tension", type = "Tukey") Tukey contrasts for factor tension, covariable: wool Contrast matrix: tensionL tensionM tensionH tensionM-tensionL 0 0 -110 tensionH-tensionL 0 0 -101 tensionH-tensionM 0 00 -11 Absolute Error Tolerance: 0.001 Coefficients: Estimate t value Std.Err. p raw p Bonf p adj tensionH-tensionL -14.722 -3.8023.872 0.000 0.001 0.001 tensionM-tensionL -10.000 -2.5823.872 0.013 0.026 0.024 tensionH-tensionM -4.722 -1.2193.872 0.228 0.228 0.228 or if you prefer to get the confidence intervals, too, you can use: summary(simint(breaks ~ wool + tension, data = warpbreaks, whichf="tension", type = "Tukey")) Simultaneous 95% confidence intervals: Tukey contrasts Call: simint.formula(formula = breaks ~ wool + tension, data = warpbreaks, whichf = "tension", type = "Tukey") Tukey contrasts for factor tension, covariable: wool Contrast matrix: tensionL tensionM tensionH tensionM-tensionL 0 0 -110 tensionH-tensionL 0 0 -101 tensionH-tensionM 0 00 -11 Absolute Error Tolerance: 0.001 95 % quantile: 2.415 Coefficients: Estimate 2.5 % 97.5 % t value Std.Err. p raw p Bonf p adj tensionM-tensionL -10.000 -19.352 -0.648 -2.5823.872 0.013 0.038 0.034 tensionH-tensionL -14.722 -24.074 -5.370 -3.8023.872 0.000 0.001 0.001 tensionH-tensionM -4.722 -14.074 4.630 -1.2193.872 0.228 0.685 0.447 - Please be careful: The resulting confidence intervals in simint are not associated with the p-values from 'simtest' as it is described in the help page of the two functions. - I had not the time to check the differences in the function or read the references given on the help page. If you are interested in the function you can check those to find out which one you prefer. Best regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Christoph Strehblow writes: > hi list, > > i have to ask you again, having tried and searched for several days... > > i want to do a TukeyHSD after an Anova, and want to get the adjusted > p-values after the Tukey Correction. > i found the p.adjust function, but it can only correct for "holm", > "hochberg", bonferroni", but not "Tukey". > > Is it not possbile to get adjusted p-values after Tukey-correction? > > sorry, if this is an often-answered-question, but i didn´t find it on > the list archive... > > thx a lot, list, Chris > > > Christoph Strehblow, MD > Department of Rheumatology, Diabetes and Endocrinology > Wilhelminenspital, Vienna, Austria > [EMAIL PROTECTED] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] filename
Hi the following works for me: j <- "box" z <- "plot" jpeg(file=paste("YOURPATH",j,z,".jpg", sep = "")) boxplot(rnorm(100)) dev.off() Regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C13 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-44-632-4673 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Lars writes: > Hey, > > I'm generating a .jpeg file for a web application passing parameters to > R via cgi. R gets the parameters by using commandArgs, which are passed > to a variable named j and z > > j<-commandArgs()[3] > z<-commandArgs()[4] > > Later I want to use the characters strings of the argument which the > variables are holding for my filename, e.g.: > > jpeg(file="d:/data/images/jz.jpeg",...) > > i tried to use paste and as.character, but neither worked out the way i > want it to. > > thanks for any advice, Lars Claussen > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Pb with ks.test pvalue
Dear Anthony I don't know how SAS calculates the p-value, but in R the p-value is calculated under the assumption that the parameters of the distribution (you want to compare with your samples) are known and not estimated from the data. In your example you estimate them from the data (by mean(w) and sd(w) and therefore the p-values are not reliable. Somehow you fit the theoretical distribution to well to your data (using mean and sd, estimated from the data). Hence you are too conservative and the p.values are two large. Maybe SAS does a correction for the estimation of the parameters and therefore gets smaller p-values, but this is pure speculation since I don't know the way how SAS is doing the calculation. I did a simulation and created 1 samples from a normal distribution and calculated the ks.test. I expected around 500 significant results (on the level 0.05) by chance and got 1 or 2. I recommend to use graphical methods (e.g. normal plots) to validate the normal distribution of your data instead of testing it. See also ?qqnorm or ?qqplot. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Anthony Landrevie writes: > > Hello, > > While doing test of normality under R and SAS, in order to prove the > efficiency of R to my company, I notice > > that Anderson Darling, Cramer Van Mises and Shapiro-Wilk tests results are > quite the same under the two environnements, > > but the Kolmogorov-smirnov p-value really is different. > > Here is what I do: > > > ks.test(w,pnorm,mean(w),sd(w)) > > One-sample Kolmogorov-Smirnov test > > data: w > > D = 0.2143, p-value = 0.3803 > > alternative hypothesis: two.sided > > > w > > [1] 3837 3334 2208 1745 2576 3208 3746 3523 3430 3480 3116 3428 2184 2383 > 3500 3866 3542 > > [18] 3278 > > > > SAS results: > > Kolmogorov-Smirnov D 0.214278 Pr > D 0.0271 > > Why is the p-value so high under R? Much higher than with other tests. > > Best regards, > > Anthony Landrevie (French Student) > > > > - > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] setAs between a new S4 class and "POSIXlt"
Dear R gurus I've a question concerning the transformation of a new S4 class (that I defined) and the existing class "POSIXlt". I did the following: ## Definition of the new class: setClass("dtime", representation(data = "POSIXlt"), prototype(data = as.POSIXlt("2004/06/01"))) ## Transformation between the new class "dtime" and "POSIXlt": setAs("dtime", "POSIXlt", def = function(from) { [EMAIL PROTECTED] }) setAs("POSIXlt", "dtime", def = function(from) { new("dtime", data = from) }) ## Create a new "dtime" object: (x1 <- new("dtime")) str(x1) ## Transformation to "POSIXlt" class works well: (y1 <- as(x1, "POSIXlt")) str(y1) ## Transformation to "dtime" class fails as(y2, "dtime") > Fehler in insertMethod(methods, sig, args, def, TRUE) : inserting method into non-methods-list object (class "NULL") ## This works properly new("dtime", data = y2) ## Now I put another setAs for the Subclass "POSIXt" setAs("POSIXt", "dtime", def = function(from) { new("dtime", data = from) }) ## and the transformation to "dtime" class works: as(y2, "dtime") I tried to understand what happend, without success. Could someone give me a hint or a reference (some help page or others) to improve my understanding, please? Thank you very much. I work with: platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status Under development (unstable) major2 minor1.0 year 2005 month03 day 22 language R Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] wilcox.test statistics
Hi, a value of 0 for the test statistic is possible. The test statistic is not just the sum of ranks, but this sum - n*(n+1)/2, where n is the number of observations of the group the rank sum is build. This statistic is equivalent to the ranks sums, since it differs only about a constant, which depends on the number of observations. Look at the following situation > x <- 1:10 > y <- 11:20 > wilcox.test(x,y) Wilcoxon rank sum test data: x and y W = 0, p-value = 1.083e-05 alternative hypothesis: true mu is not equal to 0 When every observation of group1 is smaller than those of group2 the rank sum of the smaller group is sum(1:n1) = sum(1:10) = 10*(10+1)/2 = n1*(n1+1)/2 If you compare this to the test statistics, you'll observe that in this case the test statistic is 0. Regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Liu Ting-Yuan writes: > > Hi, > > Could anyone provide the formula of the statistics which the wilcox.test > used for the two-sample rank-sum test? I got some statistics of 0 values, > but it is impossible to have 0 "rank-sum". Does the function use the > Mann-Whitney U test statistics? Thanks. > > Ting-Yuan Liu > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problem with se.contrast()
Dear Prof Ripley, Dear Prof Dalgaard Thank you both for your help. I tried it with helmert contrasts and got a result that is consistent with lme. I didn't realize that the parameterization of the model has an influence on the contrasts that I tried to test. It seems that I should read a little bit more about this issue for my better understanding. I have one last point to propose: You could include (as interim solution) a warning (that there might be an in efficiency loss) in se.contrast() if one uses non-orthogonal contrasts and a multi-stratum model. Best regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Prof Brian Ripley writes: > On Mon, 21 Feb 2005, Peter Dalgaard wrote: > > > Prof Brian Ripley <[EMAIL PROTECTED]> writes: > > > >>>> test.aov <- with(testdata,aov(Measurement ~ Material + > >>>> Error(Lab/Material))) > >>>> se.contrast(test.aov, > >>>> > >>>> list(Material=="A",Material=="B",Material=="C",Material=="D"), > >>>> coef=c(0.5,0.5,-0.5,-0.5),data=testdata) > >>>> [1] 0.1432572 > >>> > >>> I got a different result and I have admit that I didn't understand why > >>> there is a differnce between the lme model and this one. There are some > >>> comments in the help pages but I'm not sure if this is the answer. > >> > >> It is. You used the default `contrasts', which are not actually > >> contrasts. With > >> > >> options(contrasts=c("contr.helmert", "contr.poly")) > >> > >> it gives the same answer as the other two. Because you used > >> non-contrasts there was an efficiency loss (to the Intercept stratum). > > > > Brian, > > > > I'm not sure how useful that contrasts-that-are-not-contrasts line is. > > I agree, it was not precise enough (too late at night for me). Try > `non-orthogonal contrasts'. The issue was correct though, it is the > choice of contrasts, and as I would automatically use orthogonal contrasts > with aov() I had not encountered it and it took me a while to pick up on > what Christoph had done differently from me (I had run the example and got > the same result as the randomized-block analysis before my original post). > > There's a comment in the code for aov: > > ## helmert contrasts can be helpful: do we want to force them? > ## this version does for the Error model. > > and perhaps we should make them the default for the per-stratum fits. > > > It certainly depends on your definition of "contrasts". Contrast > > matrices having zero column sums was not part of the definition I was > > taught. I have contrasts as "representations of the group mean > > structure that are invariant to changes of the overall level", so > > treatment contrasts are perfectly good contrasts in my book. > > I don't think Yates thought in those terms, and the whole idea of dividing > into strata (and the generalized Yates algorithm) is based on contrasts > being orthogonal to the overall mean (and to things in other strata). > > It was that, not zero-sum, that I was taught, but in balanced cases such > as here it is the same thing. > > > The zero-sum condition strikes me as a bit arbitrary: after all there > > are perfectly nice orthogonal designs where some levels of a factor > > occur more frequently than others. > > Balance and orthogonality are not quite the same, though. > > > This in turn makes me a bit wary of what is going on inside > > se.contrasts, but it's gotten too late for me to actually study the code > > tonight. > > > > Could you elaborate on where precisely the condition on the contrast > > matrices comes into play? > > In finding the projection lengths in eff.aovlist, here > > proj.len <- > lapply(aovlist, function(x) > { > asgn <- x$assign[x$qr$pivot[1:x$rank]] > sp <- split(seq(along=asgn), attr(terms(x), "term.labels")[asgn]) > sapply(sp, function(x, y) sum(y[x]), y=diag(x$qr$qr)^2) > }) > > using only the diagonal requires orthogona
Re: [R] bivariate empirical cdf
Hi I changed you function a little bit, so there is no more conflict in the if condition: mecdf<-function(u,v,z) { u=sort(u) v=sort(v) n=length(u) m=length(z) nb <- numeric(m) for(j in seq(1,m)) { nb.temp=0 for (i in seq(1,n)) { if (u[i] Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ -- MEYFREDI Jean-Christophe writes: > Dear R users, > > > I'm trying to write a small function in order to compute empirical > cumulative density function.All seems to work but when I try to plot the > function, I always get error messages. > > This is the function I use > > mecdf<-function(u,v,z) { > u=sort(u) > v=sort(v) > n=length(u) > nb=0 > > for (i in seq(1,n)) { > if (u[i] nb<-nb+1 > } > } > nb=nb/n > return(nb) > } > > In fact if I try to obtain mecdf(u,v,0.1) or mecdf(u,v,0.2), all is > good, but with mecdf(u,v,c(0.1,0.2)), for example, I get errors and wrong > results for both mecdf(u,v,0.1) and mecdf(u,v,0.2). I think that it > consitutes the key point of my plot difficulty. Can someone help me ? > > Best regards > > JCM > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problem with se.contrast()
Dear Jamie I already thought that your data structure could be more complicated than in the example. I would be careful anywhere. Since there is a difference in the results of se.contrasts() in R-devel and the results from lme (and the with lme consistent results of the aggregated data) in this nice balanced design, I am suspicious, especially if you real design is more complicated. Even if you get no error message for your data, I'd calculate the analysis using for example lme for a second time to control the results. I wish you all the best for your analysis. Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Jamie Jarabek writes: > Christoph, > > Thank you for your advice. My actual design is indeed more complicated than > what > I have indicated here. I was just using this as a toy example illustrate my > particular problem. As suggested by Prof. Ripley I will download R-devel and > see > if the fixes included within alleviate my problems. > > Jamie Jarabek > __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] problem with se.contrast()
Dear Jamie As Prof. Ripley explained your analysis is equivalent to the fixed effect models for the means, so you can calculate it by (if this is your design): > Lab <- factor(rep(c("1","2","3"),each=12)) > Material <- factor(rep(c("A","B","C","D"),each=3,times=3)) > Measurement <- c(12.20,12.28,12.16,15.51,15.02,15.29,18.14,18.08,18.21, > 18.54,18.36,18.45,12.59,12.30,12.67,14.98,15.46,15.22, > 18.54,18.31,18.60,19.21,18.77,18.69,12.72,12.78,12.66, > 15.33,15.19,15.24,18.00,18.15,17.93,18.88,18.12,18.03) > testdata <- data.frame(Lab,Material,Measurement) > rm(list=c("Lab","Material","Measurement")) You can aggregate your data > dat.mean <- aggregate(testdata$Measurement, > by = list(Material=testdata$Material,Lab=testdata$Lab), > FUN = mean) > names(dat.mean)[3] <- "Measurement" > test.red.aov1 <- aov(Measurement ~ Lab + Material, data = dat.mean) > se.contrast(test.red.aov1, > list(Material=="A",Material=="B",Material=="C",Material=="D"), > coef=c(0.5,0.5,-0.5,-0.5),dat.mean) > [1] 0.1220339 By aggregating the data you bypass the problem in se.contrast and you do not need R-devel. - The second way to get the same is to set your contrast for the factor "Material" and calculate you model with this contrast and use summary.lm: > dat.mean$Material <- C(dat.mean$Material, c(-0.5,-0.5,0.5,0.5), >how.many = 3) > test.red.aov2 <- aov(Measurement ~ Lab + Material, data = dat.mean) > summary.lm(test.red.aov2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 16.020000.10568 151.583 5.56e-12 *** Lab2 0.258330.14946 1.7280.135 Lab3 0.065830.14946 0.4400.675 Material14.522780.12203 37.062 2.58e-08 *** Material21.210560.12203 9.920 6.06e-05 *** Material31.553890.12203 12.733 1.44e-05 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Material1 is now the contrast you are interested in and you get beside the Std. Error the Estimate, too. Material2 and Material3 are just orthogonal contrasts to complete. - The third way (which I prefer) is using lme from the package nlme > library(nlme) Here I use the origianl data set (not aggregated) and set the desired contrast, too. > testdata$Material <- C(testdata$Material, c(-0.5,-0.5,0.5,0.5), >how.many = 3) > test.lme <- lme(fixed = Measurement ~ Material, data = testdata, > random = ~1|Lab/Material) With anova you get the F-Test for the fixed factor "Material" > anova(test.lme) > numDF denDF F-value p-value (Intercept) 124 43301.14 <.0001 Material3 6 544.71 <.0001 and with the summary you have your contrast with standard error: > summary(test.lme) Fixed effects: Measurement ~ Material Value Std.Error DF t-value p-value (Intercept) 16.128056 0.07750547 24 208.08925 0e+00 Material14.522778 0.12203325 6 37.06185 0e+00 Material21.210556 0.12203325 6 9.91988 1e-04 Material31.553889 0.12203325 6 12.73332 0e+00 - Last but not least I tried it with R-devel and the original data frame: First I reset the contrast on the default value: testdata$Material <- C(testdata$Material, "contr.treatment", how.many = 3) I used your syntax and se.contrast(): > test.aov <- with(testdata,aov(Measurement ~ Material + Error(Lab/Material))) > se.contrast(test.aov, > list(Material=="A",Material=="B",Material=="C",Material=="D"), > coef=c(0.5,0.5,-0.5,-0.5),data=testdata) > [1] 0.1432572 I got a different result and I have admit that I didn't understand why there is a differnce between the lme model and this one. There are some comments in the help pages but I'm not sure if this is the answer. - I hope some of the code above can help to analyse your data. Maybe Prof. Ripley was right and you have another design. Then you just can ignore this and your life is much more easier :-) Best regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal
Re: [R] (no subject)
Hi Jim I'm not sure if I understand your problem correctly. Is this a solution? (li <- list(a=1,b=200.44)) as.numeric(paste(unlist(li), collapse = "")) Best regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Jim Gustafsson writes: > > R-people > > I wonder if one could change a list of table with number of the form > 1,200.44 , to 1200.44 > > Regards > JG > > > -- > This e-mail and any attachment may be confidential and may also be > privileged. > If you are not the intended recipient, please notify us immediately and then > delete this e-mail and any attachment without retaining copies or disclosing > the contents thereof to any other person. > Thank you. > -- > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] How to generate labels or names?
Hi Eric If you want produce a vector with names, you can use v <- rnorm(20) names(v) <- paste("Lab",1:20, sep="") Regards, Christoph ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Eric Rodriguez writes: > Hi, > > I'm new to R and I would like to generate labels like data.frame does > : "V1 V2 V3...". > I'm trying to generate a N vector with label such as "Lab1 Lab2 ... LabN". > > I guess this is pretty easy when you know R ;) > > Thanks for help > > Eric > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] sw
Dear Mahdi Mahdi Osman writes: > Hi list, > > I am just a new user of R. > > How can I run stepwise regression in R? If you look for a stepwise procedure for Linear Regression Models or Generalized Linear Models, you can use step() (see ?step) Regards, Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ -- > Is there a graphic user interphase for any of the spatial packages inculded > in R, such as gstat, geoR and someothers. I am mainly interested interactive > variogram modelling and mapping. > > Thanks > Mahdi > > -- > --- > Mahdi Osman (PhD) > E-mail: [EMAIL PROTECTED] > --- > > 10 GB Mailbox, 100 FreeSMS http://www.gmx.net/de/go/topmail > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] r square values for independent variables in multiple linear regr ession model -- newbie
Dear Avneet If you fit a multiple linear regression model your independent variables will not be orthogonal and therefore it is difficult to divide the r square on the different variables with a meaningful interpretation. In a balanced analysis of variance design the situation is somehow different. As result of the balanced design the explanatory variables are orthogonal and you can decompose the sum of squares on the different variables. But if you have covariables you must be careful. I wouldn't try to assign r square values for each Variable. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ -- Singh, Avneet writes: > Hello > > Could you please suggest a way to find out the r square values for each > independent variable while using "lm" for developing a multiple linear > regression model. > > Thank you > avneet > > "I have no data yet. It is a capital mistake to theorize before one has > data. Insensibly one begins to twist facts to suit theories instead of > theories to suit facts." > ~ Sir Arthur Conan Doyle (1859-1930), Sherlock Holmes > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Wilcoxon test for mixed design (between-within subjects)
Hallo Marco Marco Chiarandini writes: > Hallo, > > is there any extension of the pairwise Wilcoxon test to a dependent > samples layout with replicates (or, in other terms, a one-way layout > with blocking and replicates)? > There is always the possibility to summarize the replicates and then calculate a common pairwise Wilcoxon test. > The Wilcoxon method with matched pairs works for the case of dependent > samples with one observation per block, while the Mann-Whitney test > works for independent samples, thus one single block and replicated > observations. Is there a method which mixes this two cases considering a > depedent sample design in which each block has more than one > observation? I know it exists a Friedman test for this case but in the > Friedman test ranks are constructed considering all subjects jointly, > while in Wilcoxon only the pair of subject currently considered are > ranked, thus resulting in a more powerful test. > The Friedman test calculates the ranks inside of the blocks, for each block separately and is not considering all subjects jointly. In R implemented is the case with unreplicated blocked data, so this doesn't help you. > If no such method exists in R, I am anyway intersted in possible > references. Look at: Myles Hollander & Douglas A. Wolfe (1999), _Nonparametric statistical methods_. New York: John Wiley & Sons. There you find the generalization of the replicated case, but you have to implement it. Regards, Christoph Buser ------ Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ -- > Thank you for the consideration, > > Regards, > > Marco > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Cholesky Decomposition
Dear Kolluru For a real symmetric positive-definite square matrix you can use chol (see ?chol) in the base package. help.search("cholesky") gives some more alternatives: chol.new(assist)A Modified Cholesky Decomposition chol.reduce(kernlab)Incomplete Cholesky decomposition gchol(kinship) Generalized Cholesky decompostion solve.bdsmatrix(kinship) Solve a matrix equation using the generalized Cholesky decompostion solve.gchol(kinship)Solve a matrix equation using the generalized Cholesky decompostion Cholesky-class(Matrix) Cholesky decompositions sscChol-class(Matrix) Cholesky decompositions of sscMatrix objects chol(base) The Choleski Decomposition chol2inv(base) Inverse from Choleski Decomposition Hope there is something for you. Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ kolluru ramesh writes: > Can we do Cholesky Decompositon in R for any matrix > > > - > > > [[alternative HTML version deleted]] > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Dropping a digit with scan() on a connection
Dear Tim You can use cat("TITLE extra line", "235 335 535 735", "115 135 175", file="ex.data", sep="\n") cn.x <- file("ex.data", open="r") a <- scan(cn.x, skip=1, n=2, sep = " ") > Read 2 items a > [1] 235 335 b <- scan(cn.x, n=2, sep = " ") > Read 2 items b > [1] 535 735 c <- scan(cn.x, n=2, sep = " ") > Read 2 items c > [1] 115 135 d <- scan(cn.x, n=1, sep = " ") > Read 1 items d > [1] 175 Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ Tim Howard writes: > R gurus, > > My use of scan() seems to be dropping the first digit of sequential > scans on a connection. It looks like it happens only within a line: > > > cat("TITLE extra line", "235 335 535 735", "115 135 175", > file="ex.data", sep="\n") > > cn.x <- file("ex.data", open="r") > > a <- scan(cn.x, skip=1, n=2) > Read 2 items > > a > [1] 235 335 > > b <- scan(cn.x, n=2) > Read 2 items > > b > [1] 35 735 > > c <- scan(cn.x, n=2) > Read 2 items > > c > [1] 115 135 > > d <- scan(cn.x, n=1) > Read 1 items > > d > [1] 75 > > > > Note in b, I should get 535, not 35 as the first value. In d, I should > get 175. Does anyone know how to get these digits? > > The reason I'm not scanning the entire file at once is that my real > dataset is much larger than a Gig and I'll need to pull only portions of > the file in at once. I got readLines to work, but then I have to figure > out how to convert each entire line into a data.frame. Scan seems a lot > cleaner, with the exception of the funny character dropping issue. > > Thanks so much! > Tim Howard > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] merge data frames taking mean/mode of multiple macthes
Dear Avneet the function aggregate (see also ?aggregate) could be useful for your problem. See the short example I've written below. dat1 <- data.frame(lot = factor(1:10),y1 = rnorm(10)) str(dat1) dat2 <- data.frame(nr = 1:100, lot = factor(rep(1:10, each = 10)),y2 = rnorm(100)) str(dat2) dat2.agr <- aggregate(dat2$y, by = list(lot = dat2$lot), FUN = mean) names(dat2.agr)[2] <- "y2" dat.mer <- merge(dat1, dat2.agr) str(dat.mer) Be careful about merging dataframes. There should always be a control that the right cases are merged together. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ avneet singh writes: > Hello :) > > I have two data frames, one has properties taken on a > piece by piece basis and the other has performance on > a lot by lot basis. I wish to combine these two data > frames but the problem is that each lot has multiple > pieces and hence i need to take a mean of the > properties of multiple pieces and match it to the row > having data about the lot. > > I was wondering if there is a simple commmand, an > extension of "merge", or an option of merge i do not > know which could easily do this work. > > Thank you :) > > = > I believe in equality for everyone, except reporters and photographers. > ~Mahatma Gandhi > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Kolmogorov-Smirnof test for lognormal distribution with estimated parameters
Hi Kwabena I did once a simulation, generating normal distributed values (500 values) and calculating a KS test with estimated parameters. For 1 times repeating this test I got about 1 significant tests (on a level alpha=0.05 I'm expecting about 500 significant tests by chance) So I think if you estiamte the parameters from the data, you fit to good and the used distribution of the test statistic is not adequate as it is indicated in the help page you cited. There (in the help page) is some literature, but it is no easy stuff to read. Furthermore I know no implementation of an KS test which accounts for this estimation of the parameter. I recommend a graphical tool instead of a test: x <- rlnorm(100) qqnorm(log(x)) See also ?qqnorm and ?qqplot. If you insist on testing a theoretical distribution be aware that a non significant test does not mean that your data has the tested distribution (especially if you have few data, there is no power in the test to detect deviations from the theoretical distribution and the conclusion that the data fits well is trappy) If there are enough data I'd prefer a chi square test to the KS test (but even there I use graphical tools instead). See ?chisq For this test you have to specify classes and this is subjective (you can't avoid this). You can reduce the DF of the expected chi square distribution (under H_0) by the number of estimated parameters from the data and will get better results. DF = number of classes - 1 - estimated parameters I think this test is more powerful than the KS test, particularly if you must estimate the parameters from data. Regards, Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ Kwabena Adusei-Poku writes: > Hello all, > > Would somebody be kind enough to show me how to do a KS test in R for a > lognormal distribution with ESTIMATED parameters. The R function > ks.test()says "the parameters specified must be prespecified and not > estimated from the data" Is there a way to correct this when one uses > estimated data? > > Regards, > > Kwabena. > > > Kwabena Adusei-Poku > University of Goettingen > Institute of Statistics and Econometrics > Platz der Goettingen Sieben 5 > 37073 Goettingen > Germany > Tel: +49-(0)551-394794 > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Getting empirical percentiles for data
Dear Mike Mike Cheung writes: > Dear List, > > I have some discrete data and want to calculate the percentiles and the > percentile ranks for each of the unique scores. I can calculate the > percentiles with quantile(). > > I know that "ecdf" can be used to calculate the empirical cumulative > distribution. However, I don't know how to exact the cumulative > probabilities for each unique element. The requirement is similar to the > "FREQUENCIES" in SPSS. Could someone help me in exacting the cumulative > probabilities from the ecdf object? Thanks in advance! You can use the following function: f.freq <- function(x) { tab <- data.frame(table(x)) tab$Percent <- tab$Freq*100/length(x) tab$Cum.Percent[1] <- tab$Percent[1] for(i in 2:length(tab[,1])) tab$Cum.Percent[i] <- tab$Cum.Percent[i-1] + tab$Percent[i] tab } x <- round( rnorm(50, mean=50, sd=10) ) f.freq(x) This should give you a table analog to frequencies in SPSS. > > # Generating artificial data > x <- round( rnorm(50, mean=50, sd=10) ) > probs <- seq(0.1, 0.9, by=0.1) > # Calculating percentiles > x.percentiles <- quantile(x, probs) > # Calculating percentile ranks > x.ranks <- ecdf(x) > > Best, > Mike > > __ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Best, Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] integer factorization
Hi Robin There is a function factorize() in the package conf.design library(conf.design) factorize(60) [1] 2 2 3 5 Hope this helps Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] different DF in package nlme and lme4
Hi all I tried to reproduce an example with lme and used the Orthodont dataset. library(nlme) fm2a.1 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1 | Subject) anova(fm2a.1) > numDF denDF F-value p-value > (Intercept) 180 4123.156 <.0001 > age 180 114.838 <.0001 > Sex 1259.292 0.0054 or alternatively (to get the same result) fm2a.2 <- lme(distance ~ age + Sex, data = Orthodont, random = list(Subject = ~ 1)) anova(fm2a.2) > numDF denDF F-value p-value > (Intercept) 180 4123.156 <.0001 > age 180 114.838 <.0001 > Sex 1259.292 0.0054 --- Then I restarted!!! R to use the lme4 package instead of nlme. --- library(lme4) fm2b.1 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1 | Subject) anova(fm2b.1) > Analysis of Variance Table > Df Sum Sq Mean Sq Denom F valuePr(>F) > age 1 235.356 235.356 105.000 114.8383 < 2.2e-16 *** > Sex 1 19.044 19.044 105.000 9.2921 0.002912 ** or alternatively (to get the same result) fm2b.2 <- lme(distance ~ age + Sex, data = Orthodont, random = list(Subject = ~ 1anova(fm2b.2) > Analysis of Variance Table > Df Sum Sq Mean Sq Denom F valuePr(>F) > age 1 235.356 235.356 105.000 114.8383 < 2.2e-16 *** > Sex 1 19.044 19.044 105.000 9.2921 0.002912 ** I got different DF for the denominator. Do I have to use lme in another way in the package lme4? I use R 2.0.1 under linux and Package: nlme Version: 3.1-53 Date: 2004-11-03 Package: lme4 Version: 0.6-11 Date: 2004-12-16 Thanks for help. Regards, Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Random intercept model with time-dependent covariates, results different from SAS
Answering on a mail from >From Keith Wong Date Sun 04 Jul 2004 - 17:21:36 EST Subject [R] Random intercept model with time-dependent covariates, results different from SAS Hi all I've got a question about the degrees of freedom in a mixed model, calculated with lme from the lme4 package. Since I've no access to the original data of Keith Wong, I generated a dataframe with the same structure (but balanced) by > set.seed(1) > dat <- data.frame(y = rnorm(50), group = rep(c("a","b"), each = 25), time = factor(rep(1:5,10)), w = rnorm(50), z = rnorm(50), id = factor(rep(1:10, each = 5))) > str(dat) The subject id is nested in group. We have 5 repeated measures for each subject. Surly there is no meaningfull interpretation for the results but for demonstration it will do well. I do the model without the covariates: > library(nlme) > options(contrasts = c("contr.SAS", "contr.poly")) > g2 <- lme(y ~ time+group+time:group, data = dat, random = ~ 1 | id) > anova(g2) Analysis of Variance Table numDF denDF F-value p-value (Intercept) 132 0.6340507 0.4317 time432 0.1103619 0.9780 group 1 8 0.2924309 0.6034 time:group 432 0.4614766 0.7634 I quit R and restart it and now use library(lme4) > library(lme4) > options(contrasts = c("contr.SAS", "contr.poly")) > g2 <- lme(y ~ time+group+time:group, data = dat, random = ~ 1 | id) > anova(g2) Analysis of Variance Table Df Sum Sq Mean Sq Denom F value Pr(>F) time4 0.351 0.088 40.000 0.1104 0.9782 group 1 0.233 0.233 40.000 0.2925 0.5916 time:group 4 1.468 0.367 40.000 0.4615 0.7635 I get other degrees of freedom for the denominator. How can I tell lme that id is nested in the fixed factor group to get 8 as denominator DF for group? In my example the difference is small (I've generated the data randomly) but in other more meaningfull analysis the different DF can change the results. I Use R 2.0.1 under Linux. Package: nlme Version: 3.1-53 Package: lme4 Version: 0.6-11 In the original message the example was also calculated with SAS and there was the problem of the degrees of freedom for group, too, but the syntax wasn't correct. One must specify > PROC MIXED; > CLASS id time group; > MODEL y = time group time*group /solution; > RANDOM int /subject = id(group); > RUN; It is important to tell SAS that the subject id is nested in group and you can do this by using subject = id(group) instead of subject = id Then you will get the correct degrees of freedom for the test of group: Type 3 Tests of Fixed Effects Num Den Effect DFDF F Value Pr > F time 432 0.11 0.9780 group 1 8 0.29 0.6033 time*group432 0.46 0.7634 Thanks in advance for an answer. I'm interested if I must use the lme syntax in another way for the lme4 package. Regards, Christoph Buser __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Friedman test for replicated blocked data
Hi Marco > I would need to extend the Friedman test to a replicated > design. Currently the function: friedman.test(y, ...) only works for > unreplicated designs. You'll find something about this topic in: Myles Hollander & Douglas A. Wolfe (1999), _Nonparametric statistical methods_. 2nd edition, John Wiley & Sons. Chapter 7 In Chapter 7.9 there is an extension to the balanced replicated design. Notice also the comments 70 and 74 about the motivation of the test and the approximation of the test statistic and comment 77 for an extension to the general case (unbalanced). They give further references for more details, too. Hope this will help you Christoph -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO C11 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-5414 fax: 632-1228 http://stat.ethz.ch/~buser/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] drop1 with contr.treatment
Dear R Core Team I've a proposal to improve drop1(). The function should change the contrast from the default ("treatment") to "sum". If you fit a model with an interaction (which ist not signifikant) and you display the main effect with drop1( , scope = .~., test = "F") If you remove the interaction, then everything's okay. There is no way to fit a model, let an non signifikant interaction in the model and get interpretable main effects with the "treatment" contrast. One solution would be to change automatically the contrast to the "sum" contrast. Another possibility is to produce a warning when you use drop1 with the scope argument to get main effects in the presence of an interaction (even not signifikant) with "treatment" contrast. An example: library(MASS) ##- Data "genotype" names(genotype) ##- > [1] "Litter" "Mother" "Wt" ##- to be sure the contrasts are "treatment" options(contrasts = c("contr.treatment", "contr.poly" )) ##- model with interaction gen.int <- aov(Wt ~ Litter*Mother, data = genotype) drop1(gen.int, scope = .~., test = "F") ##- Model: ##- Wt ~ Litter * Mother ##- Df Sum of SqRSSAIC F value Pr(F) ##- 2440.8 257.0 ##- Litter 3 591.7 3032.5 264.3 3.6362 0.01968 * ##- Mother 3 582.3 3023.1 264.1 3.5782 0.02099 * ##- Litter:Mother 9 824.1 3264.9 256.8 1.6881 0.12005 ##- --- ##- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 ##- the interaction is not signifikant, Litter is signifikant, ##- but is that really true? ##- main effect model gen.main <- aov(Wt ~ Litter + Mother, data = genotype) drop1(gen.main, scope = .~., test = "F") ##- Model: ##- Wt ~ Litter + Mother ##-Df Sum of SqRSSAIC F value Pr(F) ##- 3264.9 256.8 ##- Litter 3 63.6 3328.5 252.0 0.3508 0.78870 ##- Mother 3 775.1 4040.0 263.8 4.2732 0.00886 ** ##- --- ##- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 ## Dramatic change of the sum of squares for the factor Litter ## With contrast = "sum" the whole thing looks better options(contrasts = c("contr.sum", "contr.poly" )) ##- model with interaction gen.int1 <- aov(Wt ~ Litter*Mother, data = genotype) drop1(gen.int1, scope = .~., test = "F") ##- Model: ##- Wt ~ Litter * Mother ##- Df Sum of SqRSSAIC F value Pr(F) ##- 2440.8 257.0 ##- Litter 3 27.7 2468.5 251.7 0.1700 0.91612 ##- Mother 3 671.7 3112.6 265.9 4.1282 0.01142 * ##- Litter:Mother 9 824.1 3264.9 256.8 1.6881 0.12005 ##- --- ##- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 ##- the interaction is not signifikant, Litter is NOT! signifikant. gen.main1 <- aov(Wt ~ Litter + Mother, data = genotype) drop1(gen.main1, scope = .~., test = "F") ##- Wt ~ Litter + Mother ##-Df Sum of SqRSSAIC F value Pr(F) ##- 3264.9 256.8 ##- Litter 3 63.6 3328.5 252.0 0.3508 0.78870 ##- Mother 3 775.1 4040.0 263.8 4.2732 0.00886 ** ##- --- ##- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 ##- Litter stays not signifikant. Best regards Christoph Buser -- Christoph Buser <[EMAIL PROTECTED]> Seminar fuer Statistik, LEO D6 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3505 fax: 632-1228 http://stat.ethz.ch/~buser/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html