Re: [R] hypergeometric vs fisher.test
Andrea Franceschini wrote: I ask the question also because I found this line in Wikipedia: The test (see above) based on the hypergeometric distribution (hypergeometric test) is identical to the corresponding one-tailed version of Fisher's exact test. Is this wrong ? May I kindly ask a friendly explanation for not-experts in statistics ? Thx a lot, You never said you were a non-expert. A question like that might very well have come from a student, or an incompetent claiming to have found a bug in fisher.test... The point was that Fisher's test takes the 2x2 table a b c d and interprets the situation under the null hypothesis as taking a sample of size a+c from an urn with a+b white balls and c+d black balls. Once you read the docs for phyper properly (it _is_ tricky to get it right), you arrive at phyper(16,17+181,449+19551,17+449, lower.tail=FALSE) [1] 3.693347e-06 -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unable to retrieve residual sum of squares from nls output
Colleagues, I am using nls successfully (2.11.1, OS X) but I am having difficulties retrieving part of the output - residual sum of squares. I have assigned the output to FIT: FIT Nonlinear regression model model: NEWY ~ PMESOR + PAMPLITUDE * cos(2 * pi * (NEWX - POFFSET)/PERIOD) data: parent.frame() PMESOR PAMPLITUDEPOFFSET 1153.02 -1183.09 24.58 residual sum-of-squares: 1815056 Number of iterations to convergence: 8 Achieved convergence tolerance: 1.643e-08 As you can see, the object contains residual sum-of-squares: 1815056. However, I can't figure out how to grab that value. The tried using str: str(FIT) List of 6 $ m :List of 16 ..$ resid :function () ..$ fitted:function () ..$ formula :function () ..$ deviance :function () ..$ lhs :function () ..$ gradient :function () ..$ conv :function () ..$ incr :function () ..$ setVarying:function (vary = rep(TRUE, length(useParams))) ..$ setPars :function (newPars) ..$ getPars :function () ..$ getAllPars:function () ..$ getEnv:function () ..$ trace :function () ..$ Rmat :function () ..$ predict :function (newdata = list(), qr = FALSE) ..- attr(*, class)= chr nlsModel $ convInfo :List of 5 ..$ isConv : logi TRUE ..$ finIter: int 8 ..$ finTol : num 1.64e-08 ..$ stopCode : int 0 ..$ stopMessage: chr converged $ data : language parent.frame() $ call : language nls(formula = NEWY ~ PMESOR + PAMPLITUDE * cos(2 * pi * (NEWX - POFFSET)/PERIOD), start = list(PMESOR = MESOR, PAMPLITUDE = AMPLITUDE, ... $ dataClasses: Named chr numeric ..- attr(*, names)= chr NEWX $ control:List of 2 ..$ maxiter : num 1000 ..$ warnOnly: logi TRUE - attr(*, class)= chr nls However, none of these elements appears to contain the residual sum of squares. Help! Thanks. Dennis Dennis Fisher MD P (The P Less Than Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to retrieve residual sum of squares from nls output
On Fri, Aug 13, 2010 at 5:21 PM, Dennis Fisher fis...@plessthan.com wrote: Colleagues, I am using nls successfully (2.11.1, OS X) but I am having difficulties retrieving part of the output - residual sum of squares. I have assigned the output to FIT: FIT Nonlinear regression model model: NEWY ~ PMESOR + PAMPLITUDE * cos(2 * pi * (NEWX - POFFSET)/PERIOD) data: parent.frame() PMESOR PAMPLITUDE POFFSET 1153.02 -1183.09 24.58 residual sum-of-squares: 1815056 Number of iterations to convergence: 8 Achieved convergence tolerance: 1.643e-08 As you can see, the object contains residual sum-of-squares: 1815056. However, I can't figure out how to grab that value. The tried using str: str(FIT) List of 6 $ m :List of 16 ..$ resid :function () ..$ fitted :function () ..$ formula :function () ..$ deviance :function () ..$ lhs :function () ..$ gradient :function () ..$ conv :function () ..$ incr :function () ..$ setVarying:function (vary = rep(TRUE, length(useParams))) ..$ setPars :function (newPars) ..$ getPars :function () ..$ getAllPars:function () ..$ getEnv :function () ..$ trace :function () ..$ Rmat :function () ..$ predict :function (newdata = list(), qr = FALSE) ..- attr(*, class)= chr nlsModel $ convInfo :List of 5 ..$ isConv : logi TRUE ..$ finIter : int 8 ..$ finTol : num 1.64e-08 ..$ stopCode : int 0 ..$ stopMessage: chr converged $ data : language parent.frame() $ call : language nls(formula = NEWY ~ PMESOR + PAMPLITUDE * cos(2 * pi * (NEWX - POFFSET)/PERIOD), start = list(PMESOR = MESOR, PAMPLITUDE = AMPLITUDE, ... $ dataClasses: Named chr numeric ..- attr(*, names)= chr NEWX $ control :List of 2 ..$ maxiter : num 1000 ..$ warnOnly: logi TRUE - attr(*, class)= chr nls However, none of these elements appears to contain the residual sum of squares. Help! Thanks. Try: sum(resid(FIT)^2) or FIT$m$deviance() __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to retrieve residual sum of squares from nls output
Hi, Dennis, Does this give what you want? sum(resid(FIT)^2) Jun Senior Pharmacokineticist Seventh Wave Labs On Fri, Aug 13, 2010 at 4:21 PM, Dennis Fisher fis...@plessthan.com wrote: Colleagues, I am using nls successfully (2.11.1, OS X) but I am having difficulties retrieving part of the output - residual sum of squares. I have assigned the output to FIT: FIT Nonlinear regression model model: NEWY ~ PMESOR + PAMPLITUDE * cos(2 * pi * (NEWX - POFFSET)/PERIOD) data: parent.frame() PMESOR PAMPLITUDE POFFSET 1153.02 -1183.09 24.58 residual sum-of-squares: 1815056 Number of iterations to convergence: 8 Achieved convergence tolerance: 1.643e-08 As you can see, the object contains residual sum-of-squares: 1815056. However, I can't figure out how to grab that value. The tried using str: str(FIT) List of 6 $ m :List of 16 ..$ resid :function () ..$ fitted :function () ..$ formula :function () ..$ deviance :function () ..$ lhs :function () ..$ gradient :function () ..$ conv :function () ..$ incr :function () ..$ setVarying:function (vary = rep(TRUE, length(useParams))) ..$ setPars :function (newPars) ..$ getPars :function () ..$ getAllPars:function () ..$ getEnv :function () ..$ trace :function () ..$ Rmat :function () ..$ predict :function (newdata = list(), qr = FALSE) ..- attr(*, class)= chr nlsModel $ convInfo :List of 5 ..$ isConv : logi TRUE ..$ finIter : int 8 ..$ finTol : num 1.64e-08 ..$ stopCode : int 0 ..$ stopMessage: chr converged $ data : language parent.frame() $ call : language nls(formula = NEWY ~ PMESOR + PAMPLITUDE * cos(2 * pi * (NEWX - POFFSET)/PERIOD), start = list(PMESOR = MESOR, PAMPLITUDE = AMPLITUDE, ... $ dataClasses: Named chr numeric ..- attr(*, names)= chr NEWX $ control :List of 2 ..$ maxiter : num 1000 ..$ warnOnly: logi TRUE - attr(*, class)= chr nls However, none of these elements appears to contain the residual sum of squares. Help! Thanks. Dennis Dennis Fisher MD P (The P Less Than Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to retrieve residual sum of squares from nls output
On Fri, Aug 13, 2010 at 5:32 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Fri, Aug 13, 2010 at 5:21 PM, Dennis Fisher fis...@plessthan.com wrote: Colleagues, I am using nls successfully (2.11.1, OS X) but I am having difficulties retrieving part of the output - residual sum of squares. I have assigned the output to FIT: FIT Nonlinear regression model model: NEWY ~ PMESOR + PAMPLITUDE * cos(2 * pi * (NEWX - POFFSET)/PERIOD) data: parent.frame() PMESOR PAMPLITUDE POFFSET 1153.02 -1183.09 24.58 residual sum-of-squares: 1815056 Number of iterations to convergence: 8 Achieved convergence tolerance: 1.643e-08 As you can see, the object contains residual sum-of-squares: 1815056. However, I can't figure out how to grab that value. The tried using str: str(FIT) List of 6 $ m :List of 16 ..$ resid :function () ..$ fitted :function () ..$ formula :function () ..$ deviance :function () ..$ lhs :function () ..$ gradient :function () ..$ conv :function () ..$ incr :function () ..$ setVarying:function (vary = rep(TRUE, length(useParams))) ..$ setPars :function (newPars) ..$ getPars :function () ..$ getAllPars:function () ..$ getEnv :function () ..$ trace :function () ..$ Rmat :function () ..$ predict :function (newdata = list(), qr = FALSE) ..- attr(*, class)= chr nlsModel $ convInfo :List of 5 ..$ isConv : logi TRUE ..$ finIter : int 8 ..$ finTol : num 1.64e-08 ..$ stopCode : int 0 ..$ stopMessage: chr converged $ data : language parent.frame() $ call : language nls(formula = NEWY ~ PMESOR + PAMPLITUDE * cos(2 * pi * (NEWX - POFFSET)/PERIOD), start = list(PMESOR = MESOR, PAMPLITUDE = AMPLITUDE, ... $ dataClasses: Named chr numeric ..- attr(*, names)= chr NEWX $ control :List of 2 ..$ maxiter : num 1000 ..$ warnOnly: logi TRUE - attr(*, class)= chr nls However, none of these elements appears to contain the residual sum of squares. Help! Thanks. Try: sum(resid(FIT)^2) or FIT$m$deviance() or deviance(FIT) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bug in t.test?
Hello all, due to unexplained differences between statistical results from collaborators and our lab that arose in the same large proteomics dataset we reevaluated the t.test() function. Here, we found a weird behaviour that is also reproducible in the following small test dataset: Suppose, we have two vectors with numbers and some missing values that refer to the same individuals and that should therefore be evaluated with a paired t-test: testdata.A - c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01); testdata.B - c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA); Then print(t.test(testdata.A, testdata.B, paired=TRUE, alternative=two.sided, na.action=na.pass)) and print(t.test(testdata.A, testdata.B, paired=TRUE, alternative=two.sided, na.action=na.exclude)) deliver the same p value (0.1162, identical to Excel's result). However, after combining the two vectors with testdata - c(testdata.A, testdata.B); and defining a criterion vector with criterion - c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1); (that is the type of data layout we have in our proteomics project) we get a different p-value (0.01453) with print(t.test(testdata ~ criterion, paired=TRUE, alternative=two.sided, na.action=na.exclude)) . The statement print(t.test(testdata ~ criterion, paired=TRUE, alternative=two.sided, na.action=na.pass)) however, delivers a p-value of 0.1162 again. With print(t.test(testdata[criterion==0], testdata[criterion==1], paired=TRUE, alternative=two.sided, na.action=na.exclude)) that imitates the first form, we get again a p value of 0.1162. What is the reason for the different p values? Should not all calls to t.test that exlude missing values be equivalent and therefore deliver the same results? Excel, StatView and KaleidaGraph all display a p-value of 0.1162. J. W. D. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Dr. Johannes W. Dietrich, M.D. -- Laboratory XU44, Endocrine Research -- Medical Hospital I, Bergmannsheil University Hospitals -- Ruhr University of Bochum -- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany -- Phone: +49:234:302-6400, Fax: +49:234:302-6403 -- eMail: j.w.dietr...@medical-cybernetics.de -- WWW: http://medical-cybernetics.de -- WWW: http://www.bergmannsheil.de -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice xyplots plots with multiple lines per cell
Hello, I need to plot the means of some outcome for two groups (control vs intervention) over time (discrete) on the same plot, for various subsets such as gender and grade level. What I have been doing is creating all possible subsets first, using the aggregate function to create the means over time, then plotting the means over time (as a simple line plot with both control intervention on one plot) for one subset. I then use par() and repeat this plot for each gender x grade level subset so they all appear on one page. This appears to me to be very similar to an xyplot, something like mean(outcome) ~ gender + gradelevel. However, I can't figure out how I could get both control and intervention lines in the same plot. Any suggestions? What i'm doing now -works-, but just seems to be the long way around. -Robin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing rJava fails on Gentoo (amd64) with Sun JDK - checking JNI data types... error
Installing rJava fails consistently, whether installed via the command line as below, or through install.packages( 'rJava' ), and whether 0.84 or 0.85 is used, with the message :- checking JNI data types... configure: error: One or more JNI types differ from the corresponding native type. You may need to use non-standard compiler flags or a different compiler in order to fix this. I've tried to fix it using suggestions from the r-help list, and the Gentoo java documentation, but had no joy so far. Other Java stuff works fine on this box. # R CMD INSTALL rJava_0.8-5.tar.gz * installing to library ‘/usr/lib64/R/library’ * installing *source* package ‘rJava’ ... checking for gcc... x86_64-pc-linux-gnu-gcc -std=gnu99 checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether x86_64-pc-linux-gnu-gcc -std=gnu99 accepts -g... yes checking for x86_64-pc-linux-gnu-gcc -std=gnu99 option to accept ISO C89... none needed checking how to run the C preprocessor... x86_64-pc-linux-gnu-gcc -std=gnu99 -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for sys/wait.h that is POSIX.1 compatible... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for string.h... (cached) yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking for unistd.h... (cached) yes checking for an ANSI C-conforming const... yes checking whether time.h and sys/time.h may both be included... yes configure: checking whether x86_64-pc-linux-gnu-gcc -std=gnu99 supports static inline... yes checking whether setjmp.h is POSIX.1 compatible... yes checking whether sigsetjmp is declared... yes checking whether siglongjmp is declared... yes checking Java support in R... present: interpreter : '/usr/bin/java' archiver: '/usr/bin/jar' compiler: '/home/astaines/.gentoo/java-config-2/current-user-vm/bin/javac' header prep.: '/usr/bin/javah' cpp flags : '-I/opt/sun-jdk-1.6.0.20/jre/../include -I/opt/sun-jdk-1.6.0.20/jre/../include/linux' java libs : '-L/opt/sun-jdk-1.6.0.20/jre/lib/amd64/server -L/opt/sun-jdk-1.6.0.20/jre/lib/amd64 -L/opt/sun-jdk-1.6.0.20/jre/../lib/amd64 -L -L/usr/java/packages/lib/amd64 -L/usr/lib64 -L/lib64 -L/lib -L/usr/lib -ljvm' checking whether JNI programs can be compiled... yes checking JNI data types... configure: error: One or more JNI types differ from the corresponding native type. You may need to use non-standard compiler flags or a different compiler in order to fix this. ERROR: configuration failed for package ‘rJava’ * removing ‘/usr/lib64/R/library/rJava’ # R --version R version 2.11.1 (2010-05-31) # java -showversion java version 1.6.0_20 Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) The compiler mentioned in the R output :- /home/astaines/.gentoo/java-config-2/current-user-vm is a link pointing to /usr/lib/jvm/sun-jdk-1.6 which in turn is a link pointing to /opt/sun-jdk-1.6.0.20 however, any of the three will work. # R CMD javareconf *** JAVA_HOME is not a valid path, ignoring Java interpreter : /usr/bin/java Java version : 1.6.0_20 Java home path : /opt/sun-jdk-1.6.0.20/jre Java compiler: /home/astaines/.gentoo/java-config-2/current-user-vm/bin/javac Java headers gen.: /usr/bin/javah Java archive tool: /usr/bin/jar Java library path: $(JAVA_HOME)/lib/amd64/server:$(JAVA_HOME)/lib/amd64:$(JAVA_HOME)/../lib/amd64::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib JNI linker flags : -L$(JAVA_HOME)/lib/amd64/server -L$(JAVA_HOME)/lib/amd64 -L$(JAVA_HOME)/../lib/amd64 -L -L/usr/java/packages/lib/amd64 -L/usr/lib64 -L/lib64 -L/lib -L/usr/lib -ljvm JNI cpp flags: -I$(JAVA_HOME)/../include -I$(JAVA_HOME)/../include/linux Updating Java configuration in /usr/lib64/R Done. JAVA_HOME is unset by default on my system. As suggested, by Godmar Back, on the list I've tried setting it. The Gentoo way is to use JAVA_HOME=$(java-config --jdk-home) and this produces what I think is the desired result #printenv | grep -i JAVA_HOME JAVA_HOME=/opt/sun-jdk-1.6.0.20 BUT # env JAVA_HOME=/opt/sun-jdk-1.6.0.20/ R CMD javareconf *** JAVA_HOME is not a valid path, ignoring Java interpreter : /usr/bin/java Java version : 1.6.0_20 Java home path : /opt/sun-jdk-1.6.0.20/jre Java compiler: /home/astaines/.gentoo/java-config-2/current-user-vm/bin/javac Java headers gen.:
Re: [R] Bug in t.test?
Thanks for the clear example. However, if there is a bug it is only that t.test.formula doesn't throw an error when given the paired=TRUE option. The documentation says The formula interface is only applicable for the 2-sample tests., but there probably should be an explicit check -- I didn't see that in the documentation until I realized that t.test.formula couldn't work for paired tests because you don't tell it which observations are paired. -thomas On Fri, 13 Aug 2010, Johannes W. Dietrich wrote: Hello all, due to unexplained differences between statistical results from collaborators and our lab that arose in the same large proteomics dataset we reevaluated the t.test() function. Here, we found a weird behaviour that is also reproducible in the following small test dataset: Suppose, we have two vectors with numbers and some missing values that refer to the same individuals and that should therefore be evaluated with a paired t-test: testdata.A - c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01); testdata.B - c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA); Then print(t.test(testdata.A, testdata.B, paired=TRUE, alternative=two.sided, na.action=na.pass)) and print(t.test(testdata.A, testdata.B, paired=TRUE, alternative=two.sided, na.action=na.exclude)) deliver the same p value (0.1162, identical to Excel's result). However, after combining the two vectors with testdata - c(testdata.A, testdata.B); and defining a criterion vector with criterion - c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1); (that is the type of data layout we have in our proteomics project) we get a different p-value (0.01453) with print(t.test(testdata ~ criterion, paired=TRUE, alternative=two.sided, na.action=na.exclude)) . The statement print(t.test(testdata ~ criterion, paired=TRUE, alternative=two.sided, na.action=na.pass)) however, delivers a p-value of 0.1162 again. With print(t.test(testdata[criterion==0], testdata[criterion==1], paired=TRUE, alternative=two.sided, na.action=na.exclude)) that imitates the first form, we get again a p value of 0.1162. What is the reason for the different p values? Should not all calls to t.test that exlude missing values be equivalent and therefore deliver the same results? Excel, StatView and KaleidaGraph all display a p-value of 0.1162. J. W. D. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Dr. Johannes W. Dietrich, M.D. -- Laboratory XU44, Endocrine Research -- Medical Hospital I, Bergmannsheil University Hospitals -- Ruhr University of Bochum -- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany -- Phone: +49:234:302-6400, Fax: +49:234:302-6403 -- eMail: j.w.dietr...@medical-cybernetics.de -- WWW: http://medical-cybernetics.de -- WWW: http://www.bergmannsheil.de -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Professor of Biostatistics University of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in t.test?
Thank you for the fast reply! Although I have read the help page for t.test over and over again I have obviously overlooked the relevant sentence. The workaround that I have planned seems to be the correct use. Thanks again, J. W. D. At 15:31 Uhr -0700 13.08.2010, Thomas Lumley wrote: Thanks for the clear example. However, if there is a bug it is only that t.test.formula doesn't throw an error when given the paired=TRUE option. The documentation says The formula interface is only applicable for the 2-sample tests., but there probably should be an explicit check -- I didn't see that in the documentation until I realized that t.test.formula couldn't work for paired tests because you don't tell it which observations are paired. -thomas On Fri, 13 Aug 2010, Johannes W. Dietrich wrote: Hello all, due to unexplained differences between statistical results from collaborators and our lab that arose in the same large proteomics dataset we reevaluated the t.test() function. Here, we found a weird behaviour that is also reproducible in the following small test dataset: Suppose, we have two vectors with numbers and some missing values that refer to the same individuals and that should therefore be evaluated with a paired t-test: testdata.A - c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01); testdata.B - c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA); Then print(t.test(testdata.A, testdata.B, paired=TRUE, alternative=two.sided, na.action=na.pass)) and print(t.test(testdata.A, testdata.B, paired=TRUE, alternative=two.sided, na.action=na.exclude)) deliver the same p value (0.1162, identical to Excel's result). However, after combining the two vectors with testdata - c(testdata.A, testdata.B); and defining a criterion vector with criterion - c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1); (that is the type of data layout we have in our proteomics project) we get a different p-value (0.01453) with print(t.test(testdata ~ criterion, paired=TRUE, alternative=two.sided, na.action=na.exclude)) . The statement print(t.test(testdata ~ criterion, paired=TRUE, alternative=two.sided, na.action=na.pass)) however, delivers a p-value of 0.1162 again. With print(t.test(testdata[criterion==0], testdata[criterion==1], paired=TRUE, alternative=two.sided, na.action=na.exclude)) that imitates the first form, we get again a p value of 0.1162. What is the reason for the different p values? Should not all calls to t.test that exlude missing values be equivalent and therefore deliver the same results? Excel, StatView and KaleidaGraph all display a p-value of 0.1162. J. W. D. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Dr. Johannes W. Dietrich, M.D. -- Laboratory XU44, Endocrine Research -- Medical Hospital I, Bergmannsheil University Hospitals -- Ruhr University of Bochum -- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany -- Phone: +49:234:302-6400, Fax: +49:234:302-6403 -- eMail: j.w.dietr...@medical-cybernetics.de -- WWW: http://medical-cybernetics.de -- WWW: http://www.bergmannsheil.de -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Professor of Biostatistics University of Washington, Seattle -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Dr. Johannes W. Dietrich, M.D. -- Laboratory XU44, Endocrine Research -- Medical Hospital I, Bergmannsheil University Hospitals -- Ruhr University of Bochum -- Buerkle-de-la-Camp-Platz 1, D-44789 Bochum, NRW, Germany -- Phone: +49:234:302-6400, Fax: +49:234:302-6403 -- eMail: j.w.dietr...@medical-cybernetics.de -- WWW: http://medical-cybernetics.de -- WWW: http://www.bergmannsheil.de -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function in R?
I think it would be helpful if you could clarify youre question - do you want distinct sets - maybe use unique() but why (5,20) when its (5,10) in the row in youre example? What criteria do you want the function to select the sets by and what kind of output do you need? Maybe it's just me who dosn't get the question..sr -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324844.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function in R?
I too think I worded it incorrectly... so the second two columns of the matrix are the start and end of an interval however, because some of the intervals overlap, I want to limit the number of intervals I have to deal with. So therefore, (5 10)should merge with(7 18) making(5 18) and then (518) should merge with (1620) giving (520) whereas (1 4) has no overlap with any other interval and is therefore left on its own Ideal output would just be a collapsing of the matrix sample start end # 5 20 # 14 I got this to work using unique(c(5:10,7:18,16:20,1:4)) which gives me a c(1:4,5:20) However, I have to do this on a very large dataset and the numbers are more like c(100542:100782,598322:598821,...) any help would be appreciated thanks -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324855.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] decision tree finetune
I figured it out myself, here it is: control=rpart.control(cp=.001)) Thank you! On Fri, Aug 13, 2010 at 12:58 PM, Olga Shaganova olga.sha...@gmail.comwrote: My decision tree grows only with one split and based on what I see in E-Miner it should split on more variables. How can I adjust splitting criteria in R? Also is there way to indicate that some variables are binary, like variable Info_G is binary so in the results would be nice to see 2) Info_G=0 instead of 2) Info_G0.5. Thank you in advance! And thanks for Eric who helped with my previous question about starting rpart. Olga fit - rpart(Retention ~ Info_G+AOPD+Mail+Xref_Umbr+Ins_Age+Discount+Xref_A + Con6 + + Con5 +Con4 + + Con3 + Con2 + + Con1 , data=Home,control=rpart.control(minsplit=5)) fit n= 48407 node), split, n, deviance, yval * denotes terminal node 1) root 48407 4730.642 0.8902225 2) Info_G 0.5 14280 1999.293 0.8316527 * 3) Info_G=0.5 34127 2661.865 0.9147303 * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function in R?
Neither you nor your responder have continued the eamil chain very well so let me put things back together: on Aug 13, 2010; 03:54pm fishkbob wrote subj = merge function in R? So I have a bunch of c(start,end) points and want to consolidate them into as few c(start,end) as possible. For example: sample startend A 5 10 B 7 18 C 14 D 16 20 I'd want the function to return the two distinct sets (1,4) and (5,20) Is there an R function that already does this? or should I write my own? (how would I go about that?) In an effort to be be helpful but not copying the prior message on Aug 13, 2010; 06:46pm JesperHybel wrote: I think it would be helpful if you could clarify youre question - do you want distinct sets - maybe use unique() but why (5,20) when its (5,10) in the row in youre example? What criteria do you want the function to select the sets by and what kind of output do you need? Maybe it's just me who dosn't get the question..sr On Aug 13, 2010, at 7:01 PM, fishkbob wrote: I too think I worded it incorrectly... so the second two columns of the matrix are the start and end of an interval however, because some of the intervals overlap, I want to limit the number of intervals I have to deal with. So therefore, (5 10)should merge with(7 18) making(5 18) and then (518) should merge with (1620) giving (520) whereas (1 4) has no overlap with any other interval and is therefore left on its own Ideal output would just be a collapsing of the matrix sample start end # 5 20 # 14 I got this to work using unique(c(5:10,7:18,16:20,1:4)) which gives me a c(1:4,5:20) However, I have to do this on a very large dataset and the numbers are more like c(100542:100782,598322:598821,...) any help would be appreciated thanks -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324855.html Sent from the R help mailing list archive at Nabble.com. Nabble is where I saw all of this, but Nabble is not r-help: I suggest you sort your rows by the start variable and then examine where the breaks would remain by looking at the prior values of end: dd - rd.txt(sample startend + A 5 10 + B 7 18 + C 14 + D 16 20) dd[order(dd$start), ] sample start end 3 C 1 4 1 A 5 10 2 B 7 18 4 D16 20 ndd - dd[order(dd$start), ] ndd$inprior - c(NA, ndd[1:nrow(ndd)-1,3] = ndd[2:nrow(ndd),2] ) ndd sample start end inprior 3 C 1 4 NA 1 A 5 10 FALSE 2 B 7 18TRUE 4 D16 20TRUE -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kalman filter
Other terms for Kalman filtering, prediction and smoothing are state space modeling and dynamic linear models. Consider the following extension of Ben Bolker's suggestion to use the 'sos' package: library(sos) k - ???Kalman ss - findFn('state space') dlm - findFn('dynamic linear model') dlms - findFn('dynamic linear models') # NOT necessarily a subset of dlm ksd - (k | ss | dlm | dlms) # union of the above 4 installPackages(ksd) # Install the primary packages to get addition info for writeFindFn2xls writeFindFn2xls(ksd) getwd() # directory containing ksd.xls The first sheet of ksd.xls provides a one-line summary of the packages including date last updated, any vignettes, etc. This file says that ksd found 31 matches in the dse package by Paul Gilbert, who has been a major contributor in this area for many years, using his own code at the Bank of Canada. I have not used the second and third packages, expsmooth and MARSS, so I can't speak about them. However, the next package is dlm, for which a companion book appeared just over a year ago; I've found it very powerful and relatively easy to use. If your response variables are something other than the a linear combination of components of the state vector plus normal noise, then I suggest you consider the sspir package, which is number 6 on this list in terms of the number of help pages matching the search terms. Hope this helps. Spencer Graves On 8/13/2010 2:06 PM, Bruno Mourato wrote: Take a look on this Packages: - KFTRACK - UKFSST - TRACKIT 2010/8/13 FMHkagba2...@yahoo.com Dear All, Could anyone give me a hand to suggest few packages in R to running Kalman prediction and filtration ? Thanks Fir __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug in t.test?
Hi Thomas, I'm not too sure about your interpretation. Consider: set.seed(54321) X - rnorm(10) ; Y - rnorm(10) XY - c(X,Y); group-c(rep(0,10),rep(1,10)) t.test(X,Y,paired=TRUE) # Paired t-test # data: X and Y # t = -1.5265, df = 9, p-value = 0.1612 # 95 percent confidence interval: # -2.1521562 0.4178885 # sample estimates: mean of the differences # -0.8671339 t.test(XY~group,paired=TRUE) # Paired t-test # data: XY by group # t = -1.5265, df = 9, p-value = 0.1612 # 95 percent confidence interval: # -2.1521562 0.4178885 # sample estimates: mean of the differences # -0.8671339 t.test(X,Y,paired=FALSE) # Welch Two Sample t-test # data: X and Y # t = -1.4865, df = 17.956, p-value = 0.1545 # 95 percent confidence interval: # -2.0928836 0.3586159 # sample estimates: mean of x mean of y # -0.2646042 0.6025297 t.test(XY~group,paired=FALSE) # Welch Two Sample t-test # data: XY by group # t = -1.4865, df = 17.956, p-value = 0.1545 # 95 percent confidence interval: # -2.0928836 0.3586159 # sample estimates: mean in group 0 mean in group 1 # -0.2646042 0.6025297 So in each case t.test(XY~group,...) has done the same as the corresponding t.test(X,Y,...); thus it would not seem that The formula interface is only applicable for the 2-sample tests excludes specifying the two samples by a formula with a 2-level factor; and it would seem that the pairing has been correctly carried out (i.e. in the formula case t.test assumes that it is by sequential position within the respective group, without having to be told as you suggest). In other words, it acts as though giving a formula with a 2-group factor on the right, and equal numbers in the two groups, is equivalent to giving the two samples separately. Johannes' original query was about differences when there are NAs, corresponding to different settings of na.action. It is perhaps possible that 'na.action=na.pass' and 'na.action=na.exclude' result in different pairings in the case paired=TRUE. However, it seems to me that the differences he observed are, shall we say, obscure! Ted. On 13-Aug-10 22:31:45, Thomas Lumley wrote: Thanks for the clear example. However, if there is a bug it is only that t.test.formula doesn't throw an error when given the paired=TRUE option. The documentation says The formula interface is only applicable for the 2-sample tests., but there probably should be an explicit check -- I didn't see that in the documentation until I realized that t.test.formula couldn't work for paired tests because you don't tell it which observations are paired. -thomas On Fri, 13 Aug 2010, Johannes W. Dietrich wrote: Hello all, due to unexplained differences between statistical results from collaborators and our lab that arose in the same large proteomics dataset we reevaluated the t.test() function. Here, we found a weird behaviour that is also reproducible in the following small test dataset: Suppose, we have two vectors with numbers and some missing values that refer to the same individuals and that should therefore be evaluated with a paired t-test: testdata.A - c(1.15, -0.2, NA, 1, -2, -0.5, 0.1, 1.2, -1.4, 0.01); testdata.B - c(1.2, 1.1, 3, -0.1, 3, 1.1, 0, 1.3, 4, NA); Then print(t.test(testdata.A, testdata.B, paired=TRUE, alternative=two.sided, na.action=na.pass)) and print(t.test(testdata.A, testdata.B, paired=TRUE, alternative=two.sided, na.action=na.exclude)) deliver the same p value (0.1162, identical to Excel's result). However, after combining the two vectors with testdata - c(testdata.A, testdata.B); and defining a criterion vector with criterion - c(0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1); (that is the type of data layout we have in our proteomics project) we get a different p-value (0.01453) with print(t.test(testdata ~ criterion, paired=TRUE, alternative=two.sided, na.action=na.exclude)) . The statement print(t.test(testdata ~ criterion, paired=TRUE, alternative=two.sided, na.action=na.pass)) however, delivers a p-value of 0.1162 again. With print(t.test(testdata[criterion==0], testdata[criterion==1], paired=TRUE, alternative=two.sided, na.action=na.exclude)) that imitates the first form, we get again a p value of 0.1162. What is the reason for the different p values? Should not all calls to t.test that exlude missing values be equivalent and therefore deliver the same results? Excel, StatView and KaleidaGraph all display a p-value of 0.1162. J. W. D. -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- Dr. Johannes W. Dietrich, M.D. -- Laboratory XU44, Endocrine Research -- Medical Hospital I, Bergmannsheil University Hospitals -- Ruhr University of Bochum -- Buerkle-de-la-Camp-Platz 1,
Re: [R] Bug in t.test?
(Ted Harding) wrote: Johannes' original query was about differences when there are NAs, corresponding to different settings of na.action. It is perhaps possible that 'na.action=na.pass' and 'na.action=na.exclude' result in different pairings in the case paired=TRUE. However, it seems to me that the differences he observed are, shall we say, obscure! Did you miss the point that the incorrect behaviour with na.exclude is the default one? (Actually, na.omit is, but the effect is the same). Arguably, t.test could just hardcode na.action=na.pass and begone with it. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to add lines to lattice plot produced by rms::bplot
I have a plot produced by function bplot (package = rms) that is really a lattice plot (class=trellis). It is similar to this plot produced by a very minor modification of the first example on the bplot help page: requiere(rms) n - 1000# define sample size set.seed(17) # so can reproduce the results age- rnorm(n, 50, 10) blood.pressure - rnorm(n, 120, 15) cholesterol- rnorm(n, 200, 25) sex- factor(sample(c('female','male'), n,TRUE)) label(age)- 'Age' # label is in Hmisc label(cholesterol)- 'Total Cholesterol' label(blood.pressure) - 'Systolic Blood Pressure' label(sex)- 'Sex' units(cholesterol)- 'mg/dl' # uses units.default in Hmisc units(blood.pressure) - 'mmHg' # Specify population model for log odds that Y=1 L - .4*(sex=='male') + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y - ifelse(runif(n) plogis(L), 1, 0) ddist - datadist(age, blood.pressure, cholesterol, sex) options(datadist='ddist') fit - lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)), x=TRUE, y=TRUE) p - Predict(fit, age, cholesterol, sex='male', np=50) # vary sex last bp.plot - bplot(p, lfun=contourplot) bp.plot I have tried a variety of efforts at using update (which I assume is a lattice function although I can find no help page for it. It does appear in some of the lattice hep pages and my understanding is that it pushes objects onto the list structure of a plot object. I've also tried adding to it with llines() #- Oh, never mind. I recovered a memory that I had seen a solution on rhelp and had saved it. Turns out it was from Peter Ehlers, to whom I offer thanks. I was trying to add a step function: ht and weight from a dataframe, bld: trellis.focus(panel, 1, 1) panel.lines(x=bld$inches, y=bld$BMI28, type='s') trellis.unfocus() Success! Now... how do I control the color levels in levelplot or contourplot??? David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Odp: help
Please,when i export the output from R to excel.I am not getting all the 15,000 observations but only 2000.Thank you -- View this message in context: http://r.789695.n4.nabble.com/help-tp2323542p2324459.html Sent from the Export many data to Excel 2007 mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R execution
So I have a for loop for (i in 1 : 5) { print(i) } and I want it to print 1, 2, 3, 4, 5 sequentially for each loop iteration. Instead, it waits for the whole loop to finish and then prints. It's strange because my R console seems to have two modes of execution. One where it will iterate for each loop and print the message (and where I can easily abort an R program with the escape key), and another mode where it doesn't print for each loop iteration/just keeps executing even if I press escape key. I can't figure out how to enable the first way of execution! Can someone shed some light on this matter? -- View this message in context: http://r.789695.n4.nabble.com/R-execution-tp2324852p2324852.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to eliminate an element of a list
list-seq(2,10,2) list [1] 2 4 6 8 10 list[-which(2==list)] [1] 4 6 8 10 using the which() will let you remove things from a list that have a specified value... I usually use the blah- blah[-which(TRUE==is.na(blah)) ] which will remove all NA values in your list -- View this message in context: http://r.789695.n4.nabble.com/how-to-eliminate-an-element-of-a-list-tp2323329p2324696.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] hypergeometric vs fisher.test
or an incompetent Such harsh words Peter. You're not making this a friendly environment for people to ask questions. Is there a less competent attribute mailing list so that some of us don't offend you with our questions? On Aug 13, 2010, at 2:11 PM, Peter Dalgaard wrote: Andrea Franceschini wrote: I ask the question also because I found this line in Wikipedia: The test (see above) based on the hypergeometric distribution (hypergeometric test) is identical to the corresponding one-tailed version of Fisher's exact test. Is this wrong ? May I kindly ask a friendly explanation for not-experts in statistics ? Thx a lot, You never said you were a non-expert. A question like that might very well have come from a student, or an incompetent claiming to have found a bug in fisher.test... The point was that Fisher's test takes the 2x2 table a b c d and interprets the situation under the null hypothesis as taking a sample of size a+c from an urn with a+b white balls and c+d black balls. Once you read the docs for phyper properly (it _is_ tricky to get it right), you arrive at phyper(16,17+181,449+19551,17+449, lower.tail=FALSE) [1] 3.693347e-06 -- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merge function in R?
So I have a bunch of c(start,end) points and want to consolidate them into as few c(start,end) as possible. For example: sample startend A 5 10 B 7 18 C 14 D 16 20 I'd want the function to return the two distinct sets (1,4) and (5,20) Is there an R function that already does this? or should I write my own? (how would I go about that?) -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324684.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to eliminate an element of a list
Try to do this list - seq(1,5,1) list-list[-3] list [1] 1 2 4 5 list[-x] will remove the xth value in your list. Also you can do something like list[-c(1:4)] [1] 5 To remove values at indexes 1-4 list[-c(1,4)] [1] 2 3 5 To remove values at indexes 1 and 4 -- View this message in context: http://r.789695.n4.nabble.com/how-to-eliminate-an-element-of-a-list-tp2323329p2324691.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Learning ANOVA
- Original Message From: Liaw, Andy andy_l...@merck.com To: Stephen Liu sati...@yahoo.com; r-help@r-project.org Sent: Sat, August 14, 2010 12:48:33 AM Subject: RE: [R] Learning ANOVA Note the names of the variables here. They don't match what you tried to use in your boxplot() call below. Where did you get the idea that there are DO and Stream in the test01 data frame? Hi Andy, I followed following video to proceed; ANOVA in R http://www.youtube.com/watch?v=Dwd3ha0P8uwfeature=related I fixed it with; boxplot(test01$count ~ test01$spray) Continued: InsectSprays.aov -(test01$count ~ test01$spray) summary(InsectSprays.aov) Length ClassMode 3 formulacall Seems having problem here. TukeyHSD(InsectSprays.aov) Error in UseMethod(TukeyHSD) : no applicable method for 'TukeyHSD' applied to an object of class formula I was still stuck here. B.R. Stephen L - Original Message From: Liaw, Andy andy_l...@merck.com To: Stephen Liu sati...@yahoo.com; r-help@r-project.org Sent: Sat, August 14, 2010 12:48:33 AM Subject: RE: [R] Learning ANOVA From: Stephen Liu Hi folks, R on Ubuntu 10.04 64 bit. Performed following steps on R:- ### to access to the object data(InsectSprays) ### create a .csv file write.csv(InsectSprays, InsectSpraysCopy.csv) On another terminal $ sudo updatedb $ locate InsectSpraysCopy.csv /home/userA/InsectSpraysCopy.csv ### Read in some data test01 - read.csv(file.choose(), header=TRUE) Enter file name: /home/userA/InsectSpraysCopy.csv ### Look at the data test01 X count spray 1 110 A [snipped] Note the names of the variables here. They don't match what you tried to use in your boxplot() call below. Where did you get the idea that there are DO and Stream in the test01 data frame? Andy ### Create a side-by-side boxplot of the data boxplot(test01$DO ~ test01$Stream) Error in model.frame.default(formula = test01$DO ~ test01$Stream) : invalid type (NULL) for variable 'test01$DO' I was stucked here. Pls help. TIA B.R. Stephen L - Original Message From: Stephen Liu sati...@yahoo.com To: r-help@r-project.org Sent: Fri, August 13, 2010 11:34:31 AM Subject: [R] Learning ANOVA Hi folks, File to be used is on; data(InsectSprays) I can't figure out where to insert it on following command; test01 - read.csv(fil.choose(), header=TRUE) Please help. TIA B.R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Polygon Graph in lattice/ lattice extra
OK, I've added a 'horizontal' argument to panel.xyarea(), which is consistent with the meaning in panel.xyplot(). It is available from R-forge via SVN now, or as a built package within a day or 2 probably. Questions like this are best sent to package maintainers directly, I think. Regards # Felix On 13 August 2010 22:59, Sebastian wurster.sebast...@googlemail.com wrote: Hi, I'd like to draw a polygon graph. I used the package lattice extra which includes a function for that (see: http://latticeextra.r-forge.r-project.org/#panel.xyareatheme=default). But i don't want the polygon ending with its filled border at the x-axis (like in my code). Instead it should be rotated about 90 degrees and end with its filled border at the y-axis. I'm pleased about any suggestions or ideas. Thanks, Sebastian Here's the code: library(latticeExtra) data - as.data.frame(cbind(c(1,2,3,4,1,2,3,4),c(1,1,1,1,2,2,2,2),c(2,3.8,3,3.5,2.2,4,3.2,3.5))) x - data$V1 y - data$V3 groups - data$V2 xyplot(y ~ x, data, groups = groups, type='l', par.settings = simpleTheme(col = c(grey, rgb(166,27,30,maxColorValue = 255)), lwd = c(5,2)), superpose = TRUE, panel = panel.superpose, panel.groups = function(..., group.number) { if (group.number == 1) panel.xyarea(...) else panel.xyplot(...) panel.grid(h=FALSE, v=-1, col.line=grey) }, auto.key=list(columns=2, space=bottom, cex=0.8, size=1.3, adj=1, between=0.2, between.colums=0.1, points = FALSE, rectangles = TRUE)) -- View this message in context: http://r.789695.n4.nabble.com/Polygon-Graph-in-lattice-lattice-extra-tp2324156p2324156.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Felix Andrews / 安福立 http://www.neurofractal.org/felix/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] detecting a key press
Hi Folks, I'm relatively new to r. I'd like to have a user respond by pressing a 1 or a 2 and determining their choice and the time of response. Previous postings have indicated that keyboard responses can be processed using scan and readline but both seem to wait for the user to also press return. Is there a way to detect the initial key press without requiring them to hit return? Thank you so much. Martin H. Teicher Department of Psychiatry McLean Hospital / Harvard Medical School Belmont, MA USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] different outcomes of P values in SPSS and R
Leo Vorthoren L.Vorthoren at nioo.knaw.nl writes: I have been using generalized linear models in SPSS 18, in order to build models and to calculate the P values. When I was building models in Excel (using the intercept and Bs from SPSS), I noticed that the graphs differed from my expectations. When I ran the dataset again in R, I got totally different outcomes for both the P values as well as the Bs and the intercepts. The outcomes of R seem much more likely to be the correct ones, but I really cannot explain the differences. I appreciate/assume that you're asking on the off chance that someone else has tried something very similar and gone to the trouble of figuring out the differences between R's and SPSS's default setup, but you're unlikely to get an answer without more detailed information. My best guess is that SPSS and R are using different contrasts and/or different baseline levels. R uses treatment contrasts by default, and assumes that the first (alphabetical) level of a factor is the baseline level. It's conceivable that you have a dataset where the results are numerically unstable and sensitive to small details in the algorithms used. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] what is scaling in plot
Dear List, Pls kindly advise what scaling is in plot. Sometime it could be negative but sometimes it might be positive .(I guess it is the proportion between the plot and the margin) Thanks Elaine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] discerning species by color in cca biplot
Dear List, I am running constrained correspondence analysis for abundance data of 7 birds. However, I would like to check which bird prefers which environment gradient by showing the species with different colors of the dots in cca plot (package vegan). Please kindly help and thank you Elaine [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] detecting a key press
Martin Teicher martin_teicher at hms.harvard.edu writes: I'm relatively new to r. I'd like to have a user respond by pressing a 1 or a 2 and determining their choice and the time of response. Previous postings have indicated that keyboard responses can be processed using scan and readline but both seem to wait for the user to also press return. Is there a way to detect the initial key press without requiring them to hit return? Probably not -- probably depends a lot on platform and interface too (i.e. in a command-line interface, interaction with the keyboard is likely to be handled by the OS rather than by R itself). If you're on Windows you could look at ?getGraphicsEvent (which will see keyboard responses, not from the console, but from the graphics window). Alternatively, you might be able to put something together with the tcltk package ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] what is scaling in plot
elaine kuo elaine.kuo.tw at gmail.com writes: Pls kindly advise what scaling is in plot. Sometime it could be negative but sometimes it might be positive .(I guess it is the proportion between the plot and the margin) Your question is unclear. Please give more context and/or details. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] different outcomes of P values in SPSS and R
R usesType I sequential SS, not the default Type III marginal SS reported by SPSS. There is a good blog post explaining this difference along with some interesting comments -- http://myowelt.blogspot.com/2008/05/obtaining-same-anova-results-in-r-as-in.html Best Wishes, Martin H. Teicher Dept of Psychiatry McLean Hospital / Harvard Medical School Belmont MA 02478 On Aug 13, 2010, at 10:32 PM, Ben Bolker wrote: Leo Vorthoren L.Vorthoren at nioo.knaw.nl writes: I have been using generalized linear models in SPSS 18, in order to build models and to calculate the P values. When I was building models in Excel (using the intercept and Bs from SPSS), I noticed that the graphs differed from my expectations. When I ran the dataset again in R, I got totally different outcomes for both the P values as well as the Bs and the intercepts. The outcomes of R seem much more likely to be the correct ones, but I really cannot explain the differences. I appreciate/assume that you're asking on the off chance that someone else has tried something very similar and gone to the trouble of figuring out the differences between R's and SPSS's default setup, but you're unlikely to get an answer without more detailed information. My best guess is that SPSS and R are using different contrasts and/or different baseline levels. R uses treatment contrasts by default, and assumes that the first (alphabetical) level of a factor is the baseline level. It's conceivable that you have a dataset where the results are numerically unstable and sensitive to small details in the algorithms used. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] different outcomes of P values in SPSS and R
Yes, but ... the original poster said the coefficients differed too. (The blog post you refer to deals with ANOVA (i.e. linear models) rather than GLMs (generalized linear models): it is true that the sequential/marginal distinction still applies, but I don't think that can be the *only* thing going on here.) On Fri, Aug 13, 2010 at 10:50 PM, Martin Teicher martin_teic...@hms.harvard.edu wrote: R usesType I sequential SS, not the default Type III marginal SS reported by SPSS. There is a good blog post explaining this difference along with some interesting comments -- http://myowelt.blogspot.com/2008/05/obtaining-same-anova-results-in-r-as-in.html Best Wishes, Martin H. Teicher Dept of Psychiatry McLean Hospital / Harvard Medical School Belmont MA 02478 On Aug 13, 2010, at 10:32 PM, Ben Bolker wrote: Leo Vorthoren L.Vorthoren at nioo.knaw.nl writes: I have been using generalized linear models in SPSS 18, in order to build models and to calculate the P values. When I was building models in Excel (using the intercept and Bs from SPSS), I noticed that the graphs differed from my expectations. When I ran the dataset again in R, I got totally different outcomes for both the P values as well as the Bs and the intercepts. The outcomes of R seem much more likely to be the correct ones, but I really cannot explain the differences. I appreciate/assume that you're asking on the off chance that someone else has tried something very similar and gone to the trouble of figuring out the differences between R's and SPSS's default setup, but you're unlikely to get an answer without more detailed information. My best guess is that SPSS and R are using different contrasts and/or different baseline levels. R uses treatment contrasts by default, and assumes that the first (alphabetical) level of a factor is the baseline level. It's conceivable that you have a dataset where the results are numerically unstable and sensitive to small details in the algorithms used. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Learning ANOVA
Hi JesperHybel, Thanks for your advice. If you're trying to follow the youtube video you have a typing mistake here: InsectSprays.aov -(test01$count ~ test01$spray) I think this should be: InsectSprays.aov -aov(test01$count ~ test01$spray) Your advice works for me. Sorry I missed aov before(test01$count ~ test01$spray) InsectSprays.aov - aov(test01$count ~ test01$spray) summary(InsectSprays.aov) Df Sum Sq Mean Sq F valuePr(F) test01$spray 5 2668.8 533.77 34.702 2.2e-16 *** Residuals66 1015.2 15.38 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 TukeyHSD(InsectSprays.aov) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = test01$count ~ test01$spray) $`test01$spray` difflwr upr p adj B-A 0.833 -3.866075 5.532742 0.9951810 C-A -12.417 -17.116075 -7.717258 0.000 D-A -9.583 -14.282742 -4.883925 0.014 E-A -11.000 -15.699409 -6.300591 0.000 F-A 2.167 -2.532742 6.866075 0.7542147 C-B -13.250 -17.949409 -8.550591 0.000 D-B -10.417 -15.116075 -5.717258 0.002 E-B -11.833 -16.532742 -7.133925 0.000 F-B 1.333 -3.366075 6.032742 0.9603075 D-C 2.833 -1.866075 7.532742 0.4920707 E-C 1.417 -3.282742 6.116075 0.9488669 F-C 14.583 9.883925 19.282742 0.000 E-D -1.417 -6.116075 3.282742 0.9488669 F-D 11.750 7.050591 16.449409 0.000 F-E 13.167 8.467258 17.866075 0.000 I made a comparison of my result with example(InsectSprays). They looks the same. I also compared plot(InsectSprays.aov). A further question how to plot 4 graphs simultaneously? Instead of reading them, individually. I read ?plot but unable to resolve. Also how to save InsectSprays.aov? I think I can only save it as InsectSprays.csv. I can't find write.aov command. Thanks TIA B.R. satimis - Original Message From: JesperHybel jesperhy...@hotmail.com To: r-help@r-project.org Sent: Sat, August 14, 2010 2:09:48 AM Subject: Re: [R] Learning ANOVA If you're trying to follow the youtube video you have a typing mistake here: InsectSprays.aov -(test01$count ~ test01$spray) I think this should be: InsectSprays.aov -aov(test01$count ~ test01$spray) youre missing the functioncall aov on the right hand side of the assignment operator '-'. the results of the application of function aov() is stored in InsectSprays.aov and is accessed through summary(InsectSprays.aov) since you missed the functioncall you cannot apply TukeyHSD() to InsectSprays.aov I think -- View this message in context: http://r.789695.n4.nabble.com/Learning-ANOVA-tp2323660p2324590.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to add lines to lattice plot produced by rms::bplot
Hi David I do not know if you have done something like this. I tried str(bp.plot) which gave the section about the regions (for colours) as: $ panel.args.common:List of 8 ..$ x : num [1:2500] 27 28 29 29.9 30.9 ... ..$ y : num [1:2500] 141 141 141 141 141 ... ..$ z : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ... ..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 ..$ region : logi FALSE ..$ zlab :List of 3 .. ..$ label: chr log odds .. ..$ rot : num 90 .. ..$ cex : num 1 ..$ labels : logi TRUE ..$ contour: logi TRUE I added the col.region and colours from a plot levelplot that I had done to see what would occur to the trellis parameters. No colours were produced when plotted. bp.plot - bplot(p, lfun=contourplot, color.key = TRUE, col.regions = c(#FF,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) ) $ panel.args.common:List of 10 ..$ x : num [1:2500] 27 28 29 29.9 30.9 ... ..$ y : num [1:2500] 141 141 141 141 141 ... ..$ z : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ... ..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 ..$ region : logi FALSE ..$ color.key : logi TRUE ..$ zlab :List of 3 .. ..$ label: chr log odds .. ..$ rot : num 90 .. ..$ cex : num 1 ..$ labels : logi TRUE ..$ contour: logi TRUE ..$ col.regions: chr [1:7] #FF #00 #A9E2FF #8080FF ... So it has been added to the panel.args.common, whether you can access these are another matter. I then tried bp.plot - bplot(p, lfun=contourplot, par.settings = list(axis.text = list(cex = 0.65)), color.key = TRUE, col.regions = c(#FF,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) ) which changed the size of the axis text so it may be the case of having to add the col.regions etc to the appropriate list in par.settings I'll leave you to amend and access the colours. You may have to add values for the wireframe/levelplot arguments like at etc. and col.regions (I think that is the function) to produce an appropriate colour range of your choice It is a while since I have delved into these sorts of plots. Need some sustenance. Regards Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email home: mac...@northnet.com.au At 10:33 14/08/2010, you wrote: I have a plot produced by function bplot (package = rms) that is really a lattice plot (class=trellis). It is similar to this plot produced by a very minor modification of the first example on the bplot help page: require(rms) n - 1000# define sample size set.seed(17) # so can reproduce the results age- rnorm(n, 50, 10) blood.pressure - rnorm(n, 120, 15) cholesterol- rnorm(n, 200, 25) sex- factor(sample(c('female','male'), n,TRUE)) label(age)- 'Age' # label is in Hmisc label(cholesterol)- 'Total Cholesterol' label(blood.pressure) - 'Systolic Blood Pressure' label(sex)- 'Sex' units(cholesterol)- 'mg/dl' # uses units.default in Hmisc units(blood.pressure) - 'mmHg' # Specify population model for log odds that Y=1 L - .4*(sex=='male') + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y - ifelse(runif(n) plogis(L), 1, 0) ddist - datadist(age, blood.pressure, cholesterol, sex) options(datadist='ddist') fit - lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)), x=TRUE, y=TRUE) p - Predict(fit, age, cholesterol, sex='male', np=50) # vary sex last bp.plot - bplot(p, lfun=contourplot) bp.plot I have tried a variety of efforts at using update (which I assume is a lattice function although I can find no help page for it. It does appear in some of the lattice hep pages and my understanding is that it pushes objects onto the list structure of a plot object. I've also tried adding to it with llines() #- Oh, never mind. I recovered a memory that I had seen a solution on rhelp and had saved it. Turns out it was from Peter Ehlers, to whom I offer thanks. I was trying to add a step function: ht and weight from a dataframe, bld: trellis.focus(panel, 1, 1) panel.lines(x=bld$inches, y=bld$BMI28, type='s') trellis.unfocus() Success! Now... how do I control the color levels in levelplot or contourplot??? David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] different outcomes of P values in SPSS and R
What your saying is true. The sequential/marginal difference can account for the discrepancy in p values but not necessarily the coefficients. One thing I've found that can lead to differences in coefficients and p values between R and SPSS is whether or not you specify that a variable is a factor (e.g., group2 - factor(group) ). Marty Teicher On Aug 13, 2010, at 10:58 PM, Ben Bolker wrote: Yes, but ... the original poster said the coefficients differed too. (The blog post you refer to deals with ANOVA (i.e. linear models) rather than GLMs (generalized linear models): it is true that the sequential/marginal distinction still applies, but I don't think that can be the *only* thing going on here.) On Fri, Aug 13, 2010 at 10:50 PM, Martin Teicher martin_teic...@hms.harvard.edu wrote: R usesType I sequential SS, not the default Type III marginal SS reported by SPSS. There is a good blog post explaining this difference along with some interesting comments -- http://myowelt.blogspot.com/2008/05/obtaining-same-anova-results-in-r-as-in.html Best Wishes, Martin H. Teicher Dept of Psychiatry McLean Hospital / Harvard Medical School Belmont MA 02478 On Aug 13, 2010, at 10:32 PM, Ben Bolker wrote: Leo Vorthoren L.Vorthoren at nioo.knaw.nl writes: I have been using generalized linear models in SPSS 18, in order to build models and to calculate the P values. When I was building models in Excel (using the intercept and Bs from SPSS), I noticed that the graphs differed from my expectations. When I ran the dataset again in R, I got totally different outcomes for both the P values as well as the Bs and the intercepts. The outcomes of R seem much more likely to be the correct ones, but I really cannot explain the differences. I appreciate/assume that you're asking on the off chance that someone else has tried something very similar and gone to the trouble of figuring out the differences between R's and SPSS's default setup, but you're unlikely to get an answer without more detailed information. My best guess is that SPSS and R are using different contrasts and/or different baseline levels. R uses treatment contrasts by default, and assumes that the first (alphabetical) level of a factor is the baseline level. It's conceivable that you have a dataset where the results are numerically unstable and sensitive to small details in the algorithms used. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to eliminate an element of a list
On Aug 13, 2010, at 4:06 PM, fishkbob wrote: list-seq(2,10,2) list [1] 2 4 6 8 10 list[-which(2==list)] [1] 4 6 8 10 using the which() will let you remove things from a list that have a specified value... I usually use the blah- blah[-which(TRUE==is.na(blah)) ] which will remove all NA values in your list Although which() is useful when used with [ because it avoids the NA indexing problem, it is entirely superfluous here. blah- blah[-is.na(blah) ] # will be more efficient -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to add lines to lattice plot produced by rms::bplot
On Aug 13, 2010, at 11:25 PM, Duncan Mackay wrote: Hi David I do not know if you have done something like this. I had tried a few efforts like that, starting with an examination of str(bp.plot) as you demonstrate. I tried str(bp.plot) which gave the section about the regions (for colours) as: $ panel.args.common:List of 8 ..$ x : num [1:2500] 27 28 29 29.9 30.9 ... ..$ y : num [1:2500] 141 141 141 141 141 ... ..$ z : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ... ..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 ..$ region : logi FALSE ..$ zlab :List of 3 .. ..$ label: chr log odds .. ..$ rot : num 90 .. ..$ cex : num 1 ..$ labels : logi TRUE ..$ contour: logi TRUE I tried (with a bplot object named bldLT40): bldLT40$legend$right$args$key$at - c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5, 5.0, by=0.5), 6:10) ... and then tried bldLT40$panel.args$at - c(-0.233, seq(.50, 2.25, by=0.25), seq(2.5, 5.0, by=0.5), 6:10) Neither of these efforts changed the boundaries beteen colors in the plot area. The first effort changed the legend scal,e but that just created a misalignment of the colors of plot area and the legend. I would be interested in either a strategy that lets one alter the color level changes of the z variable (which in bplot-created objects is zhat, or lets one specify the values at which contour lines are drawn in contourplot. Thanks for your efforts. -- David. I added the col.region and colours from a plot levelplot that I had done to see what would occur to the trellis parameters. No colours were produced when plotted. bp.plot - bplot(p, lfun=contourplot, color.key = TRUE, col.regions = c (#FF ,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) ) $ panel.args.common:List of 10 ..$ x : num [1:2500] 27 28 29 29.9 30.9 ... ..$ y : num [1:2500] 141 141 141 141 141 ... ..$ z : num [1:2500] -1.43 -1.41 -1.39 -1.36 -1.34 ... ..$ at : num [1:10] -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 ..$ region : logi FALSE ..$ color.key : logi TRUE ..$ zlab :List of 3 .. ..$ label: chr log odds .. ..$ rot : num 90 .. ..$ cex : num 1 ..$ labels : logi TRUE ..$ contour: logi TRUE ..$ col.regions: chr [1:7] #FF #00 #A9E2FF #8080FF ... So it has been added to the panel.args.common, whether you can access these are another matter. I then tried bp.plot - bplot(p, lfun=contourplot, par.settings = list(axis.text = list(cex = 0.65)), color.key = TRUE, col.regions = c (#FF ,#00,#A9E2FF,#8080FF,#FF,#FFD18F,#FF) ) which changed the size of the axis text so it may be the case of having to add the col.regions etc to the appropriate list in par.settings I'll leave you to amend and access the colours. You may have to add values for the wireframe/levelplot arguments like at etc. and col.regions (I think that is the function) to produce an appropriate colour range of your choice It is a while since I have delved into these sorts of plots. Need some sustenance. Regards Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email home: mac...@northnet.com.au At 10:33 14/08/2010, you wrote: I have a plot produced by function bplot (package = rms) that is really a lattice plot (class=trellis). It is similar to this plot produced by a very minor modification of the first example on the bplot help page: require(rms) n - 1000# define sample size set.seed(17) # so can reproduce the results age- rnorm(n, 50, 10) blood.pressure - rnorm(n, 120, 15) cholesterol- rnorm(n, 200, 25) sex- factor(sample(c('female','male'), n,TRUE)) label(age)- 'Age' # label is in Hmisc label(cholesterol)- 'Total Cholesterol' label(blood.pressure) - 'Systolic Blood Pressure' label(sex)- 'Sex' units(cholesterol)- 'mg/dl' # uses units.default in Hmisc units(blood.pressure) - 'mmHg' # Specify population model for log odds that Y=1 L - .4*(sex=='male') + .045*(age-50) + (log(cholesterol - 10)-5.2)*(-2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(-L)] y - ifelse(runif(n) plogis(L), 1, 0) ddist - datadist(age, blood.pressure, cholesterol, sex) options(datadist='ddist') fit - lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol,4)), x=TRUE, y=TRUE) p - Predict(fit, age, cholesterol, sex='male', np=50) # vary sex last bp.plot - bplot(p, lfun=contourplot) bp.plot I have tried a variety of efforts at using update (which I assume is a lattice function although I can find no help page for it. It does appear in some of the lattice hep pages and my understanding is that it pushes objects onto the list structure of a plot object. I've also tried adding to it with llines() #- Oh, never mind. I recovered a memory that I had seen a solution on rhelp and had saved it. Turns out
Re: [R] R execution
On Aug 13, 2010, at 6:57 PM, chantalm wrote: So I have a for loop for (i in 1 : 5) { print(i) } and I want it to print 1, 2, 3, 4, 5 sequentially for each loop iteration. Instead, it waits for the whole loop to finish and then prints. It's strange because my R console seems to have two modes of execution. One where it will iterate for each loop and print the message (and where I can easily abort an R program with the escape key), and another mode where it doesn't print for each loop iteration/just keeps executing even if I press escape key. I can't figure out how to enable the first way of execution! Can someone shed some light on this matter? This depends (if I remember correctly) on OS specifics which you have not provided. Aren't they teaching you to read the manuals (aka Posting Guide) at U-dub these days? (perhaps search on : ??flush console # ( again, IIRC) -- David Winsemius, MD (UW '87) West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Learning ANOVA
Hi Chris, Thanks for your advice which works for me here. I'm reading; An Introduction to R http://h1.mg4.mail.yahoo.com/dc/launch?sysreq=ignore NOT finish yet. Also I found many tutorials on Internet. I'll take me time going through all of them. I'm interesting reading tutorial with demo. Otherwise it won't be easy for a beginning remembering all commands and steps on manual. B.R. Stephen L - Original Message From: Christopher W Ryan cr...@binghamton.edu To: R-help@r-project.org Sent: Sat, August 14, 2010 1:51:34 AM Subject: Re: [R] Learning ANOVA Read documentation for TukeyHSD by typing the command: ?TukeyHSD The input to that function should usually be, a fitted model object, usually an aov fit. You have not created a fitted model object. This seems to work: model - aov(InsectSprays$count ~ InsectSprays$spray) TukeyHSD(model) The aov() makes the aov model object. Tacking .aov onto an object name doesn't do that. Which introductory R book(s) have you read? That would be a worthwhile investment of your time. --Chris Ryan On Fri, Aug 13, 2010 at 12:43 PM, Stephen Liu sati...@yahoo.com wrote: Hi Erik, I followed following video as example; ANOVA in R http://www.youtube.com/watch?v=Dwd3ha0P8uwfeature=related Now I got it done; boxplot(test01$count ~ test01$spray) Continued: InsectSprays.aov -(test01$count ~ test01$spray) summary(InsectSprays.aov) Length ClassMode 3 formulacall Seems having problem here. TukeyHSD(InsectSprays.aov) Error in UseMethod(TukeyHSD) : no applicable method for 'TukeyHSD' applied to an object of class formula I'm still stuck here. B.R. Stephen L - Original Message From: Erik Iverson er...@ccbr.umn.edu To: Stephen Liu sati...@yahoo.com Cc: r-help@r-project.org Sent: Sat, August 14, 2010 12:15:31 AM Subject: Re: [R] Learning ANOVA Performed following steps on R:- ### to access to the object data(InsectSprays) ### create a .csv file write.csv(InsectSprays, InsectSpraysCopy.csv) On another terminal $ sudo updatedb $ locate InsectSpraysCopy.csv /home/userA/InsectSpraysCopy.csv ### Read in some data test01 - read.csv(file.choose(), header=TRUE) Enter file name: /home/userA/InsectSpraysCopy.csv I either don't understand what you're doing, or you seem very confused. R comes with many sample data sets for you to use. You can see a list of them using the ?data function. Calling data with an argument loads that dataset. So, when you type: data(InsectSprays) that data object is now available in R, see objects() You can look at it simply by printing it: InsectSprays If for some reason it makes sense to do it this way for your use case, then that's fine, I just want to make sure you understand that you don't have to if accessing built-in datasets is all you want. ### Look at the data test01 X count spray 1 110 A 2 2 7 A 3 320 A snip ### Create a side-by-side boxplot of the data boxplot(test01$DO ~ test01$Stream) Error in model.frame.default(formula = test01$DO ~ test01$Stream) : invalid type (NULL) for variable 'test01$DO' Why do you think test01 has an element called D0 or Stream?? The column names when you print the data tell you otherwise! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge function in R?
On Fri, 13 Aug 2010, fishkbob wrote: So I have a bunch of c(start,end) points and want to consolidate them into as few c(start,end) as possible. For example: sample startend A 5 10 B 7 18 C 14 D 16 20 I'd want the function to return the two distinct sets (1,4) and (5,20) Is there an R function that already does this? Yes. See the reduce() function in the IRanges package on BioConductor See pages 11-12 of http://www.bioconductor.org/packages/2.6/bioc/vignettes/IRanges/inst/doc/IRangesOverview.pdf HTH, Chuck or should I write my own? (how would I go about that?) -- View this message in context: http://r.789695.n4.nabble.com/merge-function-in-R-tp2324684p2324684.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Charles C. Berry(858) 534-2098 Dept of Family/Preventive Medicine E mailto:cbe...@tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.