Re: [R] strangely long floating point with write.table()
Thanks for the ideas. It is great to have such skilled assistance with this issue. That said, I don't think we've solved this one, yet. Looking back at where my numbers came from, I found that I had read in integers from a file, divided by 1000, then (critically) subtracted those numbers from 2. It turns out that the important part seems to be the subtraction, not the data source. It isn't necessary to read in the data to get the effect. Here is a simple example: write.table(c(1,2)-c(0.995,1.995), file="data.txt", row.names=F, col.names=F) $ cat data.txt 0.005 0.00489 Here is another simple example that uses seq() and does not require reading in data. As you can see, the output for both commands should be the same, but there is a big difference in how the numbers are represented in the output. What causes the inconsistency within and between these two output files? write.table(1-seq(0.995,0.840,-.005), file="data1.txt", row.names=F, col.names=F) write.table(2-seq(1.995,1.840,-.005), file="data2.txt", row.names=F, col.names=F) $ head -33 data[12].txt ==> data1.txt <== 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.0601 0.0649 0.0701 0.075 0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 ==> data2.txt <== 0.00489 0.00979 0.0149 0.0198 0.0249 0.0298 0.0349 0.0398 0.0449 0.0498 0.0549 0.0598 0.0649 0.0698 0.075 0.0798 0.085 0.0899 0.095 0.0999 0.105 0.11 0.115 0.12 0.125 0.13 0.135 0.14 0.145 0.15 0.155 0.16 Importantly, if I do this... write.table(seq(0.005,0.160,.005), file="data.txt", row.names=F, col.names=F) ...I'm producing all the same values, but no number in the output file exceeds three digits to the right of the decimal. Thanks again for all of the helpful comments and ideas. Best, Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota http://scholar.google.com/citations?user=EV_phq4AAAAJ On Sat, 15 Mar 2014, Duncan Murdoch wrote: On 14-03-14 11:03 PM, Mike Miller wrote: On Fri, 14 Mar 2014, Duncan Murdoch wrote: On 14-03-14 8:59 PM, Mike Miller wrote: What I'm using: R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-unknown-linux-gnu (64-bit) That's not current, but it's not very old... According to some docs, options(digits) controls numerical precision in output of write.table(). I'm using the default value for digits: getOption("digits") [1] 7 I have a bunch of numbers in a data frame that are only a few digits to the right of the decimal: That's not enough to reproduce this. Put together a self-contained reproducible example if you're wondering why something behaves as it does. With just a bunch of output, you'll just get uninformed guesses. Thanks for the tip. Here's what I've done: data2 <- data[c(94,120),c(18,20,21)] Thanks, I got the data2.Rdata file. Peter was right, you don't have what you think you have in that dataframe. See below. save(data2, file="data2.Rdata") q("no") $ R load("data2.Rdata") data2 V18 V20 V21 94 0.008 0.008 0.64 120 0.023 0.023 0.000529 I'll create a dataframe that looks like yours: data3 <- data.frame(V18=c(0.008, 0.023), V20=c(0.008, 0.023), V21=c(0.64, 0.000529)) data3 V18 V20 V21 1 0.008 0.008 0.64 2 0.023 0.023 0.000529 But it's not the same: data2-data3 V18 V20 V21 94 6.938894e-18 6.938894e-18 1.219727e-19 120 -9.020562e-17 -9.020562e-17 -4.119968e-18 I can't tell where these errors crept in; they are likely there in your "data" object, which you didn't give us. I'd guess as Peter did that your numbers are the results of computations that introduced rounding error. Duncan Murdoch write.table(data2, file="data2.txt", sep="\t", row.names=F, col.names=F) $ cat data2.txt 0.00801 0.00801 6.41e-05 0.0229 0.0229 0.0005289996 The data2.Rdata file is attached to this message. I guess that is enough to reproduce this exact finding. I don't know how it works in general. I don't have a newer version of R available right now. It did the same thing on an older version (2.15.1). Interestingly, on a different machine with an even older version (2.12.2) I see something a little different: 0.008 0.008 6.41e-05 0
Re: [R] strangely long floating point with write.table()
On Fri, 14 Mar 2014, Duncan Murdoch wrote: On 14-03-14 8:59 PM, Mike Miller wrote: What I'm using: R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-unknown-linux-gnu (64-bit) That's not current, but it's not very old... According to some docs, options(digits) controls numerical precision in output of write.table(). I'm using the default value for digits: getOption("digits") [1] 7 I have a bunch of numbers in a data frame that are only a few digits to the right of the decimal: That's not enough to reproduce this. Put together a self-contained reproducible example if you're wondering why something behaves as it does. With just a bunch of output, you'll just get uninformed guesses. Thanks for the tip. Here's what I've done: data2 <- data[c(94,120),c(18,20,21)] save(data2, file="data2.Rdata") q("no") $ R load("data2.Rdata") data2 V18 V20 V21 94 0.008 0.008 0.64 120 0.023 0.023 0.000529 write.table(data2, file="data2.txt", sep="\t", row.names=F, col.names=F) $ cat data2.txt 0.00801 0.00801 6.41e-05 0.0229 0.0229 0.0005289996 The data2.Rdata file is attached to this message. I guess that is enough to reproduce this exact finding. I don't know how it works in general. I don't have a newer version of R available right now. It did the same thing on an older version (2.15.1). Interestingly, on a different machine with an even older version (2.12.2) I see something a little different: 0.008 0.008 6.41e-05 0.0229 0.0229 0.0005289996 Best, Mike__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strangely long floating point with write.table()
What I'm using: R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-unknown-linux-gnu (64-bit) According to some docs, options(digits) controls numerical precision in output of write.table(). I'm using the default value for digits: getOption("digits") [1] 7 I have a bunch of numbers in a data frame that are only a few digits to the right of the decimal: data[c(94,120), c(18,20,21)] V18 V20 V21 94 0.008 0.008 0.64 120 0.023 0.023 0.000529 I write the data to a file: write.table(data, file="data.txt", sep="\t", row.names=F, col.names=F) Then I look at those same values and this is what I see: $ gawk -F'\t' 'NR==94 || NR==120 {print $18,$20,$21}' data.txt 0.00801 0.00801 6.41e-05 0.0229 0.0229 0.0005289996 This is the weird thing: Only those two records get the long, annoyingly "precise" 17-digit numbers. Other records look like this: 0.052 1.052 1.106704 0.178 0.178 0.031684 I understand that binary representations won't reflect decimal representations precisely, etc., but why do I get this junk for only two records out of 197 records? Records that contain only integral values can't get it wrong, but the other 30 of 32 records with decimals look fine -- see below. Also, if precision should be to 7 digits, why am I getting 17 digits for exactly two of the records? Why does this happen for all three numbers in those two records? If you think this is a bug that I should report elsewhere, let me know. Thanks. Mike $ gawk -F'\t' '{print $18,$20,$21}' data.txt | grep -F . 0.944 0.944 0.891136 0.885 1.885 3.553225 0.052 1.052 1.106704 0.178 0.178 0.031684 1.996 1.996 3.984016 0.86 1.86 3.4596 0.765 1.765 3.115225 0.986 1.986 3.944196 0.998 0.998 0.996004 0.998 0.998 0.996004 0.956 0.956 0.913936 0.99 1.99 3.9601 0.00801 0.00801 6.41e-05 0.99 0.99 0.9801 0.0229 0.0229 0.0005289996 0.938 0.938 0.879844 0.034 1.034 1.069156 0.86 1.86 3.4596 0.911 1.911 3.651921 0.971 0.971 0.942841 0.994 0.994 0.988036 0.418 0.418 0.174724 0.805 1.805 3.258025 0.996 1.996 3.984016 0.998 1.998 3.992004 0.623 1.623 2.634129 0.998 0.998 0.996004 1.628 1.628 2.650384 0.981 0.981 0.962361 0.998 0.998 0.996004 1.676 1.676 2.808976 0.986 1.986 3.944196 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Another question about bootstrapping cor
Read the help on boot. Specifically in non-parametric bootstrapping the statistic function takes a data (the original data) and an index number that shows which rows are taken in the bootstrap sample. So you need to do the following. x <- 1:15 y <- c(2,4,1,3,5, 7,6, 9,10,8, 14, 13, 11, 15, 12) x[3] <- NA; x[11] <- NA; x[8] <- NA; y[2] <- NA; y[8] <- NA; y[12] <- NA cor(x,y,use="complete.obs",method="kendall") library(boot) tmpdf <- data.frame(x,y) cor1 <- function(data,idx) { df<-data[idx,] cor(df$x,df$y,use="complete.obs",method="kendall")} corboot <- boot(tmpdf,cor1,R=999) Bootstrap Statistics : original biasstd. error t1* 0.867 -0.001852283 0.1099832 in your previous function the index was also getting passed to the cor1 function . To see the individual results use: corboot$t t is a matrix of the bootstrap replicates. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of David Parkhurst Sent: Thursday, March 06, 2014 12:28 PM To: r-help@r-project.org Subject: [R] Another question about bootstrapping cor When I use the fix that Arun K. provided to my earlier example, I wonder how to find out where in the 999 bootstrap repetitions the value for the actual data fits. Here is the fixed code: x <- 1:15 y <- c(2,4,1,3,5, 7,6, 9,10,8, 14, 13, 11, 15, 12) x[3] <- NA; x[11] <- NA; x[8] <- NA y[2] <- NA; y[8] <- NA; y[12] <- NA cor(x,y,use="complete.obs",method="kendall") library(boot) tmpdf <- data.frame(x,y) ## corboot <- boot(tmpdf, cor(x,y,use="complete.obs",method="kendall"),R=999) cor1 <- function(x,y) {cor(x,y,use="complete.obs",method="kendall")} corboot <- boot(tmpdf,cor1,R=999) When I run that, I get > corboot ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = tmpdf, statistic = cor1, R = 999) Bootstrap Statistics : original biasstd. error t1* 1.000 -1.0062026 0.2490860 t2* 0.867 -0.8694414 0.2491383 t2* is the estimate for the actual data. What is t1*? And again, how do I find where in the distribution of the 999 reps does that true value fall? I would expect print(corboot) to give me more information, but it just repeats that same output. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] AIX 7.1 and R build problems
Has anyone managed to build R-3.0.2 from source on AIX 7.1 using gcc 4.2.0. The configure script finishes with: ... checking whether wctrans exists and is declared... no checking whether iswblank exists and is declared... no checking whether wctype exists and is declared... no checking whether iswctype exists and is declared... no configure: error: Support for MBCS locales is required. Scanning through the config.log the configure script seems to be happy with the C99 compliance at hand. I have tried reading NEWS, README, and R-Help archives but I can't get past this. I'm all Google'd out. Is my compiler too old? _ Michael Beddo Senior Scientist Data Ventures, Inc. 1475 Central Ave. Suite 230 | Los Alamos, NM 87544 tel 505.695.2132 http://www.dataventures.com | "Advanced - Effective - Actionable - Proven. Analytics" __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] r noobie, reading my text file into r
When starting out I sometimes find it easier to do the following: Ceosalary<-read.table(file.choose(),sep="\t") This will give you a dialog box to find the file you want and you won't have to worry about getting the full path exactly right. Hth, Mike W. Michael Conklin Executive Vice President | Marketing Science GfK Custom Research, LLC | 8401 Golden Valley Road | Minneapolis, MN, 55427 T +1 763 417 4545 | M +1 612 567 8287 www.gfk.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of frankreynolds Sent: Thursday, February 06, 2014 12:01 PM To: r-help@r-project.org Subject: [R] r noobie, reading my text file into r Hi everyone, this is my first time using r and I think I'm overlooking something small and I just need some help seeing it. I have a file in notepad that I need to read into r. > ceosalary<-read.table(file="C:/Users/mz106_010/Desktop/ceosalary.csv", > header > = TRUE,sep="\t") Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'C:/Users/mz106_010/Desktop/ceosalary.csv': No such file or directory > ceosalary<-read.table(file="C:/Users/mz106_010/Desktop/ceosalary.txt", > header > = TRUE,sep="\t") Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file 'C:/Users/mz106_010/Desktop/ceosalary.txt': No such file or directory am I writing one of those incorrectly? What can I do to fix the problem? Any help would be greatly appreciated. Thanks for your time everyone! -- View this message in context: http://r.789695.n4.nabble.com/r-noobie-reading-my-text-file-into-r-tp4684871.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help with ggplot legend specification
I am creating a scatterplot with the following code. pl<-ggplot(df,aes(x=Importance,y=Performance,fill=PBF,size=gapsize))+ geom_point(shape=21,colour="black")+scale_size_area(max_size=pointsizefactor) points are plotted where the size of the point is related to a metric variable gapsize and the fill color on the point is related to the variable PBF which is a 4 level factor. This works exactly as I want with the points varying in size based on the metric and being color coded. I get 2 legends on the side of the plot, one related to the size of the dot and the other showing the color coding. The problem is that the dots on the color coding legend are so small that it is impossible to discern what color they are. The dots in the plot are large, so it is clear what colors they are, but the legend is useless. How can I increase the size of the points in the color legend. pointsizefactor<-5 df Importance Performance gapsize labels PBF q50451 0.7079463 -0.7213622 2 a W q50452 0.4489164 -0.5552116 1 b G q50453 0.7714138 -0.6940144 5 c F q50454 0.6284830 -0.6011352 3 d S q50455 0.7131063 -0.6800826 4 e G q50456 0.7038184 -0.6026832 6 f S q50457 0.5201238 -0.3539732 8 g G q50458 0.9195046 -0.8214654 2 h F q50459 0.3797730 -0.4184727 1 i W q504510 0.8065015 -0.6305470 7 j G q504511 0.6062951 -0.4442724 6 k S q504512 0.6253870 -0.4478844 8 l G q504513 0.3813209 -0.4102167 2 m W q504514 0.3813209 -0.3436533 3 n F q504515 0.5185759 -0.4365325 5 o G q504516 0.5872033 -0.4556244 6 p S q504518 0.5397317 -1.000 1 q S q504519 0.5882353 -0.4674923 9 r S q504520 0.4205366 -0.4164087 4 s W q504521 0.7616099 -0.3323013 10 t F q504522 0.7213622 -0.6088751 7 u G q504523 0.6780186 -0.6130031 8 v G q504524 0.6904025 -0.3937049 10 w W q504525 0.4143447 -0.4669763 4 x W q504526 0.5779154 -0.2982456 9 y F q504527 0.6718266 -0.3457172 10 z G Thanks all //Mike W. Michael Conklin Executive Vice President | Marketing Science GfK Custom Research, LLC | 8401 Golden Valley Road | Minneapolis, MN, 55427 T +1 763 417 4545 | M +1 612 567 8287 www.gfk.com<http://www.gfk.com/> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] automatic history file append with every command?
In the bash shell we can use PROMPT_COMMAND="history -a" to tell bash to always append the last command to the history file. It will do this with every command so that if bash crashes or an ssh connection is lost, the command history will still be available in the history file. With R, I see there is a savehistory() command (link below), but I have some questions: (1) does savehistory() append to the current history file or overwrite it? (2) if it appends, when savehistory() is evoked repeatedly, does it only append commands that haven't already been appended? Or does it append the entire history to the file? (3) Is there a way to evoke history saving automatically so that the file is always updated? I have the impression that savehistory() only overwrites and cannot append. If that's true, I might still get what I want, but I'd be doing a lot of unnecessary writing. Luckily, my R history is usually quite short. So if I were to enter 100 commands, appending would write 100 lines, but overwriting would write 5050 lines. savehistory: http://stat.ethz.ch/R-manual/R-devel/library/utils/html/savehistory.html Best, Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] netlogo r-extension loadlibrary() failures
Trying to access R from Netlogo5 (using the NetLogo R-Extension), running the configuration validation tests in NetLogo5/extensions/r/Systemcheck.nlogo, I get several loadlibrary() errors ... in rJava Check2, > library(rJava); .path.package('rJava') Error : .onLoad failed in loadNamespace() for 'rJava', details: call: inDL(x, as.logical(local), as.logical(now), ...) error: unable to load shared object 'D:/Programs/R/R-3.0.1/library/rJava/libs/x64/rJava.dll': LoadLibrary failure: The specified module could not be found. [ though that is in fact where the rJava.dll is located ] in the JavaGD Check, > library(JavaGD); .path.package('JavaGD') Error : .onLoad failed in loadNamespace() for 'JavaGD', details: call: inDL(x, as.logical(local), as.logical(now), ...) error: unable to load shared object 'D:/Programs/R/R-3.0.1/library/JavaGD/libs/x64/JavaGD.dll': LoadLibrary failure: The specified module could not be found. [ the exact location of the JavaGD.dll ] In the Check R test, I get... java.lang.UnsatisfiedLinkError: no R in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1878) at java.lang.Runtime.loadLibrary0(Runtime.java:849) at java.lang.System.loadLibrary(System.java:1087) at org.nlogo.extension.r.systemcheck.Entry$RCheck.perform(Entry.java:208) at org.nlogo.prim._extern.perform(_extern.java:54) at org.nlogo.nvm.Context.stepConcurrent(Context.java:91) at org.nlogo.nvm.ConcurrentJob.step(ConcurrentJob.java:82) at org.nlogo.job.JobThread.org$nlogo$job$JobThread$$runPrimaryJobs(JobThread.scala:143) at org.nlogo.job.JobThread$$anonfun$run$1.apply$mcV$sp(JobThread.scala:78) at org.nlogo.job.JobThread$$anonfun$run$1.apply(JobThread.scala:76) at org.nlogo.job.JobThread$$anonfun$run$1.apply(JobThread.scala:76) at scala.util.control.Exception$Catch.apply(Exception.scala:88) at org.nlogo.util.Exceptions$.handling(Exceptions.scala:41) at org.nlogo.job.JobThread.run(JobThread.scala:75) NetLogo 5.0 main: org.nlogo.app.AppFrame thread: AWT-EventQueue-0 Java HotSpot(TM) 64-Bit Server VM 1.7.0_25 (Oracle Corporation; 1.7.0_25-b16) operating system: Windows 7 6.1 (amd64 processor) Scala version 2.9.1.final JOGL: (3D View not initialized) OpenGL Graphics: (3D View not initialized) model: Systemcheck 11:29:31.892 AddJobEvent (org.nlogo.window.ButtonWidget) AWT-EventQueue-0 11:29:31.784 InputBoxLoseFocusEvent (org.nlogo.window.ButtonWidget) AWT-EventQueue-0 11:29:31.757 PeriodicUpdateEvent (org.nlogo.app.App$$anon$1 (org.nlogo.window.GUIWorkspace)) AWT-EventQueue-0 11:29:31.557 PeriodicUpdateEvent (org.nlogo.app.App$$anon$1 (org.nlogo.window.GUIWorkspace)) AWT-EventQueue-0 11:29:31.357 PeriodicUpdateEvent (org.nlogo.app.App$$anon$1 (org.nlogo.window.GUIWorkspace)) AWT-EventQueue-0 11:29:31.157 PeriodicUpdateEvent (org.nlogo.app.App$$anon$1 (org.nlogo.window.GUIWorkspace)) AWT-EventQueue-0 11:29:30.957 PeriodicUpdateEvent (org.nlogo.app.App$$anon$1 (org.nlogo.window.GUIWorkspace)) AWT-EventQueue-0 11:29:30.757 PeriodicUpdateEvent (org.nlogo.app.App$$anon$1 (org.nlogo.window.GUIWorkspace)) AWT-EventQueue-0 11:29:30.557 PeriodicUpdateEvent (org.nlogo.app.App$$anon$1 (org.nlogo.window.GUIWorkspace)) AWT-EventQueue-0 11:29:30.345 PeriodicUpdateEvent (org.nlogo.app.App$$anon$1 (org.nlogo.window.GUIWorkspace)) AWT-EventQueue-0 [ though $PATH includes %R_HOME%\bin\x64\ where there is an r.dll ] All three are loadlibrary() errors, where the target is located exactly where it can't be loaded from. What can I do to fix that? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] descriptive stats by cells in factorial design
MZ Female 11notES 42.3 5.362 30.338.6 41.846.0 56.6 172 0 21 Parent MZ Female 17notES 46.4 5.177 34.942.4 46.049.5 63.2 155 0 22 Parent MZ Male 11 ES 43.4 5.351 31.340.0 43.446.5 64.7 197 0 23 Parent MZ Male 11notES 41.6 4.656 32.138.0 41.444.6 65.3 331 0 24 Parent MZ Male 17notES 46.7 5.242 34.543.1 45.949.0 63.8 131 0 Thanks to all. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] descriptive stats by cells in factorial design
, N=sum(!is.na(x)), NAs=sum(is.na(x)))} That will allow me to feed the na.rm=T argument to the mean, sd and quantile functions. By not naming the quantile function (e.g., not using q=quantile(x, ...)) I allow the builtin column names to be used unaltered (i.e., 0%, 25%, 50%, 75%, 100%). I also did not use length() because it will count NA values and I want to see the sample sizes used for mean, sd and quantile. To deal with that problem I created a function with output named "N" to count those sample sizes and one with output named "NAs" to count the number of NAs. Then I do this: summaryBy(Age ~ Generation + Zygosity + Sex + Cohort + ESstatus, data=x, FUN=descriptivefun, na.rm=T) Generation ZygositySex Cohort ESstatus Age.meanAge.sd Age.0% Age.25% Age.50% Age.75% Age.100% Age.N Age.NAs 1 Offspring DZ Female 11 ES 17.78528 0.3535863 16.93 17.6000 17.775 17.965018.92 106 0 2 Offspring DZ Female 11notES 18.13679 0.968 16.76 17.8525 18.190 18.457519.50 162 0 3 Offspring DZ Female 17notES 17.47529 0.4569588 16.56 17.0700 17.590 17.870018.29 191 0 4 Offspring DZ Male 11 ES 17.76149 0.3467540 17.18 17.5150 17.715 18.18.71 134 0 5 Offspring DZ Male 11notES 17.87667 0.5187333 16.83 17.4600 17.860 18.240019.02 153 0 6 Offspring DZ Male 17notES 17.50418 0.3915823 16.73 17.1900 17.530 17.830018.52 165 0 7 Offspring MZ Female 11 ES 17.87628 0.4506530 16.86 17.6775 17.805 18.100019.12 196 0 8 Offspring MZ Female 11notES 18.05739 0.6103713 16.76 17.6300 18.050 18.420019.70 291 0 9 Offspring MZ Female 17notES 17.41061 0.4956190 16.55 16.9700 17.340 17.820018.45 395 0 10 Offspring MZ Male 11 ES 17.77174 0.3236917 16.84 17.5800 17.790 17.970019.02 195 0 11 Offspring MZ Male 11notES 17.87718 0.6472397 16.56 17.3300 17.855 18.210020.01 284 0 12 Offspring MZ Male 17notES 17.49114 0.3961757 16.65 17.1775 17.500 17.810018.35 332 0 13 Parent DZ Female 11 ES 44.61512 5.1246314 32.17 41.3400 44.680 48.280057.95 121 0 14 Parent DZ Female 11notES 42.54346 4.3670998 34.03 39.3450 42.110 45.550057.06 107 0 15 Parent DZ Female 17notES 46.30559 4.9177705 36.10 42.7275 45.765 48.335062.6968 0 16 Parent DZ Male 11 ES 44.60206 4.5605484 34.31 41.4475 44.890 47.497558.75 126 0 17 Parent DZ Male 11notES 42.71121 4.9600561 32.05 39.2400 42.760 45.270058.20 157 0 18 Parent DZ Male 17notES 46.77458 4.0226198 40.18 44.1250 46.000 48.820061.1259 0 19 Parent MZ Female 11 ES 44.23476 5.0214627 29.55 40.6925 44.125 47.730056.73 206 0 20 Parent MZ Female 11notES 42.31988 5.3622671 30.31 38.6050 41.835 46.017556.58 172 0 21 Parent MZ Female 17notES 46.36490 5.1770435 34.88 42.4200 45.950 49.495063.18 155 0 22 Parent MZ Male 11 ES 43.40787 5.3507439 31.28 39.9700 43.440 46.480064.65 197 0 23 Parent MZ Male 11notES 41.56363 4.6564818 32.10 38.0250 41.390 44.645065.29 331 0 24 Parent MZ Male 17notES 46.69298 5.2421896 34.45 43.1500 45.890 49.005063.80 131 0 I think that output looks very nice. One thing that I don't understand is why my function produces %.5f output for every value but the doBy::summaryBy() function uses different formats in different columns. Compare the above output with this output: descriptivefun(x$Age) mean sd 0%25%50%75% 100% NNAs 28.49302 13.29077 16.55000 17.65000 18.23000 42.25500 65.29000 4434.00.0 It's not a big deal, but it would be cool if I could tell doBy::summaryBy() how to format the numbers using something like format=c(rep("%.2f",7), rep("%d",2)). Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota On Mon, 5 Aug 2013, David Carlson wrote: This is a bit simpler. The function quantile() labels the output whereas fivenum() does not: aggregate(Age ~ Generation + Zygosity + Sex + Cohort + ESstatus, data=x, function(x) c(mean=mean(x), sd=sd(x), quantile(x))) On Mon, 5 Aug 2013, Dr. Thomas W. MacFarland wrote: Dear Dr. Miller: Pasted below is syntax that should mostly answer your recent question to the R mailing list: descriptivefun <
Re: [R] descriptive stats by cells in factorial design
t DZ Female 17notES 46.30559 4.9177705 36.10 42.7275 45.765 48.3350 62.69 68 16 Parent DZ Male 11 ES 44.60206 4.5605484 34.31 41.4475 44.890 47.4975 58.75126 17 Parent DZ Male 11notES 42.71121 4.9600561 32.05 39.2400 42.760 45.2700 58.20157 18 Parent DZ Male 17notES 46.77458 4.0226198 40.18 44.1250 46.000 48.8200 61.12 59 19 Parent MZ Female 11 ES 44.23476 5.0214627 29.55 40.6925 44.125 47.7300 56.73206 20 Parent MZ Female 11notES 42.31988 5.3622671 30.31 38.6050 41.835 46.0175 56.58172 21 Parent MZ Female 17notES 46.36490 5.1770435 34.88 42.4200 45.950 49.4950 63.18155 22 Parent MZ Male 11 ES 43.40787 5.3507439 31.28 39.9700 43.440 46.4800 64.65197 23 Parent MZ Male 11notES 41.56363 4.6564818 32.10 38.0250 41.390 44.6450 65.29331 24 Parent MZ Male 17notES 46.69298 5.2421896 34.45 43.1500 45.890 49.0050 63.80131 Thanks very much. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rms plot.Predict when type="p"
As always, Frank, thanks for the help. Much appreciated. Mike Babyak Department of Psychiatry and Behavioral Sciences Duke University Medical Center R version 3.01 Windows 7 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] descriptive stats by cells in factorial design
I'm looking for recommendations for a good way to do this. There must be a good function in some package... I have a 5-way factorial design, two levels per factor, so 32 cells, and I mostly just want the means and standard deviations for the contents of every cell. I can write a for loop, but maybe there is a nice way to show the model and get the results I want without having to do that. Similarly, it would be nice to also have the range and maybe some percentiles, if there is a function that would just pump them out. Finally, if something would draw a series of box plots for the cells of the factorial design, that could be nice to have. It seems like this sort of thing must already have been worked out. Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Editing code in a function? related to allowing zero weights in lmer()
To close the thread, I wanted to post my solution to this issue. Searching some other help threads, I found this: https://stat.ethz.ch/pipermail/r-help/2008-August/171321.html So, here's what I did: I got at the script for lmerFrames by doing this: edit(lme4:::lmerFrames) then edited the code to allow zero weights, and assigned it to a new function: my.lmerFrames<-function (...} and then did this: library(R.utils) reassignInPackage("lmerFrames", pkgName="lme4", my.lmerFrames) Voila, things execute, no error. Note also that R.utils requires some additional packages (R.oo, R.methodsS3 before you can use it. On Fri, Aug 2, 2013 at 12:03 AM, Mike Rennie wrote: > Indeed- thanks for the tips to get me going. This just kicks things up a > notch for me, having happily used packages without wanting to tinker with > them till now. > > Cheers. > > > On Thu, Aug 1, 2013 at 11:57 PM, Bert Gunter wrote: > >> 1. ?"::" >> >> 2. ?getAnywhere >> >> 3. Please consult the "R language definition" manual -- or even "An >> Introduction to R" -- to learn about R programming. It **is** a >> programming language, you know! >> >> Cheers, >> Bert >> >> On Thu, Aug 1, 2013 at 8:54 PM, Mike Rennie >> wrote: >> > Hi folks, >> > >> > I've not before had to edit code right in a function, but I think I need >> > to. I am using lmer() and want to use a model that uses zero weights. I >> > found this thread from 2009: >> > >> > https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001995.html >> > >> > but I'm unsure of how I would actually go about editing the code in the >> > package. >> > >> > Part of the problem is that I can't even find lmerFrames(). I can see it >> > being called in lmer, but if I just type in >> > >> > lmerFrames >> > >> > or >> > >> > fix(lmerFrames) >> > >> > I get nothing. >> > >> > The note from Doug Bates is that the function is "hidden"- is there a >> trick >> > to getting at these hidden functions to edit them? >> > >> > Any thoughts or suggestions welcome. >> > >> > Cheers, >> > >> > -- >> > Michael Rennie, Research Scientist >> > Fisheries and Oceans Canada, Freshwater Institute >> > Winnipeg, Manitoba, CANADA >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm >> > > > > -- > Michael Rennie, Research Scientist > Fisheries and Oceans Canada, Freshwater Institute > Winnipeg, Manitoba, CANADA > -- Michael Rennie, Research Scientist Fisheries and Oceans Canada, Freshwater Institute Winnipeg, Manitoba, CANADA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] install-packages
Try typing this into google search bar: [R] install packages The majority of the results on the first page will help you out. On Thu, Aug 1, 2013 at 8:43 PM, Said Filahi wrote: > hello, > i am new and I want to know how to install a packare on R > > thank you > > > said filahi > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Michael Rennie, Research Scientist Fisheries and Oceans Canada, Freshwater Institute Winnipeg, Manitoba, CANADA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Editing code in a function?
Hi folks, I've not before had to edit code right in a function, but I think I need to. I am using lmer() and want to use a model that uses zero weights. I found this thread from 2009: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001995.html but I'm unsure of how I would actually go about editing the code in the package. Part of the problem is that I can't even find lmerFrames(). I can see it being called in lmer, but if I just type in lmerFrames or fix(lmerFrames) I get nothing. The note from Doug Bates is that the function is "hidden"- is there a trick to getting at these hidden functions to edit them? Any thoughts or suggestions welcome. Cheers, -- Michael Rennie, Research Scientist Fisheries and Oceans Canada, Freshwater Institute Winnipeg, Manitoba, CANADA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rms plot.Predict when type="p"
Hi all, I'm trying to change the pch, color, lty, and lwd in a type="p" plot produced by plot.Predict in the rms package. What I'm shooting for is a separate plotting symbol symbol color for each level of a factor (and also to manage the line width for the CIs). Here's what I hope is a working example #make up some data n = 30 x1 = runif(n) group = factor(rep(1:3, length.out=n)) e = rnorm(n, 0, 1) #population model y = as.numeric(group) + 0.2*x1 + e #make a dataframe population.df <- data.frame(x1,group,y) #clean up rm(n,x1,group,e,y) #load rms package require(rms) #store predictor characteristics d<-datadist(population.df) options(datadist="d") # model. f1 <- ols(y ~ x1+group, population.df) #model result summary(f1) #get predicted values pf1<-Predict(f1,group) #plot myplot<-plot(pf1,~group,nlines=T,type='p', ylab='fitted Y', xlab='Treatment') #print plot with defaults myplot #try to alter settings plot.symbol.settings<-trellis.par.get("plot.symbol") #have a look at settings str(plot.symbol.settings) plot.symbol.settings$pch<-c(1,2,3) plot.symbol.settings$col<-c(2,1,3) plot.symbol.settings$lwd<-3 plot.symbol.settings$cex<-1.5 trellis.par.set("plot.symbol", plot.symbol.settings) #print again with new settings myplot As you can see, only the first pch and color are passed. So, obviously I am missing something important. Also, I notice that the lwd argument has no effect, so I am assuming this is controlled by something else, but I haven't found it yet. I'd be grateful if somebody could point me in the right direction. Thanks, Mike Babyak Department of Psychiatry and Behavioral Sciences Duke University Medical Center R version 3.01 Windows 7 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggmap, hexbin and length distortion in Lat/long
I am working with spatial data in ggmap, generally with great success. I have a huge data set with the coordinates in NAD 83 UTM Zone 11 (meters). To map the data the coordinates were converted to Lat Long in GIS prior to use in R and ggmap/ggplot. I am using hexagonal binning to aggregate the data : #create bins and calculate stats hb<-hexbin(DF$lon,DF$lat,xbins=80,IDs=TRUE) hb.avg<-hexTapply(hb,DF$Res,mean,na.rm=TRUE) hb.mx<-hexTapply(hb,DF$Res,max,na.rm=TRUE) hb.p80<-hexTapply(hb,DF$Res,quantile,prob=0.80,na.rm=TRUE) #create df for ggplot hx_dat <- data.frame(hcell2xy(hb), count = hb@count, xo = hb@xcm, yo = hb@ycm, Mean=hb.avg,Max=hb.mx, p80=hb.p80) #Base Map #BBox is the bounding box Base<-get_map(BBox,source='google') m_hx<-ggmap(Base,legend = "bottom", base_layer=ggplot(aes(x=x,y=y),data=hx_dat)) #Map of means a<-0.55 hc<-'grey60' m_hx+geom_hex(aes(x = x, y = y, fill = Mean), color = hc, ,alpha=a,stat = "identity") + scale_fill_gradientn("Mean",colours=rev(rainbow(4)),trans='sqrt') ...and so on for other stats I can also run statistical analyses on hx_dat. By creating hexbins based on lat/long it seems there will be distortion due to the differences in length of a degree at different locations on the earth's surface. What is the most efficient way to eliminate this distortion? Should I run hexbin in NAD83 and convert the x/y coordinates to Lat Long? Can I get ggmap to convert the baselayer to NAD84 and just do everything in NAD(my preferred option)? I have tried converting Lat Long to NAD84 and back but the coordinates are coming up in the eastern Pacific and not in California, so I am missing something and I am not sure that is the best way to solve the problem anyway. Thanks in advance, any help is greatly appreciated Mike Michael J. Bock, PhD | Senior Manager mb...@environcorp.com This message contains information that may be confidenti...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Behaviors of diag() with character vector in R 3.0.0
Thanks, A.K. I managed to create diagonal matrices for character vectors. Since this new behavior broke a package that I have written, I would like to make sure that this new behavior was not introduced by mistakes. If this new behavior is final, I will modify my code to fit it. Mike On Tue, Apr 9, 2013 at 10:14 PM, arun wrote: > Hi, > You could try this: > v <- c("a", "b") > mat1<-diag(length(v)) > diag(mat1)<- v > mat1 > # [,1] [,2] > #[1,] "a" "0" > #[2,] "0" "b" > > > v1<- letters[1:5] > mat2<- diag(length(v1)) > diag(mat2)<- v1 > mat2 > # [,1] [,2] [,3] [,4] [,5] > #[1,] "a" "0" "0" "0" "0" > #[2,] "0" "b" "0" "0" "0" > #[3,] "0" "0" "c" "0" "0" > #[4,] "0" "0" "0" "d" "0" > #[5,] "0" "0" "0" "0" "e" > A.K. > > > > > - Original Message - > From: Mike Cheung > To: r-help > Cc: > Sent: Tuesday, April 9, 2013 3:15 AM > Subject: [R] Behaviors of diag() with character vector in R 3.0.0 > > Dear all, > > According to CHANGES IN R 3.0.0: > o diag() as used to generate a diagonal matrix has been re-written > in C for speed and less memory usage. It now forces the result > to be numeric in the case diag(x) since it is said to have 'zero > off-diagonal entries'. > > diag(x) does not work for character vector in R 3.0.0 any more. For > example, > v <- c("a", "b") > > ## R 2.15.3 > diag(v) > [,1] [,2] > [1,] "a" "0" > [2,] "0" "b" > > ## R 3.0.0 > diag(v) > [,1] [,2] > [1,] NA0 > [2,]0 NA > Warning message: > In diag(v) : NAs introduced by coercion > > Regarding the character matrix, it still works. For example, > m <- matrix(c("a", "b", "c", "d"), nrow=2) > diag(m) > ## Both R 2.15.3 and 3.0.0 > [1] "a" "d" > > n <- matrix(0, ncol=2, nrow=2) > diag(n) <- v > n > ## Both R 2.15.3 and 3.0.0 > [,1] [,2] > [1,] "a" "0" > [2,] "0" "b" > > I understand that the above behavior follows exactly what the manual says. > It appears to me that the version in 2.15.3 is more general as it works for > both numeric and character vectors and matrices, whereas the version in > 3.0.0 works for character matrices but not character vectors. > > Would it be possible to retain the behaviors of diag() for character > vectors? Thanks. > > Mike > -- > - > Mike W.L. Cheung Phone: (65) 6516-3702 > Department of Psychology Fax: (65) 6773-1843 > National University of Singapore > http://courses.nus.edu.sg/course/psycwlm/internet/ > - > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- - Mike W.L. Cheung Phone: (65) 6516-3702 Department of Psychology Fax: (65) 6773-1843 National University of Singapore http://courses.nus.edu.sg/course/psycwlm/internet/ - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Behaviors of diag() with character vector in R 3.0.0
Dear all, According to CHANGES IN R 3.0.0: o diag() as used to generate a diagonal matrix has been re-written in C for speed and less memory usage. It now forces the result to be numeric in the case diag(x) since it is said to have 'zero off-diagonal entries'. diag(x) does not work for character vector in R 3.0.0 any more. For example, v <- c("a", "b") ## R 2.15.3 diag(v) [,1] [,2] [1,] "a" "0" [2,] "0" "b" ## R 3.0.0 diag(v) [,1] [,2] [1,] NA0 [2,]0 NA Warning message: In diag(v) : NAs introduced by coercion Regarding the character matrix, it still works. For example, m <- matrix(c("a", "b", "c", "d"), nrow=2) diag(m) ## Both R 2.15.3 and 3.0.0 [1] "a" "d" n <- matrix(0, ncol=2, nrow=2) diag(n) <- v n ## Both R 2.15.3 and 3.0.0 [,1] [,2] [1,] "a" "0" [2,] "0" "b" I understand that the above behavior follows exactly what the manual says. It appears to me that the version in 2.15.3 is more general as it works for both numeric and character vectors and matrices, whereas the version in 3.0.0 works for character matrices but not character vectors. Would it be possible to retain the behaviors of diag() for character vectors? Thanks. Mike -- - Mike W.L. Cheung Phone: (65) 6516-3702 Department of Psychology Fax: (65) 6773-1843 National University of Singapore http://courses.nus.edu.sg/course/psycwlm/internet/ - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] prcomp() and varimax()
Thanks for the reply. Maybe my problem is that prcomp() and varimax() are calculating "cumulative proportion of variance" differently? When I use the tol parameter with prcomp(), it restricts the number of components to 3 and reports that the cumulative variance explained by the third component is 100%. But when I try to pass that 3-component analysis to varimax(), the cumulative variance of the third component drops to 20%. The cumulative proportion of variance explained by a component should not change following rotation, so it seems like it should be either 50% (as in the original 15 component model pca1) or else 75% (as in the smaller unrotated model pca2). But component 3 in the rotated model (pca3) has a value that is neither of those. I suspect maybe I am not using varimax() correctly? Especially because it doesn't make sense that all components in the rotated model (pca3) would explain an identical amount of variance- this is real data, so the first component should explain more variance than the second, and so on. Thanks for the help, Mike On 4/7/2013 6:38 AM, S Ellison wrote: >> My concern is with the reported proportions of variance for the 3 >> components after varimax rotation. It looks like each of my 3 components >> explains 1/15 of the total variance, summing to a cumulative proportion >> of 20% of variance explained. But those 3 components I retained should >> now be the only components in the analysis, so they should be able to >> account for 100% of the explained variance. > Am I misreading what you just wrote? One percentage (20%) is a proportion of > the total variance in the data; the other is the proportion of the variance > explained by the model. These are different things; they should not usually > be the same. > > *** > This email and any attachments are confidential. Any use, copying or > disclosure other than by the intended recipient is unauthorised. If > you have received this message in error, please notify the sender > immediately via +44(0)20 8943 7000 or notify postmas...@lgcgroup.com > and delete this message and any copies from your computer and network. > LGC Limited. Registered in England 2991879. > Registered office: Queens Road, Teddington, Middlesex, TW11 0LY, UK Hello, I am attempting to do a principal components analysis on 15 survey items. I want to use a varimax rotation on the retained components, but I am dubious of the output I am getting, and so I suspect I am doing something wrong. I proceed in the following steps: 1) use prcomp() to inspect all 15 components, and decide which to retain 2) run prcomp() again, using the "tol" parameter to omit unwanted components 3) pass the output of step 2 to varimax() My concern is with the reported proportions of variance for the 3 components after varimax rotation. It looks like each of my 3 components explains 1/15 of the total variance, summing to a cumulative proportion of 20% of variance explained. But those 3 components I retained should now be the only components in the analysis, so they should be able to account for 100% of the explained variance. I am able to get reliable seeming results using principal() from the "psych" package, in which the total amount of variance explained by my retained components does not differ before or after rotation. But principal() uses varimax(), so I suspect I am either doing something wrong or misinterpreting the output when using the base package functions. Am I doing something wrong when attempting to retain only 3 components? Am I using varimax() incorrectly? Am I misinterpreting the returned values from varimax()? Thanks for any help, Mike Here is a link to the data file I am using: https://www.dropbox.com/s/scypebzy0nnhlwk/pca_sampledata.txt ### step 1 ### > d1 = read.table("pca_sampledata.txt", T) > m1 = with(d1, ~ g.enjoy + g.look + g.cost + g.fit + g.health + g.resale + b.withstand + b.satisfy + b.vegetated + b.everyone + b.harmed + b.eco + b.ingenuity + b.security + b.proud) > pca1 = prcomp(m1) > summary(pca1) #output truncated for this posting Importance of components: PC1PC2PC3 PC4 PC5 ... PC15 Standard deviation 1.5531 1.3064 1.1695 0.93512 0.92167 ... 0.35500 Proportion of Variance 0.2199 0.1556 0.1247 0.07972 0.07744 ... 0.01149 Cumulative Proportion 0.2199 0.3755 0.5002 0.57988 0.65732 ... 1.0 ### step 2 ### > pca2 = prcomp(m1, tol=.75) > summary(pca2) #full output shown Importance of components: PC1PC2PC3 Standard deviation 1.5531 1.3064 1.1695 Proportion of Variance 0.4397 0.3111 0.2493 Cumulative Proportion 0.4397 0.7507 1. ### step 3 ### > pca3 = varimax(pca2$rotation) > pca3 >
[R] prcomp() and varimax()
Hello, I am attempting to do a principal components analysis on 15 survey items. I want to use a varimax rotation on the retained components, but I am dubious of the output I am getting, and so I suspect I am doing something wrong. I proceed in the following steps: 1) use prcomp() to inspect all 15 components, and decide which to retain 2) run prcomp() again, using the "tol" parameter to omit unwanted components 3) pass the output of step 2 to varimax() My concern is with the reported proportions of variance for the 3 components after varimax rotation. It looks like each of my 3 components explains 1/15 of the total variance, summing to a cumulative proportion of 20% of variance explained. But those 3 components I retained should now be the only components in the analysis, so they should be able to account for 100% of the explained variance. I am able to get reliable seeming results using principal() from the "psych" package, in which the total amount of variance explained by my retained components does not differ before or after rotation. But principal() uses varimax(), so I suspect I am either doing something wrong or misinterpreting the output when using the base package functions. Am I doing something wrong when attempting to retain only 3 components? Am I using varimax() incorrectly? Am I misinterpreting the returned values from varimax()? Thanks for any help, Mike Here is a link to the data file I am using: https://www.dropbox.com/s/scypebzy0nnhlwk/pca_sampledata.txt ### step 1 ### > d1 = read.table("pca_sampledata.txt", T) > m1 = with(d1, ~ g.enjoy + g.look + g.cost + g.fit + g.health + g.resale + b.withstand + b.satisfy + b.vegetated + b.everyone + b.harmed + b.eco + b.ingenuity + b.security + b.proud) > pca1 = prcomp(m1) > summary(pca1) #output truncated for this posting Importance of components: PC1PC2PC3 PC4 PC5 ...PC15 Standard deviation 1.5531 1.3064 1.1695 0.93512 0.92167 ... 0.35500 Proportion of Variance 0.2199 0.1556 0.1247 0.07972 0.07744 ... 0.01149 Cumulative Proportion 0.2199 0.3755 0.5002 0.57988 0.65732 ... 1.0 ### step 2 ### > pca2 = prcomp(m1, tol=.75) > summary(pca2) #full output shown Importance of components: PC1PC2PC3 Standard deviation 1.5531 1.3064 1.1695 Proportion of Variance 0.4397 0.3111 0.2493 Cumulative Proportion 0.4397 0.7507 1. ### step 3 ### > pca3 = varimax(pca2$rotation) > pca3 > ... > PC1 PC2 PC3 > SS loadings1.000 1.000 1.000 > Proportion Var 0.067 0.067 0.067 > Cumulative Var 0.067 0.133 0.200 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question: how to convert raw to numeric
I know that there is a function to convert binary data to string named rawToChar.but I wander is there any similar function for "Integer" and "float".I need to read some binary file in "integer" and "float" data. I can do this job in this way: (as below) first convert 4 byte raw to bits then pack bits back to "integer", but it not work for "float",I worry about the performance. raw4 = raw_buffer[1:4] bits = rawToBits(raw4) packBits(bits, type = "integer") Best wishes Really hope to get your response [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding the knots in a smoothing spline using nknots
Thanks, David! That makes sense. I shall re-read the manual page again. Regards, Mike Nielsen On Wed, Feb 27, 2013 at 12:19 PM, David Winsemius wrote: > > On Feb 27, 2013, at 6:39 AM, Mike Nielsen wrote: > > > Hi r-helpers. > > > > Please forgive my ignorance, but I would like to plot a smoothing spline > > (smooth.spline) from package "stats", and show the knots in the plot, > and I > > can't seem to figure out where smooth.spline has located the knots (when > I > > use nknots). Unfortunately, I don't know a lot about splines, but I know > > that they provide me an easy way to estimate the location of local maxima > > and minima on varying time-scales (number of knots) in my original data. > > > > I see there is a fit$knot, but it's not clear to me what those values > are: > > for some reason I had expected that they would be contained in my > original > > y values, but they're not. > > It appears they are in the range of [0-1] and the ss$fit$min and > ss$fit$range provide the scaling data ( for the x-values rather than > the y-values): > > > unique(ss$fit$knot) > [1] 0. 0.04095904 0.08291708 0.12487512 0.16583417 0.20779221 > 0.24975025 0.29070929 > [9] 0.33266733 0.37462537 0.41658342 0.45754246 0.49950050 0.54145854 > 0.58241758 0.62437562 > [17] 0.66633367 0.70829171 0.74925075 0.79120879 0.83316683 0.87412587 > 0.91608392 0.95804196 > [25] 1. > > I would think that in your case with x0 being 0 you could just use > ss$fit$range*unique(ss$fit$knot) as your knot positions. In the more > geneneral case you would need to add ss$fit$min. I tried confirming this > hunch by looking "statiscal Models in S", inMASSe4, and at the R code but > the R code calls a FORTRAN routine, so you would need to pull the source to > confirm. > > -- > David. > > > I tried generating nknots equally spaced points > > in my x, but when I plotted the points that corresponded to my original y > > values at those equally-spaced x values, I found that the spline did not > > pass through them, which, perhaps naively, I thought it might. > > > > Also, the manual says that yin comprises "the y values used at the > unique y > > values" -- should this read "at the unique x values"? > > > > Could someone kindly point to a resource where I can get a slightly > fuller > > explanation? I looked at the code for smooth.spline, but can't readily > > follow it. > > > > Here's a toy example: > > > >> x<-seq(from=0,to=4*pi,length=1002) > >> y<-sin(x) > >> ss<-smooth.spline(x,y=y,all.knots=F,nknots=25) > >> ss > > Call: > > smooth.spline(x = x, y = y, all.knots = F, nknots = 25) > > > > Smoothing Parameter spar= -0.4573636 lambda= 1.006117e-09 (14 > iterations) > > Equivalent Degrees of Freedom (Df): 26.99935 > > Penalized Criterion: 3.027077e-06 > > GCV: 3.190666e-09 > >> str(ss) > > List of 15 > > $ x : num [1:1002] 0 0.0126 0.0251 0.0377 0.0502 ... > > $ y : num [1:1002] 2.88e-05 1.26e-02 2.51e-02 3.77e-02 5.02e-02 ... > > $ w : num [1:1002] 1 1 1 1 1 1 1 1 1 1 ... > > $ yin : num [1:1002] 0 0.0126 0.0251 0.0377 0.0502 ... > > $ data:List of 3 > > ..$ x: num [1:1002] 0 0.0126 0.0251 0.0377 0.0502 ... > > ..$ y: num [1:1002] 0 0.0126 0.0251 0.0377 0.0502 ... > > ..$ w: num [1:1002] 1 1 1 1 1 1 1 1 1 1 ... > > $ lev : num [1:1002] 0.2238 0.177 0.1399 0. 0.0891 ... > > $ cv.crit : num 3.19e-09 > > $ pen.crit: num 3.03e-06 > > $ crit: num 3.19e-09 > > $ df : num 27 > > $ spar: num -0.457 > > $ lambda : num 1.01e-09 > > $ iparms : Named int [1:3] 1 0 14 > > ..- attr(*, "names")= chr [1:3] "icrit" "ispar" "iter" > > $ fit :List of 5 > > ..$ knot : num [1:31] 0 0 0 0 0.041 ... > > ..$ nk : num 27 > > ..$ min : num 0 > > ..$ range: num 12.6 > > ..$ coef : num [1:27] 2.88e-05 1.72e-01 5.19e-01 9.04e-01 1.05 ... > > ..- attr(*, "class")= chr "smooth.spline.fit" > > $ call: language smooth.spline(x = x, y = y, all.knots = F, nknots = > > 25) > > - attr(*, "class")= chr "smooth.spline" > >> > > > > Many thanks! > > > > > > Regards, > > > > Mike Nielsen > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] solving x in a polynomial function
Doh- I'm a moron. I get it now. The last line is the confirmation that function "realroots" is working. Sorry- late in the day on a friday. Thanks everyone for your help with this- Uber-useful, and much appreciated. Mike On 3/1/13, Mike Rennie wrote: > Hi Peter, > > With the edit you suggested, now I'm just getting back the value of a > that I put in, not the expected value of b... > >> cbind(a,b) >a b > [1,] 1 1.0 > [2,] 2 2.0 > [3,] 3 2.5 > [4,] 4 3.0 > [5,] 5 3.5 > [6,] 6 4.0 > [7,] 7 6.0 > [8,] 8 7.0 > [9,] 9 7.5 > [10,] 10 8.0 > > #a of 5 should return b of 3.5 > >> realroots <- function(model, b){ > + is.zero <- function(x, tol = .Machine$double.eps^0.5) abs(x) < > tol > + if(names(coef(model))[1] == "(Intercept)") > + > + r <- polyroot(c(coef(model)[1] - b, coef(model)[-1])) > + else > + r <- polyroot(c(-b, coef(model))) > + Re(r[is.zero(Im(r))]) > + } >> >> r <- realroots(po.lm, 5) >> predict(po.lm, newdata = data.frame(b = r)) > 1 2 > 5 5 > > > This function just returns what I feed it as written. > > Mike > > On 3/1/13, Peter Ehlers wrote: >> On 2013-03-01 13:06, Mike Rennie wrote: >>> Hi guys, >>> >>> Perhaps on the right track? However, not sure if it's correct. I fixed >>> the bug that A.K. indicated above (== vs =), but the values don't seem >>> right. From the original data, an a of 3 should give b of 2.5. >>> >>>> realroots <- function(model, b){ >>> + is.zero <- function(x, tol = .Machine$double.eps^0.5) abs(x) < >>> tol >>> + if(names(model)[1] == "(Intercept)") >>> + r <- polyroot(c(coef(model)[1] - b, coef(model)[-1])) >>> + else >>> + r <- polyroot(c(-b, coef(model))) >>> + Re(r[is.zero(Im(r))]) >>> + } >>>> >>>> r <- realroots(po.lm, 3) >>>> predict(po.lm, newdata = data.frame(b = r)) # confirm >>> 1 >>> 1.69 >>> >>> So I think there's a calculation error somehwere. >> >> You need to replace the following line >> >>if(names(model)[1] == "(Intercept)") >> >> with >> >>if(names(coef(model))[1] == "(Intercept)") >> >> >> Peter Ehlers >> >> >>> >>> >>> On 3/1/13, arun wrote: >>>> Hi Rui, >>>> >>>> Looks like a bug: >>>> ###if(names(model)[1] = "(Intercept)") >>>> Should this be: >>>> >>>> if(names(model)[1] == "(Intercept)") >>>> >>>> A.K. >>>> >>>> >>>> >>>> - Original Message - >>>> From: Rui Barradas >>>> To: Mike Rennie >>>> Cc: r-help Mailing List >>>> Sent: Friday, March 1, 2013 3:18 PM >>>> Subject: Re: [R] solving x in a polynomial function >>>> >>>> Hello, >>>> >>>> Try the following. >>>> >>>> >>>> a <- 1:10 >>>> b <- c(1, 2, 2.5, 3, 3.5, 4, 6, 7, 7.5, 8) >>>> >>>> dat <- data.frame(a = a, b = b) # for lm(), it's better to use a df >>>> po.lm <- lm(a~b+I(b^2)+I(b^3)+I(b^4), data = dat); summary(po.lm) >>>> >>>> >>>> realroots <- function(model, b){ >>>> is.zero <- function(x, tol = .Machine$double.eps^0.5) abs(x) < tol >>>> if(names(model)[1] = "(Intercept)") >>>> r <- polyroot(c(coef(model)[1] - b, coef(model)[-1])) >>>> else >>>> r <- polyroot(c(-b, coef(model))) >>>> Re(r[is.zero(Im(r))]) >>>> } >>>> >>>> r <- realroots(po.lm, 5.5) >>>> predict(po.lm, newdata = data.frame(b = r)) # confirm >>>> >>>> >>>> >>>> Hope this helps, >>>> >>>> Rui Barradas >>>> >>>> Em 01-03-2013 18:47, Mike Rennie escreveu: >>>>> Hi there, >>>>> >>>>> Does anyone know how I solve for x from a given y in a polynomial >>>>> function? Here's some example code: >>>>> >>>>> ##example file >>>>> >>>>>
Re: [R] solving x in a polynomial function
Hi Peter, With the edit you suggested, now I'm just getting back the value of a that I put in, not the expected value of b... > cbind(a,b) a b [1,] 1 1.0 [2,] 2 2.0 [3,] 3 2.5 [4,] 4 3.0 [5,] 5 3.5 [6,] 6 4.0 [7,] 7 6.0 [8,] 8 7.0 [9,] 9 7.5 [10,] 10 8.0 #a of 5 should return b of 3.5 > realroots <- function(model, b){ + is.zero <- function(x, tol = .Machine$double.eps^0.5) abs(x) < tol + if(names(coef(model))[1] == "(Intercept)") + + r <- polyroot(c(coef(model)[1] - b, coef(model)[-1])) + else + r <- polyroot(c(-b, coef(model))) + Re(r[is.zero(Im(r))]) + } > > r <- realroots(po.lm, 5) > predict(po.lm, newdata = data.frame(b = r)) 1 2 5 5 This function just returns what I feed it as written. Mike On 3/1/13, Peter Ehlers wrote: > On 2013-03-01 13:06, Mike Rennie wrote: >> Hi guys, >> >> Perhaps on the right track? However, not sure if it's correct. I fixed >> the bug that A.K. indicated above (== vs =), but the values don't seem >> right. From the original data, an a of 3 should give b of 2.5. >> >>> realroots <- function(model, b){ >> + is.zero <- function(x, tol = .Machine$double.eps^0.5) abs(x) < >> tol >> + if(names(model)[1] == "(Intercept)") >> + r <- polyroot(c(coef(model)[1] - b, coef(model)[-1])) >> + else >> + r <- polyroot(c(-b, coef(model))) >> + Re(r[is.zero(Im(r))]) >> + } >>> >>> r <- realroots(po.lm, 3) >>> predict(po.lm, newdata = data.frame(b = r)) # confirm >> 1 >> 1.69 >> >> So I think there's a calculation error somehwere. > > You need to replace the following line > >if(names(model)[1] == "(Intercept)") > > with > >if(names(coef(model))[1] == "(Intercept)") > > > Peter Ehlers > > >> >> >> On 3/1/13, arun wrote: >>> Hi Rui, >>> >>> Looks like a bug: >>> ###if(names(model)[1] = "(Intercept)") >>> Should this be: >>> >>> if(names(model)[1] == "(Intercept)") >>> >>> A.K. >>> >>> >>> >>> - Original Message - >>> From: Rui Barradas >>> To: Mike Rennie >>> Cc: r-help Mailing List >>> Sent: Friday, March 1, 2013 3:18 PM >>> Subject: Re: [R] solving x in a polynomial function >>> >>> Hello, >>> >>> Try the following. >>> >>> >>> a <- 1:10 >>> b <- c(1, 2, 2.5, 3, 3.5, 4, 6, 7, 7.5, 8) >>> >>> dat <- data.frame(a = a, b = b) # for lm(), it's better to use a df >>> po.lm <- lm(a~b+I(b^2)+I(b^3)+I(b^4), data = dat); summary(po.lm) >>> >>> >>> realroots <- function(model, b){ >>> is.zero <- function(x, tol = .Machine$double.eps^0.5) abs(x) < tol >>> if(names(model)[1] = "(Intercept)") >>> r <- polyroot(c(coef(model)[1] - b, coef(model)[-1])) >>> else >>> r <- polyroot(c(-b, coef(model))) >>> Re(r[is.zero(Im(r))]) >>> } >>> >>> r <- realroots(po.lm, 5.5) >>> predict(po.lm, newdata = data.frame(b = r)) # confirm >>> >>> >>> >>> Hope this helps, >>> >>> Rui Barradas >>> >>> Em 01-03-2013 18:47, Mike Rennie escreveu: >>>> Hi there, >>>> >>>> Does anyone know how I solve for x from a given y in a polynomial >>>> function? Here's some example code: >>>> >>>> ##example file >>>> >>>> a<-1:10 >>>> >>>> b<-c(1,2,2.5,3,3.5,4,6,7,7.5,8) >>>> >>>> po.lm<-lm(a~b+I(b^2)+I(b^3)+I(b^4)); summary(po.lm) >>>> >>>> (please ignore that the model is severely overfit- that's not the >>>> point). >>>> >>>> Let's say I want to solve for the value b where a = 5.5. >>>> >>>> Any thoughts? I did come across the polynom package, but I don't think >>>> that does it- I suspect the answer is simpler than I am making it out >>>> to be. Any help would be welcome. >>>> >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > > -- Michael Rennie, Research Scientist Fisheries and Oceans Canada, Freshwater Institute Winnipeg, Manitoba, CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] solving x in a polynomial function
Hi guys, Perhaps on the right track? However, not sure if it's correct. I fixed the bug that A.K. indicated above (== vs =), but the values don't seem right. From the original data, an a of 3 should give b of 2.5. > realroots <- function(model, b){ + is.zero <- function(x, tol = .Machine$double.eps^0.5) abs(x) < tol + if(names(model)[1] == "(Intercept)") + r <- polyroot(c(coef(model)[1] - b, coef(model)[-1])) + else + r <- polyroot(c(-b, coef(model))) + Re(r[is.zero(Im(r))]) + } > > r <- realroots(po.lm, 3) > predict(po.lm, newdata = data.frame(b = r)) # confirm 1 1.69 So I think there's a calculation error somehwere. On 3/1/13, arun wrote: > Hi Rui, > > Looks like a bug: > ###if(names(model)[1] = "(Intercept)") > Should this be: > > if(names(model)[1] == "(Intercept)") > > A.K. > > > > - Original Message - > From: Rui Barradas > To: Mike Rennie > Cc: r-help Mailing List > Sent: Friday, March 1, 2013 3:18 PM > Subject: Re: [R] solving x in a polynomial function > > Hello, > > Try the following. > > > a <- 1:10 > b <- c(1, 2, 2.5, 3, 3.5, 4, 6, 7, 7.5, 8) > > dat <- data.frame(a = a, b = b) # for lm(), it's better to use a df > po.lm <- lm(a~b+I(b^2)+I(b^3)+I(b^4), data = dat); summary(po.lm) > > > realroots <- function(model, b){ > is.zero <- function(x, tol = .Machine$double.eps^0.5) abs(x) < tol > if(names(model)[1] = "(Intercept)") > r <- polyroot(c(coef(model)[1] - b, coef(model)[-1])) > else > r <- polyroot(c(-b, coef(model))) > Re(r[is.zero(Im(r))]) > } > > r <- realroots(po.lm, 5.5) > predict(po.lm, newdata = data.frame(b = r)) # confirm > > > > Hope this helps, > > Rui Barradas > > Em 01-03-2013 18:47, Mike Rennie escreveu: >> Hi there, >> >> Does anyone know how I solve for x from a given y in a polynomial >> function? Here's some example code: >> >> ##example file >> >> a<-1:10 >> >> b<-c(1,2,2.5,3,3.5,4,6,7,7.5,8) >> >> po.lm<-lm(a~b+I(b^2)+I(b^3)+I(b^4)); summary(po.lm) >> >> (please ignore that the model is severely overfit- that's not the point). >> >> Let's say I want to solve for the value b where a = 5.5. >> >> Any thoughts? I did come across the polynom package, but I don't think >> that does it- I suspect the answer is simpler than I am making it out >> to be. Any help would be welcome. >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Michael Rennie, Research Scientist Fisheries and Oceans Canada, Freshwater Institute Winnipeg, Manitoba, CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] solving x in a polynomial function
Hi there, Does anyone know how I solve for x from a given y in a polynomial function? Here's some example code: ##example file a<-1:10 b<-c(1,2,2.5,3,3.5,4,6,7,7.5,8) po.lm<-lm(a~b+I(b^2)+I(b^3)+I(b^4)); summary(po.lm) (please ignore that the model is severely overfit- that's not the point). Let's say I want to solve for the value b where a = 5.5. Any thoughts? I did come across the polynom package, but I don't think that does it- I suspect the answer is simpler than I am making it out to be. Any help would be welcome. -- Michael Rennie, Research Scientist Fisheries and Oceans Canada, Freshwater Institute Winnipeg, Manitoba, CANADA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding the knots in a smoothing spline using nknots
Hi r-helpers. Please forgive my ignorance, but I would like to plot a smoothing spline (smooth.spline) from package "stats", and show the knots in the plot, and I can't seem to figure out where smooth.spline has located the knots (when I use nknots). Unfortunately, I don't know a lot about splines, but I know that they provide me an easy way to estimate the location of local maxima and minima on varying time-scales (number of knots) in my original data. I see there is a fit$knot, but it's not clear to me what those values are: for some reason I had expected that they would be contained in my original y values, but they're not. I tried generating nknots equally spaced points in my x, but when I plotted the points that corresponded to my original y values at those equally-spaced x values, I found that the spline did not pass through them, which, perhaps naively, I thought it might. Also, the manual says that yin comprises "the y values used at the unique y values" -- should this read "at the unique x values"? Could someone kindly point to a resource where I can get a slightly fuller explanation? I looked at the code for smooth.spline, but can't readily follow it. Here's a toy example: > x<-seq(from=0,to=4*pi,length=1002) > y<-sin(x) > ss<-smooth.spline(x,y=y,all.knots=F,nknots=25) > ss Call: smooth.spline(x = x, y = y, all.knots = F, nknots = 25) Smoothing Parameter spar= -0.4573636 lambda= 1.006117e-09 (14 iterations) Equivalent Degrees of Freedom (Df): 26.99935 Penalized Criterion: 3.027077e-06 GCV: 3.190666e-09 > str(ss) List of 15 $ x : num [1:1002] 0 0.0126 0.0251 0.0377 0.0502 ... $ y : num [1:1002] 2.88e-05 1.26e-02 2.51e-02 3.77e-02 5.02e-02 ... $ w : num [1:1002] 1 1 1 1 1 1 1 1 1 1 ... $ yin : num [1:1002] 0 0.0126 0.0251 0.0377 0.0502 ... $ data:List of 3 ..$ x: num [1:1002] 0 0.0126 0.0251 0.0377 0.0502 ... ..$ y: num [1:1002] 0 0.0126 0.0251 0.0377 0.0502 ... ..$ w: num [1:1002] 1 1 1 1 1 1 1 1 1 1 ... $ lev : num [1:1002] 0.2238 0.177 0.1399 0. 0.0891 ... $ cv.crit : num 3.19e-09 $ pen.crit: num 3.03e-06 $ crit: num 3.19e-09 $ df : num 27 $ spar: num -0.457 $ lambda : num 1.01e-09 $ iparms : Named int [1:3] 1 0 14 ..- attr(*, "names")= chr [1:3] "icrit" "ispar" "iter" $ fit :List of 5 ..$ knot : num [1:31] 0 0 0 0 0.041 ... ..$ nk : num 27 ..$ min : num 0 ..$ range: num 12.6 ..$ coef : num [1:27] 2.88e-05 1.72e-01 5.19e-01 9.04e-01 1.05 ... ..- attr(*, "class")= chr "smooth.spline.fit" $ call: language smooth.spline(x = x, y = y, all.knots = F, nknots = 25) - attr(*, "class")= chr "smooth.spline" > Many thanks! Regards, Mike Nielsen [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple - Finding vector in a vector
Great, works perfekt now! Thanks for your fast help! Nico 2012/10/8 David L Carlson : > Building on Jessica and Steve's use of embed: > >> X <- c(NA, 1, NA, 1, 1, 1, 1, 1, 1, NA, 1) >> Z <- rowSums(embed(X, 3)) >> Z[is.na(Z)] <- 1 >> Z > [1] 1 1 1 3 3 3 3 1 1 > > -- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77843-4352 > >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- >> project.org] On Behalf Of Jessica Streicher >> Sent: Monday, October 08, 2012 9:19 AM >> To: Mike Spam >> Cc: r-help@r-project.org >> Subject: Re: [R] Simple - Finding vector in a vector >> >> > x<-c(NA , 1 ,NA, 1 , 1 , 1 , 1 , 1 ,1 ,NA , 1) >> > embed(x,3) >> [,1] [,2] [,3] >> [1,] NA1 NA >> [2,]1 NA1 >> [3,]11 NA >> [4,]111 >> [5,]111 >> [6,]111 >> [7,]111 >> [8,] NA11 >> [9,]1 NA1 >> >> > which(rowSums(embed(x,3))==3) >> [1] 4 5 6 7 >> >> gives you the first of the 3 >> >> On 08.10.2012, at 15:38, Mike Spam wrote: >> >> > Sorry, i just realized, that it output the sum of all vectors. I can >> > work with this function but it would be much faster and easier if it >> > would be possible to get the positions of evry match. >> > >> > example: >> > >> > NA 1 NA 1 1 1 1 1 1 NA 1 >> > >> > rle returns >> > lengths: int [1:6] 1 1 1 6 1 1 >> > >> > what i need would be something like, >> > 1 1 1 3 3 3 3 1 1 >> > >> > but anyway i can work with rle, if there is no suitable function. >> > >> > thanks, >> > Nico >> > >> > >> > >> > 2012/10/8 Mike Spam : >> >> Hey Rui, >> >> >> >> Perfect! Thanks!! :) >> >> >> >> Nico >> >> >> >> 2012/10/8 Rui Barradas : >> >>> Hello, >> >>> >> >>> See ?rle >> >>> >> >>> Hope this helps, >> >>> >> >>> Rui Barradas >> >>> Em 08-10-2012 13:55, Mike Spam escreveu: >> >>>> >> >>>> Hi, >> >>>> >> >>>> just a simple question. >> >>>> Assumed i have a vector, >> >>>> >> >>>> FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE >> >>>> or >> >>>> NA 1 1 1 NA 1 NA 1 NA >> >>>> >> >>>> what i need is the position where an element is the same - three >> (or >> >>>> in general multiple) times in a row. >> >>>> >> >>>> in this case: i want to get the position where it is TRUE TRUE >> TRUE or 1 1 >> >>>> 1 >> >>>> it doesn't matter if it is the first, middle or last element. So >> the >> >>>> output could be 2 or 3 or 4 >> >>>> >> >>>> My idea would be to lag the vector and calc differences... but i >> would >> >>>> prefer any build in (or time saving) function. :) >> >>>> >> >>>> thanks, >> >>>> >> >>>> Nico >> >>>> >> >>>> __ >> >>>> R-help@r-project.org mailing list >> >>>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>>> PLEASE do read the posting guide >> >>>> http://www.R-project.org/posting-guide.html >> >>>> and provide commented, minimal, self-contained, reproducible code. >> >>> >> >>> >> > >> > __ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple - Finding vector in a vector
Sorry, i just realized, that it output the sum of all vectors. I can work with this function but it would be much faster and easier if it would be possible to get the positions of evry match. example: NA 1 NA 1 1 1 1 1 1 NA 1 rle returns lengths: int [1:6] 1 1 1 6 1 1 what i need would be something like, 1 1 1 3 3 3 3 1 1 but anyway i can work with rle, if there is no suitable function. thanks, Nico 2012/10/8 Mike Spam : > Hey Rui, > > Perfect! Thanks!! :) > > Nico > > 2012/10/8 Rui Barradas : >> Hello, >> >> See ?rle >> >> Hope this helps, >> >> Rui Barradas >> Em 08-10-2012 13:55, Mike Spam escreveu: >>> >>> Hi, >>> >>> just a simple question. >>> Assumed i have a vector, >>> >>> FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE >>> or >>> NA 1 1 1 NA 1 NA 1 NA >>> >>> what i need is the position where an element is the same - three (or >>> in general multiple) times in a row. >>> >>> in this case: i want to get the position where it is TRUE TRUE TRUE or 1 1 >>> 1 >>> it doesn't matter if it is the first, middle or last element. So the >>> output could be 2 or 3 or 4 >>> >>> My idea would be to lag the vector and calc differences... but i would >>> prefer any build in (or time saving) function. :) >>> >>> thanks, >>> >>> Nico >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simple - Finding vector in a vector
Hey Rui, Perfect! Thanks!! :) Nico 2012/10/8 Rui Barradas : > Hello, > > See ?rle > > Hope this helps, > > Rui Barradas > Em 08-10-2012 13:55, Mike Spam escreveu: >> >> Hi, >> >> just a simple question. >> Assumed i have a vector, >> >> FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE >> or >> NA 1 1 1 NA 1 NA 1 NA >> >> what i need is the position where an element is the same - three (or >> in general multiple) times in a row. >> >> in this case: i want to get the position where it is TRUE TRUE TRUE or 1 1 >> 1 >> it doesn't matter if it is the first, middle or last element. So the >> output could be 2 or 3 or 4 >> >> My idea would be to lag the vector and calc differences... but i would >> prefer any build in (or time saving) function. :) >> >> thanks, >> >> Nico >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple - Finding vector in a vector
Hi, just a simple question. Assumed i have a vector, FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE or NA 1 1 1 NA 1 NA 1 NA what i need is the position where an element is the same - three (or in general multiple) times in a row. in this case: i want to get the position where it is TRUE TRUE TRUE or 1 1 1 it doesn't matter if it is the first, middle or last element. So the output could be 2 or 3 or 4 My idea would be to lag the vector and calc differences... but i would prefer any build in (or time saving) function. :) thanks, Nico __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R process must die - can I save history? [SOLVED]
This is a solution for UNIX/Linux-type OS users and a lot of it is only related to R in the sense that it can allow you to reconnect to an R session. I managed to do it and to save history, thanks to helpful messages from Ista Zahn and Brian Ripley. To explain this, I will call my two Linux boxes Desktop and Server. I logged into Desktop, an Ubuntu box, and ran this command: ps aux | grep ssh That showed me these ssh processes and a few extraneous things: USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND mbmiller 3520 0.0 0.0 46944 4668 pts/5S+ Jul25 0:23 ssh -X Server mbmiller 4602 0.0 0.0 47720 5544 pts/9S+ Aug09 1:07 ssh -X Server mbmiller 25614 0.0 0.0 45584 3344 pts/11 S+ Sep24 0:00 ssh -X Server I launched an xterm from which I would try to recapture those ssh sessions. (Spoiler: This worked for every ssh process.) I was missing the reptyr binary, so I installed it like so: sudo apt-get install reptyr The reptyr man page told me that I had to do this to allow reptyr to work (I could change the 0 back to 1 after finishing): $ sudo su root# echo 0 > /proc/sys/kernel/yama/ptrace_scope root# exit After that I just ran reptyr for each process, for example: reptyr 3520 A few lines of text would appear (the last saying "Set the controlling tty"), I'd hit "enter" and it would give my my command prompt from my old shell, or the R prompt if R was running in that shell. I was then able to save command histories, etc. When I tried to exit from R using q("yes") it accepted the command, but it did not return my bash prompt. To deal with that, I tried Brian Ripley's recommendation: I logged into Server via ssh and ran this command: ps aux | grep R Which returned this along with some irrelevant stuff: USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND mbmiller 5156 0.0 0.1 213188 9244 pts/1S+ Aug06 1:00 /share/apps/R-2.15.1/lib64/R/bin/exec/R -q I tried the kill command... kill -USR1 5156 ...and that returned my bash prompt immediately in the other xterm. R was done, the .Rhistory looked perfect, and .RData also was there. I logged into Server, went to the appropriate directory, ran R and found that all of the objects were there and working correctly. So that was amazing. I could reattach to the R session and also kill it without losing history and data. This is a big deal for me because I get stuck like that about once a year and it's always a huge pain. Mike On Tue, 2 Oct 2012, Ista Zahn wrote (off-list): If you can find the process ID you can try connecting the process to another terminal with reptyr (https://github.com/nelhage/reptyr), and then just use savehistory() as usual. You don't say what flavor of Linux you're dealing with, but it looks like there are packaged versions for at least Ubuntu, Fedora, and Arch. On Tue, 2 Oct 2012, Prof Brian Ripley wrote: Maybe not. On a Unix-alike see ?Signals. If you can find the pid of the R process and it is still running (and not e.g. suspended), kill -USR1 will save the workspace and history. Original query: On Tue, 2 Oct 2012, Mike Miller wrote: I connected from my desktop Linux box to a Linux server using ssh in an xterm, but that xterm was running in Xvnc. I'm running R on the server in that xterm (over ssh). Something went wrong with Xvnc that has caused it to hang, probably this bug: https://bugs.launchpad.net/ubuntu/+source/vnc4/+bug/819473 So I can't get back to that ssh session or to R. I had done a bunch of work in R but the command history hasn't been written out. If I kill R, I assume the command history is gone. I wish I could somehow cause R to dump the command history. Is there any way to tell the running R process to write the history somewhere? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R process must die - can I save history?
I connected from my desktop Linux box to a Linux server using ssh in an xterm, but that xterm was running in Xvnc. I'm running R on the server in that xterm (over ssh). Something went wrong with Xvnc that has caused it to hang, probably this bug: https://bugs.launchpad.net/ubuntu/+source/vnc4/+bug/819473 So I can't get back to that ssh session or to R. I had done a bunch of work in R but the command history hasn't been written out. If I kill R, I assume the command history is gone. I wish I could somehow cause R to dump the command history. Is there any way to tell the running R process to write the history somewhere? Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] effective way to return only the first argument of "which()"
Thank you very much, especially Milan and Bert! I will do some speedtests and fit the function to my needs. I think the best way would be a modified function in C... But i am not familiar enough with C. Perhaps this would be a simple but useful extension. If someone has a solution, i would appreciate a post in this mailing list. Cheers and thanks to all, Nico 2012/9/19 Bert Gunter : > Well, following up on this observation, which can be put under the > heading of "Sometimes vectorization can be much slower than explicit > loops" , I offer the following: > > firsti <-function(x,k) > { > i <- 1 > while(x[i]<=k){i <- i+1} > i > } > >> system.time(for(i in 1:100)which(x>.99)[1]) >user system elapsed >19.1 2.422.2 > >> system.time(for(i in 1:100)which.max(x>.99)) >user system elapsed > 30.456.75 37.46 > >> system.time(for(i in 1:100)firsti(x,.99)) >user system elapsed >0.030.000.03 > > ## About a 500 - 1000 fold speedup ! > >> firsti(x,.99) > [1] 122 > > It doesn't seem to scale too badly, either (whatever THAT means!): > (Of course, the which() versions are essentially identical in timing, > and so are omitted) > >> system.time(for(i in 1:100)firsti(x,.)) >user system elapsed >2.700.002.72 > >> firsti(x,.) > [1] 18200 > > Of course, at some point, the explicit looping is awful -- with k = > .99, the index was about 36, and the timing test took 54 > seconds. > > So I guess the point is -- as always -- that the optimal approach > depends on the nature of the data. Prudence and robustness clearly > demands the vectorized which() approaches if you have no information. > But if you do know something about the data, then you can often write > much faster tailored solutions. Which is hardly revelatory, of course. > > Cheers to all, > Bert > > On Wed, Sep 19, 2012 at 8:55 AM, Milan Bouchet-Valat > wrote: >> Le mercredi 19 septembre 2012 à 15:23 +, William Dunlap a écrit : >>> The original method is faster than which.max for longish numeric vectors >>> (in R-2.15.1), but you should check time and memory usage on your >>> own machine: >>> >>> > x <- runif(18e6) >>> > system.time(for(i in 1:100)which(x>0.99)[1]) >>>user system elapsed >>> 11.641.05 12.70 >>> > system.time(for(i in 1:100)which.max(x>0.99)) >>>user system elapsed >>> 16.382.94 19.35 >> If you the probability that such an element appears at the beginning of >> the vector, a custom hack might well be more efficient. The problem with >> ">", which() and which.max() is that they will go over all the elements >> of the vector even if it's not needed at all. So you can start with a >> small subset of the vector, and increase its size in a few steps until >> you find the value you're looking for. >> >> Proof of concept (the values of n obviously need to be adapted): >> x <-runif(1e7) >> >> find <- function(x, lim) { >> len <- length(x) >> >> for(n in 2^(14:0)) { >> val <- which(x[seq.int(1L, len/n)] > lim) >> >> if(length(val) > 0) return(val[1]) >> } >> >> return(NULL) >> } >> >>> system.time(for(i in 1:100)which(x>0.999)[1]) >> utilisateur système écoulé >> 9.740 5.795 15.890 >>> system.time(for(i in 1:100)which.max(x>0.999)) >> utilisateur système écoulé >> 14.288 9.510 24.562 >>> system.time(for(i in 1:100)find(x, .999)) >> utilisateur système écoulé >> 0.017 0.002 0.019 >>> find(x, .999) >> [1] 1376 >> >> (Looks almost like cheating... ;-) >> >> >> >> >> >>> Bill Dunlap >>> Spotfire, TIBCO Software >>> wdunlap tibco.com >>> >>> >>> > -Original Message- >>> > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >>> > On Behalf >>> > Of Jeff Newmiller >>> > Sent: Wednesday, September 19, 2012 8:06 AM >>> > To: Mike Spam; r-help@r-project.org >>> > Subject: Re: [R] effective way to return only the first argument of >>> > "which()" >>> > >>> > ?which.max >>> > --- >>> > Jeff NewmillerThe
Re: [R] effective way to return only the first argument of "which()"
Hi, Thanks Michael, but i think this is even slower. x <-sample(2000) which(x < 5)[1] which.max(x < 5) system.time(for(i in 1:100) which.max(x < 5)) User System verstrichen 60.84 13.70 86.33 system.time(for(i in 1:100) which(x < 5)[1]) User System verstrichen 40.458.25 48.95 Thanks, Nico __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] effective way to return only the first argument of "which()"
Hi, I was looking for a function like "which()" but only returns the first argument. Compare: x <- c(1,2,3,4,5,6) y <- 4 which(x>y) returns: 5,6 which(x>y)[1] returns: 5 which(x>y)[1] is exactly what i need. I did use this but the dataset is too big (~18 mio. Points). That's why i need a more effective way to get the first element of a vector which is bigger/smaller than a specific number. I found "match()" but this function only works for equal numbers. Thanks, Nico __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Compiling R2.15.1 on ubuntu with x86-64 architecture and shared library
I am sure I am providing insufficient information, please ask for more. I installed R 2.14.2 on my Ubuntu laptop with and AMD64 processor and also installed RStudio and everything worked fine. Now, I tried to build R 2.15.1 from source and installed it using defaults. RStudio now complained that R was not built as a shared library. Went back and uninstalled, and configured with -enable-R-shlib Now make fails at: /usr/bin/ld: CConverters.o: relocation R_X86_64_325 against '.rodata' can not be used when making a shared object; recompile with -fPIC Being new to Linux, I have no idea how to recompile with -fPIC Any help would be appreciated. Mike W. Michael Conklin Executive Vice President | GfK Marketing Science | Consumer Experiences North America GfK Custom Research, LLC | 8401 Golden Valley Road | Minneapolis, MN, 55427 T +1 763 417 4545 | M +1 612 567 8287 www.gfk.com<http://www.gfk.com/> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] split plot experiment with a repeated measure
Hello, I would like advice on if and how I might viably analyse this dataset: The experiment has 3 blocks with a three level main plot treatment, split to a two level treatment. In this case most parameters are also measured repeatedly at 1-3 monthly intervals. Ideally I would like to test the full response~treatment_1*treatment_2*sample_time model, but don't know what the structure should be, or even whether this is a viable analysis. I should probably add that neither the main or sub plots are randomised. Thanks Mike Palmer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Setwd to a directory that requires authentication
I work at a company where we log on to windows using a username and password. There is a global server with files that I need to use R to do some analysis on. That server requires my windows credentials to logon. When I access the server from internet explorer it automatically uses my windows credentials to logon. When I use R it doesn't work. I have tried: setInternet2(TRUE) con = url(description="http://username:passw...@server.net";, open="r") open(con) which gave me a timeout error. I have also tried: setwd("server.net") which gives me the error: Error in setwd("server.net") : cannot change working directory [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Pseudo R squared in gls model
On Thu, 23 Aug 2012, Gary Dong wrote: I'm wondering if the gls function reports pseudo R. I do not see it by summary(). If the package does not report, can I calculate it in this way? Adjusted pseudo R squared = 1 - [(Loglik(beta) - k ) / Loglik(null)] where k is the number of IVs. We've been using this R² instead, but I'm not sure if it is available in any package: Buse, A. (1973), "Goodness of Fit in Generalized Least Squares Estimation," American Statistician, 27, 106-108. http://www.jstor.org/stable/10.2307/2683631 Equation 15. Make sure to compute y-bar correctly (it is the regression coefficient in the intercept-only model). To do GLS, we have been using OLS after transformation (multiplying by the "T" matrix in Buse's notation) where T'T = inv(V). So the equation on the final page works for us. Let me know if you can't access that PDF and I'll send a copy. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on meaning of '%+%', '%?%' (? = various single letter) in code
Hello. I’ve seen several uses of '%?%' in R code (with ? = a single letter or other characters used as the middle character in a 3 character string with '%' as the 1st and 3rd characters). I’ve also recently seen ‘%+%’ usage at: http://vita.had.co.nz/papers/ggplot2-wires.pdf on p. 7 of the document the following code appears: (Note the '%+%' string at end of 1st line.) ggplot(threept, aes(x, y)) %+% lineup(permute("result"), n = 8) + geom_point(aes(colour = result, group = 1), alpha = 1/4) + scale_colour_manual(values = c("made" = mnsl("10B 6/6"), "missed" = mnsl("10R 6/6"))) + xlab(NULL) + ylab(NULL) + coord_equal() + facet_wrap(˜ .sample, nrow = 2) I've looked long and hard on internet and on the CRAN and can't find a reference to what '%+%' or '%?%' or ‘%?%’ (? = any letter) does in an R function/script. I've also asked several friends who have significant R experience and all of us are stumped by what '%+%', or more generally '%?%' does in an R function/script. x On p. 12 of: R Language Definitions appears: ‘%x% Special binary operators, x can be replaced by any valid name’ and ‘R deals with entire vectors of data at a time, and most of the elementary operators and basic mathematical functions like log are vectorized (as indicated in the table above). This means that e.g. adding two vectors of the same length will create a vector containing the element-wise sums, implicitly looping over the vector index. This applies also to other operators like -, *, and / as well as to higher dimensional structures. Notice in particular that multiplying two matrices does not produce the usual matrix product (the %*% operator exists for that purpose). Some finer points relating to vectorized operations will be discussed in Section 3.3 [Elementary arithmetic operations], page 16.’ Page 16 doesn’t specifically address ‘%?%’, at least in a way I can understand. xxx I tried an experiment with %+%, and it failed to work! I loaded xts package (must have xts and zoo installed b/c xts has a dependency on zoo.) Entered the following: > library(xts) > data(sample_matrix) > x1 <- sample_matrix > x1 > x3 <- x1 %/% x1 > x3 > x4 <- x1 %+% x1 MH (Mike Hilt comment): After entering ‘x3’, the data was listed with all values = 1, which is what should happen when binary division is done – CHECK! MH: After entering ‘x4 <- x1 %+% x1 R returns: ‘Error: could not find function "%+%"’ MH: So %+% is not a Special binary operator as %x% is savdescribed on p. 12 of the CRAN doc: R Language Definitions. Could someone help me out and let me know what ‘%?%’ (where ? = a single letter in a 3 character string with ‘%’ being the 1st and 3rd characters), and/or ‘%+%’ does in R code/function? Thank you, Mike Hilt mikehil...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Multiple rms summary plots in a single device
I would like to incorporate multiple summary plots from the rms package into a single device and to control the titles, and also to open a new device when I reach a specified number of plots. Currently I am only getting a single "plot(summary(" graph in the upper left- hand corner of each successive device. However, in the rms documention I see instances of a loop being used with "par(mfrow(" for multiple plots in a single device(e.g. residuals.lrm), and these examples work on my system. Please advise regarding options that must be specified to "plot(summary(", or in the construction of my loop. Below are sample code and my sessionInfo(). Please note that I am using data.table to facilitate my "real analysis", but I can replicate the issue with tData as a data.frame (using seg <- subset(tData, groups == segment) logic), but I included the data.table logic in case it may be having some influence. Thank you! Mike tData <- data.frame(groups=as.factor(1:8), low=as.factor(1:4) ,high=as.factor(seq(100, 400, 100)), rand=runif(400)) tData <- data.table(tData) setkeyv(tData, 'groups') dd <- datadist(tData) options(datadist = 'dd') doSumPlot <- function(segment){ seg <<- tData[groups == segment,] plot(summary(rand ~ + low + high ,data = seg ), main=paste('Group:', segment)) } for(i in 1:length(levels(tData$groups))){ cat('Group: ', i, '\n') if(i == 1 ){ dev.new() par(mfrow=c(2,2)) } if(i/5 == round(i/5, 0)){ dev.new() par(mfrow=c(2,2)) } # dev.new() doSumPlot(levels(tData$groups)[i]) } > sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] rms_3.5-0Hmisc_3.9-3 survival_2.36-14 data.table_1.8.0 loaded via a namespace (and not attached): [1] cluster_1.14.2 grid_2.15.0lattice_0.20-6 tools_2.15.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading Excel Formulas as values
When I read excel files using the read.xlsx() command any cells that have formulas in them come up as NA. Is there a way to read just the numeric value of the cell without using the "paste value" command in Excel? I need to read in hundreds of Excel spreadsheets and compile them into one large super spreadsheet automatically. Hence the reason I cannot reformat each sheet manually. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can't read xlsx file into R. Seem, Seem to have XLConnect loaded.
I have spent hours on R in Windows 7. Just installed 2 days ago so the R package should be current. Currently I am using the RGui (64-bit) for Windows. I can not read an Excel file into R from my computer. Have hours on this. Completely crazy!! I have the XLConnect package loaded. I think it is loaded because when I enter: > loadedNamespaces() [1] "base" "datasets" "graphics" "grDevices" [5] "methods" "rJava" "stats" "tools" [9] "utils" "XLConnect" "XLConnectJars" XLConnect is listed. Does that mean it is loaded? > getwd() [1] "F:/RwkCB3/R OG SBCD 1205v01" > list.files() [1] "~$ R OG SBCD 1205 MH1.docx" "~WRL2121.tmp" [3] "01 R OG SBCD 1205 MH1 BU1.docx" "01 R OG SBCD 1205 MH1.docx" [5] "01 R OG SBCD 1205v01 BU2" "Dta1.txt" [7] "Gas Dly YTD1204 v01 BU.xlsm" "Gas Dly YTD1204 v01.xlsm" [9] "Oil Dly YTD1204 v01 BU.xlsm" "Oil Dly YTD1204 v01.xlsm" [11] "Oil Dly YTD1204 v01.xlsx" "R OG SBCD 1205v01 sCt.lnk" [13] "R setwd to C Prog R sCt OtDt.lnk" So apparently "Oil Dly YTD1204 v01.xlsx" exists in my working directory. SO WHY DOES THE FOLLOWING BEHAVE THE WAY IT DOES? > OlPrcFl <- loadWorkbook(Oil Dly YTD1204 v01.xlsx, create = FALSE) Error: unexpected input in "OlPrcFl <- loadWorkbook(" I can read an xlsx file in when I do: > OlPrcFl <- loadWorkbook(file.choose()) That is not a real, long-term solution. Have same problem installing packages -- Can't get R to load a package when I specify a path. Works when I use file.choose() I would really appreciate it if you could tell me what is going on. Thanks, Mike Hilt mikehil...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question on jitter in plot.Predict in rms
Dear colleagues, I have a question regarding controlling the jitter when plotting predictions in the rms package. Below I've simulated some data that reflect what I'm working with. The model predicts a continuous variable with an ordinal score, a two-level group, and a continuous covariate. Of primary interest is a plot of the group by score interaction, where the score is the ordinal variable, and the group Ns are quite disparate. When I produce the plot for the predicted values with the data=llist argument, as expected I get datadensity hatch marks. However, in the group with the smaller N, I get jittered datadensity points, while in the group with the larger N, the jitter apparently defaults to single vertical lines, which I assume is because the jitter would look like a black blob. Some of my co-authors are a bit worried about how that looks, so here I am. Apart from abandoning data=llist altogether, is there a way to modify the jitter in the smaller group so it behaves like the larger one? Of secondary importance, anything you can tell me about improving my clumsy little simulation code would be welcome--I have little to no idea what I'm doing there. for example, would there be a way to produce the group variable with the disparate Ns more directly? Thanks, Mike Babyak Behavioral Medicine Research Center Duke University Medical Center #question about jitter/llist in rms #R v 2.14.1 under windows 7 #question about jitter/llist in rms require(rms) #simulate some data n = 5000 age = runif(n) score = runif(n) + 0.5*age group<- as.numeric(sample(c(FALSE,TRUE), 5000, replace=T, prob=c(.1, .9))) ordscore = as.numeric(factor(rep(1:5, length.out=n))) table(group,ordscore) e = rnorm(n, 0, 0.1) #true model y = group + 0.6*ordscore + group*ordscore + .2*age + e #convert group to factor group.f<-as.factor(group) #save data characterics dd1<-datadist(age,ordscore,group.f) options(datadist="dd1") #estimate model reg1<-ols(y~group.f+ordscore+group.f*ordscore+age,x=T,y=T) #plot results preg<-Predict(reg1,ordscore,group.f) #produces interaction plot with datadensity hatch marks plot(preg,data=llist(ordscore,group.f)) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate random numbers for lotteries
On Sun, 29 Apr 2012, Daniel Nordlund wrote: I don't know what the OP is really trying to accomplish yet, and I am not motivated (yet) to try to figure it out. However, all this "flooring" and "ceiling) and "rounding" is not necessary for generating uniform random integers. For N integers in the range from A to B it is as simple as sample(A:B, N, replace=TRUE) That is easier to work with. Thanks. Jim Holtman also was pointing to the sample() function. I'm used to making all random numbers from runif() because I use that kind of tactic in a lot of different languages. Sample() is faster, too. Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate random numbers for lotteries
On Mon, 30 Apr 2012, Vale Fara wrote: ok, what to do is to generate two sets (x,y) of integer uniform random numbers so that the following condition is satisfied: the sum of the numbers obtained in x,y matched two by two (first number obtained in x with first number obtained in y and so on) is in mean equal to a value z, mean value that I can decide before the randomization. Hope this is more clear than before... It isn't very clear to me. If you generate random X,Y pairs such that (X+Y)/2=z, then you have a only one random number and a nother that is completely dependent on it: X = random Y = 2z - X. I'll just tell you one thing you might be able to use, but I don't have time for this. To make a vector of N uniformly-distributed random integers in the range from integer A to integer B, inclusive, you can do this: floor( runif(N, min=A, max=B+1) ) The floor() function rounds down to the nearest integer. Depending on the exact nature of the algorithm, it might be possible for B+1 to happen, but it would be extremely unlikely, if it really is possible. This should do the same thing: floor((B-A+1)*runif(N)+A) The ceiling function can accomplish the same thing. To make random integers from 1 to K, do this: ceiling( K*runif(N) ) Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] generate random numbers for lotteries
On Fri, 27 Apr 2012, Vale Fara wrote: I am working with lotteries and I need to generate two sets of uniform random numbers. Requirements: 1) each set has 60 random numbers random integers? 2) random numbers in the first set are taken from an interval (0-10), whereas numbers in the second set are taken from a higher interval (15-25) Depends on if you mean integers. R has functions. Here's one: http://www.astrostatistics.psu.edu/su07/R/html/stats/html/Uniform.html 3) numbers generated in the first set should be matched to numbers in the second set (row by row) so that the expected value of each couple of random numbers (i.e. of each lottery) is around to a given value (12.5 +/- 5, where 12.5 is the median value of the interval extremes). Do you mean that the mean for the pair of numbers must be between 7.5 and 17.5, inclusive? That means the sum must be from 15 to 35. Well, you are in luck because if you make the numbers as you suggested above, that will happen -- you don't have to do anything special to make it happen. For the computation of the expected value, the probabilities in each lottery are ½ and ½. For what outcome? You lost me. How do this? Any help given would be greatly appreciated. I hope that helps. Mike__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Completely Off Topic:Link to IOM report on use of "-omics" tests in clinical trials
Thanks, I had totally missed this controversy but from quick read of summary the impact on open source analysis was unclear.Can you explain the punchline? I think many users of R have concluded the biggest problem in most analyses isfirst getting the data and then verfiying any results you derive, both issues that sound related to your post. ( The jumble below is illustrative of what hotmail has been doing with plain text, getting plain data withoutall the formatting junk is a recurring problem LOL). > Date: Mon, 26 Mar 2012 22:38:56 +0100 > From: iaingallagher@btopenworld.com > To: gunter.berton@gene.com; r-help@r-project.org > Subject: Re: [R] Completely Off Topic:Link to IOM report on use of "-omics" tests in clinical trials > > I followed this case while it was ongoing. > > > It was a very interesting example of basic mistakes but also (for me) of journal politicking. > > > Keith Baggerly and Kevin Coombes wrote a great paper - "DERIVING CHEMOSENSITIVITY FROM CELL LINES: FORENSIC BIOINFORMATICS AND REPRODUCIBLE RESEARCH IN HIGH-THROUGHPUT BIOLOGY" in The Annals of Applied Statistics (2009, Vol. 3, No. 4, 1309–1334) which explains some of the background and investigative work they had to do to bring those mistakes to light.! > > > Best > > iain > > > > - Original Message - > From: Bert Gunter> To: r-help@r-project.org > Cc: > Sent: Monday, 26 March 2012, 19:12 > Subject: [R] Completely Off Topic:Link to IOM report on use of "-omics" tests in clinical trials > > Warning: This has little directly to do with R, although R and related > tools (e.g. sweave and other reproducible research tools) have a > natural role to play. > > The IOM report: > > http://www.iom.edu/Reports/2012/Evolution-of-Translational-Omics.aspx > > that arose out of the Duke Univ. genomics testing scandal ha! s been > released. My thanks to Keith Baggerly for forwar ding this. I believe > that many R users in the medical research community will find this > interesting, and I hope I do not venture too far out of line by > passing on the link to readers of this list. It **will** have an > important impact on so-called Personalized Health Care (which I guess > affects all of us), and open source analytical (statistical) > methodology is a central issue. > > For those interested, try the summary first. > > Best to all, > Bert > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pd! b-biostatistics/pdb-ncb-home.htm > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqrt(-x) vs. -x^0.5
Thanks Sarah, All, I guess I never thought of a negative sign as an "operation", but knowing that it is considered an operation explains everything nicely. Somewhere in it's underbelly, I suppose -4 is represented as "0 - 4"? Either way, I'm glad it is consistent & accurate, so that I didn't find myself in another pickle like "weak" typing and attempts to use time /date classes in 'R' have brought me. Thanks! Mike --- [The theory of gravity] is to me so great an absurdity that I believe no Man who has in philosophical matters a competent faculty of thinking can ever fall into it. -- Isaac Newton On Wed, Mar 21, 2012 at 4:37 PM, Rolf Turner wrote: > On 22/03/12 12:17, Sarah Goslee wrote: > >> It's order of operations, and a good reason to always use >> parentheses: which is evaluated first, the unary minus or >> the raising-to-powers? >> >> (-4)^0.5 >> -(4^0.5) >> >> sqrt(-4) >> -sqrt(4) >> > > If the OP *really* wants the square root of -4 he could do > sqrt(-4+0i) or (-4+0i)^0.5 (and get 0+2i in either case). > >cheers, > >Rolf Turner > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqrt(-x) vs. -x^0.5
Hi Everyone, I did a search through the archives and did not find an answer, although I must admit it is a hard search to do ( ^0.5 is tough to explicitly search for ). I am sure there is some mathematically accurate reason to explain the following, but I guess I either never learned it or have since forgotten it. In 'R', when I type (for instance): sqrt(-4) I get NaN but if I type in: -4 ^ 0.5 I get -2 I presume this is on purpose and correct, but it's the first time I've come across any theoretical difference between ^0.5 and sqrt. What is the theoretical difference / meaning between these two operations? Thanks! Mike --- [The theory of gravity] is to me so great an absurdity that I believe no Man who has in philosophical matters a competent faculty of thinking can ever fall into it. -- Isaac Newton [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] confidence intervals in dotplots in a for loop
# I have some population estimates and confidence intervals for various size classes # of animals captured with two gear types. I'd like to plot the estimates along with # the 90 and 95% CI's by size class for each gear type. # The data can be read in as: estimates <- c(67.42,30.49,32.95,23.53,10.26,6.03,23.53,0.93,50.72,24.2,25.84,18.54, 7.16,3.6,9.35,0.33,87.28,37.25,40.16,28.59,13.77,8.92,40.74,1.68,48.28,2 3.09, 24.49,17.7,6.63,3.28,7.79,0.26,91.63,38.74,41.6,29.74,14.49,9.51,44.16,1 .88) estimates.m<- matrix(estimates, 8,5) colnames(estimates.m) <- list("est","lci90","uci90","lci95","uci95") id <- c(1,1,2,2,3,3,4,4) size <- c("All","All","Large","Large","Medium","Medium","Small","Small") gear <- rep(c("Camera","Dredge"),4) cds.est <- data.frame(id,size,gear,estimates.m) # The following script works fine for plotting one size class at a time using dotplot() and is how I # would like the plots to look library(lattice) dat1 <- cds.est[cds.est$id == 1,] dotplot(gear ~ est , data=dat1, xlim = c(min(dat1$lci95 -10), max(dat1$uci95 +10)), xlab='Density (num / Nmi^2)', main = as.vector(unique(dat1$size)), panel=function(x,y) { panel.xyplot(x,y,pch=16,cex=1) panel.segments(dat1$lci95,as.numeric(y),dat1$uci95,as.numeric(y), lty=1, col=1) panel.segments(dat1$lci90,as.numeric(y),dat1$uci90,as.numeric(y), lty=1, lwd=4, col='grey60') panel.xyplot(x,y,pch=16,cex=1.2,col='white') panel.xyplot(x,y,pch=1,cex=1.1, col='black') }) # Since I have multiple size classes and will producing similar plots for other data sets # I've written the following script using a loop. The script loops over the "id" variable in the # cds.est data frame and stores the plots in a list. Since dotplot() is part of the # lattice package, I used grid.arrange to tile the plots. library(grid) library(gridExtra) id2 <- 1:max(cds.est$id) plots <- vector("list", length(id2)) j <- 0 for (i in id2) { dat <- cds.est[cds.est$id == i,] plots[[ j <- j+1]] <- dotplot(gear ~ est , data=dat, xlim = c(min(dat$lci95 -10), max(dat$uci95 +10)), xlab='Density (num / Nmi^2)', main = as.vector(unique(dat$size)), panel=function(x,y) { panel.xyplot(x,y,pch=16,cex=1) panel.segments(dat$lci95,as.numeric(y),dat$uci95,as.numeric(y), lty=1, col=1) panel.segments(dat$lci90,as.numeric(y),dat$uci90,as.numeric(y), lty=1, lwd=4, col='grey60') panel.xyplot(x,y,pch=16,cex=1.2,col='white') panel.xyplot(x,y,pch=1,cex=1.1, col='black') }) } do.call(grid.arrange, plots) # The script runs and produces all the plots with the correct estimates, but the CI's are not # plotting correctly. Does anyone have suggestions on what is causing this? Thanks, Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] The Future of R | API to Public Databases
LOL, I remember posting about this in the past. The US gov agencies vary but mostare quite good. The big problem appears to be people who push proprietary orcommercial "standards" for which only one effective source exists. Some formats,like Excel and PDF come to mind and there is a disturbing trend towards theiradoption in some places where raw data is needed by many. The best thing to do is contact the informationprovider and let them know you want raw data, not images or stuff that worksin limited commercial software packages. Often data sources are valuable andthe revenue model impacts availability. If you are just arguing over different open formats, it is usually easy for someone towrite some conversion code and publish it- CSV to JSON would not be a problem for example. Data of course are quite variable and there is nothingwrong with giving provider his choice. > Date: Sat, 14 Jan 2012 10:21:23 -0500 > From: ja...@rampaginggeek.com > To: r-help@r-project.org > Subject: Re: [R] The Future of R | API to Public Databases > > Web services are only part of the problem. In essence, there are at > least two facets: > 1. downloading the data using some protocol > 2. mapping the data to a common model > > Having #1 makes the import/download easier, but it really becomes useful > when both are included. I think #2 is the harder problem to address. > Software can usually be written to handle #1 by making a useful > abstraction layer. #2 means that data has consistent names and meanings, > and this requires people to agree on common definitions and a common > naming convention. > > RDF (Resource Description Framework) and its related technologies > (SPARQL, OWL, etc) are one of the many attempts to try to address this. > While this effort would benefit R, I think it's best if it's part of a > larger effort. > > Services such as DBpedia and Freebase are trying to unify many data sets > using RDF. > > The task view and package ideas a great ideas. I'm just adding another > perspective. > > Jason > > On 01/13/2012 05:18 PM, Roy Mendelssohn wrote: > > HI Benjamin: > > > > What would make this easier is if these sites used standardized web > > services, so it would only require writing once. data.gov is the worst > > example, they spun the own, weak service. > > > > There is a lot of environmental data available through OPenDAP, and that is > > supported in the ncdf4 package. My own group has a service called ERDDAP > > that is entirely RESTFul, see: > > > > http://coastwatch.pfel.noaa.gov/erddap > > > > and > > > > http://upwell.pfeg.noaa.gov/erddap > > > > We provide R (and matlab) scripts that automate the extract for certain > > cases, see: > > > > http://coastwatch.pfeg.noaa.gov/xtracto/ > > > > We also have a tool called the Environmental Data Connector (EDC) that > > provides a GUI from with R (and ArcGIS, Matlab and Excel) that allows you > > to subset data that is served by OPeNDAP, ERDDAP, certain Sensor > > Observation Service (SOS) servers, and have it read directly into R. It is > > freely available at: > > > > http://www.pfeg.noaa.gov/products/EDC/ > > > > We can write such tools because the service is either standardized > > (OPeNDAP, SOS) or is easy to implement (ERDDAP). > > > > -Roy > > > > > > On Jan 13, 2012, at 1:14 PM, Benjamin Weber wrote: > > > >> Dear R Users - > >> > >> R is a wonderful software package. CRAN provides a variety of tools to > >> work on your data. But R is not apt to utilize all the public > >> databases in an efficient manner. > >> I observed the most tedious part with R is searching and downloading > >> the data from public databases and putting it into the right format. I > >> could not find a package on CRAN which offers exactly this fundamental > >> capability. > >> Imagine R is the unified interface to access (and analyze) all public > >> data in the easiest way possible. That would create a real impact, > >> would put R a big leap forward and would enable us to see the world > >> with different eyes. > >> > >> There is a lack of a direct connection to the API of these databases, > >> to name a few: > >> > >> - Eurostat > >> - OECD > >> - IMF > >> - Worldbank > >> - UN > >> - FAO > >> - data.gov > >> - ... > >> > >> The ease of access to the data is the key of information processing with R. > >> > >> How can we handle the flow of information noise? R has to give an > >> answer to that with an extensive API to public databases. > >> > >> I would love your comments and ideas as a contribution in a vital > >> discussion. > >> > >> Benjamin > >> > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > ** > > "The contents of this message
Re: [R] how to combine grouped data and ungrouped data in a trellis xyplot
Thanks Deepayan. That did the trick. xyplot(cpu~dt|zone,data=filt_zone_df,ylim=c(0,100),groups=pool, auto.key=list(columns=min(4,length(unique(filt_zone_df$pool))),lines=T,points=F), type="l", main="Test Chart", ylab="% Utilization", panel=function(x,y,groups,subscripts,...){ panel.xyplot(x,y,groups=groups,subscripts=subscripts,...) panel.lines(filt_zone_df$dt[subscripts],filt_zone_df$mem[subscripts],col="red") }, as.Table=T, subscripts=T) Regards, -mike On Mon, Jan 9, 2012 at 5:06 AM, Deepayan Sarkar wrote: > On Sun, Jan 8, 2012 at 9:45 AM, Mike Dahman wrote: > > I'm hoping the community knowledge can help me out here. I have found > great > > value in R, especially using it to generate charts, but I am still > scaling > > the learning curve in a number of ways. > > > > I am looking plot one grouped line and one ungrouped line in a lattice > plot. > > > > I can plot one grouped line like this (the line's color in each panel > > becomes dependent on the newpool value): > > > > > > > > > xyp<-xyplot(cpucap~date|zone,data=df,type="l",groups=newpool,auto.key=list(points=F,lines=T), > > main=paste(df$server[1]," CPU Caps\n",df$date[1]," to > > ",df$date[nrow(df)],sep="") > >) > >print(xyp) > > > > > > and I can plot two ungrouped lines using a panel=function with subscripts > > like this (maybe not the best way, but I found an example doing it this > > way): > > > >xyplot(cpu~dt|zone,data=filt_zone_df,ylim=c(0,100), > > main=paste(server," - Zone CPU (Blue) & Memory (Red) > > > Util\n",filt_zone_df$ts[1],"-",filt_zone_df$ts[nrow(filt_zone_df)],sep=""), > > panel=function(x,y,subscripts){ > > panel.lines(x,y) > > > panel.lines(filt_zone_df$dt[subscripts],filt_zone_df$mem[subscripts],col="red") > >}, as.Table=T, subscripts=T) > > > > > > but I'm struggling with plotting one line that is grouped and one that > > isn't. When I try to pass group to the first panel.xyplot() function in > the > > panel=function it either does nothing or bombs out. > > > > > xyplot(cpu~dt|zone,data=servdf,ylim=c(0,100),groups=pool,auto.key=list(points=F,lines=T),type="l", > > main="test", > > panel=function(x,y,groups,subscripts,...){ > > panel.xyplot(x,y,groups,...) # would > > like this to be colored based on the groups=pool > > Try > > panel.xyplot(x, y, groups = groups, subscripts = subscripts, > ...) > > -Deepayan > > > > > panel.lines(servdf$dt[subscripts],servdf$mem[subscripts],col="red") > >}, as.Table=T, subscripts=T) > > > > > > A little nudge in the right direction is appreciated. I'm getting tripped > > up on how to get the groups definition into the panel function and also > > into the panel.xyplot function within it. I've tried using a number of > > variations in the arguments with the panel=function definition and the > call > > to panel.xyplot() within it, but no success. My assumption was that the > use > > of '...' would pass it on down, but that doesn't seem to be the > > case, especially since most of the examples I can find from googling show > > folks listing group as an argument, and sometimes have something like > > groups=groups. I've tried a number of things and thought it is time to > ask > > for help. > > > > Regards, > > > > -mike > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] glmmPQL and predict
Is the labeling/naming of levels in the documentation for the predict.glmmPQL function "backwards"? The documentation states "Level values increase from outermost to innermost grouping, with level zero corresponding to the population predictions". Taking the sample in the documentation: fit <- glmmPQL(y ~ trt + I(week > 2), random = ~1 | ID, family = binomial, data = bacteria) > head(predict(fit, bacteria, level = 0, type="response")) [1] 0.9680779 0.9680779 0.8587270 0.8587270 0.9344832 0.9344832 > head(predict(fit, bacteria, level = 1, type="response")) X01 X01 X01 X01 X02 X02 0.9828449 0.9828449 0.9198935 0.9198935 0.9050782 0.9050782 > head(predict(fit, bacteria, type="response")) ## population prediction X01 X01 X01 X01 X02 X02 0.9828449 0.9828449 0.9198935 0.9198935 0.9050782 0.9050782 The returned values for level=1 and level= match, which is not what I expected based upon the documentation. Exponentiating the intercept coefficients from the fitted regression, the level=0 values match when the random effect intercept is included > 1/(1+exp(-3.412014)) ## only the fixed effect [1] 0.9680779 > 1/(1+exp(-1*(3.412014+0.63614382))) ## fixed and random effect intercepts [1] 0.9828449 Thanks! Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to combine grouped data and ungrouped data in a trellis xyplot
I'm hoping the community knowledge can help me out here. I have found great value in R, especially using it to generate charts, but I am still scaling the learning curve in a number of ways. I am looking plot one grouped line and one ungrouped line in a lattice plot. I can plot one grouped line like this (the line's color in each panel becomes dependent on the newpool value): xyp<-xyplot(cpucap~date|zone,data=df,type="l",groups=newpool,auto.key=list(points=F,lines=T), main=paste(df$server[1]," CPU Caps\n",df$date[1]," to ",df$date[nrow(df)],sep="") ) print(xyp) and I can plot two ungrouped lines using a panel=function with subscripts like this (maybe not the best way, but I found an example doing it this way): xyplot(cpu~dt|zone,data=filt_zone_df,ylim=c(0,100), main=paste(server," - Zone CPU (Blue) & Memory (Red) Util\n",filt_zone_df$ts[1],"-",filt_zone_df$ts[nrow(filt_zone_df)],sep=""), panel=function(x,y,subscripts){ panel.lines(x,y) panel.lines(filt_zone_df$dt[subscripts],filt_zone_df$mem[subscripts],col="red") }, as.Table=T, subscripts=T) but I'm struggling with plotting one line that is grouped and one that isn't. When I try to pass group to the first panel.xyplot() function in the panel=function it either does nothing or bombs out. xyplot(cpu~dt|zone,data=servdf,ylim=c(0,100),groups=pool,auto.key=list(points=F,lines=T),type="l", main="test", panel=function(x,y,groups,subscripts,...){ panel.xyplot(x,y,groups,...) # would like this to be colored based on the groups=pool panel.lines(servdf$dt[subscripts],servdf$mem[subscripts],col="red") }, as.Table=T, subscripts=T) A little nudge in the right direction is appreciated. I'm getting tripped up on how to get the groups definition into the panel function and also into the panel.xyplot function within it. I've tried using a number of variations in the arguments with the panel=function definition and the call to panel.xyplot() within it, but no success. My assumption was that the use of '...' would pass it on down, but that doesn't seem to be the case, especially since most of the examples I can find from googling show folks listing group as an argument, and sometimes have something like groups=groups. I've tried a number of things and thought it is time to ask for help. Regards, -mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dropping columns from data frame
Thank you, David. I was merely using "head" to limit the code/ output. My question remains, because a created data frame has the same columns as was output from "head": > head(orig.df,3) num1.10 num11.20 lc1.10 lc11.20 uc1.10 uc11.20 1 1 11 a k A K 2 2 12 b l B L 3 3 13 c m C M > # Illustration 1: contiguous columns at beginning of data frame > head(orig.df[,-c(1:3)],2) lc11.20 uc1.10 uc11.20 1 k A K 2 l B L > new.df <- orig.df[,-c(1:3)] > head(new.df,2) lc11.20 uc1.10 uc11.20 1 k A K 2 l B L > > # Illustration 2: non-contiguous columns > head(orig.df[,-c(1,3,5)],2) num11.20 lc11.20 uc11.20 1 11 k K 2 12 l L > new.df <- orig.df[,-c(1,3,5)] > head(new.df,2) num11.20 lc11.20 uc11.20 1 11 k K 2 12 l L On Jan 6, 9:49 am, David Winsemius wrote: > On Jan 6, 2012, at 10:00 AM, Mike Harwood wrote: > > > How does R do it, and should I ever be worried? I always remove > > columns by index, and it works exactly as I would naively expect - but > > HOW? The second illustration, which deletes non contiguous columns, > > represents what I do all the time and have some trepidation about > > because I don't know the mechanics (e.g. why doesn't the column > > formerly-known-as-4 become 3 after column 1 is dropped: doesn't vector > > removal from a df/list invoke a loop in C?). > > You are NOT "removing columns". You are returning (to `head` and then > to `print`) an extract from the dataframe, but that does not change > the original dataframe. To effect a change you would need to assign > the value back to the same name as the original daatframe. > > -- > David > > > > > > > > > > > Can I delete a named > > list of columns, which are examples 4 and 5 and which generate the > > "unary error' mesages, without resorting to "orig.df$num1.10 <- NULL"? > > > Thanks! > > > orig.df <- data.frame(cbind( > > 1:10 > > ,11:20 > > ,letters[1:10] > > ,letters[11:20] > > ,LETTERS[1:10] > > ,LETTERS[11:20] > > )) > > names(orig.df) <- c( > > 'num1.10' > > ,'num11.20' > > ,'lc1.10' > > ,'lc11.20' > > ,'uc1.10' > > ,'uc11.20' > > ) > > # Illustration 1: contiguous columns at beginning of data frame > > head(orig.df[,-c(1:3)]) > > > # Illustration 2: non-contiguous columns > > head(orig.df[,-c(1,3,5)]) > > > # Illustration 3: contiguous columns at end of data frame > > head(orig.df[,-c(4:6)]) ## as expected > > > # Illustrations 4-5: unary errors > > head(orig.df[,-c(as.list('num1.10', 'lc1.10', 'uc1.10'))]) > > head(orig.df[,-c('num1.10', 'lc1.10', 'uc1.10')]) > > > Mike > > > __ > > r-h...@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > __ > r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dropping columns from data frame
How does R do it, and should I ever be worried? I always remove columns by index, and it works exactly as I would naively expect - but HOW? The second illustration, which deletes non contiguous columns, represents what I do all the time and have some trepidation about because I don't know the mechanics (e.g. why doesn't the column formerly-known-as-4 become 3 after column 1 is dropped: doesn't vector removal from a df/list invoke a loop in C?). Can I delete a named list of columns, which are examples 4 and 5 and which generate the "unary error' mesages, without resorting to "orig.df$num1.10 <- NULL"? Thanks! orig.df <- data.frame(cbind( 1:10 ,11:20 ,letters[1:10] ,letters[11:20] ,LETTERS[1:10] ,LETTERS[11:20] )) names(orig.df) <- c( 'num1.10' ,'num11.20' ,'lc1.10' ,'lc11.20' ,'uc1.10' ,'uc11.20' ) # Illustration 1: contiguous columns at beginning of data frame head(orig.df[,-c(1:3)]) # Illustration 2: non-contiguous columns head(orig.df[,-c(1,3,5)]) # Illustration 3: contiguous columns at end of data frame head(orig.df[,-c(4:6)]) ## as expected # Illustrations 4-5: unary errors head(orig.df[,-c(as.list('num1.10', 'lc1.10', 'uc1.10'))]) head(orig.df[,-c('num1.10', 'lc1.10', 'uc1.10')]) Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Can levelplot colorkeys display a logarithmic scale evenly?
I'm using the {lattice} "levelplot" function to make a (more or less) 2-d histogram, and for the most part it's working fine with my data. However, I can't get the color key to do what I need. I can give it labels and custom cutoffs, but my cutoff lines (and hence my labels) aren't evenly spaced, instead they're more-or-less logarithmic, starting at [0,20,50,100...] and continuing on up to 5 million. Levelplot scales the ticks/labels linearly on the color key, leaving most my labels scrunched down atop each other at the bottom and only the last few (..."500K","1M","2M","5M") really visible on the rest of the key. I want each gap (no matter its numerical range) to occupy one evenly-spaced "block" on the color key, so they're all readable by the user. I want it to display value on a logarithmic scale (0,1,10,100,1000,...). The colorkey only seems to support linear scales (0,2,4,6,8,10,...), at least by default, and I don't see any options in the help to change that default. Does levelplot support this? Or am I destined to draw the colorkey manually (very painfully using polygon calls and user coordinates)? Or are there other levelplot-like functions that would facilitate this? Any help is appreciated, - Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] empty files created with trellis xyplot jpeg device
Thanks Michael. That did the trick. Despite googling most of the day yesterday, I didn't quite have the right search string to find that one. Almost feels like the answer was hiding in plain sight, now that you point me to it. I added some code to save the xyplots to a variable and then print it at the end of the function. before: xyplot(...) after: xyp<-xyplot(...) print(xyp) Works great. Regards, -mike On Sun, Jan 1, 2012 at 1:40 AM, R. Michael Weylandt < michael.weyla...@gmail.com> wrote: > I'm guessing R FAQ 7.22: http://cran.r-project.org/doc/FAQ/R-FAQ.html > > The subtlety is that in an interactive session print is automatically > called at the final evaluation of most everything, but you have to > prompt it in interactive use (and depending on details, in some > function calls) > > Michael Weylandt > > On Sat, Dec 31, 2011 at 6:50 PM, Mike Dahman > wrote: > > New years greetings. > > > > I have been setting up a function to generate multiple jpeg charts. When > > the calls are issued at the interactive console, the jpeg files are > > generated without an issue. When I try to issue the same calls from a > > function, some chart files are empty. It appears to only be related to > > trellis charts. Any help to troubleshoot this is appreciated. > > > > Regards, > > > > -mike > > > > > > R version 2.14.0 (2011-10-31) > > Copyright (C) 2011 The R Foundation for Statistical Computing > > ISBN 3-900051-07-0 > > Platform: x86_64-unknown-linux-gnu (64-bit) > > > > > > # validate devices > >> capabilities() > >jpeg png tifftcltk X11 aqua http/ftp sockets > >TRUEFALSEFALSEFALSE TRUEFALSE TRUE TRUE > > libxml fifo clediticonv NLS profmemcairo > >TRUE TRUE TRUE TRUE TRUEFALSEFALSE > > > > # Example functionality from the interactive console > > > > # I am going to use a zone variable to help duplicate the code in the > > function > >> zone > > [1] "isoranp-z1" > > > > # call a function to pull in data and assign it to a data frame > >> testz<-get_zonedata_url(2011,51,zone) > > > > # validate the data frame > >> str(testz) > > 'data.frame': 2016 obs. of 14 variables: > > $ ts : Factor w/ 2016 levels "12/18/2011 00:00",..: 1 2 3 4 5 6 7 8 > 9 > > 10 ... > > $ server : Factor w/ 1 level "phx1npf4sn2": 1 1 1 1 1 1 1 1 1 1 ... > > $ zone : Factor w/ 1 level "isoranp-z1": 1 1 1 1 1 1 1 1 1 1 ... > > $ pool : Factor w/ 1 level "ORA-S1": 1 1 1 1 1 1 1 1 1 1 ... > > $ cpucap : num 4 4 4 4 4 4 4 4 4 4 ... > > $ memcap : int 26 26 26 26 26 26 26 26 26 26 ... > > $ swapcap: int 26 26 26 26 26 26 26 26 26 26 ... > > $ cpu: num 78.2 206.8 198.4 366.4 112.1 ... > > $ poolsz : int 42 42 42 42 42 42 42 42 42 42 ... > > $ mem: num 75.5 75.3 75.6 74.5 74.3 ... > > $ swp: num 80.2 80.1 79.6 79 78.9 ... > > $ dates : chr "12/18/2011" "12/18/2011" "12/18/2011" "12/18/2011" ... > > $ times : chr "00:00:00" "00:05:00" "00:10:00" "00:15:00" ... > > $ dt :Classes 'chron', 'dates', 'times' atomic [1:2016] 15326 15326 > > 15326 15326 15326 ... > > .. ..- attr(*, "format")= chr [1:2] "m/d/y" "h:m:s" > > .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970 > > .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year" > > > > # set up a jpeg device > >> trellis.device(jpeg,file=paste("charts/",zone,".zone_util.jpg",sep="")) > > > > # call a function to generate the chart > > # the plot_zone_util function uses xyplot > >> plot_zone_util(testz) > > > > #close the device > >> dev.off() > > > > The jpeg file is generated as expected: > > (User and group has been removed) > > > > ls -l isoranp-z1.zone_util.jpg > > -rw-rw-r-- 1 <> <> 32123 Dec 31 16:13 isoranp-z1.zone_util.jpg > > > > # here is the plot_zone_util function for reference > > plot_zone_util <- function(zone_df){ > > > >xyplot(cpu~dt|server,data=zone_df,ylim=c(0,100), > > main=paste(zone_df$zone[1]," CPU (Blue) & Memory (Red) > > Util\n",zone_df$ts[1],"-",zone_df$ts[nrow(zone_df)],sep=""), > > xlab=&qu
[R] empty files created with trellis xyplot jpeg device
New years greetings. I have been setting up a function to generate multiple jpeg charts. When the calls are issued at the interactive console, the jpeg files are generated without an issue. When I try to issue the same calls from a function, some chart files are empty. It appears to only be related to trellis charts. Any help to troubleshoot this is appreciated. Regards, -mike R version 2.14.0 (2011-10-31) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) # validate devices > capabilities() jpeg png tifftcltk X11 aqua http/ftp sockets TRUEFALSEFALSEFALSE TRUEFALSE TRUE TRUE libxml fifo clediticonv NLS profmemcairo TRUE TRUE TRUE TRUE TRUEFALSEFALSE # Example functionality from the interactive console # I am going to use a zone variable to help duplicate the code in the function > zone [1] "isoranp-z1" # call a function to pull in data and assign it to a data frame > testz<-get_zonedata_url(2011,51,zone) # validate the data frame > str(testz) 'data.frame': 2016 obs. of 14 variables: $ ts : Factor w/ 2016 levels "12/18/2011 00:00",..: 1 2 3 4 5 6 7 8 9 10 ... $ server : Factor w/ 1 level "phx1npf4sn2": 1 1 1 1 1 1 1 1 1 1 ... $ zone : Factor w/ 1 level "isoranp-z1": 1 1 1 1 1 1 1 1 1 1 ... $ pool : Factor w/ 1 level "ORA-S1": 1 1 1 1 1 1 1 1 1 1 ... $ cpucap : num 4 4 4 4 4 4 4 4 4 4 ... $ memcap : int 26 26 26 26 26 26 26 26 26 26 ... $ swapcap: int 26 26 26 26 26 26 26 26 26 26 ... $ cpu: num 78.2 206.8 198.4 366.4 112.1 ... $ poolsz : int 42 42 42 42 42 42 42 42 42 42 ... $ mem: num 75.5 75.3 75.6 74.5 74.3 ... $ swp: num 80.2 80.1 79.6 79 78.9 ... $ dates : chr "12/18/2011" "12/18/2011" "12/18/2011" "12/18/2011" ... $ times : chr "00:00:00" "00:05:00" "00:10:00" "00:15:00" ... $ dt :Classes 'chron', 'dates', 'times' atomic [1:2016] 15326 15326 15326 15326 15326 ... .. ..- attr(*, "format")= chr [1:2] "m/d/y" "h:m:s" .. ..- attr(*, "origin")= Named num [1:3] 1 1 1970 .. .. ..- attr(*, "names")= chr [1:3] "month" "day" "year" # set up a jpeg device > trellis.device(jpeg,file=paste("charts/",zone,".zone_util.jpg",sep="")) # call a function to generate the chart # the plot_zone_util function uses xyplot > plot_zone_util(testz) #close the device > dev.off() The jpeg file is generated as expected: (User and group has been removed) ls -l isoranp-z1.zone_util.jpg -rw-rw-r-- 1 <> <> 32123 Dec 31 16:13 isoranp-z1.zone_util.jpg # here is the plot_zone_util function for reference plot_zone_util <- function(zone_df){ xyplot(cpu~dt|server,data=zone_df,ylim=c(0,100), main=paste(zone_df$zone[1]," CPU (Blue) & Memory (Red) Util\n",zone_df$ts[1],"-",zone_df$ts[nrow(zone_df)],sep=""), xlab="date", ylab="utilization (%)", panel=function(x,y,subscripts){ panel.lines(x,y) panel.lines(zone_df$dt[subscripts],zone_df$mem[subscripts],col="red") }, as.Table=T, subscripts=T) } ## # Try and do the same thing within a function::: > gen_zone_charts(zone,2011,51) # Note the zone_util.jpg file is zero length. It is a lattice chart. The other two charts generate ok. User and group has been removed. -rw-rw-r-- 1 <> <> 22376 Dec 31 16:20 isoranp-z1.zone_cpu.jpg -rw-rw-r-- 1 <> <> 18910 Dec 31 16:20 isoranp-z1.zone_mem.jpg -rw-rw-r-- 1 <> <> 0 Dec 31 16:20 isoranp-z1.zone_util.jpg # here is the gen_zone_charts function: > gen_zone_charts function(zone,year,wk){ data_frame<-get_zonedata_url(year,wk,zone) # this results in a 0 length file # i have tried using jpeg(), and trellis.device() with the same results #jpeg(file=paste("charts/",zone,".zone_util.jpg",sep="")) trellis.device(jpeg,file=paste("charts/",zone,".zone_util.jpg",sep="")) #uses xyplot - works fine being called from the console plot_zone_util(data_frame) dev.off() # this works ok jpeg(file=paste("charts/",zone,".zone_cpu.jpg",sep="")) # uses combination of boxplot and plot with a preceeding par() plot_zone_cpu(data_frame) dev.off() # this works ok jpeg(file=paste("charts/",zone,".zone_mem.jpg",sep="")) # uses combination of plot and plot with a preceeding par() plot_zone_mem(data_frame) dev.off() } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] HELP!! - PHP calling R to execute a r-code file (*.r)
> Date: Fri, 30 Dec 2011 16:04:08 -0600 > From: xiuquan.w...@gmail.com > To: r-help@r-project.org > Subject: [R] HELP!! - PHP calling R to execute a r-code file (*.r) > > Hi, > > I have met a tough problem when using PHP to call R to generate some plots. > I tested it okay in my personal computer with WinXP. But when I was trying > to update to my server (Win2003 server), I found it did not work. Below is > the details: I've run into lots of problems like this. Generally first check the php error log file, I have noidea where it is on your machine, and see if you can get your script to dump output somewhere,possibly with absolute path so you know where to look for it LOL. Often the change in user creates unexpected problems with file permissions and libraries and paths. You need to checkthe specific direcories for permissions not just top level. I would also point out that there is Rapache available as well as Rserver. Curious if people are using R with any other unique situations server side. We have a java webserver which I use to invoke R via bash scripts and generate rathercomplicated files. These could take very long to generate but if you have flexible caching system,it can be easy to re use output files or even generate them ahead of time. Starting "R" or any otherprocess is not instantaneous and often image generation is quite time consuming. Thereare a lot of issues making it work well in a server setting in real time. Scale up has also been an issue. Apache threading or process model is quite expensive if you careabout performance. We were able to use "netty" front end and so far that has worked very well.PHP AFAIK is not thread safe however. > > 1> r-code file (E:/mycode.r): > -- > jpeg("E:/mytest.jpg") > plot(1:10) > dev.off() > -- > > 2> php code: > --- > exec("R CMD BATCH --vanilla --slave --no-timing E:/mycode.r && exit"); > --- > 3> results: > for WinXP: the image can be generated successfully. > for Server(win2003): can not be generated. > > BTW, I have added a user "everyone" with full control permission to the E > disk. > [[elided Hotmail spam]] > > Thanks. > > > All the best, > Xiuquan Wang > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quotes inside char string
Duncan, Thank you for referring me to Uwe's answer. -Original Message- From: Duncan Murdoch [mailto:murdoch.dun...@gmail.com] Sent: Tuesday, December 20, 2011 1:14 PM To: Mike Pfeiff Cc: 'R-help@r-project.org' Subject: Re: [R] Quotes inside char string On 20/12/2011 2:10 PM, Mike Pfeiff wrote: > How do I return a character string with quotes inside string? > > For example, what logic do I use if I want to return the following: > > Test Score="A" > > I tried the following > > Score<-paste("Test Score=","A",sep='"') > > But it returned a "\" inside: No it didn't, as Uwe explained to you. Duncan Murdoch > > "Test Score=\"A" > > > > Any assistance would be greatly appreciated. > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Quotes inside char string
Thank you Uwe. I can't get my sqlQuery to work and thought that it must be the "\" embedded in the text string, I guess not if R doesn't pass the "\". Thanks for your response. -Original Message- From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] Sent: Tuesday, December 20, 2011 1:20 PM To: Mike Pfeiff Cc: 'R-help@r-project.org' Subject: Re: [R] Quotes inside char string On 20.12.2011 20:10, Mike Pfeiff wrote: > How do I return a character string with quotes inside string? > > For example, what logic do I use if I want to return the following: > > Test Score="A" > > I tried the following > > Score<-paste("Test Score=","A",sep='"') > > But it returned a "\" inside: > > > "Test Score=\"A" > > > > Any assistance would be greatly appreciated. That is the print()ed R representation of it. The underlying value *within* the quoted string is actually: Test Score="A and that is always true. E.g. in your example with the SQL query this question probably is derived from. cat() shows the actual version (but does not return) it. Uwe Ligges > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Quotes inside char string
How do I return a character string with quotes inside string? For example, what logic do I use if I want to return the following: Test Score="A" I tried the following Score<-paste("Test Score=","A",sep='"') But it returned a "\" inside: "Test Score=\"A" Any assistance would be greatly appreciated. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RODBC Error: 'getCharCE' must be called on a CHARSXP
Thanks. But if I do as you suggested... txt<-'SELECT Date, Region, Price FROM TableXYZ WHERE Type="Domestic"' returns... "SELECT TradeDate, Hub, SettlementPrice FROM PowerSettlements WHERE Contract=\"ERN\"" Which because of the extra "\"' is improper SQL form. Maybe I should be asking what is the proper way to return "" inside of a char string? -Original Message- From: Uwe Ligges [mailto:lig...@statistik.tu-dortmund.de] Sent: Tuesday, December 20, 2011 8:46 AM To: Mike Pfeiff Cc: 'r-help@r-project.org' Subject: Re: [R] RODBC Error: 'getCharCE' must be called on a CHARSXP On 20.12.2011 14:55, Mike Pfeiff wrote: > I am trying to connect to an internal database and use the sqlQuery command > to reduce and retrieve data using the following code: > > channel<-odbcConnect("some_dsn", uid="", pwd="") txt<-'SELECT Date, > Region, Price FROM TableXYZ WHERE Type="Domestic"' > sqlQuery(channel, cat(txt,"\n"),errors=TRUE,) sqlQuery(channel, txt, errors=TRUE) seems more plausible (since cat returns NULL). Uwe Ligges > close(channel) > > However, I get the following error immediately after sqlQuery command: > > Error in odbcQuery(channel, query, rows_at_time) : >'getCharCE' must be called on a CHARSXP > > I believe my connection is good because I used the following commands to > successfully view the columns: > > sqlColumns(channel, TableXYZ) > > There doesn't seem to be much info on "getCharCE" and/or "CHARSXP. > Any guidance the group could provide this vey new user to R, would be > greatly appreciated > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RODBC Error: 'getCharCE' must be called on a CHARSXP
I am trying to connect to an internal database and use the sqlQuery command to reduce and retrieve data using the following code: channel <-odbcConnect("some_dsn", uid="", pwd="") txt<-'SELECT Date, Region, Price FROM TableXYZ WHERE Type="Domestic"' sqlQuery(channel, cat(txt,"\n"),errors=TRUE,) close(channel) However, I get the following error immediately after sqlQuery command: Error in odbcQuery(channel, query, rows_at_time) : 'getCharCE' must be called on a CHARSXP I believe my connection is good because I used the following commands to successfully view the columns: sqlColumns(channel, TableXYZ) There doesn't seem to be much info on "getCharCE" and/or "CHARSXP. Any guidance the group could provide this vey new user to R, would be greatly appreciated [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simple R server for SQL Server?
I have a big SQL Server application that needs some help from R, and this R needs to load a large workspace in order to service the database request. I'd like to keep an R process running continually because loading the workspace takes about 1 minute. The data in the workspace doesn't change very often. Rserve doesn't seem to align with my needs - I don't want a new workspace with each new connection, because I just have to load the large workspace I need each time. So I want an R process with its workspace pre-loaded, sitting and waiting for requests from SQL Server. These would be simple requests, such as "make-forecast". The R process crunches and puts the answers back into the database using RODBC. I think I need two pieces? 1) Some R code to set up a small server listening for the "make-forecast" command 2) An intermediate program that SQL Server talks to, which in turn talks to the running R process. I don't know if SQL server can communicate to R through a socket, that's why I think I need an intermediate program. All of this (R, SQL Server) would run on the same Windows server box. Some small code examples for 1) would be helpful, but advice would be better. - Mike Beddo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: tcplot documentation in evd package
This issue occurs only when both the evd and ismev packages are loaded. Please retract this posting, if possible. Thank you in advance! Mike -- Forwarded message -- From: Mike Harwood Date: Mon, Dec 12, 2011 at 7:47 PM Subject: tcplot documentation in evd package To: r-help@r-project.org Hello, and please advise regarding any errors/omissions on my part. However, the documentation for R's tcplot function in the evd package appears to contain an error. I am using evd version 2.2-4 in R 2.14.0 with Windows XP. > data(portpirie) > mrlplot(portpirie) ## No Error > tlim <- c(3.6, 4.2) > tcplot(portpirie, tlim) ## Line from documentation Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow = ulow, : `x' must be a non-empty numeric vector > tcplot(portpirie, tlim=c(3.6, 4.2)) ## explicitly specifying the threshold limits Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow = ulow, : `x' must be a non-empty numeric vector tcplot(portpirie$SeaLevel, tlim) ## Resolves Issue gpd.fitrange(portpirie$SeaLevel, 3.6, 4.2) ## An alternative, still naming the SeaLevel vector Please advise. Thanks! Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tcplot documentation in evd package
Hello, and please advise regarding any errors/omissions on my part. However, the documentation for R's tcplot function in the evd package appears to contain an error. I am using evd version 2.2-4 in R 2.14.0 with Windows XP. > data(portpirie) > mrlplot(portpirie) ## No Error > tlim <- c(3.6, 4.2) > tcplot(portpirie, tlim) ## Line from documentation Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow = ulow, : `x' must be a non-empty numeric vector > tcplot(portpirie, tlim=c(3.6, 4.2)) ## explicitly specifying the threshold > limits Error in fpot(data, u[1], model = model, cmax = cmax, r = r, ulow = ulow, : `x' must be a non-empty numeric vector tcplot(portpirie$SeaLevel, tlim) ## Resolves Issue gpd.fitrange(portpirie$SeaLevel, 3.6, 4.2) ## An alternative, still naming the SeaLevel vector Please advise. Thanks! Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] any updates w.r.t. lapply, sapply, apply retaining classes
Hi Joshua, Thank you for the input! I agree that it is non-trivial to solve the cases you & I have posed. However, I would wholeheartedly support having an error spit back for any function that does not explicitly support a class. In this case, if I attempt to do sapply(x, class), and 'x' is of class "difftime", then I should receive an error "sapply cannot function upon class 'difftime' ". Why do I take this stance? There are at least 2 strong reasons: - Most importantly, an incorrect answer is far more dangerous than no answer. E.g., if I ask "what is 3 + 3?", I would far prefer to receive "I don't know" than "5". The former lets me know I need to choose another path, the latter mistakenly makes me think I have an answer, when I do not, and I continue with analyses on the assumption that answer is correct. In the case of dates, this happens often. E.g., is the numeric that is returned from sapply, for instance, the # of seconds since 1970-01-01, or the number of days since 1970-01-01. This depends upon how 'R' internally attempts to fix any incongruities. - But also very significantly, an error will get me in the habit of avoiding any marginalized class types. I keep thinking, for instance, that I can use the "Dates" class, since 'R' says that it supports them. But if I got into the habit of converting all dates into numerics myself beforehand (maybe counting the number of seconds from 1970-01-01, since that seems a magic date), then I would be guaranteed that a function will either (a) cause an error (e.g., if I try a character function on it), or (b) function properly. However, since I don't overtly receive errors when attempting to use dates (or difftimes, or factors, or whatever), I keep using them, instead of relying solely upon the true & trusted classes. - the trickiest here is really factors. Factors are, by most accounts, considered a core class. In some cases, you can only use factors. E.g., when you want some sort of ordinal categorical variable. Therefore, the fact that factors also barf similarly to other classes like difftime (albeit much more rarely), is especially dangerous. There are, of course, habits that we can create to make ourselves better programmers, and I will recognize that I can improve. However, this issue of functions generating "wrong" answers is such a *huge* problem with 'R', and other languages are catching up to 'R' so quickly, as far as their capability to handle higher level math, that this issue is making 'R' a less desirable language to use, as time progresses. I don't mean to claim that my opinion is the end-all-be-all, but I would like to hear others chime in, whether this is a large concern, or whether there is a very small minority of folks impacted by it. Regards, Mike --- XKCD <http://www.xkcd.com> On Thu, Nov 3, 2011 at 2:51 PM, Joshua Wiley wrote: > Hi Mike, > > This isn't really an answer to your question, but perhaps will serve > to continue discussion. I think that there are some fundamental > issues when working special classes. As a thought example, suppose I > wrote a class, "posreal", which inherits from the numeric class. It > is only valid for positive, real numbers. I use it in a package, but > do not develop methods for it. A user comes along and creates a > vector, x that is a posreal. Then tries: mean(x * -3). Since I never > bothered to write a special method for mean for my class, R falls back > to the inherited numeric, but gives a value that is clearly not valid > for posreal. What should happen? S3 methods do not really have > validation, so in principle, one could write a function like: > > f <- function(x) { > vclass <- class(x) > res <- mean(x) > class(res) <- vclass > return(res) > } > > which "retains" the appropriate class, but in name only. R core > cannot possibly know or imagine all classes that may be written that > inherit from more basic types but with possible special aspects and > requirements. I think the inherited is considered to be more generic > and that is returned. It is usually up to the user to ensure that the > function (whose methods were not specific to that special class but > the inherited) is valid for that class and can manually convert it > back: > > res <- as.posreal(res) > > What about lapply and sapply? Neither are generic or have methods for > difftime, and so do some unexpected/desirable things. Again, without > methods defined for a particular c
[R] any updates w.r.t. lapply, sapply, apply retaining classes
Hi All, I don't have a "I need help" question, so much as a query into any update whether 'R' has made any progress with some of the core functions retaining classes. As an example, because it's one of the cases that most egregiously impacts me & my work and keeps pushing me away from 'R' and into other numerical languages (such as NumPy in python), I will use sapply / lapply to demonstrate, but this behavior is ubiquitous throughout 'R'. Let's say I have a class which is theoretically supported, but not one of the core "numeric" or "character" classes (and, to some degree, "factor" classes). Many of the basic functions will convert my desired class into either numeric or character, so that my returned answer is gibberish. E.g.: test= as.difftime(c(1, 1, 8, 0.25, 8, 1.25), units= "days") ## create a small array of time differences class(test) ## this will return the proper class, "difftime" class(test[1] ) ## this will also return the proper class, "difftime" sapply(test, class) ## this will return *numerics* for all of the classes. Ack!! In the example I give above, the impact might seem small, but the implications are *huge*. This means that I am, in effect, not allowed to use *any* of the vectoring functions in 'R', which avoid performing loops thereby speeding up process time extraordinarily. Many can sympathize that 'R' is ridiculously slow with "for" loops, compared to other languages. But that's theoretically OK, a good statistician or data analyst should be able to work comfortably with matrices and vectors. However, *'R' cannot work comfortably* with matrices or vectors, *unless* they are using the numeric or character classes. Many of the classes suffer the problem I just described, although I only used "difftime" in the example. Factors seem a bit more "comfortable", and can be handled most of the time, but not as well as numerics, and at times functions working on factors can return the numerical representation of the factor instead of the original factor. Is there any progress in guaranteeing that all core functions either (a) ideally return exactly the classes, and hierarchy of classes, that they received (e.g., a list of data frames with difftimes & dates & characters would return a list of data frames with difftimes & dates & characters), or (b) barring that, the function should at least error out with a clear error explaining that sapply, for example, cannot vectorize on the class being used? Returning incorrect answers is far worse than returning an error, from a perspective of stability. This is, by far, the largest Achilles' heel to 'R'. Personally, as my career advances and I work on more technical things, I am finding that I have to leave 'R' by the wayside and use other languages for robust numerical calculations and programming. This saddens me, because there are so many wonderful packages developed by the community. The example above came up because I am using the "forecast" library to great effect in predicting how long our product cycle time will be. However, I spend much of my time fighting all these class & typing bugs in 'R' (and we have to start recognizing that they are bugs, otherwise they may never get resolved), such that many of the improvements in my productivity due to all the wonderful computational packages are entirely offset by the time I spend fighting this issue of poor classes. Thanks & Regards! Mike --- XKCD <http://www.xkcd.com> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R shell line width
Thank you very much for your responses! This is exactly what I needed. On Fri, Sep 16, 2011 at 8:13 PM, Joshua Wiley wrote: > Hi Mike, > > Look at ?options particularly something like: > > options(width = 120) > > 80 is the default, I believe. On 1920 pixels I can comfortably get > around 220 (depending on the overhead of the program, full screen, > etc.). I imagine it would also be possible to run into limitations > from the terminal R is running in, though I do not know that for a > fact. > > Cheers, > > Josh > > On Fri, Sep 16, 2011 at 12:48 PM, Mike P wrote: >> Hi, >> >> I want to apologize in advance if this has already been asked. I >> wasn't able to find any information, either on google or from local >> list search. >> >> I'm running an R shell from a linux command line, in an xterm window. >> Whenever I print a data frame, only the first couple of columns are >> printed side-by-side, the others are being repositioned below them. It >> seems something is limiting the line width of the output, even though >> there is enough horizontal space to fit each row on a single line. >> >> For example, this command: >> >>> data.frame(matrix(1:30,nrow=1)) >> >> prints columns 1-21 on the first line, and the rest 22-30 on the second. >> >> Is there a way I can configure R to increase the width of my output? >> >> Thanks. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, ATS Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/ > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] any way to convert back to DateTime class when "accidental" conversion to numeric?
Hi Dr. Ripley, All, Thanks for the succinct advice! Perfectly what I needed! Jeff, Absolutely I agree that this is a dangerous path, and I would never consider doing it for something that needs to be robust. But in 'R' type casting is a bit messed up, so I've come to accept that sometimes something I called a string might become a factor, a date became a numeric, etc. Then I stick in enough catches throughout my functions to deal with it, worst case. It is ugly, but I cannot figure out a way to have tight types and still use 'R'. Yet, 'R' has so many cool functions (and more added every day); I'd be silly to throw the baby out with the bathwater. So my typical best case or robust solution is to write a parent script in python (I simply know python best) that handles all data typing, etc., then call 'R' once I know that everything is clean. In this particular case above where I was asking, this is really for exploratory work. Once I get a solution, I will likely handle typing outside of 'R'. Thanks for the advice! Regards, Mike --- XKCD <http://www.xkcd.com> On Wed, Oct 5, 2011 at 11:41 PM, Prof Brian Ripley wrote: > A more portable way (that function only works in some versions of R) is > > as.POSIXct(1317857320, origin="1970-01-01") > > possibly with a 'tz' argument if you need to restore the timezone. > > > On Wed, 5 Oct 2011, jim holtman wrote: > > Here is what I use: >> >> unix2POSIXct(1317857320) >> [1] "2011-10-05 19:28:40 EDT" >> >> >> unix2POSIXct <- function (time) structure(time, class = c("POSIXt", >> "POSIXct")) >> >> >> On Wed, Oct 5, 2011 at 7:38 PM, Mike Williamson >> wrote: >> >>> Hi, >>> >>>In short, I would like to know if there is any way to convert a >>> numeric >>> into a date, similar to how strptime() can convert a string to a date >>> time >>> class? >>> >>>There are some functions, etc. which don't work well with dates, and >>> tend to force them into numerics. I understand that the number it spits >>> back is the number of seconds since the beginning of 1970 (see the first >>> few >>> sentences of the "Details" portion of ?DateTimeClasses). >>>However, it's a bit of a hassle to convert that by hand. I can create >>> a >>> function to do this, and it isn't so hard, but I found it hard to believe >>> such a function didn't already exist, so I wanted to ask the community. >>> >>>As an example, today (Oct 5th 2011 at approximately 4:30pm, Pacific >>> time) is approximately 1317857320 as a numeric, but I would like to know >>> how >>> to go from that number back to the "2011-10-05 16:28:39 PDT" date time >>> class >>> which originally generated it. >>> >>>Thanks! >>> Mike >>> >>> --- >>> XKCD <http://www.xkcd.com> >>> >>>[[alternative HTML version deleted]] >>> >>> __** >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >>> PLEASE do read the posting guide http://www.R-project.org/** >>> posting-guide.html <http://www.R-project.org/posting-guide.html> >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> -- >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >> > -- > Brian D. Ripley, rip...@stats.ox.ac.uk > Professor of Applied Statistics, > http://www.stats.ox.ac.uk/~**ripley/<http://www.stats.ox.ac.uk/~ripley/> > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UKFax: +44 1865 272595 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] any way to convert back to DateTime class when "accidental" conversion to numeric?
Hi, In short, I would like to know if there is any way to convert a numeric into a date, similar to how strptime() can convert a string to a date time class? There are some functions, etc. which don't work well with dates, and tend to force them into numerics. I understand that the number it spits back is the number of seconds since the beginning of 1970 (see the first few sentences of the "Details" portion of ?DateTimeClasses). However, it's a bit of a hassle to convert that by hand. I can create a function to do this, and it isn't so hard, but I found it hard to believe such a function didn't already exist, so I wanted to ask the community. As an example, today (Oct 5th 2011 at approximately 4:30pm, Pacific time) is approximately 1317857320 as a numeric, but I would like to know how to go from that number back to the "2011-10-05 16:28:39 PDT" date time class which originally generated it. Thanks! Mike --- XKCD <http://www.xkcd.com> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read .csv from web from password protected site
Yes, I meant to reply to all (sorry still new at asking for help) 1. No, I am not able to open the file when I insert "userID:password@" between "http://"; and "www" http://userid:passw...@www.frontierweather.com/degreedays/L15N15PowerRegionAverages_10weeks.txt (replacing userid and password with my actual information that is known to work) 2. Yes, the webpage where the data is stored does require me to my userid and password and gives me the option to remember password for future use which I have selected. Subsequent visits to the website do not require me to reenter info. -Original Message- From: Sarah Goslee [mailto:sarah.gos...@gmail.com] Sent: Monday, October 03, 2011 3:07 PM To: Mike Pfeiff; r-help Subject: Re: [R] read .csv from web from password protected site Hi, I've assumed that you meant to send this to the R-help list, and not just me. On Mon, Oct 3, 2011 at 3:54 PM, Mike Pfeiff wrote: > Sarah, Thanks for the suggestion. Although, > read.table("http://userid:passw...@my.url/file.csv";) did not work as it > returned the following: > > Error in file(file, "rt") : cannot open the connection > In addition: Warning message: > In file(file, "rt") : unable to resolve 'userid' > > (where 'usesid is my actual userid) > > I've tried the following RCurl commands... > > myURL > ="http://www.frontierweather.com/degreedays/L15N15PowerRegionAverages_10weeks.txt"; > h=getURL(myURL, userpw = "userid:passwod", followlocation=TRUE) > test=read.table(h,header=TRUE,sep=",") > > ..and I can't get the data to read and get the following errors: > > Error in file(file, "rt") : cannot open the connection > In addition: Warning message: > In file(file, "rt") : unable to resolve 'userid' > > I'm at a total loss. Any assistance anyone could provide would be greatly > appreciated. You can get to the file using your web browser and that userid/password combo, right? Do you have to go through a dialog box? Press a button to login? Any of those have the potential to complicate the task. If so, you'll need to work through the Forms section at http://www.omegahat.org/RCurl/philosophy.html Sarah > > > -Original Message- > From: Sarah Goslee [mailto:sarah.gos...@gmail.com] > Sent: Monday, October 03, 2011 2:26 PM > To: Mike Pfeiff > Cc: r-help@r-project.org > Subject: Re: [R] read .csv from web from password protected site > > Hi Mike, > > On Mon, Oct 3, 2011 at 12:31 PM, Mike Pfeiff wrote: >> I am very new to R and have been struggling trying to read a basic ".csv" >> file from a password protected site with the following code: >> >> myURL >> ="http://www.frontierweather.com/degreedays/L15N15PowerRegionAverages_10weeks.txt"; >> test2=read.table(url(myURL),header=TRUE,sep=",") > > >> A 'data.frame' is returned into the workspace, however it is not the data >> contained in the ".csv" file. I think this occurs because the website >> where I am trying to retrieve the data is password protected. >> >> Is there a way to specify the username and password? > > > I'd try first > read.table("http://userid:passw...@my.url/file.csv";), which is the standard > way to do it (hint: try that form in your web browser and see whether you can > access the data), and if that doesn't work look into the RCurl package. The > list archives have a fair bit of information on this topic. > > Sarah > > >> Any guidance would be greatly appreciated. >> >> Sincerely, >> >> Mike >> > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read .csv from web from password protected site
I am very new to R and have been struggling trying to read a basic ".csv" file from a password protected site with the following code: myURL ="http://www.frontierweather.com/degreedays/L15N15PowerRegionAverages_10weeks.txt"; test2=read.table(url(myURL),header=TRUE,sep=",") A 'data.frame' is returned into the workspace, however it is not the data contained in the ".csv" file. I think this occurs because the website where I am trying to retrieve the data is password protected. Is there a way to specify the username and password? Any guidance would be greatly appreciated. Sincerely, Mike [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need help with contourplot figure
I can't figure out how to add tick marks on both my X and Y axis. For example, my X axis ranges from 0 to 1 and there are both a tick mark and a number label at the X-axis values of 0.2,0.4,0.6. and 0.8. I want to add tick marks to the figure at every 0.1 value. This will help a viewer determine the values on the x axis. I included all of my code. I apologize but it is very long. I am created a contourplot figure that will be a jpg. I also included my notes after the # sign so you can see what I am doing. Any help would be greatly appreciated. jpeg(file="C:/Documents and Settings/Michael/My Documents/Mike/amberjack/Reefs_Model/YPRlevel.jpg", width=8,height=8, unit="in", res=300) #location of file and size x<-contourplot(YPR~F*Length, data=yprplot2, at=c(2.0,3.0,4.0,5.0,5.5,6.0,6.25,6.5,6.75,7.0,7.12,7.25,7.35,7.45,7.5), ylim=c(25,40), xlim=c(0,1),xlab = "Fishing Mortality Rate", ylab = "Minimum Size (inches)", panel=function(...){ #This step adds the point for the current YPR value, ... means to leave the function open to bring in other functions panel.levelplot(...) grid.points(0.609,30,pch=8) #pch is the point character where 19 is a closed circle grid.points(0.333,30,pch=8) #pch is the point character where 19 is a closed circle grid.points(eumetric$F, eumetric$Length,pch=18,gp=gpar(col="black", cex=.9))}) #add the eumetric line and make them points #now add the text for the current ypr location print(x) #this brings up the figure I already made grid.text('Fcurrent',0.62,0.35,gp=gpar(col="black", cex=1)) #0.42 and 0.40 is the location of the text on the figure grid.text('Fmsy',0.38,0.35,gp=gpar(col="black", cex=1)) #0.42 and 0.40 is the location of the text on the figure dev.off()#it won't send the pdf until this is added. It turns off the pdf function [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survexp with large dataframes
Hello, and thank you in advance. I would like to capture the expected survival from a coxph model for subjects in an observational study with recurrent events, but the survexp statement is failing due to memory. I am using R version 2.13.1 (2011-07-08) on Windows XP. My objective is to plot the fitted survival with the Kaplan-Meier plot. Below is the code with output and [unfortunately] errors. Is there something wrong in my use of cluster in generating the proportional hazards model, or is there some syntax to pass it into survexp? Mike > dim(dev) [1] 899876 25 > mod1 <- coxph(Surv(begin.cp, end.cp, event) + ~ age.sex + + plan_type + + uw_load + + cluster(mbr_key) + ,data=dev + ) > > summary(mod1) Call: coxph(formula = Surv(begin.cp, end.cp, event) ~ age.sex + plan_type + uw_load + cluster(mbr_key), data = dev) n= 899876, number of events= 753324 coef exp(coef) se(coef) robust se z Pr(>|z|) age.sex19-34_MALE -0.821944 0.439576 0.005529 0.023298 -35.280 < 2e-16 *** age.sex35-49_FEMALE 0.058776 1.060537 0.004201 0.018477 3.181 0.00147 ** age.sex35-49_MALE -0.515590 0.597148 0.004634 0.019986 -25.798 < 2e-16 *** age.sex50-64_FEMALE 0.190940 1.210386 0.004350 0.020415 9.353 < 2e-16 *** age.sex50-64_MALE -0.127514 0.880281 0.004487 0.021431 -5.950 2.68e-09 *** age.sexCHILD_CHILD -0.327522 0.720707 0.004238 0.017066 -19.192 < 2e-16 *** plan_typeLOW-0.165735 0.847270 0.002443 0.011080 -14.958 < 2e-16 *** uw_load1-50 0.215122 1.240014 0.006437 0.029189 7.370 1.71e-13 *** uw_load101-250 0.551042 1.735060 0.003993 0.018779 29.344 < 2e-16 *** uw_load251+ 0.981660 2.668884 0.003172 0.017490 56.126 < 2e-16 *** uw_load51-1000.413464 1.512046 0.006216 0.027877 14.832 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 exp(coef) exp(-coef) lower .95 upper .95 age.sex19-34_MALE 0.4396 2.27490.42000.4601 age.sex35-49_FEMALE1.0605 0.94291.02281.0996 age.sex35-49_MALE 0.5971 1.67460.57420.6210 age.sex50-64_FEMALE1.2104 0.82621.16291.2598 age.sex50-64_MALE 0.8803 1.13600.84410.9180 age.sexCHILD_CHILD 0.7207 1.38750.69700.7452 plan_typeLOW 0.8473 1.18030.82910.8659 uw_load1-501.2400 0.80641.17111.3130 uw_load101-250 1.7351 0.57631.67241.8001 uw_load251+2.6689 0.37472.57892.7620 uw_load51-100 1.5120 0.66141.43161.5970 Concordance= 0.643 (se = 0 ) Rsquare= 0.205 (max possible= 1 ) Likelihood ratio test= 206724 on 11 df, p=0 Wald test= 9207 on 11 df, p=0 Score (logrank) test = 246358 on 11 df, p=0, Robust = 4574 p=0 (Note: the likelihood ratio and score tests assume independence of observations within a cluster, the Wald and robust score tests do not). > dev.fit <- survexp( ~ 1, ratetable=mod1, data=dev) Error in survexp.cfit(cbind(as.numeric(X), R), Y, conditional, FALSE, : cannot allocate memory block of size 15.2 Gb __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R shell line width
Hi, I want to apologize in advance if this has already been asked. I wasn't able to find any information, either on google or from local list search. I'm running an R shell from a linux command line, in an xterm window. Whenever I print a data frame, only the first couple of columns are printed side-by-side, the others are being repositioned below them. It seems something is limiting the line width of the output, even though there is enough horizontal space to fit each row on a single line. For example, this command: > data.frame(matrix(1:30,nrow=1)) prints columns 1-21 on the first line, and the rest 22-30 on the second. Is there a way I can configure R to increase the width of my output? Thanks. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem looping a function to create/add to new dataframe
Dear List Members, I have created a function to run a simulation based on a given set of values within a vector, such that I run the function like this: new.data<-sapply(vector, function) In which I run 'function' on every value within a vector that I created. The result is a matrix of 1000 rows, and as many columns as the length of the vector. I modified the function to utilize two given values, such that it is: function(x,y), and I want to run this function for not only a range of values of 'x', but a range of values of 'y' as well, such that for each value of 'y', I'd create another 1000 rows in the matrix. I was trying to loop this, but in doing so, it looks like I just keep creating datasets and replacing them with datasets for subsequent 'y' values, rather than adding new rows. The structure of my current loop is below. Any suggestions on what to change to accomplish what I want? Would it be good to create a dataframe first and then somehow add these new rows to the dataframe? Also, is it appropriate to have the 'i' in the sapply statement? for (i in c(seq(10,100,10))){ new.data<-sapply(vector, function, i) } Please let me know if more detail on my code would be helpful- I was just trying to keep it simple and focus on what I saw as the problem at hand for now. Thank you for your help. Sincerely, Mike Treglia -- Michael Treglia Applied Biodiversity Sciences NSF-IGERT Program Wildlife and Fisheries Sciences Texas A&M University Lab: (979)862-7245 ml...@tamu.edu http://people.tamu.edu/~mlt35 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storage and single-precision
On Thu, 8 Sep 2011, William Dunlap wrote: Use gzcon() to make a compressed connection and any function that write to a connection will write compressed data. E.g., > con <- gzcon(file("tempfile.junk", "wb")) > x <- as.integer(rep(c(-127, 1, 127), c(3,2,1))) > writeBin(x, con, size=1) > close(con) > q("no") bill:158% zcat tempfile.junk | od --format d1 000 -127 -127 -12711 127 006 (In this tiny example the gzip'ed file is bigger than the equivalent one, but it is gzip'ed.) That's a great function. Thanks for the tip. Apparently it works in both directions. That is, I would also use gzcon for readBin: Description: ‘gzcon’ provides a modified connection that wraps an existing connection, and decompresses reads or compresses writes through that connection. Standard ‘gzip’ headers are assumed. Perfect. Thanks again. Mike__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] storage and single-precision
On Thu, 8 Sep 2011, Duncan Murdoch wrote: On 11-09-07 6:25 PM, Mike Miller wrote: I'm getting the impression from on-line docs that R cannot work with single-precision floating-point numbers, but that it has a pseudo-mode for single precision for communication with external programs. I don't mind that R is using doubles internally, but what about storage? If all I need to store is single-precision (32-bit), can I do that? When it is read back into R it can be converted from single to double (back to 64-bit). Furthermore, the data are numbers from 0.000 to 2.000 with no missing values that could be stored just as accurately as unsigned 16-bit integers from 0 to 2000. That would be the best plan for me. writeBin is quite flexible in converting between formats if you just want to store them on disk. To use nonstandard formats in memory will require external support; it's not easy. Thanks. I can see now that writeBin will store unsigned 16-bit integers, which is what I want. There is one other issue -- with save() I'm allowed to use compression (e.g., gzip), but there doesn't seem to be a compression option in writeBin. Is there a way to get the best of both worlds? The data are highly nonrandom and at most 11 bits will be used per integer, so the compression ratio should be pretty good, if I can have one. Mike __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] storage and single-precision
I'm getting the impression from on-line docs that R cannot work with single-precision floating-point numbers, but that it has a pseudo-mode for single precision for communication with external programs. I don't mind that R is using doubles internally, but what about storage? If all I need to store is single-precision (32-bit), can I do that? When it is read back into R it can be converted from single to double (back to 64-bit). Furthermore, the data are numbers from 0.000 to 2.000 with no missing values that could be stored just as accurately as unsigned 16-bit integers from 0 to 2000. That would be the best plan for me. It looks like the ff package allows for additional formats, so I might try to use ff, but I still would like to get a better understanding of R's native capabilities in regard to representations of numbers both in RAM and in stored data files. Thanks in advance. Best, Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert CSV file to FASTA
> Date: Wed, 31 Aug 2011 01:36:51 -0700 > From: oliviacree...@gmail.com > To: r-help@r-project.org > Subject: [R] Convert CSV file to FASTA > > Hi there, > > I have large excel files which I can save as CSV files. > > Each excel file contains two columns. One contains the chromosome number and > the second contains a DNA sequence. > > I need to convert this into a fasta file that looks like this > >chromosomenumber > CGTCGAGCGTCGAGCGGAGCG > > Can anyone show me an R script to do this? > If you can post a few lines of your "csv" someone can probably give you a bach script to do it. It may be possible in R but sed/awk probbly work better. IIRC, fasta is just a name line followed by sequence. If your csv looks like "name, XX" it may be possible to change comma to space and use awk with something like print ">"$1"\n"$2 etc. > Many thanks > > x > > -- > View this message in context: > http://r.789695.n4.nabble.com/Convert-CSV-file-to-FASTA-tp3780498p3780498.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] questions about "metafor" package
Hi, Emilie. For your second question. You may check Gleser and Olkin (2009). They gave several formulas to estimate the sampling covariance for dependent effect sizes. One of them can be applied in your case. Gleser, L. J., & Olkin, I. (2009). Stochastically dependent effect sizes. In H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of research synthesis and meta-analysis. (2nd ed., pp. 357-376). New York: Russell Sage Foundation. Regards, Mike -- ----- Mike W.L. Cheung Phone: (65) 6516-3702 Department of Psychology Fax: (65) 6773-1843 National University of Singapore http://courses.nus.edu.sg/course/psycwlm/internet/ - On Sat, Aug 20, 2011 at 10:19 PM, Michael Dewey wrote: > At 16:21 17/08/2011, Emilie MAILLARD wrote: > >> Hello, >> à >> I would like to do a meta-analysis with the package ëà metaforà û. >> Ideally I would like to use a mixed model because Iââ¬â¢m interested to >> see >> the effect of some moderators. But the data set I managed to collect from >> literature presents two limits. >> à >> -à à à à à à à à Firstly, for each observation, I have means for a >> treatment and for a control, but I donââ¬â¢t always have corresponding >> standard deviations (52 of a total of 93 observations donââ¬â¢t have >> standard >> deviations). Nevertheless I have the sample sizes for all observations so I >> wonder if it was possible to weight observations by sample size in the >> package ëà metaforà û. >> -à à à à à à à à Secondly, some observations are probably not >> independent >> as I have sometimes several relevant observations for a same design. More >> precisely, for these cases, the control mean is identical but treatment >> means varied. Ideally, I would not like to do a weighted average for these >> non-independent observations because these observations represent levels of >> a moderator. I know that the package ëà metaforà û is not designed >> for the >> analysis of correlated outcomes. What are the dangers of using the package >> even if observations are not really independent ? à >> > > Emilie, > I am not sure whether this is the answer to your problem of observations > which are not independent but you might also look at the metaSEM package > http://courses.nus.edu.sg/**course/psycwlm/internet/**metaSEM/<http://courses.nus.edu.sg/course/psycwlm/internet/metaSEM/> > I am still trying to understand his paper on this (see link for reference) > but he is trying to embed meta-analysis within the structural equation > framework and it may be possible to cope with lack of independence in that > way. But as I say I am still trying to come to grips with the paper. > > > à >> >> Thank you for your help, >> à >> Ãâ°milie. >>[[alternative HTML version deleted]] >> > > Michael Dewey > i...@aghmed.fsnet.co.uk > http://www.aghmed.fsnet.co.uk/**home.html<http://www.aghmed.fsnet.co.uk/home.html> > > > __** > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write.table extra column
Thanks! On Mon, Aug 15, 2011 at 12:55 AM, Jeff Newmiller wrote: > Yup. > > Read ?write.csv and note the row.names argument. > --- > Jeff Newmiller The . . Go Live... > DCN: Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- > Sent from my phone. Please excuse my brevity. > > Mike Hunter wrote: >> >> In the following data.frame there are 6 columns, but 7 are written to >> the CSV file. >> >> install.packages("pmlr") >> library(pmlr) >> data(enzymes) >> write.table(enzymes, sep=",", eol="\n",file="albert.csv") >> >> >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] write.table extra column
In the following data.frame there are 6 columns, but 7 are written to the CSV file. install.packages("pmlr") library(pmlr) data(enzymes) write.table(enzymes, sep=",", eol="\n",file="albert.csv") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] S4 classes, some help with the basics
Thanks Duncan, Martin, You both provided exactly what I needed! Regards, Mike --- XKCD <http://www.xkcd.com> On Mon, Aug 8, 2011 at 5:21 PM, Duncan Murdoch wrote: > On 08/08/2011 8:04 PM, Mike Williamson wrote: > >> Hi All, >> >> I have tried to find an answer within documentation, but I cannot: >> >> o How can call a class "slot" without knowing the name a priori? >> > > See ?slot. > > > >> E.g., let's say I use the "pcaMethods" library to create a "pcaRes" >> object. How can I call parts of that object without using the specific >> names of the object in the call? >> >> example code: >> >> library(pcaMethods) >> myMatrix<- matrix(runif(1e4), ncol=100) ## silly, but sufficient for >> example >> myPCA<- pca(myMatrix) ## creates a "pcaRes" S4 class >> >> for (i in slotNames(myPCA) ) { >> summary(myPCA@i) ### I know this doesn't work, but this is the >> question... what grammar could I use? >> > > summary(slot(myPCA, i)) > > Duncan Murdoch > > } >> >> >> >> I would like to be able to print out the summary for each of the >> components >> of the class "pcaRes" without knowing a priori the names of those >> components. I could, for example in the above case, type in >> summary(myPCA@completeObs) to get a summary of the input matrix. But I >> HAVE >> TO TYPE "@completeObs". In the non-S4 world, I could type myPCA[[i]] for >> lists, where "i" could be looping through either the list names, or the >> list >> indices. Similarly, I could type myPCA[i] for arrays, where "i" again can >> be either a numeric index or the name. >> >> Without this ability to identify portions within an array / loop context, >> it >> becomes exceedingly difficult to work in "S4 land". How is this sort of >> thing done? >> >> Thank you! >> Mike >> >> >> >> --- >> XKCD<http://www.xkcd.com> >> >>[[alternative HTML version deleted]] >> >> __** >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] S4 classes, some help with the basics
Hi All, I have tried to find an answer within documentation, but I cannot: o How can call a class "slot" without knowing the name a priori? E.g., let's say I use the "pcaMethods" library to create a "pcaRes" object. How can I call parts of that object without using the specific names of the object in the call? example code: library(pcaMethods) myMatrix <- matrix(runif(1e4), ncol=100) ## silly, but sufficient for example myPCA <- pca(myMatrix) ## creates a "pcaRes" S4 class for (i in slotNames(myPCA) ) { summary(myPCA@i) ### I know this doesn't work, but this is the question... what grammar could I use? } I would like to be able to print out the summary for each of the components of the class "pcaRes" without knowing a priori the names of those components. I could, for example in the above case, type in summary(myPCA@completeObs) to get a summary of the input matrix. But I HAVE TO TYPE "@completeObs". In the non-S4 world, I could type myPCA[[i]] for lists, where "i" could be looping through either the list names, or the list indices. Similarly, I could type myPCA[i] for arrays, where "i" again can be either a numeric index or the name. Without this ability to identify portions within an array / loop context, it becomes exceedingly difficult to work in "S4 land". How is this sort of thing done? Thank you! Mike --- XKCD <http://www.xkcd.com> [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] running slope
Is there a function in R that will calculate a running linear slope similar to the way the function filter() will calculate a moving average? Mike Byerly Fisheries Biologist Commercial Fisheries Division Alaska Dept. of Fish and Game 3298 Douglas Place Homer, AK 99603 mike.bye...@alaska.gov <mailto:mike.bye...@alaska.gov> PH: 907-235-1745 FX: 907-235-2448 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is R the right choice for simulating first passage times of random walks?
Top posting cuz hotmail decided not to highlight... Personally I would tend to use java or c++ for the inner loops but you could of course later make an R package out of that. This is especially true if your code will be used elsewhere in a performance critical system. For example, I wrote some c++ code for dealing with graphs nothing fancy but it let me play with some data structure ideas and I could then build it into standalone programs or perhaps an R package. Many of these things slow down due to memory incoherence or IO long before you use up the processor. With c++ in principle anyway you have a lot of control over these things. Once you have your results and want to anlyze them, that's when I would use R. Dumping simulation samples to a text file is easy to also lets you use other things like sed/grep or vi to explore as needed. From: paulepan...@users.sourceforge.net To: r-help@r-project.org Date: Thu, 28 Jul 2011 02:00:13 +0200 Subject: Re: [R] Is R the right choice for simulating first passage times of random walks? Dear R folks, Am Donnerstag, den 28.07.2011, 01:36 +0200 schrieb Paul Menzel: > I need to simulate first passage times for iterated partial sums. The > related papers are for example [1][2]. > > As a start I want to simulate how long a simple random walk stays > negative, which should result that it behaves like n^(-½). My code looks > like this. > > 8< code >8 > n = 10 # number of simulations > > length = 10 # length of iterated sum > z = rep(0, times=length + 1) > > for (i in 1:n) { > x = c(0, sign(rnorm(length))) > s = cumsum(x) > for (i in 1:length) { > if (s[i] < 0 && s[i + 1] >= 0) { > z[i] = z[i] + 1 > } > } > } > plot(1:length(z), z/n) > curve(x**(-0.5), add = TRUE) > 8< code >8 Of course the program above is not complete, because it only checks for the first passage from negativ to positive. `if (s[2] < 0) {}` should be added before the for loop. > This code already runs for over half an hour on my system¹. > > Reading about the for loop [3] it says to try to avoid loops and I > probably should use a matrix where every row is a sample. > > Now my first problem is that there is no matrix equivalent for > `cumsum()`. Can I use matrices to avoid the for loop? I mean the inner for loop. Additionally I wonder if `cumsum` is really faster or if I should sum the elements by myself and check after every step if the walk gets non-negative/0. With a length of 100 this should save some cycles. On the other hand adding numbers should be really fast and adding checks in between could potentially be slower. > My second question is, is R the right choice for such simulations? It > would be great when R can also give me a confidence interval(?) and also > try to fit a curve through the result and give me the rule of > correspondence(?) [4]. Do you have pointers for those? > > I glanced at simFrame [5] and read `?simulate` but could not understand > it right away and think that this might be overkill. > > Do you have any suggestions? Thanks, Paul > ¹ AMD Athlon(tm) X2 Dual Core Processor BE-2350, 2,1 GHz > > > [1] http://www-stat.stanford.edu/~amir/preprints/irw.ps > [2] http://arxiv.org/abs/0911.5456 > [3] http://cran.r-project.org/doc/manuals/R-intro.html#Repetitive-execution > [4] https://secure.wikimedia.org/wikipedia/en/wiki/Function_(mathematics) > [5] http://finzi.psych.upenn.edu/R/library/simFrame/html/runSimulation.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Life Cycle Assessment with R.
Date: Mon, 25 Jul 2011 19:03:08 +0100 From: jbustosm...@yahoo.es To: r-help@r-project.org Subject: [R] Life Cycle Assessment with R. Hello everyone, There's something really important about climate change and how many institutions around the globe are looking for softwares solutions in order to achieve they (and everyone) needs to improve life conditions in all the planet. Currently, they're many comercial softwares working with this important topic named as: "Life Cycle Assesment", monitoring carbon emition, but as many of you may know, commercial softwares controlling or managing our planet could be another big mistake. To sum up briefly, it could be a good idea creating a R package doing Life Cycle Assessment (if has not being created) in order to gain a better understanding and making these important decisions about global warming and how can we as humanity control how Carbon Footprint is measured by commercial or not commercial propouses. Who knows if there's people working in Life Cycle Assesment (carbon emition) with R? or If there's someone interested in doing a package about it, please let me know! I explain this here, because of the R philosophy.Best wishes!José BustosChilean Biostatistician www.aespro.cl [ my text below] Well, generally R packages are more general purpose tools than specific applications such as this- although there may be an iphone save the world app LOL. I have no idea but usually the issue here is getting the required data in an open form. Many govt agencies think excel is "open" and that you would not want to do an analysis that wasn't supported in such popular software. This comes up with financial data all the time, I even asked for account information in csv and I got a reply, "thank you for asking about exporting to excel" and a detailed explanation that was completely irrelevant. Modelling of complicated systems is often well complicated and predictions about the future involve assumptions that are often made to suit the needs of the immediate analyst. On topics which involve money or emotion, getting unbiased analysis is impossible and getting data can be very difficult. This list is not designed for advocacy or even discussion of analysis results but ability to get data in form usable by R may be or more general interest to those seeking help on R. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there an R program that produces optimal solution/mix of multiple samples' varying volumes and values
> Date: Mon, 25 Jul 2011 11:39:22 -0700 > From: lukescore...@gmail.com > To: r-help@r-project.org > Subject: [R] Is there an R program that produces optimal solution/mix of > multiple samples' varying volumes and values > > Sorry about the lengthy subject line. > Anyone know of an R' program that can look at several sources' varying > available volumes/amounts and their individual set of values, compared > to a "target" range/curve for these values, to find the optimal > mixture(s) of these possible sources for the desired curve, and for a > specified amount? I hope that makes sense as a reader. Well, whatever you are talking about you need to write some error as a function of your fit parameters and decide if you can minimize it analytically or not. If you can do it analytically, then there are probably matrix functions you want. If not, there are several optimizers that should come up on google. If you are trying to express some data in terms of nice basis set that may be different than unknown collection of things. For example, if you have sin/cos curves fft may work, polynomial something else etc. Often when people ask questions like this, they try to fit to a collection of things they think may work and then the optimizer gets stuck since it can't optimize a and b for a*x + b*x etc so make sure your error function has a specific "best" fit etc. Generally coding mistakes can be addressed on the list if you get that far. > > Thanks for your time. > Luke __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xml2-config issues
Date: Fri, 22 Jul 2011 20:06:34 -0600 From: abmathe...@gmail.com To: r-help@r-project.org Subject: [R] xml2-config issues I'm trying to install the XML package on Ubuntu 10.10, and I keep getting a warning message the XML could not be found and had non-zero exit status. How can I fix this problem? > install.packages() Loading Tcl/Tk interface ... done --- Please select a CRAN mirror for use in this session --- Installing package(s) into ‘/home/amathew/R/i686-pc-linux-gnu-library/2.13’ (as ‘lib’ is unspecified) trying URL ' http://streaming.stat.iastate.edu/CRAN/src/contrib/XML_3.4-0.tar.gz' Content type 'application/x-gzip' length 896195 bytes (875 Kb) opened URL == downloaded 875 Kb * installing *source* package ‘XML’ ... checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -E No ability to remove finalizers on externalptr objects in this verison of R checking for sed... /bin/sed checking for pkg-config... /usr/bin/pkg-config checking for xml2-config... no Cannot find xml2-config ERROR: configuration failed for package ‘XML’ * removing ‘/home/amathew/R/i686-pc-linux-gnu-library/2.13/XML’ [ my text, hotmail won't highlight original ] you probably need to get libxml2 and then you should be able to run xml2-config from command line to verfiy it is there. This is a specific pkg-config ( man pkg-config for details ). For some non-R things yum or apt-get has not gotten the most recent versions but generally your package manager should be just fine. The downloaded packages are in ‘/tmp/Rtmp2V4huR/downloaded_packages’ Warning message: In install.packages() : installation of package 'XML' had non-zero exit status Thank You, Abraham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Cox model approximaions (was "comparing SAS and R survival....)
> From: thern...@mayo.edu > To: abouesl...@gmail.com > Date: Fri, 22 Jul 2011 07:04:15 -0500 > CC: r-help@r-project.org > Subject: Re: [R] Cox model approximaions (was "comparing SAS and R > survival) > > For time scale that are truly discrete Cox proposed the "exact partial > likelihood". I call that the "exact" method and SAS calls it the > "discrete" method. What we compute is precisely the same, however they > use a clever algorithm which is faster. To make things even more > confusing, Prentice introduced an "exact marginal likelihood" which is > not implemented in R, but which SAS calls the "exact" method. > > Data is usually not truly discrete, however. More often ties are the > result of imprecise measurement or grouping. The Efron approximation > assumes that the data are actually continuous but we see ties because of > this; it also introduces an approximation at one point in the > calculation which greatly speeds up the computation; numerically the > approximation is very good. > In spite of the irrational love that our profession has for anything > branded with the word "exact", I currently see no reason to ever use > that particular computation in a Cox model. I'm not quite ready to > remove the option from coxph, but certainly am not going to devote any > effort toward improving that part of the code. > > The Breslow approximation is less accurate, but is the easiest to > program and therefore was the only method in early Cox model programs; > it persists as the default in many software packages because of history. > Truth be told, unless the number of tied deaths is quite large the > difference in results between it and the Efron approx will be trivial. > > The worst approximation, and the one that can sometimes give seriously > strange results, is to artificially remove ties from the data set by > adding a random value to each subject's time. Care to elaborate on this at all? First of course I would agree that doing anything to the data, or making up data, and then handing it to an analysis tool that doesn't know you maniplated it can be a problem ( often called interpolation or something with a legitimate name LOL). However, it is not unreasonable to do a sensitivity analysis by adding noise and checking the results. Presumaably adding noise to remove things the algorighm doesn't happen to like would work but you would need to take many samples and examine stats of how your broke the ties. Now if the model is bad to begin with or the data is so coarsely binned that you can't get much out of it then ok. I guess in this case, having not thought about it too much, ties would be most common either with lots of data or if hazards spiked over time scales simlar to your measurement precision or if the measurement resolution is not comparable to hazard rate. In the latter 2 cases of course the approach is probably quite limited . Consider turning exponential curves into step functions for example. > > Terry T > > > --- begin quote -- > I didn't know precisely the specifities of each approximation method. > I thus came back to section 3.3 of Therneau and Grambsch, Extending the > Cox > Model. I think I now see things more clearly. If I have understood > correctly, both "discrete" option and "exact" functions assume "true" > discrete event times in a model approximating the Cox model. Cox partial > likelihood cannot be exactly maximized, or even written, when there are > some > ties, am I right ? > > In my sample, many of the ties (those whithin a single observation of > the > process) are due to the fact that continuous event times are grouped > into > intervals. > > So I think the logistic approximation may not be the best for my problem > despite the estimate on my real data set (shown on my previous post) do > give [[elided Hotmail spam]] > I was thinking about distributing the events uniformly in each interval. > What do you think about this option ? Can I expect a better > approximation > than directly applying Breslow or Efron method directly with the grouped > event data ? Finally, it becomes a model problem more than a > computationnal > or algorithmic one I guess. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Different result of multiple regression in R and SPSS
> From: dwinsem...@comcast.net > To: seoulseoulse...@gmail.com > Date: Tue, 19 Jul 2011 18:45:47 -0400 > CC: r-help@r-project.org > Subject: Re: [R] Different result of multiple regression in R and SPSS > > > On Jul 19, 2011, at 6:29 PM, J. wrote: > > > Thanks for the answer. > > > > However, I am still curious about which result I should use? The > > result from > > R or the one from SPSS? > > It is becoming apparent that you do not know how to use the results > from either system. The progress of science would be safer if you get > some advice from a person that knows what they are doing. > > > Why the results from two programs are different? > > Different parametrizations. If I had to guess I would bet that the > gender coefficient is R is exactly twice that of the one from SPSS. > They are probably both correct in the context of their respective > codings. I guess I would also suggest, again, run some samples with known data sets and see what you get(RSSWKDSASWYG). You would want to do this anyway if you want to insure your real data is being used reasonably. You still need to have some way to check your opinion from the expert mentioned above and known data will help there too. A factor of 2 often shows up from just looking at pictures once you have some intuition. I've often been wrong on intuition, but chasing it down and proving it helps you learn a lot :) > > -- > David Winsemius, MD > West Hartford, CT > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.