Re: [R] Problem in installing and starting Rattle
On 15 November 2010 15:30, kgorahava kgorah...@gmail.com wrote: Hello, I am trying to install and run Rattle on my Dell Laptop and I have Windows 7 OS. The following three commands executed successfully : install.packages(RGtk2) install.packages(rattle) library(rattle) Rattle: Graphical interface for data mining using R. Version 2.5.47 Copyright (c) 2006-2010 Togaware Pty Ltd. Type 'rattle()' to shake, rattle, and roll your data. When I type the below command, I get an error message. rattle() Error in inDL(x, as.logical(local), as.logical(now), ...) : unable to load shared object 'C:/Users/Kaushik/Documents/R/win-library/2.12/RGtk2/libs/i386/RGtk2.dll': LoadLibrary failure: The specified module could not be found. Failed to load RGtk2 dynamic library, attempting to install it. trying URL ' http://downloads.sourceforge.net/gtk-win/gtk2-runtime-2.22.0-2010-10-21-ash.exe?download ' Content type 'application/octet-stream' length 7820679 bytes (7.5 Mb) opened URL downloaded 7.5 Mb Learn more about GTK+ at http://www.gtk.org If the package still does not load, please ensure that GTK+ is installed and that it is on your PATH environment variable IN ANY CASE, RESTART R BEFORE TRYING TO LOAD THE PACKAGE AGAIN Error in as.GType(type) : Cannot convert RGtkBuilder to GType rattle() Error in as.GType(type) : Cannot convert RGtkBuilder to GType Why do I get this error message ? Please advice. It looks like you did not install the gtk2 package (independent of R) first. But in the process, RGtk2 noticed this and it looks like it installed it okay. You might like to follow the instructions that are shouted at you as all capitals: IN ANY CASE, RESTART R BEFORE TRYING TO LOAD THE PACKAGE AGAIN Hope that works. Installation instructions for Rattle can be found at: http://datamining.togaware.com/survivor/Install_MS_Windows.html Regards, Graham [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems loading xlsx
Thanks Jim that is my second best option but would hope to get xlsx working as it seems a more coveninet fit to xlsx files Paolo On 15 November 2010 00:12, jim holtman jholt...@gmail.com wrote: If you are running on WIndows, I would suggest that you use the RODBC package and the odbcConnectExcel2007 function. I have had reasonable success with this. On Sun, Nov 14, 2010 at 3:23 PM, Paolo Rossi statmailingli...@googlemail.com wrote: Hi all, I am trying to run the package xlsx to read Excel 2007 files and I am getting the error below. library(xlsx) Loading required package: xlsxjars Loading required package: rJava Error : .onLoad failed in loadNamespace() for 'xlsxjars', details: call: .jinit() error: cannot obtain Class.getSimpleName method ID Error: package 'xlsxjars' could not be loaded By looking up this in the mailing list I have seen that it is an error related to the configuration of the path. I was also made aware that the path read into R gets truncated if it is too long. To avoid any issue I have added the jre at the very beginning of the path - see below p = Sys.getenv(PATH) strsplit(p,;) $PATH [1] c:\\Program Files\\Java\\j2re1.4.2_06\\bin\\client\\ [2] c:\\Program Files\\Java\\jre1.5.0_06\\bin\\client\\ [3] c:\\oracle\\ora92\\bin\\ [4] c:\\oracle\\ora92\\jre\\1.4.2\\bin\\ [5] c:\\oracle\\ora92\\jre\\1.4.2\\bin\\client\\ [6] c:\\program files\\oracle\\jre\\1.3.1\\bin\\ [7] C:\\WINDOWS\\system32 [8] C:\\WINDOWS In the path variable the items have been pasted like this: c:\Program Files\Java\j2re1.4.2_06\bin\client\;c:\Program Files\Java\jre1.5.0_06\bin\client\; The issue still persists. Can you please help? Thanks Paolo [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R package 'np' problems
Chris Carleton wrote: Hi List, I'm trying to get a density estimate for a point of interest from an npudens object created for a sample of points. I'm working with 4 variables in total (3 continuous and 1 unordered discrete - the discrete variable is the character column in training.csv). When I try to evaluate the density for a point that was not used in the training dataset, and when I extract the fitted values from the npudens object itself, I'm getting values that are much greater than 1 in some cases, which, if I understand correctly, shouldn't be possible considering a pdf estimate can only be between 0 and 1. I think I must be doing something wrong, but I can't see it. Attached I've included the training data (training.csv) and the point of interest (origin.csv); below I've included the code I'm using and the results I'm getting. I also don't understand why, when trying to evaluate the npudens object at one point, I'm receiving the same set of fitted values from the npudens object with the predict() function. It should be noted that I'm indexing the dataframe of training data in order to get samples of the df for density estimation (the samples are from different geographic locations measured on the same set of variables; hence my use of sub-setting by [i] and removing columns from the df before running the density estimation). Moreover, in the example I'm providing here, the point of interest does happen to come from the training dataset, but I'm receiving the same results when I compare the point of interest to samples of which it is not a part (density estimates that are either extremely small, which is acceptable, or much greater than one, which doesn't seem right to me). Any thoughts would be greatly appreciated, Chris I haven't looked at this in any detail, but why do say that pdf values cannot exceed 1? That's certainly not true in general. -Peter Ehlers fitted(npudens(tdat=training_df[training_cols_select][training_df$cat == i,])) [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18 [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19 [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19 [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19 [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18 [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.91e+18 [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18 npu_dens - npudens(tdat=training_df[training_cols_select][training_df$cat == i,]) summary(npu_dens) Density Data: 35 training points, in 4 variable(s) aster_srtm_aspect aster_srtm_dem_filled aster_srtm_slope Bandwidth(s): 29.22422 2.500559e-24 3.111467 class_unsup_pc_iso Bandwidth(s): 0.2304616 Bandwidth Type: Fixed Log Likelihood: 1531.598 Continuous Kernel Type: Second-Order Gaussian No. Continuous Vars.: 3 Unordered Categorical Kernel Type: Aitchison and Aitken No. Unordered Categorical Vars.: 1 predict(npu_dens,newdata=origin[training_cols_select])) [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18 [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19 [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19 [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19 [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18 [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.91e+18 [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Version 2.12.0 exe file
I have R version 2.9.1 on my computer and the anlaysis is not working because I need to update to R version 2.12.0 the latest release. The person incharge of IT tried to download R version 2.12.0 but .exe file referenced in install isn't there -What might we be doing wrong? We have downloaded as tar.gz, uncompress then look for .exe file but not present. Many thanks, Morris Morris A. Anglin Doctoral candidate Institute of Health and Society Newcastle University Baddiley-Clark Building Richardson Road NE2 4AX UK Direct line at Newcastle University: +44 (0)191 222 8899 Direct line at Freeman Hospital: Tel: +44 (0) 191 233 6161 ext 26727 Mobile: +44 (0)07976206103 Email:morris.ang...@ncl.ac.uk Email:morris.ang...@nuth.nhs.uk http://www.ncl.ac.uk/ihs/postgrad/research/studentprofile.htm To me, diversity is more complicated than rituals, food, clothing, etc. ...or race, language, and class, etc. To me, diversity needs to include divergent viewpoints, divergent interpretations, and divergent perspectives. Thus, diversity needs to include giving power and sanctioned space for divergent voices, even if this means disagreeing. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] cannot see the y-labels (getting cut-off)
jim holtman wrote: increase the margins on the plot: par(mar=c(4,7,2,1)) plot(1:5,y,ylab='',yaxt='n' ); axis(2, at=y, labels=formatC(y,big.mark=,,format=fg),las=2,cex=0.1); That's what I would do, but if you want to see how cex works, use cex.axis=0.5. Check out ?par. -Peter Ehlers On Sun, Nov 14, 2010 at 6:03 PM, sachinthaka.abeyward...@allianz.com.au wrote: Hi All, When I run the following code, I cannot see the entire number. As opposed to seeing 1,000,000,000. I only see 000,000 because the rest is cut off. The cex option doesn't seem to be doing anything at all. y-seq(1e09,5e09,1e09); plot(1:5,y,ylab='',yaxt='n' ); axis(2, at=y, labels=formatC(y,big.mark=,,format=fg),las=2,cex=0.1); Any thoughts? Thanks, Sachin p.s. sorry about corporate notice. --- Please consider the environment before printing this email --- Allianz - Best General Insurance Company of the Year 2010* Allianz - General Insurance Company of the Year 2009+ * Australian Banking and Finance Insurance Awards + Australia and New Zealand Insurance Industry Awards This email and any attachments has been sent by Allianz ...{{dropped:3}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Version 2.12.0 exe file
Morris Anglin wrote: I have R version 2.9.1 on my computer and the anlaysis is not working because I need to update to R version 2.12.0 the latest release. The person incharge of IT tried to download R version 2.12.0 but .exe file referenced in install isn't there -What might we be doing wrong? We have downloaded as tar.gz, uncompress then look for .exe file but not present. (It's a good idea to state your OS.) Anyway, assuming that you got your download from your favourite CRAN repository, you may have missed these two sentences: The sources have to be compiled before you can use them. If you do not know what this means, you probably do not want to do it! Pre-compiled versions are available at CRAN. -Peter Ehlers Many thanks, Morris Morris A. Anglin Doctoral candidate Institute of Health and Society Newcastle University Baddiley-Clark Building Richardson Road NE2 4AX UK Direct line at Newcastle University: +44 (0)191 222 8899 Direct line at Freeman Hospital: Tel: +44 (0) 191 233 6161 ext 26727 Mobile: +44 (0)07976206103 Email:morris.ang...@ncl.ac.uk Email:morris.ang...@nuth.nhs.uk http://www.ncl.ac.uk/ihs/postgrad/research/studentprofile.htm To me, diversity is more complicated than rituals, food, clothing, etc. ...or race, language, and class, etc. To me, diversity needs to include divergent viewpoints, divergent interpretations, and divergent perspectives. Thus, diversity needs to include giving power and sanctioned space for divergent voices, even if this means disagreeing. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Path to nodes in ctree package party
Hallo list, I'm wondering if there is a way to extract the path to terminal nodes in a BinaryTree object, e.g. ctree, like the function path.rpart in package rpart. Thanks, Sven __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem in installing and starting Rattle
kgorahava wrote: I am trying to install and run Rattle on my Dell Laptop and I have Windows 7 OS. install.packages(RGtk2) install.packages(rattle) library(rattle) Rattle: Graphical interface for data mining using R. Version 2.5.47 Copyright (c) 2006-2010 Togaware Pty Ltd. Type 'rattle()' to shake, rattle, and roll your data. When I type the below command, I get an error message. This had bugged me before, and reinstalling Rgtk over and over did not help. I got it to work after total cleanup of all GTk related files (use a global search!) and reinstalling Gtk. So old-version files had come into the way. Dieter -- View this message in context: http://r.789695.n4.nabble.com/Problem-in-installing-and-starting-Rattle-tp3042502p3042857.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem in installing and starting Rattle
On Mon, 15 Nov 2010, Dieter Menne wrote: kgorahava wrote: I am trying to install and run Rattle on my Dell Laptop and I have Windows 7 OS. install.packages(RGtk2) install.packages(rattle) library(rattle) Rattle: Graphical interface for data mining using R. Version 2.5.47 Copyright (c) 2006-2010 Togaware Pty Ltd. Type 'rattle()' to shake, rattle, and roll your data. When I type the below command, I get an error message. This had bugged me before, and reinstalling Rgtk over and over did not help. I got it to work after total cleanup of all GTk related files (use a global search!) and reinstalling Gtk. So old-version files had come into the way. If they are in your PATH, they will (and we do also warn that 32-bit DLLs in the path for 64-bit R or v.v. will lead to confusion). So just check your PATH: you can have multiple Gtk+ installations, and you need to if you use both 32- and 64-bit R (but you need to set the PATH appropriately). Dieter -- View this message in context: http://r.789695.n4.nabble.com/Problem-in-installing-and-starting-Rattle-tp3042502p3042857.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem with lm4 and lapack library
I have R under Linux Kubuntu Hardy Heron 8.04 LTS. I am trying to install package 'lme4' and get the following error: * installing *source* package ‘lme4’ ... ** libs gcc -std=gnu99 -I/usr/share/R/include -I/usr/share/R/include -I/usr/lib/R/library/Matrix/include -I/usr/lib/R/library/stats/include -fpic -g -O2 -c init.c -o init.o gcc -std=gnu99 -I/usr/share/R/include -I/usr/share/R/include -I/usr/lib/R/library/Matrix/include -I/usr/lib/R/library/stats/include -fpic -g -O2 -c lmer.c -o lmer.o gcc -std=gnu99 -I/usr/share/R/include -I/usr/share/R/include -I/usr/lib/R/library/Matrix/include -I/usr/lib/R/library/stats/include -fpic -g -O2 -c local_stubs.c -o local_stubs.o gcc -std=gnu99 -shared -o lme4.so init.o lmer.o local_stubs.o -L/usr/lib64/R/lib -lRlapack -lblas -lgfortran -lm -lgcc_s -L/usr/lib64/R/lib -lR /usr/bin/ld: cannot find -lRlapack collect2: ld devolvió el estado de salida 1 make: *** [lme4.so] Error 1 ERROR: compilation failed for package ‘lme4’ * removing ‘/usr/local/lib/R/site-library/lme4’ The downloaded packages are in ‘/tmp/Rtmp7uCMsh/downloaded_packages’ Mensajes de aviso perdidos In install.packages(lme4) : installation of package 'lme4' had non-zero exit status - Surfing on internet and R hel system it seems the problem is related to LAPACK, LBAS. ATLAS libraries. R tries to find unsuccessfully -lRlapack I have checked which libraries R is pointing to: --- bu...@temisto:~$ ldd /usr/lib/R/bin/exec/R linux-vdso.so.1 = (0x7fff1e10) libR.so = /usr/lib/R/lib/libR.so (0x7f9157c54000) libc.so.6 = /lib/libc.so.6 (0x7f91578f2000) libblas.so.3gf = /usr/lib/atlas/libblas.so.3gf (0x7f9156f42000) libgfortran.so.2 = /usr/lib/libgfortran.so.2 (0x7f9156c83000) libm.so.6 = /lib/libm.so.6 (0x7f9156a02000) libreadline.so.5 = /lib/libreadline.so.5 (0x7f91567c2000) libpcre.so.3 = /usr/lib/libpcre.so.3 (0x7f915659c000) libz.so.1 = /usr/lib/libz.so.1 (0x7f9156385000) libdl.so.2 = /lib/libdl.so.2 (0x7f9156181000) /lib64/ld-linux-x86-64.so.2 (0x7f91581c) libncurses.so.5 = /lib/libncurses.so.5 (0x7f9155f46000) --- Also if I have the LAPACK and ATLAS libraries installed: bu...@temisto:~$ dpkg -l | grep lapack ii lapack3 3.0.2531a-6.1ubuntu1 ii liblapack-dev 3.1.1-0.3ubuntu2 ii liblapack3gf 3.1.1-0.3ubuntu2 bu...@temisto:~$ dpkg -l | grep atlas ii atlas3-base3.6.0-20.6 ii libatlas-base-dev 3.6.0-21.1ubuntu3 ii libatlas-headers 3.6.0-21.1ubuntu3 ii libatlas3gf-base 3.6.0-21.1ubuntu3 --- R CMD config BLAS_LIBS -lblas R CMD config LAPACK_LIBS -L/usr/lib64/R/lib -lRlapack R seems to be looking for LAPACK in /usr/lib64/R/lib -lRlapack, probably the wrong place. Can somebody point me to a possible solution? -- Dr Javier Bustamante Estación Biológica de Doñana, CSIC Dept. Wetland Ecology Américo Vespucio s/n 41092 - Sevilla Spain voice: +34 954 466700 ext. 1217 fax: +34 954 621125 e-mail: jbustama...@ebd.csic.es http://www.ebd.csic.es/bustamante/index.html -other links--- http://last-ebd.blogspot.com/ http://euroconbio.blogspot.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave question
Duncan Murdoch-2 wrote: See SweavePDF() in the patchDVI package on R-forge. In case googling patchDVI only show a few Japanese Pages, and search for patchDVI in R-Forge gives nothing: try https://r-forge.r-project.org/projects/sweavesearch/ (or did I miss something obvious, Duncan?) Dieter -- View this message in context: http://r.789695.n4.nabble.com/Sweave-question-tp3041257p3042892.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave question
On 15/11/2010 6:22 AM, Dieter Menne wrote: Duncan Murdoch-2 wrote: See SweavePDF() in the patchDVI package on R-forge. In case googling patchDVI only show a few Japanese Pages, and search for patchDVI in R-Forge gives nothing: try https://r-forge.r-project.org/projects/sweavesearch/ (or did I miss something obvious, Duncan?) No, I just didn't realize that it was hard to find. But you can always select R-forge as a repository, and then install.packages() will find it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mgarch-BEKK
Dear all.. Can anybody help me with mgarchBEKK? After estimate bekk model, i want to check whether the residuals meet the required assumptions. Can i perform Portmanteau test, the ARCH-LM test, plots of the AC and PAC functions of the residuals? Can you give some example with the script in R? Please.. Thank You So Much [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing levels of aggregation with negative binomial models
Dear R community, I would like to compare the degree of aggregation (or dispersion) of bacteria isolated from plant material. My data are discrete counts from leaf washes. While I do have xy coordinates for each plant, it is aggregation in the sense of the concentration of bacteria in high density patches that I am interested in. My attempt to analyze this was to fit negative binomial glms to each of my leaf treatments (using MASS) and to compare estimates of theta and use the standard errors to calculate confidence limits. My values of theta (se) were 0.387 (0.058) and 0.1035 (0.015) which were in the right direction for my hypothesis. However, some of the stats literature suggests that the confidence intervals of theta (or k) are not very robust and it would be better to calculate confidence intervals for 1/k. Is there a way I can estimate confidence intervals for 1/k in R, or indeed a more elegant way of looking at aggregation? Many thanks for your time. yours, Dr Ben Raymond, NERC Advanced Research Fellow, Lecturer in Population Genetics, School of Biological Sciences, Royal Holloway University of London, Egham, Surrey. TW20 0EX tel 0044 1784443547 ben.raym...@rhul.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to move an internal function to external keeping same environment?
Hi I have within a quite big function foo1, an internal function foo2. Now, in order to have a cleaner code, I wish to have the internal foo2 as external. This foo2 was using arguments within the foo1 environment that were not declared as inputs of foo2, which works as long as foo2 is within foo1, but not anymore if foo2 is external, as is the case now. Now, I could add all those arguments as inputs to foo2, but I feel if foo2 is called often, I would be copying those objects more than required. Am I wrong? I then used this to avoid to declare explcitely each argument to foo2: foo1-function(x){ b-x[1]+2 environment(foo2)-new.env(parent =as.environment(-1)) c-foo2(x) return(c) } foo2-function(x) x*b #try: foo1(1:100) This works. But I wanted to be sure: -am I right that if I instead declare each element to be passed to foo2, this would be more copying than required? (imagine b in my case a heavy dataset, foo2 a long computation) -is this lines environment(foo2)-new.env(parent =as.environment(-1)) the good way to do it or it can have unwanted implications? Thanks a lot!! Matthieu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to move an internal function to external keeping same environment?
On Mon, Nov 15, 2010 at 7:48 AM, Matthieu Stigler matthieu.stig...@gmail.com wrote: Hi I have within a quite big function foo1, an internal function foo2. Now, in order to have a cleaner code, I wish to have the internal foo2 as external. This foo2 was using arguments within the foo1 environment that were not declared as inputs of foo2, which works as long as foo2 is within foo1, but not anymore if foo2 is external, as is the case now. Now, I could add all those arguments as inputs to foo2, but I feel if foo2 is called often, I would be copying those objects more than required. Am I wrong? I then used this to avoid to declare explcitely each argument to foo2: foo1-function(x){ b-x[1]+2 environment(foo2)-new.env(parent =as.environment(-1)) c-foo2(x) return(c) } foo2-function(x) x*b #try: foo1(1:100) This works. But I wanted to be sure: -am I right that if I instead declare each element to be passed to foo2, this would be more copying than required? (imagine b in my case a heavy dataset, foo2 a long computation) -is this lines environment(foo2)-new.env(parent =as.environment(-1)) the good way to do it or it can have unwanted implications? This would be good enough (replacing your environment(foo2)-... line): environment(foo2) - environment() If you add parameters to foo2 it won't actually copy them unless they are modified in foo2. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to move an internal function to external keeping same environment?
On 15/11/2010 7:48 AM, Matthieu Stigler wrote: Hi I have within a quite big function foo1, an internal function foo2. Now, in order to have a cleaner code, I wish to have the internal foo2 as external. This foo2 was using arguments within the foo1 environment that were not declared as inputs of foo2, which works as long as foo2 is within foo1, but not anymore if foo2 is external, as is the case now. Now, I could add all those arguments as inputs to foo2, but I feel if foo2 is called often, I would be copying those objects more than required. Am I wrong? I then used this to avoid to declare explcitely each argument to foo2: foo1-function(x){ b-x[1]+2 environment(foo2)-new.env(parent =as.environment(-1)) c-foo2(x) return(c) } foo2-function(x) x*b #try: foo1(1:100) This works. But I wanted to be sure: -am I right that if I instead declare each element to be passed to foo2, this would be more copying than required? (imagine b in my case a heavy dataset, foo2 a long computation) -is this lines environment(foo2)-new.env(parent =as.environment(-1)) the good way to do it or it can have unwanted implications? I don't think modifying the environment of a closure could be seen as a way to get cleaner code. I'd just leave foo2 within foo1. There are several unwanted implications of the way you do it: - Setting the environment of foo2 to the foo1 evaluation frame means those local variables won't be garbage collected. If foo1 creates a large temporary matrix, it will take up space in memory until you modify foo2 again. - If we say that foo2 has global scope (it might be limited to your package namespace, but let's call that global), then your foo1 has global side effects, and those are often a bad idea. For example, suppose next week you define foo3 that also uses and modifies foo2. It's very easy for foo1 and foo3 to clash with conflicting use of the global foo2. Don't use globals unless you really have to. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R package 'np' problems
Hi Peter and List, I realized the err of my ways here. Thanks for the response; I appreciate the help. The struggles of self-taught statistics and maths continue! Chris On 15 November 2010 04:34, P Ehlers ehl...@ucalgary.ca wrote: Chris Carleton wrote: Hi List, I'm trying to get a density estimate for a point of interest from an npudens object created for a sample of points. I'm working with 4 variables in total (3 continuous and 1 unordered discrete - the discrete variable is the character column in training.csv). When I try to evaluate the density for a point that was not used in the training dataset, and when I extract the fitted values from the npudens object itself, I'm getting values that are much greater than 1 in some cases, which, if I understand correctly, shouldn't be possible considering a pdf estimate can only be between 0 and 1. I think I must be doing something wrong, but I can't see it. Attached I've included the training data (training.csv) and the point of interest (origin.csv); below I've included the code I'm using and the results I'm getting. I also don't understand why, when trying to evaluate the npudens object at one point, I'm receiving the same set of fitted values from the npudens object with the predict() function. It should be noted that I'm indexing the dataframe of training data in order to get samples of the df for density estimation (the samples are from different geographic locations measured on the same set of variables; hence my use of sub-setting by [i] and removing columns from the df before running the density estimation). Moreover, in the example I'm providing here, the point of interest does happen to come from the training dataset, but I'm receiving the same results when I compare the point of interest to samples of which it is not a part (density estimates that are either extremely small, which is acceptable, or much greater than one, which doesn't seem right to me). Any thoughts would be greatly appreciated, Chris I haven't looked at this in any detail, but why do say that pdf values cannot exceed 1? That's certainly not true in general. -Peter Ehlers fitted(npudens(tdat=training_df[training_cols_select][training_df$cat == i,])) [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18 [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19 [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19 [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19 [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18 [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.91e+18 [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18 npu_dens - npudens(tdat=training_df[training_cols_select][training_df$cat == i,]) summary(npu_dens) Density Data: 35 training points, in 4 variable(s) aster_srtm_aspect aster_srtm_dem_filled aster_srtm_slope Bandwidth(s): 29.22422 2.500559e-24 3.111467 class_unsup_pc_iso Bandwidth(s): 0.2304616 Bandwidth Type: Fixed Log Likelihood: 1531.598 Continuous Kernel Type: Second-Order Gaussian No. Continuous Vars.: 3 Unordered Categorical Kernel Type: Aitchison and Aitken No. Unordered Categorical Vars.: 1 predict(npu_dens,newdata=origin[training_cols_select])) [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18 [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19 [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19 [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19 [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18 [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.91e+18 [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [klaR package] [NaiveBayes] warning message numerical 0 probability
On 03.11.2010 16:26, Fabon Dzogang wrote: Hi, I run R 2.10.1 under ubuntu 10.04 LTS (Lucid Lynx) and klaR version 0.6-4. I compute a model over a 2 classes dataset (composed of 700 examples). To that aim, I use the function NaiveBayes provided in the package klaR. When I then use the prediction function : predict(my_model, new_data). I get the following warning : In FUN(1:747[[747L]], ...) : Numerical 0 probability with observation 458 As I did not find any documentation or any discussion concerning this warning message, I looked in the klaR source code and found the following line in predict.NaiveBayes.R : warning(Numerical 0 probability with observation , i) Within Naive Bayes in order to calculate the posteriori probabilities of the classes it is necessary to calculate the probabilities of the observations given the classes. The function NaiveBayes prints a warning if all these probabilities are numerical 0, i.e. that the observation has a numerical probability of 0 for *all* classes. Usually this is only the case when the obs. is an extreme outlier. I will change the warning to say all classes in further releases of klaR. Best wishes, Uwe Ligges Unfortunately, it is hard to get a clear picture of the whole process reading the code. I wonder if someone could help me with the meaning of this warning message. Sorry I did not provide an example, but I could not simulate the same message over a small toy example. Thank you, Fabon Dzogang. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Access DDE Data
Hello Group, Is it possible to access DDE data from R? Thanks in advance for any pointers. S __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Zero truncated Poisson distribution R2WinBUGS
I am using a binomial mixture model to estimate abundance (N) and detection probability (p) using simulated count data: -Each site has a simulated abundance that follow a Poisson distribution with lambda = 5 -There are 200 simulated sampled sites -3 repeated counts at each site - only 50 percent of the animals are counted during each count (i.e, detection probability p =0.5, see codes) We removed sites in which animals were never counted (see matrix y, in the script) I would like to use a zero truncated version of the Poisson distribution (I am aware of zero-inflated binomial mixture models, but still want to solve the problem described above). The codes below: (1) generate a matrix of counts (y), rows correspond to sites and column to repeat visits at each sites. The script also removes sites when no animals were counted during the 3 consecutive visits (2) The second part of the script calls WinBUGS and run the binomial mixture models on the count data. In this case the count matrix y was converted to a vector C1 before being passed over to BUGS Any idea how to create a zero truncated Poisson for parameter lam1 (i.e., parameter lambda of the Poisson distribution) Thank you for your help. #R script #Simulated abundance data n.site - 200# 200 sites visited lam - 5 #mean number of animals per site R - n.site # nubmer of sites T - 3 # Number of replicate counts at each site N=rpois(n = n.site, lambda = lam) #True abundance # Simulate count data; only a fraction of N is counted which results in y y - array(dim = c(R, T)) for(i in 1:T){ y[,i] - rbinom(n = n.site, size = N, prob = 0.5) } #truncate y matrix y # R-by-T matrix of counts sumy=apply(y,1,sum) cbindysumy=cbind(y,sumy) subsetcbindysumy=subset(cbindysumy,sumy!=0) y=subsetcbindysumy[,1:3]# sites where no animals ever counted are removed C1-c(y) #vectorized matrix y R=dim(y)[1] site = 1:R site.p - rep(site, T) # #WinBUGS codes # library(R2WinBUGS)# Load R2WinBUGS package sink(Model.txt) cat( model { # Priors: new uniform priors p0~dunif(0,1) lam1~dgamma(.01,.01) # Likelihood # Biological model for true abundance for (i in 1:R) { # Loops over R sites N1[i] ~ dpois(lambda1[i]) lambda1[i] - lam1 } # Observation model for replicated counts for (i in 1:n) { # Loops over all n observations C1[i] ~ dbin(p1[i], N1[site.p[i]]) p1[i] -p0 } # Derived quantities totalN1 - sum(N1[]) # Estimate total population size across all sites } ,fill=TRUE) sink() # Package data for WinBUGS R = dim(y)[1] # number of sites: 200 n = dim(y)[1] * dim(y)[2]#number of observations (sites*surveys) win.data - list(R = R, n = n, C1 = C1, site.p = site.p) # Inits Nst - apply(y, 1, max) + 1 inits - function(){list(N1 = Nst,lam1=runif(1,1,8), p0=runif(1))} parameters - c( totalN1, p0, lam1) # MCMC settings nc - 3 nb - 400#need to push to 400 for convergence ni - 1400 #need to push to 14000 for convergence nt - 1 out - bugs (win.data, inits, parameters, Model.txt, n.chains=nc, n.iter=ni, n.burn = nb, n.thin=nt, debug = T) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Surprising behavior using seq()
Patrick and Bert, Thank you both for you replies to my question. I see how my naïve expectations fail to floating point arithmetic. However, I still believe there is an underlying problem. It seems to me that when asked, c(7.7, 7.8, 7.9) %in% seq(4, 8, by=0.1) [1] TRUE FALSE TRUE R should return TRUE in all instances. %in% is testing set membership... in that way, shouldn't it be using all.equal() (instead of the implicit '=='), as Patrick suggests the R inferno? Is there a convenient way to test set membership using all.equal()? In particular, can you do it (conveniently) when the lengths of the numeric lists are different? Thanks again for your reply! Vadim On Nov 13, 2010, at 5:46 AM, Patrick Burns wrote: See Circle 1 of 'The R Inferno'. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] likelyhood maximization problem with polr
Hello, Thanks for your answer. I will try this function to see if it gives equivalent results that those obtained with polr()+dropterm() (in a case where polr() works). Many thanks -- View this message in context: http://r.789695.n4.nabble.com/likelyhood-maximization-problem-with-polr-tp2528818p3043137.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Replicate Excel's LOGEST worksheet function in R
On Sat, Nov 13, 2010 at 10:37 PM, Jeff Newmiller - jdnew...@dcn.davis.ca.us +cran+miller_2555+a7f4a7aeab.jdnewmil#dcn.davis.ca...@spamgourmet.comwrote: Anyway, I recommend you learn from David before criticizing his assistance. cran.30.miller_2...@spamgourmet.com On Fri, Nov 12, 2010 at 5:28 PM, David Winsemius - dwinsem...@comcast.net +cran+miller_2555+c0e7477398.dwinsemius#comcast@spamgourmet.com wrote: Then WHY did you ask for a function that would duplicate a particular Excel function? ... especially an Excel function which DOES a linear fit on the log of the Y argument? It was not a critical response. It was a clarification with the solution that fit my issue. In fact, David was directionally helpful in solving the issue. I am neither a statistician nor well versed in the computational nuance of Excel functions. In the future, I will refrain from such clarifying posts to avoid inflammatory ones. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] interpretation of coefficients in survreg AND obtaining the hazard function
1. The weibull is the only distribution that can be written in both a proportional hazazrds for and an accelerated failure time form. Survreg uses the latter. In an ACF model, we model the time to failure. Positive coefficients are good (longer time to death). In a PH model, we model the death rate. Positive coefficients are bad (higher death rate). You are not the first to be confused by the change in sign between the two models. 2. There are about 5 different ways to parameterize a Weibull distribution, 1-4 appear in various texts and the acf form is #5. This is a second common issue with survreg that strikes only the more sophisticated users: to understand the output they look up the Weibull in a textbook, and become even more confused! Kalbfliesch and Prentice is a good reference for the acf form. The manual page for psurvreg has some information on this, as does the very end of ?survreg. The psurvreg page also has an example of how to extract the hazard function for a Weibull fit. Begin included message Dear R help list, I am modeling some survival data with coxph and survreg (dist='weibull') using package survival. I have 2 problems: 1) I do not understand how to interpret the regression coefficients in the survreg output and it is not clear, for me, from ?survreg.objects how to. Here is an example of the codes that points out my problem: - data is stc1 - the factor is dichotomous with 'low' and 'high' categories slr - Surv(stc1$ti_lr, stc1$ev_lr==1) mca - coxph(slr~as.factor(grade2=='high'), data=stc1) mcb - coxph(slr~as.factor(grade2), data=stc1) mwa - survreg(slr~as.factor(grade2=='high'), data=stc1, dist='weibull', scale=0) mwb - survreg(slr~as.factor(grade2), data=stc1, dist='weibull', scale=0) summary(mca)$coef coef exp(coef) se(coef) z Pr(|z|) as.factor(grade2 == high)TRUE 0.2416562 1.273356 0.2456232 0.9838494 0.3251896 summary(mcb)$coef coef exp(coef) se(coef) z Pr(|z|) as.factor(grade2)low -0.2416562 0.7853261 0.2456232 -0.9838494 0.3251896 summary(mwa)$coef (Intercept) as.factor(grade2 == high)TRUE 7.9068380 -0.4035245 summary(mwb)$coef (Intercept) as.factor(grade2)low 7.5033135 0.4035245 No problem with the interpretation of the coefs in the cox model. However, i do not understand why a) the coefficients in the survreg model are the opposite (negative when the other is positive) of what I have in the cox model? are these not the log(HR) given the categories of these variable? b) how come the intercept coefficient changes (the scale parameter does not change)? 2) My second question relates to the first. a) given a model from survreg, say mwa above, how should i do to extract the base hazard and the hazard of each patient given a set of predictors? With the hazard function for the ith individual in the study given by h_i(t) = exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me that predict(mwa, type='linear') is \beta'x_i. b) since I need the coefficient intercept from the model to obtain the scale parameter to obtain the base hazard function as defined in Collett (h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this coefficient intercept changes depending on the reference level of the factor entered in the model. The change is very important when I have more than one predictor in the model. Any help would be greatly appreciated, David Biau. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ... predict.coxph
If you are looking at radioactive decay maybe but how often do you actually see exponential KM curves in real life? Exponential curves are rare. But proportional hazards does not imply exponential. A trial design could in fact try to get all the control sample to event at the same time if enough was known about prognostic factors and natural trajectory You are a dreamer. We know very little about even the diseases we know best. The life insurers are in no danger yet from accurate predictions by the medical community. Terry T __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Surprising behavior using seq()
On 15/11/2010 9:24 AM, Vadim Patsalo wrote: Patrick and Bert, Thank you both for you replies to my question. I see how my naïve expectations fail to floating point arithmetic. However, I still believe there is an underlying problem. It seems to me that when asked, c(7.7, 7.8, 7.9) %in% seq(4, 8, by=0.1) [1] TRUE FALSE TRUE R should return TRUE in all instances. %in% is testing set membership... in that way, shouldn't it be using all.equal() (instead of the implicit '=='), as Patrick suggests the R inferno? No, because 7.8 is not in the set. Some number quite close to it is there, but no 7.8. This is true for both meanings of 7.8: - the number 78/10 - R's representation of that number Neither one is in the set. What you have there is R's representation of 4 plus 38 times R's representation of 0.1. (R can represent 4 and 38 exactly, but not 0.1, 7.8, pi, or most other numbers.) Duncan Murdoch Is there a convenient way to test set membership using all.equal()? In particular, can you do it (conveniently) when the lengths of the numeric lists are different? Thanks again for your reply! Vadim On Nov 13, 2010, at 5:46 AM, Patrick Burns wrote: See Circle 1 of 'The R Inferno'. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Surprising behavior using seq()
Vadim Patsalo wrote: Patrick and Bert, Thank you both for you replies to my question. I see how my naïve expectations fail to floating point arithmetic. However, I still believe there is an underlying problem. It seems to me that when asked, c(7.7, 7.8, 7.9) %in% seq(4, 8, by=0.1) [1] TRUE FALSE TRUE R should return TRUE in all instances. %in% is testing set membership... in that way, shouldn't it be using all.equal() (instead of the implicit '=='), as Patrick suggests the R inferno? Is there a convenient way to test set membership using all.equal()? In particular, can you do it (conveniently) when the lengths of the numeric lists are different? This looks like a job for zapsmall; check ?zapsmall. -Peter Ehlers Thanks again for your reply! Vadim On Nov 13, 2010, at 5:46 AM, Patrick Burns wrote: See Circle 1 of 'The R Inferno'. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with Hist
Dear list I am trying to re-scale a histogram and using hist() but can not seem to create a reduced scale where the upper values are not plotted. What I have is about 100 of which 80 or so are between a value of 0 and 40 , one or two in the hundreds and an outlier around 2000. What I would like to do is create an x-scale that shows 5 bins between 0-100 and then 3/4 bins between 100 and 2000 but I don't need any resolution on the above 100 values. If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an error saying that there are values not included, which of course I know but I wish to ignore them. It seems that I am missing something quite simple. Any help would be appreciated. Regards Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Null values in R
Hi R-helpers , can you please let me know the methods in which NULL values can be handled in R? Are there any generic commands/functions that can be given in a workspace,so that the NULL values occuring in that workspace (for any datasets that are loaded , any output that is calculated) , are considered in the same way? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Null-values-in-R-tp3043184p3043184.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl and cookies in POST requests
Hello Duncan. Thanks for having a look at this. As soon as I get home I'll try your suggestion. BTW, the link to the omega-help mailing list seems to be broken: http://www.omegahat.org/mailman/listinfo/ Thank you. chr Duncan Temple Lang (Monday 15 November 2010, 01:02): Hi Christian Thanks for finding this. The problem seems to be that the finalizer on the curl handle seems to disappear and so is not being called when the handle is garbage collected. So there is a bug somewhere and I'll try to hunt it down quickly. In the meantime, you can achieve the same effect by calling the C routine curl_easy_cleanup. You can't do this directly with a .Call() or .C() as there is no explicit interface in the RCurl package to this routine. However, you can use the Rffi package (on the omegahat repository) library(Rffi) cif = CIF(voidType, list(pointerType)) callCIF(cif, curl_easy_cleanup, c...@ref) I'll keep looking for why the finalizer is getting discarded. Thanks again, D. On 11/14/10 6:30 AM, Christian M. wrote: Hello. I know that it's usually possible to write cookies to a cookie file by removing the curl handle and doing a gc() call. I can do this with getURL(), but I just can't obtain the same results with postForm(). If I use: curlHandle - getCurlHandle(cookiefile=FILE, cookiejar=FILE) and then do: getURL(http://example.com/script.cgi, curl=curlHandle) rm(curlHandle) gc() it's OK, the cookie is there. But, if I do (same handle; the parameter is a dummy): postForm(site, .params=list(par=cookie), curl=curlHandle, style=POST) rm(curlHandle) gc() no cookie is written. Probably I'm doing something wrong, but don't know what. Is it possible to store cookies read from the output of a postForm() call? How? Thanks. Christian PS.: I'm attaching a script that can be sourced (and its .txt version). It contains an example. The expected result is a file (cookies.txt) with two cookies. The script currently uses getURL() and two cookies are stored. If postForm() is used (currently commented), only 1 cookie is written. -- SDF Public Access UNIX System - http://sdf.lonestar.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to Read a Large CSV into a Database with R
Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to insert a very large CSV file into a SQLite database. I'm pretty new to working with databases in R, so I apologize if I'm overlooking something obvious here. I'm trying to work with the American Community Survey data, which is two 1.3GB csv files. I have enough RAM to read one of them into memory, but not both at the same time. So, in order to analyze them, I'm trying to get them into a SQLite database so I can use the R survey package's database-backed survey objects capabilities ( http://faculty.washington.edu/tlumley/survey/svy-dbi.html). I need to combine both of these CSV files into one table (within a database), so I think that I'd need a SQL manipulation technique that reads everything line by line, instead of pulling it all into memory. I've tried using read.csv.sql, but it finishes without an error and then only shows me the table structure when I run the final select statement. When I run these exact same commands on a smaller CSV file, they work fine. I imagine this is not working because the csv is so large, but I'm not sure how to confirm that or what to change if it is. I do want to get all columns from the CSV into the data table, so I don't want to filter anything. library(sqldf) setwd(R:\\American Community Survey\\Data\\2009) sqldf(attach 'sqlite' as new) read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from file , dbname=sqlite) sqldf(select * from ss09pusa limit 3,dbname=sqlite) I've also tried using the SQL IMPORT command, which I couldn't get working properly, even on a tiny two-field, five-row CSV file. library(RSQLite) setwd(R:\\American Community Survey\\Data\\2009) in_csv - file(test.csv) out_db - dbConnect(SQLite(), dbname=sqlite.db) dbGetQuery(out_db , create table test (hello integer, world text)) dbGetQuery(out_db , import in_csv test) Any advice would be sincerely appreciated. Thanks! Anthony Damico Kaiser Family Foundation [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] jackknife-after-bootstrap
Can someone help me about detection of outliers using jackknife after bootstrap algorithm? A simple procedure is to calculate the mean of the bootstrap statistics for all bootstrap samples that omit the first of the original observations. Repeat for the second, third, ... original observation. You now have $n$ means, and can look at these for outliers. A similar approach is to calculate means of bootstrap statistics for samples that include (rather than omit) each of the the original observations. Both of those approaches can suffer from considerable random variability. Provided the number of bootstrap samples is large, a better approach is to use linear regression, where y = the vector of bootstrap statistics, length R X = R by n matrix, with X[i, j] = the number of times original observation j is included in bootstrap sample i and without an intercept. The $n$ regression coefficients give estimates of the influence of the original observations, and you can look for outliers in these influence estimates. For comparison, the first simple procedure above corresponds to taking averages of y for rows with X[, j] = 0, and the similar approach to averaging y for rows with X[, j] 0. For further discussion see Hesterberg, Tim C. (1995), Tail-Specific Linear Approximations for Efficient Bootstrap Simulations, Journal of Computational and Graphical Statistics, 4(2), 113-133. Hesterberg, Tim C. and Stephen J. Ellis (1999), Linear Approximations for Functional Statistics in Large-Sample Applications, Technical Report No. 86, Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109. http://home.comcast.net/~timhesterberg/articles/tech86-linear.pdf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hist
Hi, I think you should also give the upper extreme: x - c(rnorm(80)+10, 101:110, 2001:2010) hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500)) Error in hist.default(x, breaks = c(0, 20, 40, 60, 80, 100, 200, 500)) : some 'x' not counted; maybe 'breaks' do not span range of 'x' hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500, 2100)) ## which looks horrible, but works, up to you how to cut it HTH, Ivan Le 11/15/2010 15:53, Steve Sidney a écrit : Dear list I am trying to re-scale a histogram and using hist() but can not seem to create a reduced scale where the upper values are not plotted. What I have is about 100 of which 80 or so are between a value of 0 and 40 , one or two in the hundreds and an outlier around 2000. What I would like to do is create an x-scale that shows 5 bins between 0-100 and then 3/4 bins between 100 and 2000 but I don't need any resolution on the above 100 values. If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an error saying that there are values not included, which of course I know but I wish to ignore them. It seems that I am missing something quite simple. Any help would be appreciated. Regards Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Null values in R
Are you talking about NULL or NA that may occur in some data? Can you give an example of what your concern is and what set of operations you want to do. If they are NAs, there are some standard ways that they can be handled. On Mon, Nov 15, 2010 at 9:55 AM, Raji raji.sanka...@gmail.com wrote: Hi R-helpers , can you please let me know the methods in which NULL values can be handled in R? Are there any generic commands/functions that can be given in a workspace,so that the NULL values occuring in that workspace (for any datasets that are loaded , any output that is calculated) , are considered in the same way? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Null-values-in-R-tp3043184p3043184.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ... predict.coxph
From: thern...@mayo.edu To: james.whan...@gmail.com Date: Mon, 15 Nov 2010 08:43:04 -0600 CC: r-help@r-project.org Subject: Re: [R] ... predict.coxph If you are looking at radioactive decay maybe but how often do you actually see exponential KM curves in real life? Exponential curves are rare. But proportional hazards does not imply exponential. Well, I'm being a bit extreme but of course the point would be as you being deviating from that the fit could arguably become less connected to anything of causal relevance. You've probably seen exchanges I've had with people wondering why their linear regression doesn't work over arbitrary data blobs, how much more so here. A time constant that you can relate to tunnelling probability has some relevance. A trial design could in fact try to get all the control sample to event at the same time if enough was known about prognostic factors and natural trajectory You are a dreamer. We know very little about even the diseases we know best. The life insurers are in no danger yet from accurate predictions by the medical community. OBviously the quesion is a bit hypotheticl, but thinking ahead I wonder what kind of curves you are likely to see and how they relate to each other. I just thought if you defended the model, and I admit to only skimming prior conversation, you may be willing to volunteer additional summary comments. I guess you could take something like a sigmoidal survival curve and various limits of that. Certainly this won't happen tomorrow but presumably the goal here is to reduce random/noise/variance and in any case use statistics to estimate parameters that can relate to something physically measurable ( blood parameters or something). In any case, what kinds of things could a putative treament do to a curve like this that would not be reasoably described by coxph ? Is this not something that you ever expect to care about? I guess my point is that the control and treatment curves needs to have some relationships to make ph parameters fit to anything relating to somehting causal. I'd also note that insurers typically need a prognosis when there is no detectable evidence of a disease but trial criteria may have some range of disease-range parameters that will allow enrollment. Picking out small deviations from normal to first identify the diseaes that will get you and then the time to death is of course much more difficult than plotting a trajectory when someone has clinical manifestations of a speicific diease. Thanks. Terry T __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hist
SteveSB wrote: What I have is about 100 of which 80 or so are between a value of 0 and 40 , one or two in the hundreds and an outlier around 2000. Have a look at gap.barplot in package plotrix. Personally, I prefer to use xlim(x(0,100) in this case and add one outlier at t=2000 in the legend. Dieter -- View this message in context: http://r.789695.n4.nabble.com/Help-with-Hist-tp3043187p3043211.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Null values in R
Would ?is.null be what you are looking for? -- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it. - Jubal Early, Firefly From: Raji raji.sanka...@gmail.com To: r-help@r-project.org Date: 11/15/2010 09:58 AM Subject: [R] Null values in R Sent by: r-help-boun...@r-project.org Hi R-helpers , can you please let me know the methods in which NULL values can be handled in R? Are there any generic commands/functions that can be given in a workspace,so that the NULL values occuring in that workspace (for any datasets that are loaded , any output that is calculated) , are considered in the same way? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Null-values-in-R-tp3043184p3043184.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to store package options over sessions?
I want to define some options for my package the user may change. It would be convenient if the changes could be saved when terminating an R session and recovered automatically on the next package load. What is the standard way to implement this? TIA Mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Read a Large CSV into a Database with R
Anthony-107 wrote: Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to insert a very large CSV file into a SQLite database. Better use an external utility if this is a one-time import for this job: http://sqlitebrowser.sourceforge.net/ Dieter -- View this message in context: http://r.789695.n4.nabble.com/How-to-Read-a-Large-CSV-into-a-Database-with-R-tp3043209p3043226.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hist
Steve Sidney sbsidney at mweb.co.za writes: I am trying to re-scale a histogram and using hist() but can not seem to create a reduced scale where the upper values are not plotted. What I have is about 100 of which 80 or so are between a value of 0 and 40 , one or two in the hundreds and an outlier around 2000. If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an error saying that there are values not included, which of course I know but I wish to ignore them. Subset your data first to remove the outlier? x.trimmed - x[x2000] or x.trimmed - subset(x,x2000) ? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hist
Thanks What you have suggested of course works but I am trying to reduce the 'ugliness'. Anybody got any other ideas? Regards Steve Sent from my BlackBerry® wireless device -Original Message- From: Ivan Calandra ivan.calan...@uni-hamburg.de Sender: r-help-boun...@r-project.org Date: Mon, 15 Nov 2010 16:08:47 To: r-help@r-project.org Reply-To: ivan.calan...@uni-hamburg.de Subject: Re: [R] Help with Hist Hi, I think you should also give the upper extreme: x - c(rnorm(80)+10, 101:110, 2001:2010) hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500)) Error in hist.default(x, breaks = c(0, 20, 40, 60, 80, 100, 200, 500)) : some 'x' not counted; maybe 'breaks' do not span range of 'x' hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500, 2100)) ## which looks horrible, but works, up to you how to cut it HTH, Ivan Le 11/15/2010 15:53, Steve Sidney a écrit : Dear list I am trying to re-scale a histogram and using hist() but can not seem to create a reduced scale where the upper values are not plotted. What I have is about 100 of which 80 or so are between a value of 0 and 40 , one or two in the hundreds and an outlier around 2000. What I would like to do is create an x-scale that shows 5 bins between 0-100 and then 3/4 bins between 100 and 2000 but I don't need any resolution on the above 100 values. If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an error saying that there are values not included, which of course I know but I wish to ignore them. It seems that I am missing something quite simple. Any help would be appreciated. Regards Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] About upgrade R
Tal My main use of R now is on Windows 7. As explained I always retain at least one previous version on windows 7 PCs. My upgrade is done as follows - 1) Download and install the binary install program for R and install. 2) Rename the library directory (default - C:\Program Files\R\R-2.12.0patched\library) to library2. (Windows 7 will ask for confirmation) 3) Copy the library directory from the previous version of R to the R directory (default - C:\Program Files\R\R-2.12.0patched\) ((Windows 7 will ask for confirmation). 4) copy the contents of the library2 directory to the library directory in the new R directory. 5) Right click on the R directory and select run as administrator to start R as administrator. 6) In R run some variant of update.packages(checkBuilt=TRUE). On occasion one will find that packages are reporting errors in the CRAN compile and binary versions are not yet available for the new version of R. I delete these from the library directory and look in CRAN for possible explanations. Anyway I can revert back to the old version if I need these packages. Generally one may find that the missing packages are available shortly afterwards. . This procedure is fairly close to that recommended in the R FAQ and meets my needs. I think that it is necessary to keep libraries for different versions separate. Other users may have different requirements and other update methods may be more appropriate to them. There may be no method that is best for all users. I would imagine that one could write a DOS or Power Shell or Python (or other) script that would automate this process. Best Regards John On 14 November 2010 20:24, Tal Galili tal.gal...@gmail.com wrote: Hi John, thank you for that input. It could be that the code I wrote here: http://www.r-statistics.com/2010/04/changing-your-r-upgrading-strategy-and-the-r-code-to-do-it-on-windows/ Should be updated so every time you install a new R version, you run the code for it to: 1) copy all packages from the old R version to the new R version library 2) update all the packages. But I have no clue how to do step 1. How do you find out the latest R version that was install previous to the current one? And then, how would you find where it's package library is? If you could do this in R, then installing a new version of R could be made simpler for you. Cheers, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sun, Nov 14, 2010 at 10:05 PM, John C Frain fra...@gmail.com wrote: The current method allows one to easily retain several versions working in parallel. This particularly important if some package is not available in the new version. A few years ago there were problems such as these during a major overhaul of the rmetrics group of packages. My current practice is to retain older versions until I am sure that all I need is available in the new version. Thus I am in favour of retaining the current system. John On Sunday, Novembe 14, 2010, Uwe Ligges lig...@statistik.tu-dortmund.de wrote: On 14.11.2010 17:59, Ajay Ohri wrote: wont it make more common sense to make updating packages also as part of every base version install BY Default.. just saying At least I do not like the idea: If I just want to try a beta version, I do not want that everything is updated and I can't switch back to my last stable version. Uwe Ligges Websites- http://decisionstats.com http://dudeofdata.com Linkedin- www.linkedin.com/in/ajayohri 2010/11/14 Uwe Liggeslig...@statistik.tu-dortmund.de: Upgrading is mentioned in the FAQs / R for Windows FAQs. If you have your additionally installed packages in a separate library (not the R base library) you can simply run update.packages(checkBuilt=TRUE) If not ... Uwe Ligges On 14.11.2010 15:51, Stephen Liu wrote: Hi all, Win 7 64-bit R version 2.11.1 (2010-05-31) I want to upgrade R to version 2.12.0 R-2.12.0 for Windows (32/64 bit) http://cran.r-project.org/bin/windows/base/ I found steps on following site; How to upgrade R on windows – another strategy (and the R code to do it) http://www.r-statistics.com/2010/04/changing-your-r-upgrading-strategy-and-the-r-code-to-do-it-on-windows/ I wonder is there a straight forwards way to upgrade the package direct on repo? TIA B.R. Stephen L __ r-h...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
[R] repository of earlier Windows versions of R packages
Dear Helpers, I was trying to find a repository of earlier Windows versions of R packages. However, while I can find the Archives for Linux versions (in the Old Sources section of each package's Downloads) , I cannot find one for Windows versions. Does such a repository exist? If so, where can I find it?ThanksJonathan Williams [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kalman Filter
Hello, thanks for answer my Question. I prefer use KalmanLike(y, mod, nit = 0, fast=TRUE). For parameter estimating I have a given time series. In these are several components: Season and noise; furthermore it gives a mean reversion process. The season is modelled as a fourierpolynom. From the given time series I have to estimate the - Season parameters - The mean reversion factor - variance from the noise I think in the function KalmanLike y is the vector of the time series; what does mod mean? How can I write the syntax for the state space? Have anybody a simple example for better understanding KalmanLike. Or is it better to use other packages for parameter estimating? I have no experience in work with Kalman filters and I'm a new R user. Thanks for helping. Best, Thomas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem in installing and starting Rattle
I also have the problem trying to start rattle Windows 7 32-bit R 2.12.0 When I try library(rattle) I get an error message The procedure entry point deflateSetHeader could not be located in the dynamic link library zilb1.dll I hit OK and it prompts me to install GTK+ again. I tried to uninstall GTK+ first and delete all related files as Dieter suggested but still wont work :( -- View this message in context: http://r.789695.n4.nabble.com/Problem-in-installing-and-starting-Rattle-tp3042502p3043262.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to move an internal function to external keeping same environment?
Le 15. 11. 10 14:14, Duncan Murdoch a écrit : On 15/11/2010 7:48 AM, Matthieu Stigler wrote: Hi I have within a quite big function foo1, an internal function foo2. Now, in order to have a cleaner code, I wish to have the internal foo2 as external. This foo2 was using arguments within the foo1 environment that were not declared as inputs of foo2, which works as long as foo2 is within foo1, but not anymore if foo2 is external, as is the case now. Now, I could add all those arguments as inputs to foo2, but I feel if foo2 is called often, I would be copying those objects more than required. Am I wrong? I then used this to avoid to declare explcitely each argument to foo2: foo1-function(x){ b-x[1]+2 environment(foo2)-new.env(parent =as.environment(-1)) c-foo2(x) return(c) } foo2-function(x) x*b #try: foo1(1:100) This works. But I wanted to be sure: -am I right that if I instead declare each element to be passed to foo2, this would be more copying than required? (imagine b in my case a heavy dataset, foo2 a long computation) -is this lines environment(foo2)-new.env(parent =as.environment(-1)) the good way to do it or it can have unwanted implications? I don't think modifying the environment of a closure could be seen as a way to get cleaner code. I'd just leave foo2 within foo1. There are several unwanted implications of the way you do it: - Setting the environment of foo2 to the foo1 evaluation frame means those local variables won't be garbage collected. If foo1 creates a large temporary matrix, it will take up space in memory until you modify foo2 again. - If we say that foo2 has global scope (it might be limited to your package namespace, but let's call that global), then your foo1 has global side effects, and those are often a bad idea. For example, suppose next week you define foo3 that also uses and modifies foo2. It's very easy for foo1 and foo3 to clash with conflicting use of the global foo2. Don't use globals unless you really have to. Duncan Murdoch Dear Gabor, dear Duncan Thanks a lot both of you for your fast and interesting answer! I have limited understanding of Duncan's points but will follow your advice not to do it like this. If I am nervertheless quit keen to use foo2 externally, is the use of either assign() in foo1, or mget() in foo2 more indicated? Or are the same kind of remarks raised against environment() also relevant for assign() and mget()? Thanks a lot!! Matthieu __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Read a Large CSV into a Database with R
On Mon, Nov 15, 2010 at 10:07 AM, Anthony Damico ajdam...@gmail.com wrote: Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to insert a very large CSV file into a SQLite database. I'm pretty new to working with databases in R, so I apologize if I'm overlooking something obvious here. I'm trying to work with the American Community Survey data, which is two 1.3GB csv files. I have enough RAM to read one of them into memory, but not both at the same time. So, in order to analyze them, I'm trying to get them into a SQLite database so I can use the R survey package's database-backed survey objects capabilities ( http://faculty.washington.edu/tlumley/survey/svy-dbi.html). I need to combine both of these CSV files into one table (within a database), so I think that I'd need a SQL manipulation technique that reads everything line by line, instead of pulling it all into memory. I've tried using read.csv.sql, but it finishes without an error and then only shows me the table structure when I run the final select statement. When I run these exact same commands on a smaller CSV file, they work fine. I imagine this is not working because the csv is so large, but I'm not sure how to confirm that or what to change if it is. I do want to get all columns from the CSV into the data table, so I don't want to filter anything. library(sqldf) setwd(R:\\American Community Survey\\Data\\2009) sqldf(attach 'sqlite' as new) read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from file , dbname=sqlite) sqldf(select * from ss09pusa limit 3,dbname=sqlite) What the above code does, which is unlikely to be what you intended, is to create an sqlite database called 'sqlite' and then read in the indicated file into sqlite, read it in into R from sqlite (clearly this step will fail if the data is too big for R but it its not then you are ok) and then delete the table from the database so your sqldf statement should give an error since there is no such table or else if you have a data frame in your R workspace called ss09pusa the sqldf statement will load that into a database table and the retrieve its first three rows and then delete the table. This sort of task is probably more suitable for RSQLite than sqldf but if you wish to do it with sqldf you need to follow example 9 or example 10 on the sqldf home page: In example 9, http://code.google.com/p/sqldf/#Example_9.__Working_with_Databases its very important to note that sqldf automatically deletes any table that it created after the sqldf or read.csv.sql statement is done so to not have the table dropped is to make sure you issue an sql statement that creates the table, create table mytab as select ... rather than sqldf. In example 10, http://code.google.com/p/sqldf/#Example_10._Persistent_Connections persistent connections are illustrated which represents an alternate way to do this in sqldf. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merge two dataset and replace missing by 0
Hi r users, I have two data sets (X1, X2). For example, time1-c( 0, 8, 15, 22, 43, 64, 85, 106, 127, 148, 169, 190 ,211 ) outpue1-c(171 ,164 ,150 ,141 ,109 , 73 , 47 ,26 ,15 ,12 ,6 ,2 ,1 ) X1-cbind(time1,outpue1) time2-c( 0 ,8 ,15 , 22 ,43 , 64 ,85 ,106 ,148) output2-c( 5 ,5 ,4 ,5 ,5 ,4 ,1 ,2 , 1 ) X2-cbind(time2,output2) I want to merge X1 and X2 into a big dataset X by time1 and time2 so that the missing item in output2 will be replace by 0. For example, there is no output2 when time2=127, then the corresponding output will be 0. Anyone know how to use merge command to deal with this? Thanks, Kate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fetching data
IMO it is not possible. The code behind aspx page queries data from a database server and display it on the webpage. Maithula Chandrashekhar wrote: Dear all R users, I am wondering is there any procedure exists on R to fetch data directly from http://www.ncdex.com/Market_Data/Spot_price.aspx; and save it in some time series object, without filling any field in that website. Can somebody point me whether it is possible? Thanks and regards, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://r.789695.n4.nabble.com/Fetching-data-tp3041662p3043275.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kalman Filter
Try the most excellent package dlm written by Giovanni Petris for your all your Kalman filter needs. Also buy the accompanying book - it really integrates the dlm package with the theory behind it. Best, John On Mon, Nov 15, 2010 at 8:39 AM, Garten Stuhl gartenstu...@googlemail.comwrote: Hello, thanks for answer my Question. I prefer use KalmanLike(y, mod, nit = 0, fast=TRUE). For parameter estimating I have a given time series. In these are several components: Season and noise; furthermore it gives a mean reversion process. The season is modelled as a fourierpolynom. From the given time series I have to estimate the - Season parameters - The mean reversion factor - variance from the noise I think in the function KalmanLike y is the vector of the time series; what does mod mean? How can I write the syntax for the state space? Have anybody a simple example for better understanding KalmanLike. Or is it better to use other packages for parameter estimating? I have no experience in work with Kalman filters and I'm a new R user. Thanks for helping. Best, Thomas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem in installing and starting Rattle
Ok to follow up my post, I finally got rattle and RGtk2 to work. The trick is when R prompts me to install Gtk2+ I still hit yes but after the download, once the installation process starts I close the R Gui window. After Gtk2+ installation is complete I start R again and it worked. -- View this message in context: http://r.789695.n4.nabble.com/Problem-in-installing-and-starting-Rattle-tp3042502p3043318.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] repository of earlier Windows versions of R packages
On 15.11.2010 15:52, Jonathan Williams wrote: Dear Helpers, I was trying to find a repository of earlier Windows versions of R packages. However, while I can find the Archives for Linux versions (in the Old Sources section of each package's Downloads) , I cannot find one for Windows versions. Does such a repository exist? If so, where can I find it?ThanksJonathan Williams There is no such archive. If you need a specific version of a package for a specific version of R, you need to recompile it yourself from sources. The only thing we have is the last version of a package that was built for/with a specific version of R, see your-CRAN-mirror/bin/windows/contrib/ Best, Uwe Ligges [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge two dataset and replace missing by 0
See ?merge with argument all=TRUE and replace by 0 afterwards. Uwe Ligges On 15.11.2010 16:42, Kate Hsu wrote: Hi r users, I have two data sets (X1, X2). For example, time1-c( 0, 8, 15, 22, 43, 64, 85, 106, 127, 148, 169, 190 ,211 ) outpue1-c(171 ,164 ,150 ,141 ,109 , 73 , 47 ,26 ,15 ,12 ,6 ,2 ,1 ) X1-cbind(time1,outpue1) time2-c( 0 ,8 ,15 , 22 ,43 , 64 ,85 ,106 ,148) output2-c( 5 ,5 ,4 ,5 ,5 ,4 ,1 ,2 , 1 ) X2-cbind(time2,output2) I want to merge X1 and X2 into a big dataset X by time1 and time2 so that the missing item in output2 will be replace by 0. For example, there is no output2 when time2=127, then the corresponding output will be 0. Anyone know how to use merge command to deal with this? Thanks, Kate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge two dataset and replace missing by 0
On Nov 15, 2010, at 10:42 AM, Kate Hsu wrote: Hi r users, I have two data sets (X1, X2). For example, time1-c( 0, 8, 15, 22, 43, 64, 85, 106, 127, 148, 169, 190 , 211 ) outpue1-c(171 ,164 ,150 ,141 ,109 , 73 , 47 ,26 ,15 ,12 ,6 , 2 ,1 ) X1-cbind(time1,outpue1) time2-c( 0 ,8 ,15 , 22 ,43 , 64 ,85 ,106 ,148) output2-c( 5 ,5 ,4 ,5 ,5 ,4 ,1 ,2 , 1 ) X2-cbind(time2,output2) I want to merge X1 and X2 into a big dataset X by time1 and time2 so that the missing item in output2 will be replace by 0. For example, there is no output2 when time2=127, then the corresponding output will be 0. Anyone know how to use merge command to deal with this? merge(X1,X2, by.x=time1, by.y=time2, all=TRUE) time1 outpue1 output2 1 0 171 5 2 8 164 5 3 15 150 4 4 22 141 5 5 43 109 5 6 64 73 4 7 85 47 1 8106 26 2 9127 15 NA 10 148 12 1 11 169 6 NA 12 190 2 NA 13 211 1 NA Thanks, Kate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to move an internal function to external keeping same environment?
On Mon, Nov 15, 2010 at 10:49 AM, Matthieu Stigler matthieu.stig...@gmail.com wrote: I have limited understanding of Duncan's points but will follow your advice not to do it like this. If I am nervertheless quit keen to use foo2 externally, is the use of either assign() in foo1, or mget() in foo2 more indicated? Or are the same kind of remarks raised against environment() also relevant for assign() and mget()? Another way to approach this, which is safer, is to put it into an OO framework creating a proto object (package proto) that contains the shared data and make foo and foo2 its methods. library(proto) # see http://r-proto.googlecode.com # create proto object p - proto() # add foo method which stores mydata in receiver object # and runs foo2 p$foo - function(.) { .$mydata - 1:3; .$foo2() } # add foo2 method which grabs mydata from receiver # and calculates result p$foo2 - function(.) sum(.$mydata) # run foo method - result is 6 p$foo() -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R crash
We distribute several R applications using the tcltk package on different servers or PC (Windows XP). On some machines and in a not reproducible way, all the R windows disappear when using functions like tkgetSaveFile or tkchooseDirectory. The R application remains open (the Rgui.exe processus is still available)! Despite several tests, we can not identify the origin of this problem which seems independent of the R version! Have you any ideas? Are there any compatibility issues between the tcltk package and other applications? Jérémy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge two dataset and replace missing by 0
On Mon, Nov 15, 2010 at 10:42 AM, Kate Hsu yhsu.rh...@gmail.com wrote: Hi r users, I have two data sets (X1, X2). For example, time1-c( 0, 8, 15, 22, 43, 64, 85, 106, 127, 148, 169, 190 ,211 ) outpue1-c(171 ,164 ,150 ,141 ,109 , 73 , 47 ,26 ,15 ,12 ,6 ,2 ,1 ) X1-cbind(time1,outpue1) time2-c( 0 ,8 ,15 , 22 ,43 , 64 ,85 ,106 ,148) output2-c( 5 ,5 ,4 ,5 ,5 ,4 ,1 ,2 , 1 ) X2-cbind(time2,output2) I want to merge X1 and X2 into a big dataset X by time1 and time2 so that the missing item in output2 will be replace by 0. For example, there is no output2 when time2=127, then the corresponding output will be 0. Anyone know how to use merge command to deal with this? Since these are time series you might want to use a time series package to do this: library(zoo) time1-c( 0, 8, 15, 22, 43, 64, 85, 106, 127, 148, 169, 190 ,211 ) output1-c(171 ,164 ,150 ,141 ,109 , 73 , 47 ,26 ,15 ,12 ,6 ,2 ,1 ) time2-c( 0 ,8 ,15 , 22 ,43 , 64 ,85 ,106 ,148) output2-c( 5 ,5 ,4 ,5 ,5 ,4 ,1 ,2 , 1 ) z1 - zoo(output1, time1) z2 - zoo(output2, time2) merge(z1, z2, fill = 0) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merge two dataset and replace missing by 0
Thanks for all of your help. It works to me. Kate On Mon, Nov 15, 2010 at 10:06 AM, David Winsemius dwinsem...@comcast.netwrote: On Nov 15, 2010, at 10:42 AM, Kate Hsu wrote: Hi r users, I have two data sets (X1, X2). For example, time1-c( 0, 8, 15, 22, 43, 64, 85, 106, 127, 148, 169, 190 ,211 ) outpue1-c(171 ,164 ,150 ,141 ,109 , 73 , 47 ,26 ,15 ,12 ,6 ,2 ,1 ) X1-cbind(time1,outpue1) time2-c( 0 ,8 ,15 , 22 ,43 , 64 ,85 ,106 ,148) output2-c( 5 ,5 ,4 ,5 ,5 ,4 ,1 ,2 , 1 ) X2-cbind(time2,output2) I want to merge X1 and X2 into a big dataset X by time1 and time2 so that the missing item in output2 will be replace by 0. For example, there is no output2 when time2=127, then the corresponding output will be 0. Anyone know how to use merge command to deal with this? merge(X1,X2, by.x=time1, by.y=time2, all=TRUE) time1 outpue1 output2 1 0 171 5 2 8 164 5 3 15 150 4 4 22 141 5 5 43 109 5 6 64 73 4 7 85 47 1 8106 26 2 9127 15 NA 10 148 12 1 11 169 6 NA 12 190 2 NA 13 211 1 NA Thanks, Kate [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fetching data
On Mon, Nov 15, 2010 at 3:46 PM, Feng Mai maif...@gmail.com wrote: IMO it is not possible. The code behind aspx page queries data from a database server and display it on the webpage. That doesn't make it possible. Your web browser is sending a request to the web server, and whatever happens behind the scenes at that end, what comes back is a table in a web page which can be scraped. The tricky thing is constructing exactly the right request to get the data you want. It can sometimes be as simple as constructing a url, something like 'http://example.com/price/gold/2010/11/12;, or it could need parameters: http://example.com/price/commodity=goldyear=2010month=11;. Or it can be like this - some active server pages that pass a massive VIEWSTATE object back and forth on each request. It's not impossible to script it, but you just have a lot more mess to deal with. I have done this kind of thing for some stupid web sites in the past. These days people enjoy building sensible APIs to their web data. Oh, and it might be violating the site's terms and conditions of course. Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hist
Well, another possibility would be to edit the plot so that you cut the empty part (between 300 and 2000). There might be some function that can do it, maybe the plotrix::gap.barplot() that Dieter already told you about. Le 11/15/2010 16:22, sbsid...@mweb.co.za a écrit : Thanks What you have suggested of course works but I am trying to reduce the 'ugliness'. Anybody got any other ideas? Regards Steve Sent from my BlackBerry® wireless device -Original Message- From: Ivan Calandraivan.calan...@uni-hamburg.de Sender: r-help-boun...@r-project.org Date: Mon, 15 Nov 2010 16:08:47 To:r-help@r-project.org Reply-To: ivan.calan...@uni-hamburg.de Subject: Re: [R] Help with Hist Hi, I think you should also give the upper extreme: x- c(rnorm(80)+10, 101:110, 2001:2010) hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500)) Error in hist.default(x, breaks = c(0, 20, 40, 60, 80, 100, 200, 500)) : some 'x' not counted; maybe 'breaks' do not span range of 'x' hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500, 2100)) ## which looks horrible, but works, up to you how to cut it HTH, Ivan Le 11/15/2010 15:53, Steve Sidney a écrit : Dear list I am trying to re-scale a histogram and using hist() but can not seem to create a reduced scale where the upper values are not plotted. What I have is about 100 of which 80 or so are between a value of 0 and 40 , one or two in the hundreds and an outlier around 2000. What I would like to do is create an x-scale that shows 5 bins between 0-100 and then 3/4 bins between 100 and 2000 but I don't need any resolution on the above 100 values. If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an error saying that there are values not included, which of course I know but I wish to ignore them. It seems that I am missing something quite simple. Any help would be appreciated. Regards Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Read a Large CSV into a Database with R
Hi Gabor, Thank you for the prompt reply. I definitely looked over all of the examples on the code.google.com sqldf page before sending, which is why I wrote the code read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from file , dbname=sqlite) directly pulled from their code -- read.csv.sql(~/tmp.csv, sql = create table mytab as select * from file,dbname = mydb) ..but I don't understand why this helps me around the memory problem, since I think it still all gets read into memory. Is there a way to do this line by line? I would prefer to use SQLite than sqldf, but I could not get the IMPORT command (or .IMPORT) functioning at all. I tried these with both dbGetQuery and dbSendQuery. library(RSQLite) setwd(R:\\American Community Survey\\Data\\2009) out_db - dbConnect(SQLite(), dbname=sqlite.db) dbGetQuery(out_db , create table test (hello integer, world text)) dbGetQuery(out_db , mode csv) dbGetQuery(out_db , import test.csv test) When I hit the mode and import commands, it gives me an error that makes me think it's handling these files in a completely different way. dbGetQuery(out_db , mode csv) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near mode: syntax error) I suppose I could just run sqlite3 commands from the system() function, but I was hoping there might be a way to accomplish this task entirely within R? Thanks again! On Mon, Nov 15, 2010 at 10:41 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Mon, Nov 15, 2010 at 10:07 AM, Anthony Damico ajdam...@gmail.com wrote: Hi, I'm working in R 2.11.1 x64 on Windows x86_64-pc-mingw32. I'm trying to insert a very large CSV file into a SQLite database. I'm pretty new to working with databases in R, so I apologize if I'm overlooking something obvious here. I'm trying to work with the American Community Survey data, which is two 1.3GB csv files. I have enough RAM to read one of them into memory, but not both at the same time. So, in order to analyze them, I'm trying to get them into a SQLite database so I can use the R survey package's database-backed survey objects capabilities ( http://faculty.washington.edu/tlumley/survey/svy-dbi.html). I need to combine both of these CSV files into one table (within a database), so I think that I'd need a SQL manipulation technique that reads everything line by line, instead of pulling it all into memory. I've tried using read.csv.sql, but it finishes without an error and then only shows me the table structure when I run the final select statement. When I run these exact same commands on a smaller CSV file, they work fine. I imagine this is not working because the csv is so large, but I'm not sure how to confirm that or what to change if it is. I do want to get all columns from the CSV into the data table, so I don't want to filter anything. library(sqldf) setwd(R:\\American Community Survey\\Data\\2009) sqldf(attach 'sqlite' as new) read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from file , dbname=sqlite) sqldf(select * from ss09pusa limit 3,dbname=sqlite) What the above code does, which is unlikely to be what you intended, is to create an sqlite database called 'sqlite' and then read in the indicated file into sqlite, read it in into R from sqlite (clearly this step will fail if the data is too big for R but it its not then you are ok) and then delete the table from the database so your sqldf statement should give an error since there is no such table or else if you have a data frame in your R workspace called ss09pusa the sqldf statement will load that into a database table and the retrieve its first three rows and then delete the table. This sort of task is probably more suitable for RSQLite than sqldf but if you wish to do it with sqldf you need to follow example 9 or example 10 on the sqldf home page: In example 9, http://code.google.com/p/sqldf/#Example_9.__Working_with_Databases its very important to note that sqldf automatically deletes any table that it created after the sqldf or read.csv.sql statement is done so to not have the table dropped is to make sure you issue an sql statement that creates the table, create table mytab as select ... rather than sqldf. In example 10, http://code.google.com/p/sqldf/#Example_10._Persistent_Connections persistent connections are illustrated which represents an alternate way to do this in sqldf. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] jackknife-after-bootstrap
I am always trying but i could not do it. Are there any example about this -- View this message in context: http://r.789695.n4.nabble.com/Re-jackknife-after-bootstrap-tp3043213p3043398.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Integrate to 1? (gauss.quad)
Thank you, Doug. I am still missing something here. Should this simply be sum(f(x_i) * w_i) where x_i is node i and w_i is the weight at node i? So, my function f(x) = (1/(s*sqrt(2*pi))) * exp(-((qq$nodes-mu)^2/(2*s^2))) would then be multiplied only by the weights, qq$weights Here is some reproducible code: library(statmod) Q - 5 mu - 0; s - 1 qq - gauss.quad(Q, kind = 'hermite') sum((1/(s*sqrt(2*pi))) * exp(-((qq$nodes-mu)^2/(2*s^2))) * qq$weights) [1] 0.5775682 sum(qq$weights) [1] 1.772454 (1/(s*sqrt(2*pi))) * exp(-((qq$nodes-mu)^2/(2*s^2))) [1] 0.05184442 0.25198918 0.39894228 0.25198918 0.05184442 dnorm(qq$nodes) [1] 0.05184442 0.25198918 0.39894228 0.25198918 0.05184442 -Original Message- From: dmba...@gmail.com [mailto:dmba...@gmail.com] On Behalf Of Douglas Bates Sent: Sunday, November 14, 2010 2:28 PM To: Doran, Harold Cc: r-help@r-project.org Subject: Re: [R] Integrate to 1? (gauss.quad) I don't know about the statmod package and the gauss.quad function but generally the definition of Gauss-Hermite quadrature is with respect to the function that is multiplied by exp(-x^2) in the integrand. So your example would reduce to summing the weights. On Sun, Nov 14, 2010 at 11:18 AM, Doran, Harold hdo...@air.org wrote: Does anyone see why my code does not integrate to 1? library(statmod) mu - 0 s - 1 Q - 5 qq - gauss.quad(Q, kind='hermite') sum((1/(s*sqrt(2*pi))) * exp(-((qq$nodes-mu)^2/(2*s^2))) * qq$weights) ### This does what's it is supposed to myNorm - function(theta) (1/(s*sqrt(2*pi))) * exp(-((theta-mu)^2/(2*s^2))) integrate(myNorm, -Inf, Inf) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RGoogleDocs stopped working
Thanks, Duncan. Finally getting a chance to follow up on this... I tried again, changing and resetting my password, and trying to specify my login and password manually in the getGoogleDocsConnection argument list. I also tried removing either or both of the service and error options. No luck in any case. I also tried a different Google account, also with no luck. I've also tried tweaking the URL being generated by the code, and in all cases, I get a 403: Forbidden error with content Error=BadAuthentication. I don't really know enough about how authentication is supposed to work to get much farther. Can you help? Should I try the Google API forum instead? -Harlan From: Duncan Temple Lang dun...@wald.ucdavis.edu To: r-help@r-project.org Date: Wed, 10 Nov 2010 10:33:47 -0800 Subject: Re: [R] RGoogleDocs stopped working Hi Harlan I just tried to connect to Google Docs and I had ostensibly the same problem. However, the password was actually different from what I had specified. After resetting it with GoogleDocs, the getGoogleDocsConnection() worked fine. So I don't doubt that the login and password are correct, but you might just try it again to ensure there are no typos. The other thing to look at is the values for Email and Passwd sent in the URL, i.e. the string in url in your debugging below. (Thanks for that by the way). If either has special characters, e.g. , it is imperative that they are escaped correctly, i.e. converted to %24. This should happen and nothing should have changed, but it is worth verifying. So things still seem to work for me. It is a data point, but not one that gives you much of a clue as to what is wrong on your machine. D. On Wed, Nov 10, 2010 at 10:36 AM, Harlan Harris har...@harris.name wrote: Hello, Some code using RGoogleDocs, which had been working smoothly since the summer, just stopped working. I know that it worked on November 3rd, but it doesn't work today. I've confirmed that the login and password still work when I log in manually. I've confirmed that the URL gives the same error when I paste it into Firefox. I don't know enough about this web service to figure out the problem myself, alas... Here's the error and other info (login/password omitted): ss.con - getGoogleDocsConnection(login=gd.login, password=gd.password, service='wise', error=FALSE) Error: Forbidden Enter a frame number, or 0 to exit 1: getGoogleDocsConnection(login = gd.login, password = gd.password, service = wise, error = FALSE) 2: getGoogleAuth(..., error = error) 3: getForm(https://www.google.com/accounts/ClientLogin;, accountType = HOSTED_OR_GOOGLE, Email = login, Passw 4: getURLContent(uri, .opts = .opts, .encoding = .encoding, binary = binary, curl = curl) 5: stop.if.HTTP.error(http.header) Selection: 4 Called from: eval(expr, envir, enclos) Browse[1] http.header Content-Type Cache-control Pragma text/plainno-cache, no-store no-cache ExpiresDate X-Content-Type-Options Mon, 01-Jan-1990 00:00:00 GMT Wed, 10 Nov 2010 15:24:39 GMT nosniff X-XSS-Protection Content-Length Server 1; mode=block 24 GSE status statusMessage 403 Forbidden\r\n Browse[1] url [1] https://www.google.com/accounts/ClientLogin?accountType=HOSTED%5FOR%5FGOOGLEEmail=***Passwd=***service=wisesource=R%2DGoogleDocs%2D0%2E1 Browse[1] .opts $ssl.verifypeer [1] FALSE R.Version() $platform [1] i386-apple-darwin9.8.0 $arch [1] i386 $os [1] darwin9.8.0 $system [1] i386, darwin9.8.0 $status [1] $major [1] 2 $minor [1] 10.1 $year [1] 2009 $month [1] 12 $day [1] 14 $`svn rev` [1] 50720 $language [1] R $version.string [1] R version 2.10.1 (2009-12-14) installed.packages()[c('RCurl', 'RGoogleDocs'), ] Package LibPath Version Priority Bundle Contains RCurl RCurl /Users/hharris/Library/R/2.10/library 1.4-3 NA NA NA RGoogleDocs RGoogleDocs /Library/Frameworks/R.framework/Resources/library 0.4-1 NA NA NA Depends Imports LinkingTo Suggests Enhances OS_type License Built RCurl R (= 2.7.0), methods, bitops NA NA Rcompression NA NA BSD 2.10.1 RGoogleDocs RCurl, XML, methods NA NA NA NA NA BSD 2.10.1 Any ideas? Thank you! -Harlan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
Re: [R] How to Read a Large CSV into a Database with R
On Mon, Nov 15, 2010 at 11:46 AM, Anthony Damico ajdam...@gmail.com wrote: Hi Gabor, Thank you for the prompt reply. I definitely looked over all of the examples on the code.google.com sqldf page before sending, which is why I wrote the code read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from file , dbname=sqlite) directly pulled from their code -- read.csv.sql(~/tmp.csv, sql = create table mytab as select * from file,dbname = mydb) ..but I don't understand why this helps me around the memory problem, since I think it still all gets read into memory. Is there a way to do this line by line? OK. Maybe its something else. The reading in of the file into the database should not be a resource problem provided you have enough disk space and appropriate permissions. sqldf / RSQLite are used to get sqlite to do it so that the data never goes through R at that stage so R limitations can't affect the reading in to the sqlite database. When you read it from the sqlite database then R limitations come into effect so you just have to be sure not to read too much in at a time. The use of create table ... as select ... is to prevent sqldf from deleting the table since sqldf is normally used in a fashion where you don't want to know about the back end databases so it tries to create them and delete them behind the scenes but here you want to explicitly use them so you have to work around that. Try this example. It should be reproducible so you just have to copy it and paste it into your R session. Uncomment the indicated line if you want to be able to remove any pre-existing mydb file in the current directory. Try it in a fresh R session just to be sure that nothing mucks it up. library(sqldf) # uncomment next line to make sure we are starting clean # if (file.exists(mydb)) file.remove(mydb) # create new database sqldf(attach 'mydb' as new) # create a new file. BOD is built into R and has 6 rows. write.table(BOD, file = tmp.csv, quote = FALSE, sep = ,) # read new file into database read.csv.sql(tmp.csv, sql = create table mytab as select * from file, dbname = mydb) # how many records are in table? N - sqldf(select count(*) from mytab, dbname = mydb)[[1]] # read in chunks and display what we have read k - 4 # no of records to read at once for(i in seq(0, N-1, k)) { s - sprintf(select * from mytab limit %d, %d, i, k) print(sqldf(s, dbname = mydb)) } On my machine I get this output: Time demand 118.3 22 10.3 33 19.0 44 16.0 Time demand 15 15.6 27 19.8 showing that it read the 6 line BOD data frame in chunks of 4 as required. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot.dendrogram() plot margins
Hello, Is it possible to remove those extra margins on the sample axis from plot.dendrogram: par(oma=c(0,0,0,0),mar=c(0,0,0,0)) ddr-as.dendrogram(hclust(dist(matrix(sample(1:1000,200),nrow=100 stats:::plot.dendrogram(ddr,horiz=F,axes=F,yaxs=i,leaflab=none) vs. stats:::plot.dendrogram(ddr,horiz=T,axes=F,yaxs=i,leaflab=none) What variable / line of code corresponds to this additional margin space? I would like to modify the code to remove the extra space and have that margin equal to that when horiz=T, for plotting multiple dendrograms for one images on the same device. Thanks in advance, best. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave question
Thank you for all your comment. In result of own research I found this method that seems to do what I want in addition to your suggestions: tools::texi2dvi(myfile.tex, pdf=TRUE) Thanks again, Ralf On Mon, Nov 15, 2010 at 6:42 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 15/11/2010 6:22 AM, Dieter Menne wrote: Duncan Murdoch-2 wrote: See SweavePDF() in the patchDVI package on R-forge. In case googling patchDVI only show a few Japanese Pages, and search for patchDVI in R-Forge gives nothing: try https://r-forge.r-project.org/projects/sweavesearch/ (or did I miss something obvious, Duncan?) No, I just didn't realize that it was hard to find. But you can always select R-forge as a repository, and then install.packages() will find it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rotate column names in large matrix
Dear List, I have a large (1600*1600) matrix generated with symnum, that I am using to eyeball the structure of a dataset. I have abbreviated the column names with the abbr.colnames option. One way to get an even more compact view of the matrix would be to display the column names rotated by 90 degrees. Any pointers on how to do this would be most useful. Any other tips for displaying the matrix in compact form are of course also welcome. Many thanks, Lara [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sweave: Conditional code chunks?
I have a code junk that produces a figure. In my special case, however, data does not always exist. In cases where data exists, the code chunk is of course trival (case #1), however, what do I do for case # 2 where the data does not exist? I can obviously prevent the code from being executed by checking the existence of the object x, but on the Sweave level I have a static figure chunk. Here an example that should be reproducible: # case 1 x - c(1,2,3) # case 2 - no definition of variable #x - c(1,2,3) echo=FALSE, results=hide, fig=TRUE= if(exists(as.character(substitute(meta.summary{ plot(x) } @ In a way I would need a conditional chunk or a chunk that draws a figure only if it was generated and ignores it otherwise. Any ideas? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Non-positive definite cross-covariance matrices
I am creating covariance matrices from sets of points, and I am having frequent problems where I create matrices that are non-positive definite. I've started using the corpcor package, which was specifically designed to address these types of problems. It has solved many of my problems, but I still have one left. One of the matrices I need to calculate is a cross-covariance matrix. In other words, I need to calculate cov(A, B), where A and B are each a matrix defining a set of points. The corpcor package does not seem to be able to perform this operation. Can anyone suggest a way to create cross-covariance matrices that are guaranteed (or at least likely) to be positive definite, either using corpcor or another package? I'm using R 2.8.1 and corpcor 1.5.2 on Mac OS X 10.5.8. - Jeff __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RE : Sweave: Conditional code chunks?
Hi Ralf, I first create (or not) the figure as a separate file(s) and then use conditional LaTeX to display the existing file(s) (exactely where I want it/them to appear). This also works with png etc, but you'll have to specify the extensions. Just be careful if you change paths ... This will give something like: label=condFig1,fig=T,include=F= wantToPlot - TRUE if(wantToPlot) { pdf(file=fig1.pdf) plot(1:10) # watever you want to plot ... } @ bla .. bla .. bla .. bla .. \ifnum \Sexpr{as.numeric(exists(fig1.pdf))}0 { \begin{figure}[!h] \includegraphics {fig1} %% fig1 \label{..} \end{figure} } \fi Wolfgang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolfgang Raffelsberger, PhD Laboratoire de BioInformatique et Génomique Intégratives IGBMC, 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France Tel (+33) 388 65 3300 Fax (+33) 388 65 3276 wolfgang.raffelsberger (at) igbmc.fr De : r-help-boun...@r-project.org [r-help-boun...@r-project.org] de la part de Ralf B [ralf.bie...@gmail.com] Date d'envoi : lundi 15 novembre 2010 18:49 À : r-help Mailing List Objet : [R] Sweave: Conditional code chunks? I have a code junk that produces a figure. In my special case, however, data does not always exist. In cases where data exists, the code chunk is of course trival (case #1), however, what do I do for case # 2 where the data does not exist? I can obviously prevent the code from being executed by checking the existence of the object x, but on the Sweave level I have a static figure chunk. Here an example that should be reproducible: # case 1 x - c(1,2,3) # case 2 - no definition of variable #x - c(1,2,3) echo=FALSE, results=hide, fig=TRUE= if(exists(as.character(substitute(meta.summary{ plot(x) } @ In a way I would need a conditional chunk or a chunk that draws a figure only if it was generated and ignores it otherwise. Any ideas? Ralf __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Read a Large CSV into a Database with R
Hi Gabor, Thank you for your willingness to help me through this. The code you sent works on my machine exactly the same way as it does on yours. Unfortunately, when I run the same code on the 1.3GB file, it creates the table structure but doesn't read in a single line [confirmed with sqldf(select * from mytab,dbname=mydb)] Though I don't expect anyone to download it, the file I'm using is ss09pusa.csv from http://www2.census.gov/acs2009_1yr/pums/csv_pus.zip. I tested both sets of code on my work desktop and personal laptop, so it's not machine-specific (although it might be Windows- or 64 bit-specific). Do you have any other ideas as to how I might diagnose what's going on here? Or, alternatively, is there some workaround that would get this giant CSV into a database? If you think there's a reasonable way to use the IMPORT command with RSQLite, that seems like it would import the fastest, but I don't know that it's compatible with DBI on Windows. Thanks again! Anthony read.csv.sql(R:\\American Community Survey\\Data\\2009\\ss09pusa.csv, sql = create table mytab as select * from file, dbname = mydb) NULL Warning message: closing unused connection 3 (R:\American Community Survey\Data\2009\ss09pusa.csv) # how many records are in table? N - sqldf(select count(*) from mytab, dbname = mydb)[[1]] # read in chunks and display what we have read k - 4 # no of records to read at once for(i in seq(0, N-1, k)) { +s - sprintf(select * from mytab limit %d, %d, i, k) +print(sqldf(s, dbname = mydb)) + } Error in seq.default(0, N - 1, k) : wrong sign in 'by' argument N [1] 0 On Mon, Nov 15, 2010 at 12:24 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Mon, Nov 15, 2010 at 11:46 AM, Anthony Damico ajdam...@gmail.com wrote: Hi Gabor, Thank you for the prompt reply. I definitely looked over all of the examples on the code.google.com sqldf page before sending, which is why I wrote the code read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from file , dbname=sqlite) directly pulled from their code -- read.csv.sql(~/tmp.csv, sql = create table mytab as select * from file,dbname = mydb) ..but I don't understand why this helps me around the memory problem, since I think it still all gets read into memory. Is there a way to do this line by line? OK. Maybe its something else. The reading in of the file into the database should not be a resource problem provided you have enough disk space and appropriate permissions. sqldf / RSQLite are used to get sqlite to do it so that the data never goes through R at that stage so R limitations can't affect the reading in to the sqlite database. When you read it from the sqlite database then R limitations come into effect so you just have to be sure not to read too much in at a time. The use of create table ... as select ... is to prevent sqldf from deleting the table since sqldf is normally used in a fashion where you don't want to know about the back end databases so it tries to create them and delete them behind the scenes but here you want to explicitly use them so you have to work around that. Try this example. It should be reproducible so you just have to copy it and paste it into your R session. Uncomment the indicated line if you want to be able to remove any pre-existing mydb file in the current directory. Try it in a fresh R session just to be sure that nothing mucks it up. library(sqldf) # uncomment next line to make sure we are starting clean # if (file.exists(mydb)) file.remove(mydb) # create new database sqldf(attach 'mydb' as new) # create a new file. BOD is built into R and has 6 rows. write.table(BOD, file = tmp.csv, quote = FALSE, sep = ,) # read new file into database read.csv.sql(tmp.csv, sql = create table mytab as select * from file, dbname = mydb) # how many records are in table? N - sqldf(select count(*) from mytab, dbname = mydb)[[1]] # read in chunks and display what we have read k - 4 # no of records to read at once for(i in seq(0, N-1, k)) { s - sprintf(select * from mytab limit %d, %d, i, k) print(sqldf(s, dbname = mydb)) } On my machine I get this output: Time demand 118.3 22 10.3 33 19.0 44 16.0 Time demand 15 15.6 27 19.8 showing that it read the 6 line BOD data frame in chunks of 4 as required. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sweave question
On 15/11/2010 12:38 PM, Ralf B wrote: Thank you for all your comment. In result of own research I found this method that seems to do what I want in addition to your suggestions: tools::texi2dvi(myfile.tex, pdf=TRUE) Sure, but this doesn't quite answer your original question: you can't pass a Sweave document to texi2dvi, you need to pass it through Sweave first. So the two steps would be Sweave(myfile.Rnw) tools::texi2dvi(myfile.tex) That's essentially what SweavePDF does, but it adds some extra bells and whistles, e.g. it can process multiple related files, it can go directly to a previewer,and it will set up forward and reverse searching from the previewer. Duncan Murdoch Thanks again, Ralf On Mon, Nov 15, 2010 at 6:42 AM, Duncan Murdoch murdoch.dun...@gmail.com wrote: On 15/11/2010 6:22 AM, Dieter Menne wrote: Duncan Murdoch-2 wrote: See SweavePDF() in the patchDVI package on R-forge. In case googling patchDVI only show a few Japanese Pages, and search for patchDVI in R-Forge gives nothing: try https://r-forge.r-project.org/projects/sweavesearch/ (or did I miss something obvious, Duncan?) No, I just didn't realize that it was hard to find. But you can always select R-forge as a repository, and then install.packages() will find it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rotate column names in large matrix
Hi Lara, Hmm, I've never seen column names rotated in R (certainly you could in graphics, etc. and this should do it in that case: lapply(strsplit(colnames(x), ''), paste, collapse = \n) ). You could transpose the matrix so the columns become the rows and then just have numbers (1:1600) as the columns? That's the best solution I've found when dealing with large correlation matrices. I also usually set options(digits = 2) or thereabouts (or use round() ). I'd be interested in seeing any other ideas people have also as this has been troublesome to me in the paste some too. As much as I hate to say it, I find it easier to view some of these things in Excel (you can just write the matrix to the clipboard and paste into Excel or probably open office (though I have not tried)) because it has easy scrolls bars and zooming. Cheers, Josh On Mon, Nov 15, 2010 at 9:47 AM, Lara Poplarski larapoplar...@gmail.com wrote: Dear List, I have a large (1600*1600) matrix generated with symnum, that I am using to eyeball the structure of a dataset. I have abbreviated the column names with the abbr.colnames option. One way to get an even more compact view of the matrix would be to display the column names rotated by 90 degrees. Any pointers on how to do this would be most useful. Any other tips for displaying the matrix in compact form are of course also welcome. Many thanks, Lara [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge postscript files into ps/pdf
On Fri, 12-Nov-2010 at 03:29PM -0500, Ralf B wrote: | I know such programs, however, for my specific problem I have an R | script that creates a report (which I have to create many times) | and I would like to append about 100 single paged post scripts at | the end as appendix. File names are controlled so it would be easy | to detect them; I just miss a useful function/package that allows | me to perhaps print them to a postscript graphics device. R is a somewhat clumsy tool to do what would be more simply done outside R, but if you insist on doing it in R, it makes sense to edit the R code that did the plots in the first place. You just need a good text editor. | | Ralf | | On Fri, Nov 12, 2010 at 11:47 AM, Greg Snow greg.s...@imail.org wrote: | The best approach if creating all the files using R is to change how you create the graphs so that they all go to one file to begin with (as mentioned by Joshua), but if some of the files are created differently (rgl, external programs), then this is not an option. | | One external program that is fairly easy to use is pdftk which will concatenate multiple pdf files into 1 (among other things). If you want more control of layout then you can use LaTeX which will read and include ps/pdf. | | If you need to use R, then you can read ps files using the grImport package and then replot them to a postscript/pdf device with onefile set to TRUE. | | -- | Gregory (Greg) L. Snow Ph.D. | Statistical Data Center | Intermountain Healthcare | greg.s...@imail.org | 801.408.8111 | | | -Original Message- | From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- | project.org] On Behalf Of Ralf B | Sent: Friday, November 12, 2010 12:07 AM | To: r-help Mailing List | Subject: [R] Merge postscript files into ps/pdf | | I created multiple postscript files using ?postscript. How can I merge | them into a single postscript file using R? How can I merge them into | a single pdf file? | | Thanks a lot, | Ralf | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting- | guide.html | and provide commented, minimal, self-contained, reproducible code. | | | __ | R-help@r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) . Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to Read a Large CSV into a Database with R
From: ajdam...@gmail.com Date: Mon, 15 Nov 2010 13:28:40 -0500 To: ggrothendi...@gmail.com; r-help@r-project.org Subject: Re: [R] How to Read a Large CSV into a Database with R Hi Gabor, Thank you for your willingness to help me through this. The code you sent works on my machine exactly the same way as it does on yours. Unfortunately, when I run the same code on the 1.3GB file, it creates the table structure but doesn't read in a single line [confirmed with sqldf(select * from mytab,dbname=mydb)] Though I don't expect anyone to download it, the file I'm using is ss09pusa.csv from http://www2.census.gov/acs2009_1yr/pums/csv_pus.zip. I tested both sets of code on my work desktop and personal laptop, so it's not machine-specific Do you have any other ideas as to how I might diagnose what's going on here? Or, alternatively, is there some workaround that would get this giant CSV into a database? If you think there's a reasonable way to use the IMPORT command with RSQLite, that seems like it would import the fastest, I think someone else suggested external aproaches and indeed I hae loaded census tiger filers into db for making maps and mobile apps etc. I wold mention again that this may eliminate a memory limit and let you limp along but presumably you want a streaming source or something if your analysis has preidctable access patterns and this data will not be used as part of a hotel reservation system. Structured input data, I think TIGER was line oriented, should be easy to load into a db with bash script or java app. Thanks again! Anthony read.csv.sql(R:\\American Community Survey\\Data\\2009\\ss09pusa.csv, sql = create table mytab as select * from file, dbname = mydb) NULL Warning message: closing unused connection 3 (R:\American Community Survey\Data\2009\ss09pusa.csv) # how many records are in table? N - sqldf(select count(*) from mytab, dbname = mydb)[[1]] # read in chunks and display what we have read k - 4 # no of records to read at once for(i in seq(0, N-1, k)) { + s - sprintf(select * from mytab limit %d, %d, i, k) + print(sqldf(s, dbname = mydb)) + } Error in seq.default(0, N - 1, k) : wrong sign in 'by' argument N [1] 0 On Mon, Nov 15, 2010 at 12:24 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Mon, Nov 15, 2010 at 11:46 AM, Anthony Damico wrote: Hi Gabor, Thank you for the prompt reply. I definitely looked over all of the examples on the code.google.com sqldf page before sending, which is why I wrote the code read.csv.sql(ss09pusa.csv , sql=create table ss09pusa as select * from file , dbname=sqlite) directly pulled from their code -- read.csv.sql(~/tmp.csv, sql = create table mytab as select * from file,dbname = mydb) ..but I don't understand why this helps me around the memory problem, since I think it still all gets read into memory. Is there a way to do this line by line? OK. Maybe its something else. The reading in of the file into the database should not be a resource problem provided you have enough disk space and appropriate permissions. sqldf / RSQLite are used to get sqlite to do it so that the data never goes through R at that stage so R limitations can't affect the reading in to the sqlite database. When you read it from the sqlite database then R limitations come into effect so you just have to be sure not to read too much in at a time. The use of create table ... as select ... is to prevent sqldf from deleting the table since sqldf is normally used in a fashion where you don't want to know about the back end databases so it tries to create them and delete them behind the scenes but here you want to explicitly use them so you have to work around that. Try this example. It should be reproducible so you just have to copy it and paste it into your R session. Uncomment the indicated line if you want to be able to remove any pre-existing mydb file in the current directory. Try it in a fresh R session just to be sure that nothing mucks it up. library(sqldf) # uncomment next line to make sure we are starting clean # if (file.exists(mydb)) file.remove(mydb) # create new database sqldf(attach 'mydb' as new) # create a new file. BOD is built into R and has 6 rows. write.table(BOD, file = tmp.csv, quote = FALSE, sep = ,) # read new file into database read.csv.sql(tmp.csv, sql = create table mytab as select * from file, dbname = mydb) # how many records are in table? N - sqldf(select count(*) from mytab, dbname = mydb)[[1]] # read in chunks and display what we have read k - 4 # no of records to read at once for(i in seq(0, N-1, k)) { s - sprintf(select * from mytab limit %d, %d, i, k) print(sqldf(s, dbname = mydb)) } On my machine I get this output: Time demand 1 1 8.3 2 2 10.3 3 3 19.0 4 4 16.0
[R] Need help with pointLabels()
Hello R-list, I am plotting a weighted linear regression in R. The points on my chart are also scaled to sample size, so some points are large, some are small. I have figured out everything I need using the plot() function: how to plot the points, scale them by sample size, weight the linear regression by sample size, plot that line, and plot the labels for the points. However, although the pointLabel() function assures that the labels do not overlap with each other, they still overlap with the larger points, and sometimes run off the side of the chart too. I have tried saving the plot as EPS, PDF, SVG and opening them in GIMP to try to move the text around, but GIMP does not recognize the text as text, so I cannot select the labels and move them off the points. I have tried various offsets too but that does not work either. So I need to either (a) print the labels on the plot so that they do not overlap the points or (b) be able to move the text around in the resulting image file. Any advice? Here is the code I am using. BTW, I have also tried ggplot2 but it has no function (that I am aware of) like pointLabels() to avoid label overlap. Please feel free to email me directly. postscript(file=fig_a.eps); plot(x, y, xlab=X-axis, ylab=Y-axis, cex=0.02*sample_size, pch=21); abline(lm(y~x, weight=sample_size)) pointLabel(x, y, labels=Category, cex=1, doPlot=TRUE, offset=2.5); dev.off() Thank you, -craig.star...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] first-order integer valued autoregressive process, inar(1)
Hello, in my doctoral thesis i try to model time series crash count data with an inar(1)-process, but i have a few problems in writing the r-code. is there someone, who works with inar-processes. i would be very grateful, if someone gives me some ideas in writing the code. nazli __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R error using Survr function with gcmrec
I've solved the condition problem, but have come across another one with the gcmrec function and was wondering if anyone could point me in the right direction again? After running the code below, I get this error message: Error in gcmrec(Survr(id, time, event) ~ var, data = dataOK, s = 1096) : NA/NaN/Inf in foreign function call (arg 14) I've looked into the code of the function and believe it has something to do with the line call - match.call() but I'm not sure how it works. Thanks for any input! # dataset id=c(rep(1,4),rep(2,4),rep(3,4),rep(4,5)) start=c(1996-01-01,1997-01-01,1998-01-01,1998-03-15,1996-01-01, 1996-04-15,1997-01-01,1998-01-01,1996-01-01,1997-01-01, 1998-01-01,1998-09-30,1996-01-01,1997-01-01,1997-12-15, 1998-01-01,1998-06-14) stop=c(1997-01-01,1998-01-01,1998-03-15,1999-01-01,1996-04-15, 1997-01-01,1998-01-01,1999-01-01,1997-01-01,1998-01-01, 1998-09-30,1999-01-01,1997-01-01,1997-12-15,1998-01-01, 1998-06-14,1999-01-01) time=c(366,365,73,292,105,261,365,365,366,365,272,93,366,348,17,164,201) event=c(0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0) enum=c(rep(seq(1,4,1),4),5) var=c(21312,21869,22441,22441,3105,3105,3086,3075,130610,133147,135692, 135692,11686,11976,11976,12251,12251) data=data.frame(id,start,stop,time,event,enum,var) # modify Survr function Survr = function (id, time, event) { if (length(unique(id)) = length(event[event == 0])) { stop(Data doesn't match. Every subject must have a censored time) } if (length(unique(event)) 2 | max(event) != 1 | min(event) != 0) { stop(event must be 0-1) } ans - cbind(id, time, event) oldClass(ans) - Survr invisible(ans) } # model dataOK=addCenTime(data) m-gcmrec(Survr(id,time,event)~var, data=dataOK, s=1096) On Thu, Nov 11, 2010 at 3:50 PM, Emily C lia.bede...@gmail.com wrote: Sorry for the lack of context - I found a forum ( http://r.789695.n4.nabble.com/R-help-f789696.html) that I thought was easier to navigate and was using it for the first time. Hit the reply button, but forgot that the mailing list recipients would not see previous message in the thread. I've pasted it at the bottom for reference. I actually need my data as single year entries as I am assessing the effect of a time-varying covariate (var) that is measured (and changes) on an annual basis. However, my events are assessed on a daily basis. But thank you for pointing out the condition in Survr(). To correct the problem, I believe it would still be valid to change the condition to: if (length(unique(id)) = length(event[event == 0])) { stop(Data doesn't match. Every subject must have a censored time) Thank you for the reply! On Thu, Nov 11, 2010 at 2:54 PM, David Winsemius dwinsem...@comcast.netwrote: On Nov 11, 2010, at 2:50 PM, David Winsemius wrote: On Nov 11, 2010, at 2:09 PM, Emily wrote: I'm having the same problem (???: from a three year-old posting for which you didn't copy any context.) and was wondering whether you ever found a solution? It gives me the error Error in Survr(id, time, event) : Data doesn't match. Every subject must have a censored time even though all my subjects are right-censored, and to be sure, I've even used the addCenTime function. Any input appreciated! Your data has a lot of 0 events at the end of calendar years. That does not seem to be the expected format for the Survr records. It appears to define an invalid record as one where the only censoring event is at the time of the ^valid^ last observation. Here's the first line in Survr that is throwing the error: if (length(unique(id)) != length(event[event == 0])) { stop(Data doesn't match. Every subject must have a censored time) I suspect you need to collapse your single-year entries with 0 events into multiple year entries with an event. -- David. Here's my sample data: id=c(rep(1,4),rep(2,4),rep(3,4),rep(4,5)) start=c(1996-01-01,1997-01-01,1998-01-01,1998-03-15,1996-01-01,1996-04-15,1997-01-01,1998-01-01,1996-01-01,1997-01-01,1998-01-01,1998-09-30,1996-01-01,1997-01-01,1997-12-15,1998-01-01,1998-06-14) stop=c(1997-01-01,1998-01-01,1998-03-15,1999-01-01,1996-04-15,1997-01-01,1998-01-01,1999-01-01,1997-01-01,1998-01-01,1998-09-30,1999-01-01,1997-01-01,1997-12-15,1998-01-01,1998-06-14,1999-01-01) time=c(366,365,73,292,105,261,365,365,366,365,272,93,366,348,17,164,201) event=c(0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,1,0) enum=c(rep(seq(1,4,1),4),5) var=c(21312,21869,22441,22441,3105,3105,3086,3075,130610,133147,135692,135692,11686,11976,11976,12251,12251) data=data.frame(id,start,stop,time,event,enum,var) dataOK=addCenTime(data) m-gcmrec(Survr(id,time,event)~var, data=dataOK) -- View this message in context: http://r.789695.n4.nabble.com/R-error-using-Survr-function-with-gcmrec-tp858931p3038374.html Sent from the R help mailing list archive at Nabble.com. __
[R] Re : interpretation of coefficients in survreg AND obtaining the hazard function
Dear Prof Therneau, thank yo for this information: this is going to be most useful for what I want to do. I will look into the ACF model. Yours, David Biau. De : Terry Therneau thern...@mayo.edu Cc : r-help@r-project.org Envoyé le : Lun 15 novembre 2010, 15h 33min 23s Objet : Re: interpretation of coefficients in survreg AND obtaining the hazard function 1. The weibull is the only distribution that can be written in both a proportional hazazrds for and an accelerated failure time form. Survreg uses the latter. In an ACF model, we model the time to failure. Positive coefficients are good (longer time to death). In a PH model, we model the death rate. Positive coefficients are bad (higher death rate). You are not the first to be confused by the change in sign between the two models. 2. There are about 5 different ways to parameterize a Weibull distribution, 1-4 appear in various texts and the acf form is #5. This is a second common issue with survreg that strikes only the more sophisticated users: to understand the output they look up the Weibull in a textbook, and become even more confused! Kalbfliesch and Prentice is a good reference for the acf form. The manual page for psurvreg has some information on this, as does the very end of ?survreg. The psurvreg page also has an example of how to extract the hazard function for a Weibull fit. Begin included message Dear R help list, I am modeling some survival data with coxph and survreg (dist='weibull') using package survival. I have 2 problems: 1) I do not understand how to interpret the regression coefficients in the survreg output and it is not clear, for me, from ?survreg.objects how to. Here is an example of the codes that points out my problem: - data is stc1 - the factor is dichotomous with 'low' and 'high' categories slr - Surv(stc1$ti_lr, stc1$ev_lr==1) mca - coxph(slr~as.factor(grade2=='high'), data=stc1) mcb - coxph(slr~as.factor(grade2), data=stc1) mwa - survreg(slr~as.factor(grade2=='high'), data=stc1, dist='weibull', scale=0) mwb - survreg(slr~as.factor(grade2), data=stc1, dist='weibull', scale=0) summary(mca)$coef coef exp(coef) se(coef) z Pr(|z|) as.factor(grade2 == high)TRUE 0.2416562 1.273356 0.2456232 0.9838494 0.3251896 summary(mcb)$coef coef exp(coef) se(coef) z Pr(|z|) as.factor(grade2)low -0.2416562 0.7853261 0.2456232 -0.9838494 0.3251896 summary(mwa)$coef (Intercept) as.factor(grade2 == high)TRUE 7.9068380 -0.4035245 summary(mwb)$coef (Intercept) as.factor(grade2)low 7.5033135 0.4035245 No problem with the interpretation of the coefs in the cox model. However, i do not understand why a) the coefficients in the survreg model are the opposite (negative when the other is positive) of what I have in the cox model? are these not the log(HR) given the categories of these variable? b) how come the intercept coefficient changes (the scale parameter does not change)? 2) My second question relates to the first. a) given a model from survreg, say mwa above, how should i do to extract the base hazard and the hazard of each patient given a set of predictors? With the hazard function for the ith individual in the study given by h_i(t) = exp(\beta'x_i)*\lambda*\gamma*t^{\gamma-1}, it doesn't look like to me that predict(mwa, type='linear') is \beta'x_i. b) since I need the coefficient intercept from the model to obtain the scale parameter to obtain the base hazard function as defined in Collett (h_0(t)=\lambda*\gamma*t^{\gamma-1}), I am concerned that this coefficient intercept changes depending on the reference level of the factor entered in the model. The change is very important when I have more than one predictor in the model. Any help would be greatly appreciated, David Biau. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Force evaluation of variable when calling partialPlot
RE: the folloing original question: I'm using the randomForest package and would like to generate partial dependence plots, one after another, for a variety of variables: m - randomForest( s, ... ) varnames - c( var1, var2, var3, var4 ) # var1..4 are all in data frame s for( v in varnames ) { partialPlot( x=m, pred.data=s, x.var=v ) } ...but this doesn't work, with partialPlot complaining that it can't find the variable v. I'm having a very similar problem using the partialPlot function. A simplified version of my code looks like this: data.in - paste(basedir,/sw_climate_dataframe_,root.name,.csv,sep=) data - read.csv(data.in) vars - c(sm1,precip.spring,tmax.fall,precip.fall) #selected variables in data frame xdata - as.data.frame(data[,vars]) ydata - data[,5] ntree - 2000 rf.pdplots - function() { sel.rf - randomForest(xdata,ydata,ntree=ntree,keep.forest=TRUE) par(family=sans,mfrow=c(2,2),mar=c(4,3,1,2),oma=c(0,3,0,0),mgp=c(2,1,0)) for (i in 1:length(vars)) { print((vars)[i]) partialPlot(sel.rf,xdata,vars[i],which.class=1,xlab=vars[i],main=) mtext((Logit of probability of high severity)/2, side=2, line=1, outer=T) } } rf.pdplots() When I run this code, with partialPlots embedded in a function, I get the following error: Error in eval(expr, envir, enclos) : object i not found If I just take the code inside the function and run it (not embedded in the function), it runs just fine. Is there some reason why partialPlots doesn't like to be called from inside a function? Other things I've tried/noticed: 1. If I comment out the line with the partialPlots call (and the next line with mtext), the function runs as expected and prints the variable names one at a time. 2. If the variable i is defined as a number (e.g., 4) in the global environment, then the function will run, the names print out one at a time, and four plots are created. HOWEVER, the plots are all for the last (4th) variable, BUT the x labels actually are different on each plot (i.e., the xlab is actually looping through the four values in vars). Can anyone help me make sense of this? Thanks. -Greg -- View this message in context: http://r.789695.n4.nabble.com/Force-evaluation-of-variable-when-calling-partialPlot-tp2954229p3043750.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem in installing and starting Rattle
On 16 November 2010 02:40, Feng Mai maif...@gmail.com wrote: I also have the problem trying to start rattle Windows 7 32-bit R 2.12.0 When I try library(rattle) I get an error message The procedure entry point deflateSetHeader could not be located in the dynamic link library zilb1.dll I hit OK and it prompts me to install GTK+ again. I tried to uninstall GTK+ first and delete all related files as Dieter suggested but still wont work :( Yes, this is another issue that arose with R 2.12.0 and discussed on the rattle-users mailing list (http://groups.google.com/group/rattle-users). Be sure to use the newest version of Rattle (currently 2.5.47) where this has been fixed. I've not worked out why, but if the XML package is loaded before the RGtk2 package we see this error, but not if RGtk2 is loaded first. Rattle now loads RGtk2 first. Regards, Graham -- View this message in context: http://r.789695.n4.nabble.com/Problem-in-installing-and-starting-Rattle-tp3042502p3043262.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Defining functions inside loops
Hello, I was trying to define a set of functions inside a loop, with the loop index working as a parameter for each function. Below I post a simpler example, as to illustrate what I was intending: f-list() for (i in 1:10){ f[[i]]-function(t){ f[[i]]-t^2+i } } rm(i) With that, I was expecting that f[[1]] would be a function defined by t^2+1, f[[2]] by t^2+2 and so on. However, the index i somehow doesn't get in the function definition on each loop, that is, the functions f[[1]] through f[[10]] are all defined by t^2+i. Thus, if I remove the object i from the workspace, I get an error when evaluating these functions. Otherwise, if don't remove the object i, it ends the loop with value equal to 10 and then f[[1]](t)=f[[2]](t)=...=f[[10]](t)=t^2+10. I am aware that I could simply put f-function(u,i){ f-t^2+i } but that's really not what I want. Any help would be appreciated. Thanks in advance, Eduardo Horta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help with pointLabels()
What package is pointLabel (or is it pointLabels) in? giving a reproducible example includes stating packages other than the standard ones. What you are trying to do is not simple for general cases, some tools work better on some datasets, but others work better on other datasets. Some other tools that may help (but you will probably need to do some hand tweaking after using them, or basing your solution on these) could be: thigmophobe.labels in the plotrix package spread.labels in the plotrix package TkIdentify in the TeachingDemos package spread.labs in the TeachingDemos package -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Craig Starger Sent: Monday, November 15, 2010 12:38 PM To: r-help@r-project.org Subject: [R] Need help with pointLabels() Hello R-list, I am plotting a weighted linear regression in R. The points on my chart are also scaled to sample size, so some points are large, some are small. I have figured out everything I need using the plot() function: how to plot the points, scale them by sample size, weight the linear regression by sample size, plot that line, and plot the labels for the points. However, although the pointLabel() function assures that the labels do not overlap with each other, they still overlap with the larger points, and sometimes run off the side of the chart too. I have tried saving the plot as EPS, PDF, SVG and opening them in GIMP to try to move the text around, but GIMP does not recognize the text as text, so I cannot select the labels and move them off the points. I have tried various offsets too but that does not work either. So I need to either (a) print the labels on the plot so that they do not overlap the points or (b) be able to move the text around in the resulting image file. Any advice? Here is the code I am using. BTW, I have also tried ggplot2 but it has no function (that I am aware of) like pointLabels() to avoid label overlap. Please feel free to email me directly. postscript(file=fig_a.eps); plot(x, y, xlab=X-axis, ylab=Y-axis, cex=0.02*sample_size, pch=21); abline(lm(y~x, weight=sample_size)) pointLabel(x, y, labels=Category, cex=1, doPlot=TRUE, offset=2.5); dev.off() Thank you, -craig.star...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-positive definite cross-covariance matrices
What made you think that a cross-covariance matrix should be positive definite? Id does not even need to be a square matrix, or symmetric. Giovanni Petris On Mon, 2010-11-15 at 12:58 -0500, Jeff Bassett wrote: I am creating covariance matrices from sets of points, and I am having frequent problems where I create matrices that are non-positive definite. I've started using the corpcor package, which was specifically designed to address these types of problems. It has solved many of my problems, but I still have one left. One of the matrices I need to calculate is a cross-covariance matrix. In other words, I need to calculate cov(A, B), where A and B are each a matrix defining a set of points. The corpcor package does not seem to be able to perform this operation. Can anyone suggest a way to create cross-covariance matrices that are guaranteed (or at least likely) to be positive definite, either using corpcor or another package? I'm using R 2.8.1 and corpcor 1.5.2 on Mac OS X 10.5.8. - Jeff __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Defining functions inside loops
This is a side effect of the lazy evaluation done in functions. Look at the help page for the force function for more details and how to force evaluation and solve your problem. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Eduardo de Oliveira Horta Sent: Monday, November 15, 2010 1:50 PM To: r-help@r-project.org Subject: [R] Defining functions inside loops Hello, I was trying to define a set of functions inside a loop, with the loop index working as a parameter for each function. Below I post a simpler example, as to illustrate what I was intending: f-list() for (i in 1:10){ f[[i]]-function(t){ f[[i]]-t^2+i } } rm(i) With that, I was expecting that f[[1]] would be a function defined by t^2+1, f[[2]] by t^2+2 and so on. However, the index i somehow doesn't get in the function definition on each loop, that is, the functions f[[1]] through f[[10]] are all defined by t^2+i. Thus, if I remove the object i from the workspace, I get an error when evaluating these functions. Otherwise, if don't remove the object i, it ends the loop with value equal to 10 and then f[[1]](t)=f[[2]](t)=...=f[[10]](t)=t^2+10. I am aware that I could simply put f-function(u,i){ f-t^2+i } but that's really not what I want. Any help would be appreciated. Thanks in advance, Eduardo Horta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help with pointLabels()
Sorry, pointLabel() is in the package maptools: http://rgm2.lab.nig.ac.jp/RGM2/R_man-2.9.0/library/maptools/man/pointLabel.html Thank you for the tips on the other packages, I will give it a try. -C On Mon, Nov 15, 2010 at 3:54 PM, Greg Snow greg.s...@imail.org wrote: What package is pointLabel (or is it pointLabels) in? giving a reproducible example includes stating packages other than the standard ones. What you are trying to do is not simple for general cases, some tools work better on some datasets, but others work better on other datasets. Some other tools that may help (but you will probably need to do some hand tweaking after using them, or basing your solution on these) could be: thigmophobe.labels in the plotrix package spread.labels in the plotrix package TkIdentify in the TeachingDemos package spread.labs in the TeachingDemos package -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- project.org] On Behalf Of Craig Starger Sent: Monday, November 15, 2010 12:38 PM To: r-help@r-project.org Subject: [R] Need help with pointLabels() Hello R-list, I am plotting a weighted linear regression in R. The points on my chart are also scaled to sample size, so some points are large, some are small. I have figured out everything I need using the plot() function: how to plot the points, scale them by sample size, weight the linear regression by sample size, plot that line, and plot the labels for the points. However, although the pointLabel() function assures that the labels do not overlap with each other, they still overlap with the larger points, and sometimes run off the side of the chart too. I have tried saving the plot as EPS, PDF, SVG and opening them in GIMP to try to move the text around, but GIMP does not recognize the text as text, so I cannot select the labels and move them off the points. I have tried various offsets too but that does not work either. So I need to either (a) print the labels on the plot so that they do not overlap the points or (b) be able to move the text around in the resulting image file. Any advice? Here is the code I am using. BTW, I have also tried ggplot2 but it has no function (that I am aware of) like pointLabels() to avoid label overlap. Please feel free to email me directly. postscript(file=fig_a.eps); plot(x, y, xlab=X-axis, ylab=Y-axis, cex=0.02*sample_size, pch=21); abline(lm(y~x, weight=sample_size)) pointLabel(x, y, labels=Category, cex=1, doPlot=TRUE, offset=2.5); dev.off() Thank you, -craig.star...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Defining functions inside loops
You could make f[[i]] be function(t)t^2+i for i in 1:10 with f - lapply(1:10, function(i)local({ force(i) ; function(x)x^2+i})) After that we get the correct results f[[7]](100:103) [1] 10007 10208 10411 10616 but looking at the function doesn't immdiately tell you what 'i' is in the function f[[7]] function (x) x^2 + i environment: 0x19d7458 You can find it in f[[7]]'s environment get(i, envir=environment(f[[7]])) [1] 7 The call to force() in the call to local() is not necessary in this case, although it can help in other situations. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Eduardo de Oliveira Horta Sent: Monday, November 15, 2010 12:50 PM To: r-help@r-project.org Subject: [R] Defining functions inside loops Hello, I was trying to define a set of functions inside a loop, with the loop index working as a parameter for each function. Below I post a simpler example, as to illustrate what I was intending: f-list() for (i in 1:10){ f[[i]]-function(t){ f[[i]]-t^2+i } } rm(i) With that, I was expecting that f[[1]] would be a function defined by t^2+1, f[[2]] by t^2+2 and so on. However, the index i somehow doesn't get in the function definition on each loop, that is, the functions f[[1]] through f[[10]] are all defined by t^2+i. Thus, if I remove the object i from the workspace, I get an error when evaluating these functions. Otherwise, if don't remove the object i, it ends the loop with value equal to 10 and then f[[1]](t)=f[[2]](t)=...=f[[10]](t)=t^2+10. I am aware that I could simply put f-function(u,i){ f-t^2+i } but that's really not what I want. Any help would be appreciated. Thanks in advance, Eduardo Horta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with Hist
Hi Ivan / Dieter Thanks for your assistance, I will try the suggestions. The plotrix solution seems like the more elegant solution, except that unless I missing something I will land up with a value based graph rather than a density based one, which is what I really wanted. It seems like the only solution will be to subset the data as has been suggested. Regards Steve On 2010/11/15 06:45 PM, Ivan Calandra wrote: Well, another possibility would be to edit the plot so that you cut the empty part (between 300 and 2000). There might be some function that can do it, maybe the plotrix::gap.barplot() that Dieter already told you about. Le 11/15/2010 16:22, sbsid...@mweb.co.za a écrit : Thanks What you have suggested of course works but I am trying to reduce the 'ugliness'. Anybody got any other ideas? Regards Steve Sent from my BlackBerry® wireless device -Original Message- From: Ivan Calandraivan.calan...@uni-hamburg.de Sender: r-help-boun...@r-project.org Date: Mon, 15 Nov 2010 16:08:47 To:r-help@r-project.org Reply-To: ivan.calan...@uni-hamburg.de Subject: Re: [R] Help with Hist Hi, I think you should also give the upper extreme: x- c(rnorm(80)+10, 101:110, 2001:2010) hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500)) Error in hist.default(x, breaks = c(0, 20, 40, 60, 80, 100, 200, 500)) : some 'x' not counted; maybe 'breaks' do not span range of 'x' hist(x, breaks=c(0, 20, 40, 60, 80, 100, 200, 500, 2100)) ## which looks horrible, but works, up to you how to cut it HTH, Ivan Le 11/15/2010 15:53, Steve Sidney a écrit : Dear list I am trying to re-scale a histogram and using hist() but can not seem to create a reduced scale where the upper values are not plotted. What I have is about 100 of which 80 or so are between a value of 0 and 40 , one or two in the hundreds and an outlier around 2000. What I would like to do is create an x-scale that shows 5 bins between 0-100 and then 3/4 bins between 100 and 2000 but I don't need any resolution on the above 100 values. If I use breaks c(0, 20, 40, 60, 80, 100, 200, 500) R gives me an error saying that there are values not included, which of course I know but I wish to ignore them. It seems that I am missing something quite simple. Any help would be appreciated. Regards Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help with pointLabels()
Hi Craig, As Greg pointed out, choose optimal locations for labels is tricky (side note for anyone reading pointLabel() comes from the maptools package). Inferring from your code, the labels you are plotting are the Category each point belongs to which suggests there may not be a huge amount of unique values. Assuming you have a reasonably small number of unique labels (say 10), what about colouring the points and adding a legend? This would be straightforward to code and completely sidesteps the locationing of labels issue. Of course, this can become quite clunky with numerous levels. You could do something similar with point shape, and you could even mix shape and color to convey more information in the same space. Cheers, Josh On Mon, Nov 15, 2010 at 11:37 AM, Craig Starger craig.star...@gmail.com wrote: Hello R-list, I am plotting a weighted linear regression in R. The points on my chart are also scaled to sample size, so some points are large, some are small. I have figured out everything I need using the plot() function: how to plot the points, scale them by sample size, weight the linear regression by sample size, plot that line, and plot the labels for the points. However, although the pointLabel() function assures that the labels do not overlap with each other, they still overlap with the larger points, and sometimes run off the side of the chart too. I have tried saving the plot as EPS, PDF, SVG and opening them in GIMP to try to move the text around, but GIMP does not recognize the text as text, so I cannot select the labels and move them off the points. I have tried various offsets too but that does not work either. So I need to either (a) print the labels on the plot so that they do not overlap the points or (b) be able to move the text around in the resulting image file. Any advice? Here is the code I am using. BTW, I have also tried ggplot2 but it has no function (that I am aware of) like pointLabels() to avoid label overlap. Please feel free to email me directly. postscript(file=fig_a.eps); plot(x, y, xlab=X-axis, ylab=Y-axis, cex=0.02*sample_size, pch=21); abline(lm(y~x, weight=sample_size)) pointLabel(x, y, labels=Category, cex=1, doPlot=TRUE, offset=2.5); dev.off() Thank you, -craig.star...@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] indexing lists
Hi List, I'm trying to work out how to use which(), or another function, to find the top-level index of a list item based on a condition. An example will clarify my question. a - list(c(1,2),c(3,4)) a [[1]] [1] 1 2 [[2]] [1] 3 4 I want to find the top level index of c(1,2), which should return 1 since; a[[1]] [1] 1 2 I can't seem to work out the syntax. I've tried; which(a == c(1,2)) and an error about coercing to double is returned. I can find the index of elements of a particular item by which(a[[1]]==c(1,2)) or which(a[[1]]==1) etc that return [1] 1 2 and [1] 1 respectively as they should. Any thoughts? C [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Defining functions inside loops
Thanks a lot for your readiness! Problem (apparently) solved! Best regards, Eduardo Horta On Mon, Nov 15, 2010 at 7:10 PM, William Dunlap wdun...@tibco.com wrote: You could make f[[i]] be function(t)t^2+i for i in 1:10 with f - lapply(1:10, function(i)local({ force(i) ; function(x)x^2+i})) After that we get the correct results f[[7]](100:103) [1] 10007 10208 10411 10616 but looking at the function doesn't immdiately tell you what 'i' is in the function f[[7]] function (x) x^2 + i environment: 0x19d7458 You can find it in f[[7]]'s environment get(i, envir=environment(f[[7]])) [1] 7 The call to force() in the call to local() is not necessary in this case, although it can help in other situations. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Eduardo de Oliveira Horta Sent: Monday, November 15, 2010 12:50 PM To: r-help@r-project.org Subject: [R] Defining functions inside loops Hello, I was trying to define a set of functions inside a loop, with the loop index working as a parameter for each function. Below I post a simpler example, as to illustrate what I was intending: f-list() for (i in 1:10){ f[[i]]-function(t){ f[[i]]-t^2+i } } rm(i) With that, I was expecting that f[[1]] would be a function defined by t^2+1, f[[2]] by t^2+2 and so on. However, the index i somehow doesn't get in the function definition on each loop, that is, the functions f[[1]] through f[[10]] are all defined by t^2+i. Thus, if I remove the object i from the workspace, I get an error when evaluating these functions. Otherwise, if don't remove the object i, it ends the loop with value equal to 10 and then f[[1]](t)=f[[2]](t)=...=f[[10]](t)=t^2+10. I am aware that I could simply put f-function(u,i){ f-t^2+i } but that's really not what I want. Any help would be appreciated. Thanks in advance, Eduardo Horta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing lists
Chris, Well, the 'answer' could be: which(sapply(a, function(x) all(x == c(1,2 But I wonder how these elements of 'a' in your actual application are coming to be? If you're constructing them, you can give the elements of the list names, and then it doesn't matter what numerical index they have, you can just reference them by name. a - list(name1 = 1:2, name2 = 3:4) a a - c(anothername = list(9:10), a) a a$name1 Chris Carleton wrote: Hi List, I'm trying to work out how to use which(), or another function, to find the top-level index of a list item based on a condition. An example will clarify my question. a - list(c(1,2),c(3,4)) a [[1]] [1] 1 2 [[2]] [1] 3 4 I want to find the top level index of c(1,2), which should return 1 since; a[[1]] [1] 1 2 I can't seem to work out the syntax. I've tried; which(a == c(1,2)) and an error about coercing to double is returned. I can find the index of elements of a particular item by which(a[[1]]==c(1,2)) or which(a[[1]]==1) etc that return [1] 1 2 and [1] 1 respectively as they should. Any thoughts? C [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing lists
Hi Chris, Does this do what you're after? It just compares each element of a (i.e., a[[1]] and a[[2]]) to c(1, 2) and determines if they are identical or not. which(sapply(a, identical, y = c(1, 2))) There were too many 1s floating around for me to figure out if you wanted to find elements of a that matched the entire vector or subelements of a that matched elements of the vector (if that makes any sense). HTH, Josh On Mon, Nov 15, 2010 at 1:24 PM, Chris Carleton w_chris_carle...@hotmail.com wrote: Hi List, I'm trying to work out how to use which(), or another function, to find the top-level index of a list item based on a condition. An example will clarify my question. a - list(c(1,2),c(3,4)) a [[1]] [1] 1 2 [[2]] [1] 3 4 I want to find the top level index of c(1,2), which should return 1 since; a[[1]] [1] 1 2 I can't seem to work out the syntax. I've tried; which(a == c(1,2)) and an error about coercing to double is returned. I can find the index of elements of a particular item by which(a[[1]]==c(1,2)) or which(a[[1]]==1) etc that return [1] 1 2 and [1] 1 respectively as they should. Any thoughts? C [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Defining functions inside loops
On 15/11/2010 4:10 PM, William Dunlap wrote: You could make f[[i]] be function(t)t^2+i for i in 1:10 with f- lapply(1:10, function(i)local({ force(i) ; function(x)x^2+i})) After that we get the correct results f[[7]](100:103) [1] 10007 10208 10411 10616 but looking at the function doesn't immdiately tell you what 'i' is in the function f[[7]] function (x) x^2 + i environment: 0x19d7458 You can find it in f[[7]]'s environment get(i, envir=environment(f[[7]])) [1] 7 The call to force() in the call to local() is not necessary in this case, although it can help in other situations. I thought it would be clearer to use substitute. My first guess was f - list() for (i in 1:10) f[[i]] - substitute(function(t) t^2 + increment, list(increment=i)) but this doesn't work; f[[i]] ends up as an expression evaluating to the function we want. So it looks okay f[[3]] function(t) t^2 + 3L but it doesn't work: f[[3]](1) Error: attempt to apply non-function So I tried to eval it: eval(f[[3]]) function(t) t^2 + increment Now that's a puzzle!! Why did increment come back? But it works: eval(f[[3]])(1) [1] 4 (I do actually know the answer to the puzzle, but it took me a while to figure out. Spoiler below.) So here's what I'd recommend as an answer to the original question: save - options(keep.source=FALSE) f - list() for (i in 1:10) f[[i]] - eval(substitute(function(t) t^2 + increment, list(increment=i))) options(save) Then things look just right: f[[3]] function (t) t^2 + 3L f[[3]](4) [1] 19 f[[3]](1) [1] 4 Of course, I didn't quite meet my objective of coming up with code that looked simpler than your lapply() suggestion. But at least we end up with functions with simple environments that print properly. Duncan Murdoch Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Eduardo de Oliveira Horta Sent: Monday, November 15, 2010 12:50 PM To: r-help@r-project.org Subject: [R] Defining functions inside loops Hello, I was trying to define a set of functions inside a loop, with the loop index working as a parameter for each function. Below I post a simpler example, as to illustrate what I was intending: f-list() for (i in 1:10){ f[[i]]-function(t){ f[[i]]-t^2+i } } rm(i) With that, I was expecting that f[[1]] would be a function defined by t^2+1, f[[2]] by t^2+2 and so on. However, the index i somehow doesn't get in the function definition on each loop, that is, the functions f[[1]] through f[[10]] are all defined by t^2+i. Thus, if I remove the object i from the workspace, I get an error when evaluating these functions. Otherwise, if don't remove the object i, it ends the loop with value equal to 10 and then f[[1]](t)=f[[2]](t)=...=f[[10]](t)=t^2+10. I am aware that I could simply put f-function(u,i){ f-t^2+i } but that's really not what I want. Any help would be appreciated. Thanks in advance, Eduardo Horta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to plot effect of x1 while controlling for x2
Hello R-helpers, Please see a self-contained example below, in which I attempt to plot the effect of x1 on y, while controlling for x2. Is there a function that does the same thing, without having to specify that x2 should be held at its mean value? It works fine for this simple example, but might be cumbersome if the model was more complex (e.g., lots of x variables, and/or interactions). Many thanks, Mark #make some random data x1-rnorm(100) x2-rnorm(100,2,1) y-0.75*x1+0.35*x2 #fit a model model1-lm(y~x1+x2) #predict the effect of x1 on y, while controlling for x2 xv1-seq(min(x1),max(x1),0.1) yhat_x1-predict(model1,list(x1=xv1,x2=rep(mean(x2),length(xv1))),type=response) #plot the predicted values plot(y~x1,xlim=c(min(x1),max(x1)), ylim=c(min(y),max(y))) lines(xv1,yhat_x1) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Executing Command on Multiple R Objects
Hello Everyone - I want to print a number of results from lme function objects out to a txt file. How could I do this more efficiently than what you see here: out2 - capture.output(summary(mod2a)) out3 - capture.output(summary(mod3)) out4 - capture.output(summary(mod5)) out5 - capture.output(summary(mod6)) out6 - capture.output(summary(mod7)) cat(out2,file=out.txt,sep=\n,append=TRUE) cat(out3,file=out.txt,sep=\n,append=TRUE) cat(out4,file=out.txt,sep=\n,append=TRUE) cat(out5,file=out.txt,sep=\n,append=TRUE) cat(out6,file=out.txt,sep=\n,append=TRUE) cat(third,file=out.txt,sep=\n,append=TRUE) Here's an example of what I tried, but didn't work. for (i in ls(pat = mod)) { out - capture.output(summary[[i]]) cat(out, file = results_paired.txt, sep = \n, append = TRUE) } -- View this message in context: http://r.789695.n4.nabble.com/Executing-Command-on-Multiple-R-Objects-tp3043871p3043871.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] indexing lists
On Nov 15, 2010, at 4:24 PM, Chris Carleton wrote: Hi List, I'm trying to work out how to use which(), or another function, to find the top-level index of a list item based on a condition. An example will clarify my question. a - list(c(1,2),c(3,4)) a [[1]] [1] 1 2 [[2]] [1] 3 4 I want to find the top level index of c(1,2), which should return 1 since; a[[1]] [1] 1 2 I can't seem to work out the syntax. I've tried; which(a == c(1,2)) It's a bit more involved than that (since which is expecting a vector of logicals and mapply is expecting a set of list arguments.) I needed to send mapply a MoreArgs list that would remain constant from test to test when using identical. which(mapply(identical, a, MoreArgs=list(c(1,2 [1] 1 (Admittedly very similar to Iverson's solution, but it is more readable to my eyes. On the other hand my method may stumble on more complex object with attributes. You should read hte identical help page at any rate.) and an error about coercing to double is returned. I can find the index of elements of a particular item by which(a[[1]]==c(1,2)) or which(a[[1]]==1) etc that return [1] 1 2 and [1] 1 respectively as they should. But neither was testing the overall equality of a[1]'s contents with c(1,2) which appears to be your goal. Iverson's all and my identical accomplish that goal. Any thoughts? C -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hclust, does order of data matter?
Hello, I am using the hclust function to cluster some data. I have two separate files with the same data. The only difference is the order of the data in the file. For some reason, when I run the two files through the hclust function, I get two completely different results. Does anyone know why this is happening? Does the order of the data matter? Thanks, RC -- View this message in context: http://r.789695.n4.nabble.com/hclust-does-order-of-data-matter-tp3043896p3043896.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.