Re: [R] SQL Server "float" is not handled by RODBC -- Is there a workaround?
ific examples for some of the DBI methods, if you need it. I suspect that what I have written above will cover most "query SQL", though not the "Insert/Update/Delete" stuff. Regards, Mark Dalphin On 16/10/15 10:01, jim holtman wrote: > Mark, > > Thanks for the suggestion. I will have to look into that option. I > assume that if I am running on a 64-bit system, I also have to use the > 64-bit version of Java. We have had some problems in the past because > the company standard is a 32-bit version of Java and we had to also > load in the 64-bit version to work with the XLConnect package. > > I was reading the RJDBC package documention and they seem to list a > lot of methods (e.g., 'dbReadTable'), but don't say what the > parameters are, or what it returns. Where do you find this > information? Also I notice that I probably have to get the JDBC > driver for SQL Server from Microsoft and install that - is that correct? > > I hate to start mixing approaches since we have a large number of > scripts that currently use the RODBC package, but I will try to see if > the approach you proposed do help overcome this problem. > > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > On Thu, Oct 15, 2015 at 4:18 PM, Mark Dalphin <mailto:mdalp...@gmail.com>> wrote: > > Hi Jim, > > No answers over the course of 24 hours so I'll give it a shot. > > First, I always work under Linux, so my answers may well be worthless > for your Windows scenario. > > Second, I don't know if my workaround works as I don't actually have a > SQL Server DB using float. > > Now the workaround: > > I have had many problems in the past using ODBC to connect to > databases. > Nothing I could nail down to a fault in that system, but just no > end of > problems. Some of that, of course, is due to me generally working > under > Linux. > > My general workaround that has been clean is to use JDBC instead. > There > have been hassles at times to set up the RJava, but recent versions of > that have installed very easily. Once RJava is in place (and under > Windows, you'll have fun setting up Java cleanly), then > installation of > a JDBC jar (I use jtds from SourceForge for SQL Server) and finally > RJDBC. The generic nature of the JDBC interface is a joy to work with, > interacting with most database types very well and in a uniform > manner. > > So, lots of work getting JDBC up and going to see if an > alternative path > into your DB gets you your data in a better format. Now you see why I > waited 24 hours to say anything at all ... > > Also, it might be worth while posting on the DB specific maillist: > https://stat.ethz.ch/mailman/listinfo/r-sig-db > > Hope this helps, > Mark Dalphin > > > On 15/10/15 07:23, jim holtman wrote: > > Here is the system I am using: > > = > >> sessionInfo() > > R version 3.2.2 (2015-08-14) > > Platform: x86_64-w64-mingw32/x64 (64-bit) > > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > > > locale: > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > > States.1252 > > [3] LC_MONETARY=English_United States.1252 > > LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] lubridate_1.3.3 RODBC_1.3-12 > > loaded via a namespace (and not attached): > > [1] magrittr_1.5 plyr_1.8.3tools_3.2.2 memoise_0.2.1 > Rcpp_0.12.1 > > stringi_0.5-5 digest_0.6.8 > > [8] stringr_1.0.0 > > > > > > I have data on a SQL Server that I am connecting to where some > of the > > fields are defined as "float" so that the data is stored in the > database as > > an IEEE 754 value. Now when I read this is using RODBC, the > data comes > > across the interface in the floating point format; I used > Wireshark to > > examine the packets that were being sent. Some of the data is > also defined > > as "int" and comes across in binary. > > > > When the data is read in with > > >
Re: [R] SQL Server "float" is not handled by RODBC -- Is there a workaround?
Hi Jim, No answers over the course of 24 hours so I'll give it a shot. First, I always work under Linux, so my answers may well be worthless for your Windows scenario. Second, I don't know if my workaround works as I don't actually have a SQL Server DB using float. Now the workaround: I have had many problems in the past using ODBC to connect to databases. Nothing I could nail down to a fault in that system, but just no end of problems. Some of that, of course, is due to me generally working under Linux. My general workaround that has been clean is to use JDBC instead. There have been hassles at times to set up the RJava, but recent versions of that have installed very easily. Once RJava is in place (and under Windows, you'll have fun setting up Java cleanly), then installation of a JDBC jar (I use jtds from SourceForge for SQL Server) and finally RJDBC. The generic nature of the JDBC interface is a joy to work with, interacting with most database types very well and in a uniform manner. So, lots of work getting JDBC up and going to see if an alternative path into your DB gets you your data in a better format. Now you see why I waited 24 hours to say anything at all ... Also, it might be worth while posting on the DB specific maillist: https://stat.ethz.ch/mailman/listinfo/r-sig-db Hope this helps, Mark Dalphin On 15/10/15 07:23, jim holtman wrote: > Here is the system I am using: > = >> sessionInfo() > R version 3.2.2 (2015-08-14) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 > LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] lubridate_1.3.3 RODBC_1.3-12 > loaded via a namespace (and not attached): > [1] magrittr_1.5 plyr_1.8.3tools_3.2.2 memoise_0.2.1 Rcpp_0.12.1 > stringi_0.5-5 digest_0.6.8 > [8] stringr_1.0.0 > > > I have data on a SQL Server that I am connecting to where some of the > fields are defined as "float" so that the data is stored in the database as > an IEEE 754 value. Now when I read this is using RODBC, the data comes > across the interface in the floating point format; I used Wireshark to > examine the packets that were being sent. Some of the data is also defined > as "int" and comes across in binary. > > When the data is read in with > > df <- sqlQuery(db, "select * from mydb", as.is = TRUE) > > The resulting dataframe has the floating point values as 'chr' and the > integer fields as 'int'; I would have expected the floating point fields to > be 'num'. Now in the "ODBC Connectivity" Vignette by Ripley there was the > comment that "double" data values come back as type 8, but on some systems > they may be type 6; well on SQL Server, "float" is type 6. > > So what appears to happen, is this data is not recognized as a floating > point value and is therefore converted to a character. When the data is > made available to the R script, I then have to convert this back to > floating point. If I use "stringsAsFactors = FALSE" on the query, this > conversion back to floating point will be done within the RODBC package. > This becomes a problem when I have dataframes with several million rows and > multiple columns of numerics is that the conversion to/from characters is > adding time to the processing. > > So I was wondering is there a workaround to this problem? Is it possible > to add the capability to RODBC when processing SQL Server to avoid this > conversion? Or is there some other way around this problem? > > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [SOLVED] Problem building R-2.15.3 from source
I have found a solution to the repeated seg-faults below. If I set environment variables: setenv CFLAGS -O2 setenv FFLAGS -O2 rather than the default -O3, then R builds and "checks" successfully. A few more details about the Debian system on which I have been building: gcc (Debian 4.7.2-5) 4.7.2 My "configure" command is: ./configure \ --prefix=$my_R_path \ --with-readline \ --without-x \ --enable-R-shlib\ --enable-BLAS-shlib \ --with-system-zlib \ --with-system-bzlib \ --with-system-pcre So, I'm good for the time being and hope this helps others who have trouble building from source. Cheers, Mark Mark Dalphin wrote: Hi, I have for many years build R from source for Linux. I have just run into my first problem with this in ... I don't know how long. uname -a Linux douglas 3.2.0-4-686-pae #1 SMP Debian 3.2.46-1 i686 GNU/Linux cat /etc/issue Debian GNU/Linux 7 \n \l The version of R is 2.15.3. I know it is old, but we are in a regulated environment and changes to R versions are painful. I have built R 2.15.3 elsewhere and have it running on multiple Linux boxes around here, both 32-bit and 64-bit; Ubuntu distributions, however, not Debian. This build is on a virtual machine under OpenBox. The host is a 64-bit Debian; the guest is a 32-bit Debian installation. The symptoms are strange (to me). I get segfaults during the byte-compiling phase of libraries. If I re-run 'make', the make proceeds as if it finished the previous seg-faulted step, and then segfaults on the next byte-compile. The "permissions" makes me wonder about file permissions, but the whole 'make' is under my HOME. Furthermore, I have scanned the unpacked tar-gz package for something I don't "own" and it isn't there. I also think segfaults are usually in memory, though I don't know what "permission" I have there (don't I own the RAM I request?). I have attached a section of the 'make' output below, followed by a the next "make" output: -- make[4]: Leaving directory `/home/mdalphin/src/R-2.15.3/src/library/splines' make[4]: Entering directory `/home/mdalphin/src/R-2.15.3/src/library/splines' byte-compiling package 'splines' *** caught segfault *** address 0x403ac3dc, cause 'invalid permissions' Traceback: 1: fun(libname, pkgname) 2: doTryCatch(return(expr), name, parentenv, handler) 3: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 4: tryCatchList(expr, classes, parentenv, handlers) 5: tryCatch(fun(libname, pkgname), error = identity) 6: runHook(".onLoad", env, package.lib, package) 7: loadNamespace(name) 8: doTryCatch(return(expr), name, parentenv, handler) 9: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 10: tryCatchList(expr, classes, parentenv, handlers) 11: tryCatch(loadNamespace(name), error = function(e) stop(e)) 12: getNamespace(ns) 13: asNamespace(pkg) 14: get(name, envir = asNamespace(pkg), inherits = FALSE) 15: compiler:::tryCmpfun 16: .Call("R_lazyLoadDBinsertValue", x[[1L]], file, ascii, compress, hook, PACKAGE = "base") 17: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress, envhook) 18: makeLazyLoadDB(ns, dbbase, compress = compress) 19: code2LazyLoadDB(package, lib.loc = lib.loc, keep.source = keep.source, compress = compress) 20: tools:::makeLazyLoading("splines") aborting ... /bin/bash: line 8: 18709 Doneecho "tools:::makeLazyLoading(\"splines\")" 18710 Segmentation fault | R_COMPILE_PKGS=1 R_COMPILER_SUPPRESS_ALL=1 R_DEFAULT_PACKAGES=NULL LC_ALL=C ../../../bin/R --vanilla --slave > /dev/null make[4]: *** [../../../library/splines/R/splines.rdb] Error 139 make[4]: Leaving directory `/home/mdalphin/src/R-2.15.3/src/library/splines' make[3]: *** [all] Error 2 make[3]: Leaving directory `/home/mdalphin/src/R-2.15.3/src/library/splines' make[2]: *** [R] Error 1 make[2]: Leaving directory `/home/mdalphin/src/R-2.15.3/src/library' make[1]: *** [R] Error 1 make[1]: Leaving directory `/home/mdalphin/src/R-2.15.3/src' make: *** [R] Error 1 --- make[4]: Entering directory `/home/mdalphin/src/R-2.15.3/src/library/splines' make[4]: Nothing to be done for `mklazycomp'. make[4]: Leaving directory `/home/mdalphin/src/R-2.15.3/src/library/splines' make[3]: Leaving directory `/home/mdalphin/src/R-2.15.3/src/library/splines' make[3]: Entering directory `/home/mdalphin/src/R-2.15.3/src/library/stats4' building package 'stats4' mkdir -p -- ../../../library/stats4 make[4]: Entering directory `/hom
[R] Problem building R-2.15.3 from source
ce(name) 8: doTryCatch(return(expr), name, parentenv, handler) 9: tryCatchOne(expr, names, parentenv, handlers[[1L]]) 10: tryCatchList(expr, classes, parentenv, handlers) 11: tryCatch(loadNamespace(name), error = function(e) stop(e)) 12: getNamespace(ns) 13: asNamespace(pkg) 14: get(name, envir = asNamespace(pkg), inherits = FALSE) 15: compiler:::tryCmpfun 16: .Call("R_lazyLoadDBinsertValue", x[[1L]], file, ascii, compress, hook, PACKAGE = "base") 17: lazyLoadDBinsertVariable(vars[i], from, datafile, ascii, compress, envhook) 18: makeLazyLoadDB(ns, dbbase, compress = compress) 19: code2LazyLoadDB(package, lib.loc = lib.loc, keep.source = keep.source, compress = compress) 20: tools:::makeLazyLoading("stats4") aborting ... /bin/bash: line 8: 19554 Doneecho "tools:::makeLazyLoading(\"stats4\")" 19555 Segmentation fault | R_COMPILE_PKGS=1 R_COMPILER_SUPPRESS_ALL=1 R_DEFAULT_PACKAGES="methods,graphics,stats" LC_ALL=C ../../../bin/R --vanilla --slave > /dev/null make[4]: *** [../../../library/stats4/R/stats4.rdb] Error 139 make[4]: Leaving directory `/home/mdalphin/src/R-2.15.3/src/library/stats4' make[3]: *** [all] Error 2 make[3]: Leaving directory `/home/mdalphin/src/R-2.15.3/src/library/stats4' make[2]: *** [R] Error 1 make[2]: Leaving directory `/home/mdalphin/src/R-2.15.3/src/library' make[1]: *** [R] Error 1 make[1]: Leaving directory `/home/mdalphin/src/R-2.15.3/src' make: *** [R] Error 1 -- Mark Dalphin Ph.D. Director of Bioinformatics mark.dalp...@pacificedge.co.nz <mailto:mark.dalp...@pacificedge.co.nz> *Ph:* +64-3-479-5805 *Cell:* +64-21-156-7625 *Skype:* mark.dalphin.pel <http://www.facebook.com/pages/Pacific-Edge/111356775582456> <http://twitter.com/#%21/pacificEdgeLtd> <http://www.youtube.com/PacificEdgeLtd> 87 St David St, PO Box 56, Dunedin, New Zealand 9016www.pacificedge.co.nz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] daylight
I would start with the data. My source for this is the US Navy Sunrise/Sunset tables: http://aa.usno.navy.mil/data/docs/RS_OneYear.php The page is produces is pure text; I've previously extracted the values with a simple Perl script, but would do it today using R (in general, most of the parsing I used to perform in Perl can readily be performed in R). Once I had the data available as a data frame, I'd convert the columns to POSIXct format and then use difftime(). Cheers, Mark bambus wrote: hi there, does anyone know how to calculate the amount of daylight on every day of the year in R? I mean the time between sunrise and sunset. thanks -- View this message in context: http://r.789695.n4.nabble.com/daylight-tp4647213.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Mark Dalphin Ph.D. Director of Bioinformatics mark.dalp...@pacificedge.co.nz <mailto:mark.dalp...@pacificedge.co.nz> *Ph:* +64-3-479-5805 *Cell:* +64-21-156-7625 *Skype:* mark.dalphin.pel <http://www.facebook.com/pages/Pacific-Edge/111356775582456> <http://twitter.com/#%21/pacificEdgeLtd> <http://www.youtube.com/PacificEdgeLtd> 87 St David St, PO Box 56, Dunedin, New Zealand 9016www.pacificedge.co.nz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What makes R different from other programming languages?
I've read several replies to this question already and they seem to have missed the one point that most irritated the Java programmers to whom I tried to teach R. They HATED the "object-oriented" material, both S4 and especially S3, as it did not match the style of OO programming that had been pounded into them. The ones I tried to teach hated S3 and S4 methods so much, that some even refused to learn to learn them on the grounds that they "weren't OO". Now it could easily have been my approach as I was not well equipped at the time to "compare and contrast", never the less, I would approach this aspect carefully as the two approaches are so different. I guess the other aspect which I take the most time to describe to any programmer from other more traditional languages is the working with vectors. To use R effectively, you must move data in large chunks; the standard paradigm of looping over the data is the fastest way to write a slow program. I find it takes a good long while for programmers to make the switch to working with vectors (more than a month of use), but they grasp the concept quickly and like it. Cheers, Mark johannes rara wrote: > My intention is to give a presentation about R programming language > for software developers. I would like to ask, what are the things that > make R different from other programming languages? What are the > specific cases where Java/C#/Python developer might say "Wow, that was > neat!"? What are the things that are easy in R, but very difficult in > other programming languages (like Java)? > > Thanks, > -J > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Not able to write to PostgreSQL database using "dbWriteTable"
I still don't have any idea about your schema [eg CREATE TABLE (blah, blah, ...);], but I guess you don't have the right database type for "id" when you are storing a date. In PostgreSQL: CREATE TABLE myTable ( id INTEGER PRIMARY KEY, aDate TIMESTAMP ); In R: dbGetQuery(conn, paste("INSERT INTO myTable (id, aDate)\n", "VALUES (1, '2012-06-14 11:18:36');\n", sep='')) All untested. If you want/need to use the sprintf() form, then just wrap the time variable in single quotes: sql <- sprintf("INSERT INTO myTable (id, aDate)\nVALUES (%d, '%s');\n", 1, '2011-06-14 11:18:36'); dbGetQuery(conn, sql) Mark Prakash Thomas wrote: Dear R User's, Thank you, Mark. The following code suggested by you worked for me. dbGetQuery(connAE1, sprintf("INSERT INTO test1 (id) VALUES ( %d );", i)) But I have a issue in passing "date and time data" as variable.If I hard code the value like bellow it workings. dbGetQuery(connAE1, sprintf("INSERT INTO test1 (id) VALUES ( %s );", ,'\'2012-06-10 16:36:00+05:30\'')) Can some body please help me with the code where I need to read from a variable(i) which has data & time (2012-06-10 16:36:00+05:30). R is throwing error for space as shown in output bellow **Console code & output*** > if(dbExistsTable(connAE1, "test1")){ + dbGetQuery(connAE1, sprintf("INSERT INTO test1 (id) VALUES ( %s );", i)) + } Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not Retrieve the result : ERROR: syntax error at or near "16" LINE 1: INSERT INTO test1 (id) VALUES ( 2012-06-10 16:36:00+05:30 ); ^ ) NULL * Thanks & Regards, Thomas On Wed, Jun 13, 2012 at 2:25 AM, Mark Dalphin < mark.dalp...@pacificedge.co.nz> wrote: I just tested your code and I _think_ you have a misconception about dbWriteTable(). Your code has some oddities so I am only guessing; for example, what is "zz" and why is it in this snippet? In the absence of information on the database TABLE, it is even harder to guess what you are doing, but I guess you are trying to use dbWriteTable to add a small amount of data to an existing table since previously you select from a similarly named table, "test1". The dbWriteTable function is writing to the table called "test1.id" not to "test1, column id". If you check your PostgreSQL schema, you will see that you have created a new table called "test1.id" (which you will be required to quote to remove as the DOT is an operator: DROP TABLE "test1.id";). I think you are trying to add a new row to the existing database table. Try using (untested): dbGetQuery(connAE1, sprintf("INSERT INTO test1 (id) VALUES ( %d );", i)) and you will find things go better, assuming I grasped the problem you are having correctly. Regards, Mark Dalphin Prakash Thomas wrote: Dear R User's Please help me to debug this issue. I am trying to write some data ( i= 6) to PostgreSQL database, but it not writing. Is there any issue in the way I use "dbWriteTable"? ++ Source Code library("DBI") library("RPostgreSQL") drv1 <- dbDriver("PostgreSQL") i=6 connAE1 <- dbConnect(drv1,host = "xx.xxx.xxx.xxx", port = "6443", dbname="DB",user = "x",password = "xxx") as.data.frame(zz[1]) dbGetQuery(connAE1,'SELECT id FROM \"test1\"') if(dbExistsTable(connAE1, "test1")){ dbWriteTable(con=connAE1,name=**'test1.id',value=as.data.** frame(i),row.names=T ,overwrite=F ,append=T) } dbDisconnect(connAE1) dbUnloadDriver(drv1) ++** Following is the past of the console Log for your Reference ++ console log dbGetQuery(connAE1,'SELECT id FROM \"test1\"') id 1 1 2 2 if(dbExistsTable(connAE1, "test1")){ + dbWriteTable(con=connAE1,name=**'test1.id',value=as.data.** frame(i),row.names=T ,overwrite=F ,append=T) + [TRUNCATED] id 1 1 2 2 ++**+ Thanks & Regards, Thomas [[alternative HTML version deleted]] __** R-help@r-project.org mailing list https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/l
Re: [R] Not able to write to PostgreSQL database using "dbWriteTable"
I just tested your code and I _think_ you have a misconception about dbWriteTable(). Your code has some oddities so I am only guessing; for example, what is "zz" and why is it in this snippet? In the absence of information on the database TABLE, it is even harder to guess what you are doing, but I guess you are trying to use dbWriteTable to add a small amount of data to an existing table since previously you select from a similarly named table, "test1". The dbWriteTable function is writing to the table called "test1.id" not to "test1, column id". If you check your PostgreSQL schema, you will see that you have created a new table called "test1.id" (which you will be required to quote to remove as the DOT is an operator: DROP TABLE "test1.id";). I think you are trying to add a new row to the existing database table. Try using (untested): dbGetQuery(connAE1, sprintf("INSERT INTO test1 (id) VALUES ( %d );", i)) and you will find things go better, assuming I grasped the problem you are having correctly. Regards, Mark Dalphin Prakash Thomas wrote: Dear R User's Please help me to debug this issue. I am trying to write some data ( i= 6) to PostgreSQL database, but it not writing. Is there any issue in the way I use "dbWriteTable"? ++ Source Code library("DBI") library("RPostgreSQL") drv1 <- dbDriver("PostgreSQL") i=6 connAE1 <- dbConnect(drv1,host = "xx.xxx.xxx.xxx", port = "6443", dbname="DB",user = "x",password = "xxx") as.data.frame(zz[1]) dbGetQuery(connAE1,'SELECT id FROM \"test1\"') if(dbExistsTable(connAE1, "test1")){ dbWriteTable(con=connAE1,name='test1.id',value=as.data.frame(i),row.names=T ,overwrite=F ,append=T) } dbDisconnect(connAE1) dbUnloadDriver(drv1) ++ Following is the past of the console Log for your Reference ++ console log dbGetQuery(connAE1,'SELECT id FROM \"test1\"') id 1 1 2 2 if(dbExistsTable(connAE1, "test1")){ + dbWriteTable(con=connAE1,name='test1.id',value=as.data.frame(i),row.names=T ,overwrite=F ,append=T) + [TRUNCATED] id 1 1 2 2 +++ Thanks & Regards, Thomas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Mark Dalphin Ph.D. Director of Bioinformatics mark.dalp...@pacificedge.co.nz <mailto:mark.dalp...@pacificedge.co.nz> *Ph:* +64-3-479-5805 *Cell:* +64-21-156-7625 *Skype:* mdalphin <http://www.facebook.com/pages/Pacific-Edge/111356775582456> <http://twitter.com/#%21/pacificEdgeLtd> <http://www.youtube.com/PacificEdgeLtd> 87 St David St, PO Box 56, Dunedin, New Zealand 9016www.pacificedge.co.nz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Way OT: Anyone know where to get data on relationship between education and salary
Hi Paul, I don't have a good answer to your request, but you might find something looking around the US Department of Labor's site: "Overview of BLS Wage Data by Area and Occupation" (BLS = Bureau of Labor Statistics). http://www.bls.gov/bls/blswage.htm A quick glance doesn't show me "education" associated with these data, but I am not looking hard. Most of the rest of the information you want appears to be present. Regards, Mark Dalphin Paul wrote: I'm sorry for the way OT post, but here goes. I'm an informatics specialist, and R user. My wife is a secondary school maths teacher. My wife recently tried to explain to her class the link between education and potential salary, and I would love to be able to show this graphically, however, I cannot find any freely available data for this. Does anyone know of a suitable dataset, or where I might find one ?Ideally salary, maximum education level, age, sex, industry and some form of geographic location would be amazing. Alternatively, are you aware of any public organisations that would have this information and might divulge it under the freedom of information act ? Thanks in advance Paul. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems drawing a colored 'rug' in the Lattice 'densityplot'
Thanks Phil, That is exactly what I was looking for. Regards, Mark Phil Spector wrote: Mark - If I understand what you want, it can be done with a custom panel function: mypanel = function(x,subscripts,groups,...){ panel.densityplot(x,plot.points=FALSE,groups=groups,subscripts=subscripts,...) panel.rug(x,col=trellis.par.get('superpose.line')$col[groups[subscripts]]) } Then I think you'll get the result you want if you use densityplot(~Value|Type, group=Category, data=d, panel=mypanel) - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spec...@stat.berkeley.edu On Thu, 16 Dec 2010, Mark Dalphin wrote: Hi All, I'm trying to add a 'rug' representation of my data to a plot created with densityplot(). While I can do this in the simple case, I can't do it properly when I include the "groups" argument. I have an example below. I am running a reasonably new version of R. print(sessionInfo()) R version 2.12.0 Patched (2010-11-07 r53537) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_NZ.UTF-8 LC_NUMERIC=C LC_TIME=en_NZ.UTF-8 LC_COLLATE=en_NZ.UTF-8[5] LC_MONETARY=C LC_MESSAGES=en_NZ.UTF-8LC_PAPER=en_NZ.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lattice_0.19-13 R code to show the problem: ## Setup - load package, set random seed & create fake dataset library(lattice) set.seed(1234) d <- data.frame(Type=rep(LETTERS[1:4], times=250), Category=rep(LETTERS[22:26], times=200), Value=c(rnorm(500), rnorm(300, 0.5), rnorm(200, 1)) ) ## Basic "densityplot" using 'points' to show the data densityplot(~Value|Type, data=d) ## And I can plot a 'rug' for the simple density plot densityplot(~Value|Type, data=d, plot.points='rug') ## Now add a "groups" selector to show sub-grouping of data by 'Category' ## Note: the data points are in color densityplot(~Value|Type, group=Category, data=d) ## Finally, with the groups, and with a rug. ## Note: no color for the rug densityplot(~Value|Type, group=Category, data=d, plot.points='rug') So, I can draw a rug (which is an improvement over version 2.9.1 of R when I got no rug), however, the color associated with the 'group' doesn't seem to propagate through to the rug. Is there something I am doing wrong here or is this a bug? Anyone have suggestions to work around this? Regards, Mark -- -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problems drawing a colored 'rug' in the Lattice 'densityplot'
Hi All, I'm trying to add a 'rug' representation of my data to a plot created with densityplot(). While I can do this in the simple case, I can't do it properly when I include the "groups" argument. I have an example below. I am running a reasonably new version of R. print(sessionInfo()) R version 2.12.0 Patched (2010-11-07 r53537) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_NZ.UTF-8 LC_NUMERIC=C LC_TIME=en_NZ.UTF-8LC_COLLATE=en_NZ.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_NZ.UTF-8LC_PAPER=en_NZ.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_NZ.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lattice_0.19-13 R code to show the problem: ## Setup - load package, set random seed & create fake dataset library(lattice) set.seed(1234) d <- data.frame(Type=rep(LETTERS[1:4], times=250), Category=rep(LETTERS[22:26], times=200), Value=c(rnorm(500), rnorm(300, 0.5), rnorm(200, 1)) ) ## Basic "densityplot" using 'points' to show the data densityplot(~Value|Type, data=d) ## And I can plot a 'rug' for the simple density plot densityplot(~Value|Type, data=d, plot.points='rug') ## Now add a "groups" selector to show sub-grouping of data by 'Category' ## Note: the data points are in color densityplot(~Value|Type, group=Category, data=d) ## Finally, with the groups, and with a rug. ## Note: no color for the rug densityplot(~Value|Type, group=Category, data=d, plot.points='rug') So, I can draw a rug (which is an improvement over version 2.9.1 of R when I got no rug), however, the color associated with the 'group' doesn't seem to propagate through to the rug. Is there something I am doing wrong here or is this a bug? Anyone have suggestions to work around this? Regards, Mark -- __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Source awareness?
Well, Bill, at risk of embarrassing myself, in my case, I have gone this sort of route out of plain wrong thinking. I took way too many years to learn about R packages; it seemed that hurdle was too high (it wasn't really) and so I followed the kinds of tricks that I needed to do in other languages. Sourcing sets of scripts was one of those tricks. Today, in most cases, I have found that when I travel this thinking path, I need to re-think and set up a package to do the work. That said, I have on occasion dropped projects that I couldn't figure out a clean way to do within R. In those cases, it was simply easier to do in Perl, Python or even csh. Still, because R is so powerful, I find myself doing more and more of the basic parsing and processing within R and this leads to models of processing more similar to that using in those other languages. For example, making a "filter" (in the Unix sense) out of R turns out to be challenging to write in a concise manner ('littler' solves this, but I have had some stability issues with it); I've standardised my filters now based on a generic Makefile, generic csh-script which is processed by sed, plus the actual R-code. It isn't really pretty, but it is smooth now. It didn't fit the standard R model of data and code in one directory, however, so it became a bit uglier. The specific question below could be used to set up built in test functions, in a manner similar to what is done in Python. That is probably not the way to build in unit tests in R, but it is similar to the way I did it for years in other languages. Another place that had caused me trouble is when I have my data stored in one location (I work at a company where the laboratory deposits large data sets in a predefined set of locations and I don't have write access there) and my analysis scripts are in another location which is part of a CVS source tree. If someone else checks out my scripts from CVS, they need to run properly on data in the fixed location, regardless of where they started from, yet produce their results in script, not the data directory. We can all think of ways to make that work, however, one possible approach involves the script "knowing" where it is. I'm not saying that any of this is "right", merely speculating on the cause of people thinking about "source awareness" as I have gone that route myself. And in some cases, as I said, it was wrong, and in other cases, less well defined (as I can't think of the specifics now), it was the only way I could find to make some code work (as I couldn't think of a solution, that code doesn't exist); this was usually code that was to be deployed for a user to run as a command line tool on a data set. One particular use I would like for the path to the R script is for "reproducible research". I currently log many aspects of any particular processing run with "sessionInfo()" and other tools. However, the only way I have to record the actual script name is via CVS and a string within R, something like "$Id$" or "$RCSfile$". I usually end up processing that thru a 'gsub' to strip out the '$' so that the log file, which is stored in CVS as well doesn't get updated further. It is easy to have multiple versions of a script for processing some data; knowing the script name and directory path can help in logging what was done with some data. Regards, Mark William Dunlap wrote: Over the years I've seen lots of requests concerning how to conveniently call scripts from other scripts. The S (R & S+) language is oriented towards functions, not scripts (or macros), and many of the requests are for things easy to do in functions (or packages of functions) but not in scripts. Some would be easier if one used a package of scripts (built with the usual R package building tools). I'd like to know from people who do this sort of thing what pushes them toward using sets of scripts instead of functions. I can think of several possible reasons but would like to hear from people who actually do this sort of thing. E.g., is the clarity and concreteness of a script the important thing? Is it difficult to make a package of functions? Is it that people are used to another language where scripts or macros are the preferred way to go? Or are other reasons? Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Ralf B Sent: Wednesday, October 06, 2010 8:50 AM To: r-help Mailing List Subject: [R] Source awareness? Here the general (perhaps silly question) first: Is it possible for a script to find out if it was sourced by another script or run directly? Here a small example with two scripts: # script A print ("This is script A") # script B source("C:/scriptA.R") print ("This is script B") I would like to modify script A in a way so that it only outputs 'This is script A' if it was called di
[R] Problem using with panel.average in Lattice package
Hi, I'm having a problem getting the panel.average function to work as I expect it to in a lattice plot. I wish to draw lines between the averages of groups of y-values at specific x-values. I have created a dataset below which is similar to my real data. I also show an example of using panel.loess in place of panel.average; it performs in a manner similar to what I want panel.average to do except it shows a loess line rather than a straight line connecting the means of the groups. Please see my coded examples, below. Regards, Mark Dalphin = My system information: library(lattice) print(sessionInfo()) R version 2.9.1 (2009-06-26) i686-pc-linux-gnu locale: LC_CTYPE=en_NZ.UTF-8;LC_NUMERIC=C;LC_TIME=en_NZ.UTF-8;LC_COLLATE=en_NZ.UTF-8; LC_MONETARY=C;LC_MESSAGES=en_NZ.UTF-8;LC_PAPER=en_NZ.UTF-8;LC_NAME=C; LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_NZ.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] lattice_0.17-25 loaded via a namespace (and not attached): [1] grid_2.9.1 tools_2.9.1 ##-- ## This dataset is too complicated, but it does show the type of plot I want. ## ## Create a fake qPCR dataset: Eight 96-well plates over 4 days (2 per day), ## 2 genes per plate (multiplexed), and 4 "Hi" positive control and ## 4 "Lo" positive controls per plate. ## Create the experimental data; by rights it is all identical, expect for ## experimental errors with in days and between days. ## For this simulation, each gene will be given a base value. ## In qPCR the higher the "Ct" value, the lower the concentration. library(lattice) # Add for ease of cut-n-paste of this code date <- c('2009-09-07', '2009-09-08', '2009-09-10', '2009-09-14') probe <- c('Gene.A1', 'Gene.A2', 'Gene.B1', 'Gene.B2') conc <- c('Lo', 'Hi') base.lo <- c(Gene.A1=29, Gene.A2=25, Gene.B1=28, Gene.B2=31) base.hi <- base.lo - 8 day.err <- c(Day.1=0, Day.2=1, Day.3=1.5, Day.4=1.0) d <- data.frame() for(i in seq(along=date)) { for(j in seq(along=probe)) { for(k in seq(along=conc)) { d <- rbind(d, data.frame(Date=rep(date[i], length=4), Probe=rep(probe[j], length=4), Conc=rep(conc[k], length=4), Ct=rnorm(4, sd=0.5) + (k-1)*8 + base.hi[j] + day.err[i] )) } } } d$Date <- as.POSIXct(d$Date) ##-- ## Example 1 ## Print with LOESS line showing the 'means' for the groups. ## This is close, but I don't want a loess line; I want straight lines ## between mean values. print(xyplot(Ct ~ Date|Probe, group=Conc, data=d, panel="panel.superpose", panel.groups=function(x, y, ...) { panel.loess(x, y, ...) panel.xyplot(x, y, ...) }, auto.key=TRUE)) ##-- ## Example 2 ## Parallel construction to the loess example, above. ## Note the loss of the lines. The 'horizontal' default ## is different between 'panel.loess' and 'panel.average'. print(xyplot(Ct ~ Date|Probe, group=Conc, data=d, panel="panel.superpose", panel.groups=function(x, y, ...) { panel.average(x, y, horizontal=FALSE, ...) panel.xyplot(x, y, ...) }, auto.key=TRUE)) ##-- ## Example 3 ## Don't pass along the '...' to the panel.average. Now I ## get lines, but not matching colours to the points. print(xyplot(Ct ~ Date|Probe, group=Conc, data=d, panel="panel.superpose", panel.groups=function(x, y, ...) { panel.average(x, y, horizontal=FALSE) panel.xyplot(x, y, ...) }, auto.key=TRUE)) ##** Main question: I want to create a plot that looks like Example 3, but with the coloured lines of Example 1. Suggestions? I've looked in RSiteSearch() for both "panel.average" and "panel.linejoin" but found nothing addressing this. Side question: I also read the source code to panel.average, panel.loess and panel.superpose. Which leads to a side question; how do I determine what parameters are being passed within '...'? I tried recreating my panel.groups function above as an