Re: [R] sqldf returns Error: database or disk is full
The simple interpretation is that you have run out of space and it takes sqldf 30 minutes to work through the error process and then return the error. A reason for sql is the more compact representation of a sparse matrix. Changing it into a data frame may result in a data structure with many empty cells that consumes a great deal more memory. -Original Message- From: R-help On Behalf Of Ian Worthington via R-help Sent: Monday, June 27, 2022 5:06 PM Cc: R-help Subject: Re: [R] sqldf returns Error: database or disk is full [External Email] Well, iiuc the df is copied to the db early on in the process. This is constant with the temporarily file growing rapidly in size at the start of the process and then not challenging for 30 minutes until the error message is produced. I have no idea what sqldf is doing under the hood. Sent from Yahoo Mail on Android On Mon, 27 Jun 2022 at 23:00, Ebert,Timothy Aaron wrote: I would try answering these questions: 1) What is the expected size? 2) Is the current size smaller or larger? 3) Every program running on your computer takes up memory. Given all the other programs currently running, do you have space? Maybe someone can help by suggesting a way to get R to confess how much memory it has available given current system resources. Tim -Original Message- From: R-help On Behalf Of Bert Gunter Sent: Monday, June 27, 2022 1:44 PM To: Ian Worthington Cc: R-help Subject: Re: [R] sqldf returns Error: database or disk is full [External Email] You should probably heed the warning messages you received before doing anything further. Bert On Mon, Jun 27, 2022, 2:55 PM Ian Worthington via R-help < r-help@r-project.org> wrote: > I'm using sqldf() to execute a query using a named dbname. It appears > to create an external db but after 30 minutes or so returns: > > Error: database or disk is full > The disk is not full, so have I hit some internal db size limit? The > external db created is about 610MB and doesn't grow, so maybe it's > running out of room for the result set, where ever that is stored? > > Code being executed is: > sqlCmds <- c(#"create index ixsip on ipTrace100(source)", > "create index ixdip on ipTrace100(destination)", > #"create index ixspo on ipTrace100(SrcPo)", > "create index ixdpo on ipTrace100(DstPo)", > #"create index ixno on ipTrace100(`No.`)", > # "explain query plan > "select tr1.`No.` InNo, tr1.Time InTime, tr1.source > InSource, /*tr1.destination InDest,*/ tr1.SrcPo InSrcPo, tr1.DstPo InDstPo, > min(tr2.`No.`) OutNo, tr2.Time OutTime /*, >tr2.source OutSource,*/ /*tr2.destination OutDest,*/ /*tr2.SrcPo >OutSrcPo*/ /*, tr2.DstPo OutDstPo */ > from main.ipTrace100 tr1 > join main.ipTrace100 tr2 > on tr1.destination = '10.27.187.1' and > tr1.source = tr2.destination and > tr1.SrcPo = tr2.DstPo and > tr1.`No.` < tr2.`No.` > group by tr1.`No.`" > ) > (tempfile <- tempfile()) > system.time( sqldf(sqlCmds, dbname=tempfile, verbose=TRUE) ) and >console log: > > | > | | > > > system.time( sqldf(sqlCmds, dbname=tempfile, verbose=TRUE) ) > sqldf: library(RSQLite) > sqldf: m <- dbDriver("SQLite") > sqldf: connection <- dbConnect(m, dbname = > "C:\Users\PIWN01~1\AppData\Local\Temp\RtmpSycE1E\file78e464e72860") > sqldf: initExtension(connection) > sqldf: dbWriteTable(connection, 'ipTrace100', ipTrace100, row.names = > FALSE) > sqldf: dbGetQuery(connection, 'create index ixdip on > ipTrace100(destination)') > Warning in result_fetch(res@ptr, n = n) : > SQL statements must be issued with dbExecute() or dbSendStatement() >instead of dbGetQuery() or dbSendQuery(). > sqldf: dbGetQuery(connection, 'create index ixdpo on > ipTrace100(DstPo)') Warning in result_fetch(res@ptr, n = n) : > SQL statements must be issued with dbExecute() or dbSendStatement() >instead of dbGetQuery() or dbSendQuery(). > sqldf: dbGetQuery(connection, 'select tr1.`No.` InNo, tr1.Time > InTime, tr1.source InSource, /*tr1.destination InDest,*/ tr1.SrcPo >InSrcPo, tr1.DstPo InDstPo, > min(tr2.`No.`) OutNo, tr2.Time OutTime /*, >tr2.source OutSource,*/ /*tr2.destination OutDest,*/ /*tr2.SrcPo >OutSrcPo*/ /*, tr2.DstPo OutDstPo */ > from main.ipTrace100 tr1 > join main.ipTrace100 tr2 > on tr1.destination = '10.27.187.1' and > tr1.source = tr2.destination and > tr1.SrcPo = tr2.Ds
Re: [R] sqldf and number of records affected
You're correct. It does work. I was looking at some other result printed. My bad. Looks like we can also get the same result using 'SELECT changes()'. Approach 1: con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) sqldf() suppressWarnings(sqldf(c(" update con set V1 = 0 where V1 > 5 ", "select changes()"))) sqldf("select * from main.con") sqldf() Approach 2: con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) sqldf() suppressWarnings(sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 5 "))) sqldf("select * from main.con") sqldf() -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Thursday, June 11, 2020 10:30 AM To: Ravi Jeyaraman Cc: r-help@r-project.org Subject: Re: [R] sqldf and number of records affected There is no real difference between your example and the example I provided. Both use a data.frame in R and both work for me under R 3.5 with RSQLite 2.2.0 See log below. Note that there is a bug in R 4.0 related to tcltk that could possibly affect sqldf as it uses tcltk. A fix has been announced for R 4.0.2. > library(sqldf) Loading required package: gsubfn Loading required package: proto Loading required package: RSQLite > con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) > sqldf() Path: :memory: Extensions: TRUE > sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > > 5 ")) rows updated 15 Warning message: In result_fetch(res@ptr, n = n) : SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery(). > ans <- sqldf("select * from main.con") > sqldf() NULL > ans V1 1 1 2 2 3 3 4 4 5 5 6 0 7 0 8 0 9 0 10 0 > R.version.string [1] "R version 3.5.3 (2019-03-11)" > packageVersion("sqldf") [1] ‘0.4.11’ > packageVersion("RSQLite") [1] ‘2.2.0’ > packageVersion("DBI") [1] ‘1.1.0’ On Thu, Jun 11, 2020 at 10:06 AM Ravi Jeyaraman wrote: > > Thanks for the response Gabor. Looks like the below example will work when > using SQLite, but in my case I'm just creating a dataframe in R and trying to > update it using sqldf as below and it doesn't seem to work ... > > con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) > sqldf() > sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > > 5 ")) ans <- sqldf("select * from main.con") > sqldf() > > -Original Message- > From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] > Sent: Thursday, June 11, 2020 9:12 AM > To: Ravi Jeyaraman > Cc: r-help@r-project.org > Subject: Re: [R] sqldf and number of records affected > > Here is an example. Ignore the warning or use the workaround > discussed here > https://github.com/ggrothendieck/sqldf/issues/40 > to avoid the warning. > > library(sqldf) > sqldf() # use same connection until next sqldf() > sqldf(c("pragma count_changes = 1", "update BOD set demand = 99 where Time > > 4")) > sqldf("select * from main.BOD") > sqldf() > > > On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman wrote: > > > > Hello all, When I execute a SQL using SQLDF, how do I get the number > > of records affected? I mean, if I run an UPDATE on a data frame, it > > doesn't tell me if and how many records got updated. I've read > > through the documentation and there don't seem to be a way to get > > this info unless it's done on a database. Any ideas? > > > > Thanks > > Ravi > > > > > > -- > > This email has been checked for viruses by AVG. > > https://www.avg.com > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > > > -- > This email has been checked for viruses by AVG. > https://www.avg.com > -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf and number of records affected
There is no real difference between your example and the example I provided. Both use a data.frame in R and both work for me under R 3.5 with RSQLite 2.2.0 See log below. Note that there is a bug in R 4.0 related to tcltk that could possibly affect sqldf as it uses tcltk. A fix has been announced for R 4.0.2. > library(sqldf) Loading required package: gsubfn Loading required package: proto Loading required package: RSQLite > con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) > sqldf() Path: :memory: Extensions: TRUE > sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 5 ")) rows updated 15 Warning message: In result_fetch(res@ptr, n = n) : SQL statements must be issued with dbExecute() or dbSendStatement() instead of dbGetQuery() or dbSendQuery(). > ans <- sqldf("select * from main.con") > sqldf() NULL > ans V1 1 1 2 2 3 3 4 4 5 5 6 0 7 0 8 0 9 0 10 0 > R.version.string [1] "R version 3.5.3 (2019-03-11)" > packageVersion("sqldf") [1] ‘0.4.11’ > packageVersion("RSQLite") [1] ‘2.2.0’ > packageVersion("DBI") [1] ‘1.1.0’ On Thu, Jun 11, 2020 at 10:06 AM Ravi Jeyaraman wrote: > > Thanks for the response Gabor. Looks like the below example will work when > using SQLite, but in my case I'm just creating a dataframe in R and trying to > update it using sqldf as below and it doesn't seem to work ... > > con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) > sqldf() > sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 5 ")) > ans <- sqldf("select * from main.con") > sqldf() > > -Original Message- > From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] > Sent: Thursday, June 11, 2020 9:12 AM > To: Ravi Jeyaraman > Cc: r-help@r-project.org > Subject: Re: [R] sqldf and number of records affected > > Here is an example. Ignore the warning or use the workaround discussed here > https://github.com/ggrothendieck/sqldf/issues/40 > to avoid the warning. > > library(sqldf) > sqldf() # use same connection until next sqldf() > sqldf(c("pragma count_changes = 1", "update BOD set demand = 99 where Time > > 4")) > sqldf("select * from main.BOD") > sqldf() > > > On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman wrote: > > > > Hello all, When I execute a SQL using SQLDF, how do I get the number > > of records affected? I mean, if I run an UPDATE on a data frame, it > > doesn't tell me if and how many records got updated. I've read > > through the documentation and there don't seem to be a way to get this > > info unless it's done on a database. Any ideas? > > > > Thanks > > Ravi > > > > > > -- > > This email has been checked for viruses by AVG. > > https://www.avg.com > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > > > -- > This email has been checked for viruses by AVG. > https://www.avg.com > -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf and number of records affected
Thanks for the response Gabor. Looks like the below example will work when using SQLite, but in my case I'm just creating a dataframe in R and trying to update it using sqldf as below and it doesn't seem to work ... con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) sqldf() sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 5 ")) ans <- sqldf("select * from main.con") sqldf() -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Thursday, June 11, 2020 9:12 AM To: Ravi Jeyaraman Cc: r-help@r-project.org Subject: Re: [R] sqldf and number of records affected Here is an example. Ignore the warning or use the workaround discussed here https://github.com/ggrothendieck/sqldf/issues/40 to avoid the warning. library(sqldf) sqldf() # use same connection until next sqldf() sqldf(c("pragma count_changes = 1", "update BOD set demand = 99 where Time > 4")) sqldf("select * from main.BOD") sqldf() On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman wrote: > > Hello all, When I execute a SQL using SQLDF, how do I get the number > of records affected? I mean, if I run an UPDATE on a data frame, it > doesn't tell me if and how many records got updated. I've read > through the documentation and there don't seem to be a way to get this > info unless it's done on a database. Any ideas? > > Thanks > Ravi > > > -- > This email has been checked for viruses by AVG. > https://www.avg.com > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com -- This email has been checked for viruses by AVG. https://www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf and number of records affected
Here is an example. Ignore the warning or use the workaround discussed here https://github.com/ggrothendieck/sqldf/issues/40 to avoid the warning. library(sqldf) sqldf() # use same connection until next sqldf() sqldf(c("pragma count_changes = 1", "update BOD set demand = 99 where Time > 4")) sqldf("select * from main.BOD") sqldf() On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman wrote: > > Hello all, When I execute a SQL using SQLDF, how do I get the number of > records affected? I mean, if I run an UPDATE on a data frame, it doesn't > tell me if and how many records got updated. I've read through the > documentation and there don't seem to be a way to get this info unless it's > done on a database. Any ideas? > > Thanks > Ravi > > > -- > This email has been checked for viruses by AVG. > https://www.avg.com > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf and number of records affected
Hello all, When I execute a SQL using SQLDF, how do I get the number of records affected? I mean, if I run an UPDATE on a data frame, it doesn't tell me if and how many records got updated. I've read through the documentation and there don't seem to be a way to get this info unless it's done on a database. Any ideas? Thanks Ravi -- This email has been checked for viruses by AVG. https://www.avg.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf --Warning message:
sqldf does not use Tk so you can ignore this. On Fri, Feb 19, 2016 at 12:32 PM, Divakar Reddywrote: > Dear R users, > > I'm getting Waring message while trying to load "sqldf" package in R3.2.3 > and assuming that we can ignore this as it's WARNING Message and not an > error message. > Can you guide me if my assumption is wrong? > > >> library(sqldf); > Loading required package: gsubfn > Loading required package: proto > Loading required package: RSQLite > Loading required package: DBI > Warning message: > no DISPLAY variable so Tk is not available > >> version _ > platform x86_64-redhat-linux-gnu > version.string R version 3.2.3 (2015-12-10) >> > > Thanks, > Divakar > Phoenix,USA > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf --Warning message:
Dear R users, I'm getting Waring message while trying to load "sqldf" package in R3.2.3 and assuming that we can ignore this as it's WARNING Message and not an error message. Can you guide me if my assumption is wrong? > library(sqldf); Loading required package: gsubfn Loading required package: proto Loading required package: RSQLite Loading required package: DBI Warning message: no DISPLAY variable so Tk is not available > version _ platform x86_64-redhat-linux-gnu version.string R version 3.2.3 (2015-12-10) > Thanks, Divakar Phoenix,USA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf() difference between R 3.1.2 and 3.0.1
That seems to have worked, both in the new and old version of R. I'll do more unit testing on other files. Thank you, Gabor. -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Wednesday, February 11, 2015 10:22 AM To: Doran, Harold Cc: r-help@r-project.org Subject: Re: [R] sqldf() difference between R 3.1.2 and 3.0.1 On Wed, Feb 11, 2015 at 9:45 AM, Doran, Harold hdo...@air.org wrote: I have a function written and tested using R 3.0.1 and sqldf_0.4-7.1 that works perfectly. However, using this same code with R 3.1.2 and sqldf_0.4-10 yields the error below that I am having a difficult time deciphering. Hence, same code behaves differently on different versions of R and sqldf(). Error in sqliteSendQuery(con, statement, bind.data) : error in statement: no such column: V1 Reproducible example below as well as complete sessionInfo all provided below. My function and code using the function are below. dorReader - function(dorFile, layout, sepChar = '\n'){ sepChar - as.character(sepChar) dorFile - as.character(dorFile) layout$type2 - ifelse(layout$type == 'C', 'character', ifelse(layout$type == 'N', 'numeric', 'Date')) dor - file(dorFile) attr(dor, file.format) - list(sep = sepChar) getVars - paste(select, paste(substr(V1, , layout$Start, , , layout$Length, ) ', layout$Variable.Name, ', collapse = , ), from dor) dat - sqldf(getVars) classConverter - function(obj, types){ out - lapply(1:length(obj),FUN = function(i){FUN1 - switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor, Date = as.character); FUN1(obj[,i])}) names(out) - colnames(obj) as.data.frame(out) } dat - classConverter(dat, layout$type2) names(dat) - layout$Variable.Name dat } ### contents of fwf file 'sample.txt' 1234567 1234567 1234567 1234567 1234567 1234567 1234567 1234567 layout - data.frame(Variable.Name =c('test1', 'test2'), Length = c(3,4), Start =c(1,4), End = c(3,7), type = c('N', 'N')) tmp - dorReader('sample.txt', layout) sqldf is documented to use the sqliteImportFile defaults for file.format components. It may be that RSQLite 1.0 has changed the default for header in sqliteImportFile. Try replacing your statement that sets file.format with this: attr(dor, file.format) - list(sep = sepChar, header = FALSE) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf() difference between R 3.1.2 and 3.0.1
On Wed, Feb 11, 2015 at 9:45 AM, Doran, Harold hdo...@air.org wrote: I have a function written and tested using R 3.0.1 and sqldf_0.4-7.1 that works perfectly. However, using this same code with R 3.1.2 and sqldf_0.4-10 yields the error below that I am having a difficult time deciphering. Hence, same code behaves differently on different versions of R and sqldf(). Error in sqliteSendQuery(con, statement, bind.data) : error in statement: no such column: V1 Reproducible example below as well as complete sessionInfo all provided below. My function and code using the function are below. dorReader - function(dorFile, layout, sepChar = '\n'){ sepChar - as.character(sepChar) dorFile - as.character(dorFile) layout$type2 - ifelse(layout$type == 'C', 'character', ifelse(layout$type == 'N', 'numeric', 'Date')) dor - file(dorFile) attr(dor, file.format) - list(sep = sepChar) getVars - paste(select, paste(substr(V1, , layout$Start, , , layout$Length, ) ', layout$Variable.Name, ', collapse = , ), from dor) dat - sqldf(getVars) classConverter - function(obj, types){ out - lapply(1:length(obj),FUN = function(i){FUN1 - switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor, Date = as.character); FUN1(obj[,i])}) names(out) - colnames(obj) as.data.frame(out) } dat - classConverter(dat, layout$type2) names(dat) - layout$Variable.Name dat } ### contents of fwf file 'sample.txt' 1234567 1234567 1234567 1234567 1234567 1234567 1234567 1234567 layout - data.frame(Variable.Name =c('test1', 'test2'), Length = c(3,4), Start =c(1,4), End = c(3,7), type = c('N', 'N')) tmp - dorReader('sample.txt', layout) sqldf is documented to use the sqliteImportFile defaults for file.format components. It may be that RSQLite 1.0 has changed the default for header in sqliteImportFile. Try replacing your statement that sets file.format with this: attr(dor, file.format) - list(sep = sepChar, header = FALSE) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf() difference between R 3.1.2 and 3.0.1
I have a function written and tested using R 3.0.1 and sqldf_0.4-7.1 that works perfectly. However, using this same code with R 3.1.2 and sqldf_0.4-10 yields the error below that I am having a difficult time deciphering. Hence, same code behaves differently on different versions of R and sqldf(). Error in sqliteSendQuery(con, statement, bind.data) : error in statement: no such column: V1 Reproducible example below as well as complete sessionInfo all provided below. My function and code using the function are below. dorReader - function(dorFile, layout, sepChar = '\n'){ sepChar - as.character(sepChar) dorFile - as.character(dorFile) layout$type2 - ifelse(layout$type == 'C', 'character', ifelse(layout$type == 'N', 'numeric', 'Date')) dor - file(dorFile) attr(dor, file.format) - list(sep = sepChar) getVars - paste(select, paste(substr(V1, , layout$Start, , , layout$Length, ) ', layout$Variable.Name, ', collapse = , ), from dor) dat - sqldf(getVars) classConverter - function(obj, types){ out - lapply(1:length(obj),FUN = function(i){FUN1 - switch(types[i],character = as.character,numeric = as.numeric,factor = as.factor, Date = as.character); FUN1(obj[,i])}) names(out) - colnames(obj) as.data.frame(out) } dat - classConverter(dat, layout$type2) names(dat) - layout$Variable.Name dat } ### contents of fwf file 'sample.txt' 1234567 1234567 1234567 1234567 1234567 1234567 1234567 1234567 layout - data.frame(Variable.Name =c('test1', 'test2'), Length = c(3,4), Start =c(1,4), End = c(3,7), type = c('N', 'N')) tmp - dorReader('sample.txt', layout) ### SessionInfo where functions behaves as expected sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] sqldf_0.4-7.1 RSQLite.extfuns_0.0.1 RSQLite_0.11.4DBI_0.2-7 gsubfn_0.6-5 [6] proto_0.3-10 MiscPsycho_1.6statmod_1.4.18 loaded via a namespace (and not attached): [1] chron_2.3-45 tools_3.0.1 ### SessionInfo for version not working sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] sqldf_0.4-10 RSQLite_1.0.0 DBI_0.3.1 gsubfn_0.6-6 proto_0.3-10 loaded via a namespace (and not attached): [1] chron_2.3-45 tools_3.1.2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf problems
Hello: I'm trying to run this code data2 - sqldf ( SELECT plot, age, avg(N) as N FROM data1 GROUP BY plot, t) The problem is that when calling sqldf with require(sqldf) appears this message Warning message: In sqliteCloseConnection(conn, ...) : RS-DBI driver warning: (closing pending result sets before closing this connection) If try again require(sqldf) the message does not appear, but when trying the code afterwards, I get a new message Error in sqliteFetch(rs, n = -1, ...) : RSQLite driver: (RS_SQLite_fetch: failed: Domain error) I have a new Windows 8.1 Do I need any other installations o packages or something else? The code was working in older Windows XP and Linux Thanks -- ** Juan Daniel GarcÃa Villabrille, Ingeniero de Montes, colegiado nº 5.319 627 188 682 e-mail: juandaniel...@gmail.com ** Cuando imprima este correo, no olvide reciclarlo. Porque el papel es el soporte natural, renovable y reciclable de lo más humano: la palabra. ##-- AVISO --## La información a incluir en este e-mail es CONFIDENCIAL, siendo de uso exclusivo del destinatario arriba mencionado. Si usted lee este mensaje y no es el destinatario indicado, le informamos que está totalmente prohibida cualquier utilización, divulgación, distribución y/o reproducción de esta comunicación sin autorización expresa en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos nos lo notifique inmediatamente por esta misma vÃa y proceda a su eliminación. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf problems
2014-07-14 7:52 GMT-04:00 Juan Daniel García juandaniel...@gmail.com: Hello: I'm trying to run this code data2 - sqldf ( SELECT plot, age, avg(N) as N FROM data1 GROUP BY plot, t) The problem is that when calling sqldf with require(sqldf) appears this message Warning message: In sqliteCloseConnection(conn, ...) : RS-DBI driver warning: (closing pending result sets before closing this connection) If try again require(sqldf) the message does not appear, but when trying the code afterwards, I get a new message Error in sqliteFetch(rs, n = -1, ...) : RSQLite driver: (RS_SQLite_fetch: failed: Domain error) I have a new Windows 8.1 Do I need any other installations o packages or something else? The code was working in older Windows XP and Linux See the Troubleshooting section of the sqldf home page. http://sqldf.googlecode.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf problems
One doesn't call a package... so your description is unclear. One doesn't load (using the require or library functions) the relevant package AFTER calling functions in the package. Since it appears you have left out some steps or described them out of order, please supply a reproducible example [1] (including a test data set) as the Posting Guide requests. NB I don't have Windows 8 or 8.1, so if that turns out to be the problem then I won't be able to tell, but I have a feeling there is something more basic going on here. [1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. On July 14, 2014 4:52:18 AM PDT, Juan Daniel García juandaniel...@gmail.com wrote: Hello: I'm trying to run this code data2 - sqldf ( SELECT plot, age, avg(N) as N FROM data1 GROUP BY plot, t) The problem is that when calling sqldf with require(sqldf) appears this message Warning message: In sqliteCloseConnection(conn, ...) : RS-DBI driver warning: (closing pending result sets before closing this connection) If try again require(sqldf) the message does not appear, but when trying the code afterwards, I get a new message Error in sqliteFetch(rs, n = -1, ...) : RSQLite driver: (RS_SQLite_fetch: failed: Domain error) I have a new Windows 8.1 Do I need any other installations o packages or something else? The code was working in older Windows XP and Linux Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SQLDF column errors
I am trying to exclude integer values from a small data frame 1, d1 that have matching hits in data frame 2, d2 (Very big) which involves matching those hits first. I am trying to use sqldf on the df's in the following fashion: df1: V1 12675 14753 16222 18765 df2: head(df2) V1 V2 13647 rd1500 14753 rd1580 15987 rd1590 16222 rd2020. df1_new-sqldf(select df1.V1, df2.V2 where rs10.V1 = d10.pos) - Ideally I would like to try to use delete or not equal to != though I can only find that delete works with sqldf. but it returns this error: Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such column: df1.V1) I am also trying this: df1_new-sqldf(select V1 from df1, V2 from df2 where df1.V1 = df2.V1) which returns this error: Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near from: syntax error) If anyone with sqldf knowledge could lend me a hand that would be great. Thanks! Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQLDF column errors
i think this is what you want.. :) http://stackoverflow.com/questions/4765936/using-joined-tables-to-exclude-certain-records library(sqldf) # use the mtcars example table mtcars # keep the first eight records in a second, separate data set x - mtcars[ 1:8 , ] # keep all the records from mtcars where the mpg value doesn't match a record in the `x` data table y - sqldf( select * from mtcars where not exists ( select * from x where x.mpg == mtcars.mpg ) ) # the above example assumes your tables won't fit in RAM. # if they do, this method is obviously much easier y - mtcars[ !( mtcars$mpg %in% x$mpg ) , ] On Tue, Jan 15, 2013 at 4:02 PM, Matthew Liebers mr...@cornell.edu wrote: I am trying to exclude integer values from a small data frame 1, d1 that have matching hits in data frame 2, d2 (Very big) which involves matching those hits first. I am trying to use sqldf on the df's in the following fashion: df1: V1 12675 14753 16222 18765 df2: head(df2) V1 V2 13647 rd1500 14753 rd1580 15987 rd1590 16222 rd2020. df1_new-sqldf(select df1.V1, df2.V2 where rs10.V1 = d10.pos) - Ideally I would like to try to use delete or not equal to != though I can only find that delete works with sqldf. but it returns this error: Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: no such column: df1.V1) I am also trying this: df1_new-sqldf(select V1 from df1, V2 from df2 where df1.V1 = df2.V1) which returns this error: Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near from: syntax error) If anyone with sqldf knowledge could lend me a hand that would be great. Thanks! Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf package: using variables in where condition
Sent from my iPhone On Jan 13, 2013, at 9:47 PM, ravsre ravishre...@hotmail.com wrote: I am trying to use the sqldf package to create independent data frames from a master dataframe. I want to use sqldf package and perform a simple select statement. However, what I want to do is to create a loop and repeatedly send a variable to the where clause condition such as a: select * from mydfrm where Symbol = list[i] The 'list' is another data frame with a single column with 15 values. All I what to do is to create loop for i in 1:15 and pass on into the select statement as list[1], list[2 }..tile list[15] Can someone help since, I am unable to reference the element location as variable in the sqldf function? also, is there a better way? would appreciate your help Rgds, Ravi -- View this message in context: http://r.789695.n4.nabble.com/sqldf-package-using-variables-in-where-condition-tp4655445.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf package: using variables in where condition
On Mon, Jan 14, 2013 at 8:36 AM, Ravi Sreedhar ravishre...@hotmail.com wrote: Sent from my iPhone On Jan 13, 2013, at 9:47 PM, ravsre ravishre...@hotmail.com wrote: I am trying to use the sqldf package to create independent data frames from a master dataframe. I want to use sqldf package and perform a simple select statement. However, what I want to do is to create a loop and repeatedly send a variable to the where clause condition such as a: select * from mydfrm where Symbol = list[i] The 'list' is another data frame with a single column with 15 values. All I what to do is to create loop for i in 1:15 and pass on into the select statement as list[1], list[2 }..tile list[15] Can someone help since, I am unable to reference the element location as variable in the sqldf function? See Example 5 on the sqldf home page. http://code.google.com/p/sqldf/#Example_5._Insert_Variables -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf merging with subset in specific range
Hi all: I have two data sets. Set A includes a long list of hits in a single column, say: m$V1 10 15 36 37 38 44 45 57 61 62 69 ...and so on Set B includes just a few key ranges set up by way of a minimum in column X and a maximum in column Y. Say, n$X n$Y 30 38 # range from 30 to 38 52 62 # range from 52 to 62 I would like the output to be the rows containing the following columns: m$V1 36 37 38 57 61 62 I am interested in isolating the hits in data set A that correspond to any of the hotspot ranges in data set B. I have downloaded sqldf and tried a couple things but I cannot do a traditional merge since set B is based on a range. I can always do a manual subset but I am trying to figure out if there is anything more expedient since these df's will be quite large. Thanks! Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf merging with subset in specific range
Is this what you want: m - read.table(text = 10 + 15 + 36 + 37 + 38 + 44 + 45 + 57 + 61 + 62 + 69 ) n - read.table(text = 30 38 + 52 62 ) require(sqldf) sqldf(select m.V1 + from m, n + where m.V1 between n.V1 and n.V2 + ) V1 1 36 2 37 3 38 4 57 5 61 6 62 On Wed, Dec 26, 2012 at 7:00 PM, Matthew Liebers mr...@cornell.edu wrote: Hi all: I have two data sets. Set A includes a long list of hits in a single column, say: m$V1 10 15 36 37 38 44 45 57 61 62 69 ...and so on Set B includes just a few key ranges set up by way of a minimum in column X and a maximum in column Y. Say, n$X n$Y 30 38 # range from 30 to 38 52 62 # range from 52 to 62 I would like the output to be the rows containing the following columns: m$V1 36 37 38 57 61 62 I am interested in isolating the hits in data set A that correspond to any of the hotspot ranges in data set B. I have downloaded sqldf and tried a couple things but I cannot do a traditional merge since set B is based on a range. I can always do a manual subset but I am trying to figure out if there is anything more expedient since these df's will be quite large. Thanks! Matt [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf Date problem
Zitat von jim holtman jholt...@gmail.com: Most likely your Date is either a character or a factor (you need to provide an 'str' of the dataframe). You are therefore most likely doing a character compare and that is the reason for your problem. You need to convert to a character string of the format -MM-DD to do the correct character comparison. ## x - data.frame(Date = paste0('1/', 1:31, '/2011')) str(x) 'data.frame': 31 obs. of 1 variable: $ Date: Factor w/ 31 levels 1/1/2011,1/10/2011,..: 1 12 23 26 27 28 29 30 31 2 ... x Date 1 1/1/2011 2 1/2/2011 3 1/3/2011 4 1/4/2011 5 1/5/2011 6 1/6/2011 7 1/7/2011 8 1/8/2011 9 1/9/2011 10 1/10/2011 11 1/11/2011 12 1/12/2011 13 1/13/2011 14 1/14/2011 15 1/15/2011 16 1/16/2011 17 1/17/2011 18 1/18/2011 19 1/19/2011 20 1/20/2011 21 1/21/2011 22 1/22/2011 23 1/23/2011 24 1/24/2011 25 1/25/2011 26 1/26/2011 27 1/27/2011 28 1/28/2011 29 1/29/2011 30 1/30/2011 31 1/31/2011 require(sqldf) # not correct because of character compares sqldf('select * from x where Date 1/13/2011 and Date 1/25/2011') Date 1 1/2/2011 2 1/14/2011 3 1/15/2011 4 1/16/2011 5 1/17/2011 6 1/18/2011 7 1/19/2011 8 1/20/2011 9 1/21/2011 10 1/22/2011 11 1/23/2011 12 1/24/2011 # convert the date to /MM/DD for character compares x$newDate - as.character(as.Date(as.character(x$Date), format = %m/%d/%Y)) # now do the select sqldf('select * from x where newDate between 2011-01-13 and 2011-01-25') DatenewDate 1 1/13/2011 2011-01-13 2 1/14/2011 2011-01-14 3 1/15/2011 2011-01-15 4 1/16/2011 2011-01-16 5 1/17/2011 2011-01-17 6 1/18/2011 2011-01-18 7 1/19/2011 2011-01-19 8 1/20/2011 2011-01-20 9 1/21/2011 2011-01-21 10 1/22/2011 2011-01-22 11 1/23/2011 2011-01-23 12 1/24/2011 2011-01-24 13 1/25/2011 2011-01-25 On Sat, Nov 3, 2012 at 4:22 PM, Andreas Recktenwald a.recktenw...@mx.uni-saarland.de wrote: Dear R-help readers, i've created a database for quotes data (for 4 years; 2007 -- 2010) with the sqldf package. This database contains a column Date in the format mm/dd/. The table in the database is called main.data and the database itself Honda. I tried to get the Data just for certain period, say from 01/01/2007 until 01/10/2007 with the following code: sqldf(select * from main.data where Date='01/10/2007' and Date='01/01/2007'), dbname=Honda) I get the data for this period for every year(2007,2008,2009,2010) not only for 2007. It seems that the year is overlooked and just looked for the fitting days and months. Because I haven't really much experience with sql I decide to send my problem to the list. Many thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. Thanks for your quick response Jim, you are right the entries in my Date column are characters (my fault not to mention this in my first post). Now i know the reasons for my problem and can solve it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf Date problem
Dear R-help readers, i've created a database for quotes data (for 4 years; 2007 -- 2010) with the sqldf package. This database contains a column Date in the format mm/dd/. The table in the database is called main.data and the database itself Honda. I tried to get the Data just for certain period, say from 01/01/2007 until 01/10/2007 with the following code: sqldf(select * from main.data where Date='01/10/2007' and Date='01/01/2007'), dbname=Honda) I get the data for this period for every year(2007,2008,2009,2010) not only for 2007. It seems that the year is overlooked and just looked for the fitting days and months. Because I haven't really much experience with sql I decide to send my problem to the list. Many thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf Date problem
Dear R-help readers, i've created a database for quotes data (for 4 years; 2007 -- 2010) with the sqldf package. This database contains a column Date in the format mm/dd/. The table in the database is called main.data and the database itself Honda. I tried to get the Data just for certain period, say from 01/01/2007 until 01/10/2007 with the following code: sqldf(select * from main.data where Date='01/10/2007' and Date='01/01/2007'), dbname=Honda) I get the data for this period for every year(2007,2008,2009,2010) not only for 2007. It seems that the year is overlooked and just looked for the fitting days and months. Because I haven't really much experience with sql I decide to send my problem to the list. Many thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf Date problem
Most likely your Date is either a character or a factor (you need to provide an 'str' of the dataframe). You are therefore most likely doing a character compare and that is the reason for your problem. You need to convert to a character string of the format -MM-DD to do the correct character comparison. ## x - data.frame(Date = paste0('1/', 1:31, '/2011')) str(x) 'data.frame': 31 obs. of 1 variable: $ Date: Factor w/ 31 levels 1/1/2011,1/10/2011,..: 1 12 23 26 27 28 29 30 31 2 ... x Date 1 1/1/2011 2 1/2/2011 3 1/3/2011 4 1/4/2011 5 1/5/2011 6 1/6/2011 7 1/7/2011 8 1/8/2011 9 1/9/2011 10 1/10/2011 11 1/11/2011 12 1/12/2011 13 1/13/2011 14 1/14/2011 15 1/15/2011 16 1/16/2011 17 1/17/2011 18 1/18/2011 19 1/19/2011 20 1/20/2011 21 1/21/2011 22 1/22/2011 23 1/23/2011 24 1/24/2011 25 1/25/2011 26 1/26/2011 27 1/27/2011 28 1/28/2011 29 1/29/2011 30 1/30/2011 31 1/31/2011 require(sqldf) # not correct because of character compares sqldf('select * from x where Date 1/13/2011 and Date 1/25/2011') Date 1 1/2/2011 2 1/14/2011 3 1/15/2011 4 1/16/2011 5 1/17/2011 6 1/18/2011 7 1/19/2011 8 1/20/2011 9 1/21/2011 10 1/22/2011 11 1/23/2011 12 1/24/2011 # convert the date to /MM/DD for character compares x$newDate - as.character(as.Date(as.character(x$Date), format = %m/%d/%Y)) # now do the select sqldf('select * from x where newDate between 2011-01-13 and 2011-01-25') DatenewDate 1 1/13/2011 2011-01-13 2 1/14/2011 2011-01-14 3 1/15/2011 2011-01-15 4 1/16/2011 2011-01-16 5 1/17/2011 2011-01-17 6 1/18/2011 2011-01-18 7 1/19/2011 2011-01-19 8 1/20/2011 2011-01-20 9 1/21/2011 2011-01-21 10 1/22/2011 2011-01-22 11 1/23/2011 2011-01-23 12 1/24/2011 2011-01-24 13 1/25/2011 2011-01-25 On Sat, Nov 3, 2012 at 4:22 PM, Andreas Recktenwald a.recktenw...@mx.uni-saarland.de wrote: Dear R-help readers, i've created a database for quotes data (for 4 years; 2007 -- 2010) with the sqldf package. This database contains a column Date in the format mm/dd/. The table in the database is called main.data and the database itself Honda. I tried to get the Data just for certain period, say from 01/01/2007 until 01/10/2007 with the following code: sqldf(select * from main.data where Date='01/10/2007' and Date='01/01/2007'), dbname=Honda) I get the data for this period for every year(2007,2008,2009,2010) not only for 2007. It seems that the year is overlooked and just looked for the fitting days and months. Because I haven't really much experience with sql I decide to send my problem to the list. Many thanks in advance. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
Hi, On Fri, Feb 3, 2012 at 1:12 PM, HC hca...@yahoo.co.in wrote: Thank you. The readLines command is working fine and I am able to read 10^6 lines in one go and write them using the write.table command. Does this readLines command using a block concept to optimize or goes line by line? Steve has mentioned about *nix and split commands. Would there be any speed benefit as compared to readLines? It's easy to test, right? Just give it a shot and see ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
Thank you Jim for your reply. I could figure out that readLines works fine until 35,841,335 lines (records). When the next line is read to be read, a window with R for Windows GUI front-end has stopped working message comes, with an option to close program or checking online for a solution. The tab-separated .txt file I am working on has 9 columns and about 3.25 billion rows. I suspect that the 35,841,336th line is becoming so very big that RAM out runs its capacity. Perhaps all tabs that are expected after each values are missing, making the line so very big, I am not sure. Is there any way to skip this line with readLines or any other function? I am only hoping that the data after this bad line is good and I can read them if I can skip the bad one. Thank you. HC -- View this message in context: http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4357730.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
Thank you for indicating that SQLite may not handle a file as big as 160 GB. Would you know of any utility for *physically splitting *the 160 GB text file into pieces. And if one can control the splitting at the end of a record. Thank you again. HC -- View this message in context: http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4354285.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
On Fri, Feb 3, 2012 at 6:03 AM, HC hca...@yahoo.co.in wrote: Thank you for indicating that SQLite may not handle a file as big as 160 GB. Would you know of any utility for *physically splitting *the 160 GB text file into pieces. And if one can control the splitting at the end of a record. If they are csv files or similar data files then you could use R or any scripting language to do that. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
On Fri, Feb 3, 2012 at 7:37 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Fri, Feb 3, 2012 at 6:03 AM, HC hca...@yahoo.co.in wrote: Thank you for indicating that SQLite may not handle a file as big as 160 GB. Would you know of any utility for *physically splitting *the 160 GB text file into pieces. And if one can control the splitting at the end of a record. If they are csv files or similar data files then you could use R or any scripting language to do that. Or even the *nix `split` command ... -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
This is a 160 GB tab-separated .txt file. It has 9 columns and 3.25x10^9 rows. Can R handle it? Thank you. HC -- View this message in context: http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4354556.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
On Fri, Feb 3, 2012 at 8:08 AM, HC hca...@yahoo.co.in wrote: This is a 160 GB tab-separated .txt file. It has 9 columns and 3.25x10^9 rows. Can R handle it? You can process a file N lines at time like this: con - file(myfile.dat, r) while(length(Lines - readLines(con, n = N)) 0) { ... whatever... } -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
Thank you. The readLines command is working fine and I am able to read 10^6 lines in one go and write them using the write.table command. Does this readLines command using a block concept to optimize or goes line by line? Steve has mentioned about *nix and split commands. Would there be any speed benefit as compared to readLines? Thank you. HC -- View this message in context: http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355362.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
Bad news! The readLines command works fine upto a certain limit. Once a few files have been written the R program crashes. I used the following code: * iFile-Test.txt con - file(iFile, r) N-125; iLoop-1 while(length(Lines - readLines(con, n = N)) 0 iLoop41) { oFile-paste(Split_,iLoop,.txt,sep=) write.table(Lines, oFile, sep = \t, quote = FALSE, col.names= FALSE, row.names = FALSE) iLoop-iLoop+1 } close(con) With above N=1.25 million, it wrote 28 files of about 57 mb each. That is a total of about 1.6 GB and then crashed. I tried with other values on N and it crashes at about the same place in terms of total size output, i.e., about 1.6 GB. Is this due to any limitation of Windows 7, in terms of not having the pointer after this size? Your insight would be very helpful. Thank you. HC -- View this message in context: http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355679.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
Exactly what does crashed mean? What was the error message? How you tried to put: rm(Lines) gc() at the end of the loop to free up and compact memory? If you watch the performance, does the R process seem to be growing in terms of the amount of memory that is being used? You can add: memory.size() before the above statements to see how much memory is being used. This is just some more elementary debugging that you will have to learn when using any system. On Fri, Feb 3, 2012 at 3:22 PM, HC hca...@yahoo.co.in wrote: Bad news! The readLines command works fine upto a certain limit. Once a few files have been written the R program crashes. I used the following code: * iFile-Test.txt con - file(iFile, r) N-125; iLoop-1 while(length(Lines - readLines(con, n = N)) 0 iLoop41) { oFile-paste(Split_,iLoop,.txt,sep=) write.table(Lines, oFile, sep = \t, quote = FALSE, col.names= FALSE, row.names = FALSE) iLoop-iLoop+1 } close(con) With above N=1.25 million, it wrote 28 files of about 57 mb each. That is a total of about 1.6 GB and then crashed. I tried with other values on N and it crashes at about the same place in terms of total size output, i.e., about 1.6 GB. Is this due to any limitation of Windows 7, in terms of not having the pointer after this size? Your insight would be very helpful. Thank you. HC -- View this message in context: http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355679.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
On Wed, Feb 1, 2012 at 11:57 PM, HC hca...@yahoo.co.in wrote: Hi All, I have a very (very) large tab-delimited text file without headers. There are only 8 columns and millions of rows. I want to make numerous pieces of this file by sub-setting it for individual stations. Station is given as in the first column. I am trying to learn and use sqldf package for this but am stuck in a couple of places. To simulate my requirement, I have taken iris dataset as an example and have done the following: (1) create a tab-delimited file without headers. (2) read it using read.csv.sql command (3) write the result of a query, getting first 10 records Here is the reproducible code that I am trying: # Text data file write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names=FALSE,row.names = FALSE) # create an empty database (can skip this step if database already exists) sqldf(attach myTestdbT as new) f1-file(irisNoH.txt) attr(f1, file.format) - list(header=FALSE,sep=\t) # read into table called irisTab in the mytestdb sqlite database read.csv.sql(irisNoH.txt, sql = create table main.irisTab1 as select * from file, dbname = mytestdb) res1-sqldf(select * from main.irisTab1 limit 10, dbname = mytestdb) write.table(res1, iris10.txt, sep = \t, quote = FALSE, col.names=FALSE,row.names = FALSE) # For querying records of a particular species - unresolved problems #a1-virginica #attr(f1, names) - c(A1,A2,A3,A4,A5) #res2-fn$sqldf(select * from main.irisTab1 where A5 = '$a1') In the above, I am not able to: (1) assign the names to various columns (2) query for particular value of a column; in this case for particular species, say virginica (3) I guess fn$sqldf can do the job but it requires assigning column names Any help would be most appreciated. Ignoring your iris file for a moment, to query the 5th column (getting its name via sql rather than via R) we can do this: library(sqldf) species - virginica nms - names(dbGetQuery(con, select * from iris limit 0)) fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3) Now, sqldf is best used when you are getting the data from R but if you want to store it in a database and just leave it there then you might be better off using RSQLite directly like this (the eol = \r\n in the dbWriteTable statement was needed on my Windows system but you may not need that depending on your platform): write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names = FALSE, row.names = FALSE) library(sqldf) library(RSQLite) con - dbConnect(SQLite(), dbname = mytestdb) dbWriteTable(con, iris, irisNoH.txt, sep = \t, eol = \r\n) species - virginica nms - names(dbGetQuery(con, select * from iris limit 0)) fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3) dbDisconnect(con) -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
On Thu, Feb 2, 2012 at 3:11 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Wed, Feb 1, 2012 at 11:57 PM, HC hca...@yahoo.co.in wrote: Hi All, I have a very (very) large tab-delimited text file without headers. There are only 8 columns and millions of rows. I want to make numerous pieces of this file by sub-setting it for individual stations. Station is given as in the first column. I am trying to learn and use sqldf package for this but am stuck in a couple of places. To simulate my requirement, I have taken iris dataset as an example and have done the following: (1) create a tab-delimited file without headers. (2) read it using read.csv.sql command (3) write the result of a query, getting first 10 records Here is the reproducible code that I am trying: # Text data file write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names=FALSE,row.names = FALSE) # create an empty database (can skip this step if database already exists) sqldf(attach myTestdbT as new) f1-file(irisNoH.txt) attr(f1, file.format) - list(header=FALSE,sep=\t) # read into table called irisTab in the mytestdb sqlite database read.csv.sql(irisNoH.txt, sql = create table main.irisTab1 as select * from file, dbname = mytestdb) res1-sqldf(select * from main.irisTab1 limit 10, dbname = mytestdb) write.table(res1, iris10.txt, sep = \t, quote = FALSE, col.names=FALSE,row.names = FALSE) # For querying records of a particular species - unresolved problems #a1-virginica #attr(f1, names) - c(A1,A2,A3,A4,A5) #res2-fn$sqldf(select * from main.irisTab1 where A5 = '$a1') In the above, I am not able to: (1) assign the names to various columns (2) query for particular value of a column; in this case for particular species, say virginica (3) I guess fn$sqldf can do the job but it requires assigning column names Any help would be most appreciated. Ignoring your iris file for a moment, to query the 5th column (getting its name via sql rather than via R) we can do this: library(sqldf) species - virginica nms - names(dbGetQuery(con, select * from iris limit 0)) fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3) Now, sqldf is best used when you are getting the data from R but if you want to store it in a database and just leave it there then you might be better off using RSQLite directly like this (the eol = \r\n in the dbWriteTable statement was needed on my Windows system but you may not need that depending on your platform): write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names = FALSE, row.names = FALSE) library(sqldf) library(RSQLite) con - dbConnect(SQLite(), dbname = mytestdb) dbWriteTable(con, iris, irisNoH.txt, sep = \t, eol = \r\n) species - virginica nms - names(dbGetQuery(con, select * from iris limit 0)) fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3) dbDisconnect(con) There seems to have been a pasting error here. The first part was intended to show how to do this using sqldf and the second using RSQLite.Thus the first part was intended to be: library(sqldf) species - virginica # obviously we could just do nms - names(iris) but to get # names from database instead nms - names(dbGetQuery(con, select * from iris limit 0)) # use 5th column fn$sqldf(select * from iris where `nms[5]` = '$species' limit 3) and the second part that illustrates RSQLite was ok. Note that fn$ comes from the gsubfn package which sqldf loads. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
Hi Gabor, Thank you very much for your guidance and help. I could run the following code successfully on a 500 mb test data file. A snapshot of the data file is attached herewith. code start*** library(sqldf) library(RSQLite) iFile-Test100.txt con - dbConnect(SQLite(),dbname = myTest100) dbWriteTable(con, TestDB100, iFile, sep = \t) #, eol = \r\n) nms - names(dbGetQuery(con, select * from TestDB100 limit 0)) nRec-fn$dbGetQuery(con, select count(*)from TestDB100) aL1-1; while (aL1=nRec){ res1-fn$dbGetQuery(con, select * from (select * from TestDB100 limit '$aL1',1)) istn-res1[1,1] res1-fn$dbGetQuery(con, select * from TestDB100 where `nms[1]` = '$istn') icount-dim(res1)[1] oFile-paste(istn,_Test.txt,sep=) write.table(res1, oFile, sep = \t, quote = FALSE, col.names= FALSE, row.names = FALSE) aL1-aL1+icount } dbDisconnect(con) code end*** However, the actual data file that I want to handle is about *160 GB*. And when I use the same above code on that file, it gives following error for dbWriteTable(con, ...) statement error start** dbWriteTable(con, TestDB, iFile, sep = \t) #, eol = \r\n) Error in try({ : RS-DBI driver: (RS_sqlite_getline could not realloc) [1] FALSE error end** I am not sure about the reason of this error. Is this due to the big file size? I understood from sqldf webpage that SQLite can work for even a larger file than this and is only restricted by the disc space and not RAM. I have about 400GB free space on the PC I am using, with Windows 7 as the operating system. I am assuming that the about dbWriteTable command is using the disc memory only and is not the issue. In fact this file has been created using MySQLdump and I do not have access to the original MYSQL database file. I want to know the following: (1) Am I missing something in the above code that is preventing handling of this big 160 GB file? (2) Should this be handled outside of R, if R is becoming a limitation in this? And if yes then what is a possible way forward? Thank you again for your quick response and all the help. HC http://r.789695.n4.nabble.com/file/n4353362/Test100.txt Test100.txt -- View this message in context: http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4353362.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf for Very Large Tab Delimited Files
On Thu, Feb 2, 2012 at 8:07 PM, HC hca...@yahoo.co.in wrote: Hi Gabor, Thank you very much for your guidance and help. I could run the following code successfully on a 500 mb test data file. A snapshot of the data file is attached herewith. code start*** library(sqldf) library(RSQLite) iFile-Test100.txt con - dbConnect(SQLite(),dbname = myTest100) dbWriteTable(con, TestDB100, iFile, sep = \t) #, eol = \r\n) nms - names(dbGetQuery(con, select * from TestDB100 limit 0)) nRec-fn$dbGetQuery(con, select count(*)from TestDB100) aL1-1; while (aL1=nRec){ res1-fn$dbGetQuery(con, select * from (select * from TestDB100 limit '$aL1',1)) istn-res1[1,1] res1-fn$dbGetQuery(con, select * from TestDB100 where `nms[1]` = '$istn') icount-dim(res1)[1] oFile-paste(istn,_Test.txt,sep=) write.table(res1, oFile, sep = \t, quote = FALSE, col.names= FALSE, row.names = FALSE) aL1-aL1+icount } dbDisconnect(con) code end*** However, the actual data file that I want to handle is about *160 GB*. And when I use the same above code on that file, it gives following error for dbWriteTable(con, ...) statement error start** dbWriteTable(con, TestDB, iFile, sep = \t) #, eol = \r\n) Error in try({ : RS-DBI driver: (RS_sqlite_getline could not realloc) [1] FALSE error end** I am not sure about the reason of this error. Is this due to the big file size? I understood from sqldf webpage that SQLite can work for even a larger file than this and is only restricted by the disc space and not RAM. I have about 400GB free space on the PC I am using, with Windows 7 as the operating system. I am assuming that the about dbWriteTable command is using the disc memory only and is not the issue. In fact this file has been created using MySQLdump and I do not have access to the original MYSQL database file. I want to know the following: (1) Am I missing something in the above code that is preventing handling of this big 160 GB file? (2) Should this be handled outside of R, if R is becoming a limitation in this? And if yes then what is a possible way forward? Thank you again for your quick response and all the help. HC http://r.789695.n4.nabble.com/file/n4353362/Test100.txt Test100.txt I think its unlikely SQLite could handle a database that large unless you can divide it into multiple separate databases. At one time the SQLite site said it did not handle databases over 1 GB and although I think that is outdated by more recent versions of SQLite its still likely true that your size is too large for it. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf for Very Large Tab Delimited Files
Hi All, I have a very (very) large tab-delimited text file without headers. There are only 8 columns and millions of rows. I want to make numerous pieces of this file by sub-setting it for individual stations. Station is given as in the first column. I am trying to learn and use sqldf package for this but am stuck in a couple of places. To simulate my requirement, I have taken iris dataset as an example and have done the following: (1) create a tab-delimited file without headers. (2) read it using read.csv.sql command (3) write the result of a query, getting first 10 records Here is the reproducible code that I am trying: # Text data file write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names=FALSE,row.names = FALSE) # create an empty database (can skip this step if database already exists) sqldf(attach myTestdbT as new) f1-file(irisNoH.txt) attr(f1, file.format) - list(header=FALSE,sep=\t) # read into table called irisTab in the mytestdb sqlite database read.csv.sql(irisNoH.txt, sql = create table main.irisTab1 as select * from file, dbname = mytestdb) res1-sqldf(select * from main.irisTab1 limit 10, dbname = mytestdb) write.table(res1, iris10.txt, sep = \t, quote = FALSE, col.names=FALSE,row.names = FALSE) # For querying records of a particular species - unresolved problems #a1-virginica #attr(f1, names) - c(A1,A2,A3,A4,A5) #res2-fn$sqldf(select * from main.irisTab1 where A5 = '$a1') In the above, I am not able to: (1) assign the names to various columns (2) query for particular value of a column; in this case for particular species, say virginica (3) I guess fn$sqldf can do the job but it requires assigning column names Any help would be most appreciated. Thanks HC -- View this message in context: http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4350555.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.
I've been using sqldf heavily lately but have encountered problems with ordering of observations or calculating statistics such as max() and min() when the variable used is of class Date. For example, if I run the following code: === begin code = library(sqldf) A-data.frame(Dates=as.Date(c(1994-02-14,1977-02-23,2001-09-18,2009-08-01)),Ret=rnorm(4)) OrderedA-sqldf('select * from A order by Dates') MaxA-sqldf('select max(Dates) as Dates from A')[1,1] MinA-sqldf('select min(Dates) as Dates from A')[1,1] === end code = Then the result is this: A DatesRet 1 1994-02-14 1.2414706 2 1977-02-23 -0.7728146 3 2001-09-18 1.2551331 4 2009-08-01 -0.2538359 OrderedA DatesRet 1 2001-09-18 1.2551331 2 2009-08-01 -0.2538359 3 1977-02-23 -0.7728146 4 1994-02-14 1.2414706 MaxA [1] 1994-02-14 MinA [1] 2001-09-18 Completely wrong order, no warnings issued, and the summary stats are wrong as well (but consistent with the ordering). According to the sqldf manual found at the following URL http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables? this type of query should work correctly. Any clue why it is not doing so? User error or bug? === debug info = sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] tcltk stats graphics grDevices utils datasets methods [8] base other attached packages: [1] sqldf_0.4-2 chron_2.3-42 gsubfn_0.5-7 [4] proto_0.3-9.2 RSQLite.extfuns_0.0.1 RSQLite_0.10.0 [7] DBI_0.2-5 end debug info = __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.
On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth gvfa...@gmail.com wrote: I've been using sqldf heavily lately but have encountered problems with ordering of observations or calculating statistics such as max() and min() when the variable used is of class Date. For example, if I run the following code: === begin code = library(sqldf) A-data.frame(Dates=as.Date(c(1994-02-14,1977-02-23,2001-09-18,2009-08-01)),Ret=rnorm(4)) OrderedA-sqldf('select * from A order by Dates') MaxA-sqldf('select max(Dates) as Dates from A')[1,1] MinA-sqldf('select min(Dates) as Dates from A')[1,1] === end code = Then the result is this: A Dates Ret 1 1994-02-14 1.2414706 2 1977-02-23 -0.7728146 3 2001-09-18 1.2551331 4 2009-08-01 -0.2538359 OrderedA Dates Ret 1 2001-09-18 1.2551331 2 2009-08-01 -0.2538359 3 1977-02-23 -0.7728146 4 1994-02-14 1.2414706 MaxA [1] 1994-02-14 MinA [1] 2001-09-18 Completely wrong order, no warnings issued, and the summary stats are wrong as well (but consistent with the ordering). According to the sqldf manual found at the following URL http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables? this type of query should work correctly. Any clue why it is not doing so? User error or bug? You are using an old version. Update to latest version of R and sqldf. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.
On Mon, Jan 23, 2012 at 12:46 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth gvfa...@gmail.com wrote: I've been using sqldf heavily lately but have encountered problems with ordering of observations or calculating statistics such as max() and min() when the variable used is of class Date. For example, if I run the following code: === begin code = library(sqldf) A-data.frame(Dates=as.Date(c(1994-02-14,1977-02-23,2001-09-18,2009-08-01)),Ret=rnorm(4)) OrderedA-sqldf('select * from A order by Dates') MaxA-sqldf('select max(Dates) as Dates from A')[1,1] MinA-sqldf('select min(Dates) as Dates from A')[1,1] === end code = Then the result is this: A Dates Ret 1 1994-02-14 1.2414706 2 1977-02-23 -0.7728146 3 2001-09-18 1.2551331 4 2009-08-01 -0.2538359 OrderedA Dates Ret 1 2001-09-18 1.2551331 2 2009-08-01 -0.2538359 3 1977-02-23 -0.7728146 4 1994-02-14 1.2414706 MaxA [1] 1994-02-14 MinA [1] 2001-09-18 Completely wrong order, no warnings issued, and the summary stats are wrong as well (but consistent with the ordering). According to the sqldf manual found at the following URL http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables? this type of query should work correctly. Any clue why it is not doing so? User error or bug? You are using an old version. Update to latest version of R and sqldf. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com Thanks, that worked. Known bug, I guess. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf and not converting integers to floating point in SQLite
Hi, I have following 2 tables: Table 1: POSTAL | VALUE 1000|49 1010|100 1020|50 Table 2: INSEE | POSTAL A|1000 B|1000 C|1010 D|1020 I would like to convert this to the following: INSEE | VALUE_SPREAD A|24.5 B|24.5 C|100 D|50 I can achieve this with a nested SQL query (through counting the number of POSTAL that belong to any given INSEE, and diving the value of the postal in that INSEE by that number). library(sqldf) table1 - read.csv(c:/R/table1.csv, sep=;) table2 - read.csv(c:/R/table2.csv, sep=;) table3 - sqldf(select table2.INSEE, table1.VALUE / counts.nPostals as value_spread from table1, table2,(select POSTAL, count(INSEE) as nPostals from table2 group by POSTAL) counts where table1.POSTAL = counts.POSTAL and table1.POSTAL=table2.POSTAL) Unfortunately, the value I'm working with is an integer. In SQLite, this results in the computed value also not being a float - so it gets rounded up or down. In this case, I'm getting 24 for A B instead of 24.5. Is there a way to take care of this using other R concepts, avoiding that problem (for instance using melt cast)? Thanks, Frederik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf and not converting integers to floating point in SQLite
try this: library(sqldf) table1 - read.csv(text = POSTAL | VALUE + 1000|49 + 1010|100 + 1020|50, sep=|) table2 - read.csv(text = INSEE | POSTAL + A|1000 + B|1000 + C|1010 + D|1020, sep=|) table3 - sqldf( + select table2.INSEE + , 1.0 * table1.VALUE / counts.nPostals as value_spread + from table1 + , table2 + ,(select POSTAL + , count(INSEE) as nPostals + from table2 + group by POSTAL) counts + where table1.POSTAL = counts.POSTAL + and table1.POSTAL=table2.POSTAL + ) table3 INSEE value_spread 1 A 24.5 2 B 24.5 3 C100.0 4 D 50.0 On Tue, Jan 3, 2012 at 3:13 PM, Frederik Vanrenterghem frede...@vanrenterghem.biz wrote: Hi, I have following 2 tables: Table 1: POSTAL | VALUE 1000|49 1010|100 1020|50 Table 2: INSEE | POSTAL A|1000 B|1000 C|1010 D|1020 I would like to convert this to the following: INSEE | VALUE_SPREAD A|24.5 B|24.5 C|100 D|50 I can achieve this with a nested SQL query (through counting the number of POSTAL that belong to any given INSEE, and diving the value of the postal in that INSEE by that number). library(sqldf) table1 - read.csv(c:/R/table1.csv, sep=;) table2 - read.csv(c:/R/table2.csv, sep=;) table3 - sqldf(select table2.INSEE, table1.VALUE / counts.nPostals as value_spread from table1, table2,(select POSTAL, count(INSEE) as nPostals from table2 group by POSTAL) counts where table1.POSTAL = counts.POSTAL and table1.POSTAL=table2.POSTAL) Unfortunately, the value I'm working with is an integer. In SQLite, this results in the computed value also not being a float - so it gets rounded up or down. In this case, I'm getting 24 for A B instead of 24.5. Is there a way to take care of this using other R concepts, avoiding that problem (for instance using melt cast)? Thanks, Frederik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf if iif
Dear all, I have problems with iif function using sqldf library. I counted abundance (Num) of different SPECIES in two moments (esf) saving the information in two Tables (esf50, esf100): esf50 SAMPLE SPECIES Num esf 1289diso1 44 50 1289diso2 5 50 1289diso3 1 50 diso1 44 50 diso2 5 50 diso3 1 50 esf100 SAMPLE SPECIES Num esf 1289diso1 82 100 1289diso2 13 100 1289diso3 2 100 1289diso4 3 100 diso1 82 100 diso2 13 100 diso3 2 100 diso4 3 100 I would like subtract column Num between the two moments considering only the changes, therefore I use the conditional if: var100-sqldf(select esf100.SAMPLE, esf100.SPECIES, esf100.Num, esf100.esf, iif esf100.Num - esf50.Num =0, esf100.Num-esf50.Num, esf100.Num as PIPAS from esf100 left join esf50 on esf100.SAMPLE = esf50.SAMPLE and esf100.SPECIES = esf50.SPECIES) I think the structure is right because the SQL query run ok in Access. Is the if syntax the problems? Thank in advanced. Best wishes, Carlos Rivera [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf if iif
sqldf uses the SQLite database by default for backend processing. The iif function is specific to the Jet database engine syntax (which underlies MS Access). You could read up on SQLite syntax, or you could avoid using nonstandard SQL syntax, retrieve the data into a data frame, and use R code to do your logical merging into one column. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Carlos Rivera limnoriv...@gmail.com wrote: Dear all, I have problems with iif function using sqldf library. I counted abundance (Num) of different SPECIES in two moments (esf) saving the information in two Tables (esf50, esf100): esf50 SAMPLE SPECIES Num esf 1289diso1 44 50 1289diso2 5 50 1289diso3 1 50 diso1 44 50 diso2 5 50 diso3 1 50 esf100 SAMPLE SPECIES Num esf 1289diso1 82 100 1289diso2 13 100 1289diso3 2 100 1289diso4 3 100 diso1 82 100 diso2 13 100 diso3 2 100 diso4 3 100 I would like subtract column Num between the two moments considering only the changes, therefore I use the conditional if: var100-sqldf(select esf100.SAMPLE, esf100.SPECIES, esf100.Num, esf100.esf, iif esf100.Num - esf50.Num =0, esf100.Num-esf50.Num, esf100.Num as PIPAS from esf100 left join esf50 on esf100.SAMPLE = esf50.SAMPLE and esf100.SPECIES = esf50.SPECIES) I think the structure is right because the SQL query run ok in Access. Is the if syntax the problems? Thank in advanced. Best wishes, Carlos Rivera [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf syntax, selecting rows, and skipping
I am using the example in this post: https://stat.ethz.ch/pipermail/r-help/2010-October/257204.html # create a file write.table(iris,iris.csv,row.names=FALSE,sep=,,quote=FALSE) # this does not work # has the syntax changed or is there a mistake in my usage? # the line from the post above is: # read.csv.sql(myfile.csv, sql = select * from file 2000, 1000) library(sqldf) read.csv.sql(iris.csv, sql = select * from file 5, 5) # this works # but i would like to keep the header read.csv.sql(iris.csv, sql = select * from file limit 5,skip=5,header=FALSE) # thanks sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] sqldf_0.4-2 chron_2.3-42 gsubfn_0.5-7 proto_0.3-9.2 RSQLite.extfuns_0.0.1 RSQLite_0.9-4 DBI_0.2-5 myfunctions_1.0 loaded via a namespace (and not attached): [1] tools_2.13.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf syntax, selecting rows, and skipping
On Sep 29, 2011, at 10:06 AM, Juliet Hannah wrote: I am using the example in this post: https://stat.ethz.ch/pipermail/r-help/2010-October/257204.html # create a file write.table(iris,iris.csv,row.names=FALSE,sep=,,quote=FALSE) # this does not work # has the syntax changed or is there a mistake in my usage? # the line from the post above is: # read.csv.sql(myfile.csv, sql = select * from file 2000, 1000) You didn't read to the end of that thread. The two errors above were corrected. library(sqldf) read.csv.sql(iris.csv, sql = select * from file 5, 5) # this works # but i would like to keep the header read.csv.sql(iris.csv, sql = select * from file limit 5,skip=5,header=FALSE) # thanks sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] tcltk stats graphics grDevices utils datasets methods base other attached packages: [1] sqldf_0.4-2 chron_2.3-42 gsubfn_0.5-7 proto_0.3-9.2 RSQLite.extfuns_0.0.1 RSQLite_0.9-4 DBI_0.2-5 myfunctions_1.0 loaded via a namespace (and not attached): [1] tools_2.13.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SQldf with sqlite and H2
SQldf with sqlite and H2 I have a large csv file (about 2GB) and wanted to import the file into R and do some filtering and analysis. Came across sqldf ( a great idea and product) and was trying to play around to see what would be the best method of doing this. csv file is comma delimited with some columns having comma inside the quoation like this John, Doe. I tried this first ### library(sqldf) sqldf(attach testdb as new) In.File - C:/JP/Temp/2008.csv read.csv.sql(In.File, sql = create table table1 as select * from file, dbname = testdb) It errored out with message NULL Warning message: closing unused connection 3 (C:/JP/Temp/2008.csv) When this failed, I converted this file from comma delimited to tab delimited and used this command # read.csv.sql(In.File, sql = create table table1 as select * from file, dbname = testdb, sep = \t) and this worked, it created testdb sqlite file with the size of 3GB now my question is in 3 parts. 1. Is it possible to create a dataframe with appropriate column classes and use that column classes when I use the read.csv.sql command to create the table. Something like may be create the table from that DF and then update with read.csv.sql.? Any example code will be really helpful. 2. If we use the H2 database instead of default sqlite and use the readcsv option, will that be faster and is there a way we can specify the above thought of applying a DF class to table column properties and update with CSVREAD library(RH2) something like SELECT * FROM CSVREAD('C:/JP/Temp/2008.csv') Any example code will be really helpful. 3. How do we specify where the H2 file is saved. Saw something like this, when I ran this example from RH2 package, couldn't find the file in the working directory. con - dbConnect(H2(), jdbc:h2:~/test, sa, ) Sorry for the long mail. Appreciate all for building a great community and for the wonderful software in R. Thanks for Gabor Grothendieck for bring sqldf to this great community. Any help or direction you can provide in this is highly appreciated. Thanks all. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQldf with sqlite and H2
On Thu, Jul 14, 2011 at 10:33 AM, Mandans mandan...@yahoo.com wrote: SQldf with sqlite and H2 I have a large csv file (about 2GB) and wanted to import the file into R and do some filtering and analysis. Came across sqldf ( a great idea and product) and was trying to play around to see what would be the best method of doing this. csv file is comma delimited with some columns having comma inside the quoation like this John, Doe. I tried this first ### library(sqldf) sqldf(attach testdb as new) In.File - C:/JP/Temp/2008.csv read.csv.sql(In.File, sql = create table table1 as select * from file, dbname = testdb) It errored out with message NULL Warning message: closing unused connection 3 (C:/JP/Temp/2008.csv) When this failed, I converted this file from comma delimited to tab delimited and used this command # read.csv.sql(In.File, sql = create table table1 as select * from file, dbname = testdb, sep = \t) and this worked, it created testdb sqlite file with the size of 3GB now my question is in 3 parts. 1. Is it possible to create a dataframe with appropriate column classes and use that column classes when I use the read.csv.sql command to create the table. Something like may be create the table from that DF and then update with read.csv.sql.? Any example code will be really helpful. Here is an example of using method = name__class. Note there are two underscores in a row. It appears I neglected to document that Date2 means convert from character representation whereas Date means convert from numeric representation. It would also be possible to use method = raw and then coerce the columns yourself afterwards. # create test file Lines - 'A__Date2|B 2000-01-01|x,y 2000-01-02|c,d ' tf - tempfile() cat(Lines, file = tf) library(sqldf) DF - read.csv.sql(tf, sep = |, method = name__class) str(DF) 2. If we use the H2 database instead of default sqlite and use the readcsv option, will that be faster and is there a way we can specify the above thought of applying a DF class to table column properties and update with CSVREAD library(RH2) something like SELECT * FROM CSVREAD('C:/JP/Temp/2008.csv') Any example code will be really helpful. Sorry, I haven't tested the speed of this. postgresql and mysql, both supported by sqldf, also have builtin methods to read files. If I had to guess I would guess that mysql would be fastest but this would have to be tested. 3. How do we specify where the H2 file is saved. Saw something like this, when I ran this example from RH2 package, couldn't find the file in the working directory. con - dbConnect(H2(), jdbc:h2:~/test, sa, ) ~ means your home directory so ~/test means test is in the home directory. Try normalizePath(~) normalizePath(~/test) etc. to see what they refer to. Regards. Sorry for the long mail. Appreciate all for building a great community and for the wonderful software in R. Thanks for Gabor Grothendieck for bring sqldf to this great community. Any help or direction you can provide in this is highly appreciated. Thanks all. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQldf with sqlite and H2
Thanks a lot Gabor. It helped a lot. Appreciate your time and effort. Thanks --- On Thu, 7/14/11, Gabor Grothendieck ggrothendi...@gmail.com wrote: From: Gabor Grothendieck ggrothendi...@gmail.com Subject: Re: [R] SQldf with sqlite and H2 To: Mandans mandan...@yahoo.com Cc: r-help@r-project.org Date: Thursday, July 14, 2011, 2:22 PM On Thu, Jul 14, 2011 at 10:33 AM, Mandans mandan...@yahoo.com wrote: SQldf with sqlite and H2 I have a large csv file (about 2GB) and wanted to import the file into R and do some filtering and analysis. Came across sqldf ( a great idea and product) and was trying to play around to see what would be the best method of doing this. csv file is comma delimited with some columns having comma inside the quoation like this John, Doe. I tried this first ### library(sqldf) sqldf(attach testdb as new) In.File - C:/JP/Temp/2008.csv read.csv.sql(In.File, sql = create table table1 as select * from file, dbname = testdb) It errored out with message NULL Warning message: closing unused connection 3 (C:/JP/Temp/2008.csv) When this failed, I converted this file from comma delimited to tab delimited and used this command # read.csv.sql(In.File, sql = create table table1 as select * from file, dbname = testdb, sep = \t) and this worked, it created testdb sqlite file with the size of 3GB now my question is in 3 parts. 1. Is it possible to create a dataframe with appropriate column classes and use that column classes when I use the read.csv.sql command to create the table. Something like may be create the table from that DF and then update with read.csv.sql.? Any example code will be really helpful. Here is an example of using method = name__class. Note there are two underscores in a row. It appears I neglected to document that Date2 means convert from character representation whereas Date means convert from numeric representation. It would also be possible to use method = raw and then coerce the columns yourself afterwards. # create test file Lines - 'A__Date2|B 2000-01-01|x,y 2000-01-02|c,d ' tf - tempfile() cat(Lines, file = tf) library(sqldf) DF - read.csv.sql(tf, sep = |, method = name__class) str(DF) 2. If we use the H2 database instead of default sqlite and use the readcsv option, will that be faster and is there a way we can specify the above thought of applying a DF class to table column properties and update with CSVREAD library(RH2) something like SELECT * FROM CSVREAD('C:/JP/Temp/2008.csv') Any example code will be really helpful. Sorry, I haven't tested the speed of this. postgresql and mysql, both supported by sqldf, also have builtin methods to read files. If I had to guess I would guess that mysql would be fastest but this would have to be tested. 3. How do we specify where the H2 file is saved. Saw something like this, when I ran this example from RH2 package, couldn't find the file in the working directory. con - dbConnect(H2(), jdbc:h2:~/test, sa, ) ~ means your home directory so ~/test means test is in the home directory. Try normalizePath(~) normalizePath(~/test) etc. to see what they refer to. Regards. Sorry for the long mail. Appreciate all for building a great community and for the wonderful software in R. Thanks for Gabor Grothendieck for bring sqldf to this great community. Any help or direction you can provide in this is highly appreciated. Thanks all. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sqldf INSERT INTO
Hi, I am new to R and trying to migrate from SAS. I am trying to copy data from one table to another table which have same columns using sqldf. but not working and showing NULL I wrote statement as sqldf(INSERT INTO new select * from data) but showing NULL Please help me in this regard. Thank you -- View this message in context: http://r.789695.n4.nabble.com/Sqldf-INSERT-INTO-tp3463533p3463533.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sqldf INSERT INTO
On Wed, Apr 20, 2011 at 12:39 PM, new2R bv_agr...@yahoo.co.in wrote: Hi, I am new to R and trying to migrate from SAS. I am trying to copy data from one table to another table which have same columns using sqldf. but not working and showing NULL I wrote statement as sqldf(INSERT INTO new select * from data) but showing NULL Please help me in this regard. In your example new is a table in the sqlite database, not in R's workspace, so you have to return it: library(sqldf) BOD Time demand 118.3 22 10.3 33 19.0 44 16.0 55 15.6 67 19.8 New - BOD[1, ] BOD1 - BOD[2:3,] sqldf(c(insert into New select * from BOD1, select * from New)) Time demand 118.3 22 10.3 33 19.0 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQLDF syntax
On Mon, Apr 18, 2011 at 6:34 PM, new2R bv_agr...@yahoo.co.in wrote: Hi, I am new to R and trying to migrate from SAS. I am trying to use sqldf to create a new table from existed table and change some of the columns. I have table called DataOld with columns commodity, rate and total and I am trying to create new table called DataNew with columns commodity, ratenew and totalNew. sqldf(create table datanew as select commodity, ratenew as rate * 10, totalnew as total *10 from DataOld) I got error message Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near *: syntax error) Its expression as name, not name as expression. Try this: library(sqldf) BODnew - sqldf(select demand, Time, demand + 1 as demandPlusOne from BOD) BODnew demand Time demandPlusOne 18.31 9.3 2 10.32 11.3 3 19.03 20.0 4 16.04 17.0 5 15.65 16.6 6 19.87 20.8 For more, the sqldf home page at http://sqldf.googlecode.com has links to sqlite site where you can find sql syntax diagrams. See the links along the left side of the page there. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQLDF syntax
Thank you very much. Its working. -- View this message in context: http://r.789695.n4.nabble.com/SQLDF-syntax-tp3458919p3460448.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SQLDF syntax
Hi, I am new to R and trying to migrate from SAS. I am trying to use sqldf to create a new table from existed table and change some of the columns. I have table called DataOld with columns commodity, rate and total and I am trying to create new table called DataNew with columns commodity, ratenew and totalNew. sqldf(create table datanew as select commodity, ratenew as rate * 10, totalnew as total *10 from DataOld) I got error message Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near *: syntax error) Please help me in this regard. Thank you -- View this message in context: http://r.789695.n4.nabble.com/SQLDF-syntax-tp3458919p3458919.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SQLDF - Submitting Queries with R Objects as Columns
Fellow R programmers, I'd like to submit SQLDF statements with R objects as column names. For example, I want to assign X to var1 (var1-X) and then refer to var1 in the SQLDF statement. SQLDF needs to understand that when I reference var1, it should look for X in the dataframe. This is necessary because my SQLDF is part of a larger function that I call that repeatedly with different column names. Code below... thank you in advance! Mike library(sqldf) testdf-data.frame(c(1,2,3,4,5,6,7,8,9,10),c(1,1,1,2,2,2,3,3,3,3)) names(testdf)-c(X,Y) # Works as intended sqldf(select sum(X) as XSUM, Y as Y from testdf group by Y) # Now... can I reference var1 in the code? var1-X # Unsuccessful Atteps sqldf(select sum(var1) as XSUM, Y as Y from testdf group by Y) sqldf(select sum(get(var1)) as XSUM, Y as Y from testdf group by Y) sqldf(select sum(return(var1)) as XSUM, Y as Y from testdf group by Y) -- Michael Schumacher mike.schumac...@gmail.com Manager Data Analytics, ValueClick 818-851-8638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQLDF - Submitting Queries with R Objects as Columns
On Wed, Mar 9, 2011 at 11:41 AM, Mike Schumacher mike.schumac...@gmail.com wrote: Fellow R programmers, I'd like to submit SQLDF statements with R objects as column names. For example, I want to assign X to var1 (var1-X) and then refer to var1 in the SQLDF statement. SQLDF needs to understand that when I reference var1, it should look for X in the dataframe. This is necessary because my SQLDF is part of a larger function that I call that repeatedly with different column names. Code below... thank you in advance! Mike library(sqldf) testdf-data.frame(c(1,2,3,4,5,6,7,8,9,10),c(1,1,1,2,2,2,3,3,3,3)) names(testdf)-c(X,Y) # Works as intended sqldf(select sum(X) as XSUM, Y as Y from testdf group by Y) # Now... can I reference var1 in the code? var1-X Here are two ways: sqldf(sprintf(select sum(%s) XSUM, Y from testdf group by Y, var1)) fn$sqldf(select sum($var1) XSUM, Y from testdf group by Y) See ?sprintf fn comes from the gsubfn package (which is automatically pulled in by sqldf) and adds quasi perl style string interpolation to the arguments passed to the function call it prefaces. See http://gsubfn.googlecode.com and ?fn -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQLDF - Submitting Queries with R Objects as Columns
You're submitting queries for SQLDF to execute as strings. So, if you want to use a variable column name, sprintf() or paste() your statement together, like: sqldf(sprintf('select sum(%s) as XSUM, Y as Y from testdf group by Y', var1)) -- Robert Tirrell | r...@stanford.edu | (607) 437-6532 Program in Biomedical Informatics | Butte Lab | Stanford University sqldf(select sum(return(var1)) as XSUM, Y as Y from testdf group by Y) -- Michael Schumacher mike.schumac...@gmail.com Manager Data Analytics, ValueClick 818-851-8638 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf hanging on macintosh - works on windows
Marc: Installing Simon's package worked perfectly. Thanks so much! -- View this message in context: http://r.789695.n4.nabble.com/sqldf-hanging-on-macintosh-works-on-windows-tp3022193p3023736.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf error only on Unix not Windows
Hello Group, I am having trouble with the sqldf package on unix. The same code works fine on windows. Silly Example script: # Load the package library(sqldf) # Use the titanic data set data(women) colnames(women) head(women) sqldf('select height, count(*) from women where height is not null group by weight') Unix Output and error: bash-3.00$ R --vanilla testR.R R version 2.10.1 (2009-12-14) Copyright (C) 2009 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. # Load the package library(sqldf) Loading required package: DBI Loading required package: RSQLite Loading required package: RSQLite.extfuns Loading required package: gsubfn Loading required package: proto Loading required package: tcltk Loading Tcl/Tk interface ... done Loading required package: chron # Use the titanic data set data(women) colnames(women) [1] height weight head(women) height weight 1 58115 2 59117 3 60120 4 61123 5 62126 6 63129 sqldf('select height, count(*) from women where height is not null group by weight') Error in sqliteFetch(rs, n = -1, ...) : RSQLite driver: (RS_SQLite_fetch: failed first step: SQL logic error or missing database) Calls: sqldf ... dbGetQuery - sqliteQuickSQL - sqliteFetch - .Call Error in !dbPreExists : invalid argument type Calls: sqldf ... dbGetQuery - sqliteQuickSQL - sqliteFetch - .Call Execution halted Warning message: RS-DBI driver warning: (closing pending result sets before closing this connection) Box: Solaris: SunOS icsun7 5.10 Generic_118822-25 sun4u sparc SUNW,Sun-Fire-280R I am unable to find any hits on the generated errors which makes me think either something is really wrong (resource issues or something) or this is not a common use of sqldf?? Is there a better package I should be using? Thank you very much for your help, Alex Bryant Software Development Integrated Clinical Systems, Inc. 908-996-7208 Confidentiality Note: This e-mail, and any attachment to...{{dropped:13}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf error only on Unix not Windows
On Mon, Nov 1, 2010 at 9:28 AM, Alex Bryant abry...@i-review.com wrote: Hello Group, I am having trouble with the sqldf package on unix. The same code works fine on windows. Silly Example script: # Load the package library(sqldf) # Use the titanic data set data(women) colnames(women) head(women) sqldf('select height, count(*) from women where height is not null group by weight') Some things to try: - try adding dbname = tempfile() argument to your sqldf statement and see if that makes any difference - try it with the H2 database rather than sqlite (or with PostgreSQL) To use it with H2 make sure you have Java and the CRAN package, RH2, installed. RH2 includes the H2 database itself so you don't need to install that. Then issue this line in R any time before your first sqldf call library(RH2) sqldf will notice it and automatically use the H2 database instead of sqlite - try it with R 2.11 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf hanging on macintosh - works on windows
Have a long script that runs fine on windows (32 bit). When I try to run in on two different macs (64 bit), however, it hangs with identical behavior. I start with: library(sqldf) This results in messages: Loading required package: DBI Loading required package: RSQLite Loading required package: RSQLite.extfuns Loading required package: gsubfn Loading required package: proto Loading required package: chron I then read some data, etc. I execute the following: #merge raw data and all possible combinations df.final - sqldf('select Date, Hour, x as RoomsInUse from df.possible.combos left join df.aggregate using (Hour, Date)') I receive the messages: Loading required package: tcltk Loading Tcl/Tk interface ... + Then I get into some kind of loop. Message at bottom ribbon says: executing: try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE) On the pc implementation it runs flawlessly, and quickly. Truly appreciate any ideas. -- View this message in context: http://r.789695.n4.nabble.com/sqldf-hanging-on-macintosh-works-on-windows-tp3022193p3022193.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf hanging on macintosh - works on windows
On Mon, Nov 1, 2010 at 9:59 AM, GL pfl...@shands.ufl.edu wrote: Have a long script that runs fine on windows (32 bit). When I try to run in on two different macs (64 bit), however, it hangs with identical behavior. I start with: library(sqldf) This results in messages: Loading required package: DBI Loading required package: RSQLite Loading required package: RSQLite.extfuns Loading required package: gsubfn Loading required package: proto Loading required package: chron I then read some data, etc. I execute the following: #merge raw data and all possible combinations df.final - sqldf('select Date, Hour, x as RoomsInUse from df.possible.combos left join df.aggregate using (Hour, Date)') I receive the messages: Loading required package: tcltk Loading Tcl/Tk interface ... + Then I get into some kind of loop. Message at bottom ribbon says: executing: try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE) That is not a line that appears in the sqldf source code. Try these suggestions anyways: http://permalink.gmane.org/gmane.comp.lang.r.general/209443 -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf hanging on macintosh - works on windows
added library(RH2) Still get message: Loading required package: tcltk Loading Tcl/Tk interface + directly after sqldf statement df.final - sqldf('select Date, Hour, x as RoomsInUse from df.possible.combos + left join df.aggregate using (Hour, Date)') There is no progress spinner. If I hit enter I get a At that point I start to enter any command (just summary, for instance), I get the progress spinner, the try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE) message in the bottom ribbon, and the system apparently hangs. -- View this message in context: http://r.789695.n4.nabble.com/sqldf-hanging-on-macintosh-works-on-windows-tp3022193p3022233.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf hanging on macintosh - works on windows
On Mon, Nov 1, 2010 at 10:32 AM, GL pfl...@shands.ufl.edu wrote: added library(RH2) Still get message: Loading required package: tcltk Loading Tcl/Tk interface + directly after sqldf statement df.final - sqldf('select Date, Hour, x as RoomsInUse from df.possible.combos + left join df.aggregate using (Hour, Date)') There is no progress spinner. If I hit enter I get a At that point I start to enter any command (just summary, for instance), I get the progress spinner, the try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE) message in the bottom ribbon, and the system apparently hangs. I don't have a Mac but if you wish to pursue it try this: library(sqldf) debug(sqldf) sqldf(...whatever...) # now step through it by repeatedly pressing Enter and send me the console output of the session -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf hanging on macintosh - works on windows
library(sqldf) Loading required package: DBI Loading required package: RSQLite Loading required package: RSQLite.extfuns Loading required package: gsubfn Loading required package: proto Loading required package: chron debug(sqldf) df.final - sqldf('select Date, Hour, x as RoomsInUse from df.possible.combos + left join df.aggregate using (Hour, Date)') debugging in: sqldf(select Date, Hour, x as RoomsInUse from \df.possible.combos\\nleft join \df.aggregate\ using (Hour, Date)) debug: { as.POSIXct.character - function(x) structure(as.numeric(x), class = c(POSIXt, POSIXct)) as.Date.character - function(x) structure(as.numeric(x), class = Date) as.Date.numeric - function(x, origin = 1970-01-01, ...) base::as.Date.numeric(x, origin = origin, ...) as.dates.character - function(x) structure(as.numeric(x), class = c(dates, times)) as.times.character - function(x) structure(as.numeric(x), class = times) overwrite - FALSE request.open - missing(x) is.null(connection) request.close - missing(x) !is.null(connection) request.con - !missing(x) !is.null(connection) request.nocon - !missing(x) is.null(connection) dfnames - fileobjs - character(0) if (request.close || request.nocon) { on.exit({ dbPreExists - attr(connection, dbPreExists) dbname - attr(connection, dbname) if (!missing(dbname) !is.null(dbname) dbname == :memory:) { dbDisconnect(connection) } else if (!dbPreExists drv == sqlite) { dbDisconnect(connection) file.remove(dbname) } else { for (nam in dfnames) dbRemoveTable(connection, nam) for (fo in fileobjs) dbRemoveTable(connection, fo) dbDisconnect(connection) } }) if (request.close) { if (identical(connection, getOption(sqldf.connection))) options(sqldf.connection = NULL) return() } } if (request.open || request.nocon) { if (is.null(drv)) { drv - if (package:RpgSQL %in% search()) { pgSQL } else if (package:RMySQL %in% search()) { MySQL } else if (package:RH2 %in% search()) { H2 } else SQLite } drv - tolower(drv) if (drv == mysql) { m - dbDriver(MySQL) connection - if (missing(dbname) || dbname == :memory:) { dbConnect(m) } else dbConnect(m, dbname = dbname) dbPreExists - TRUE } else if (drv == pgsql) { m - dbDriver(pgSQL) if (missing(dbname) || is.null(dbname)) { dbname - getOption(RpgSQL.dbname) if (is.null(dbname)) dbname - test } connection - dbConnect(m, dbname = dbname) dbPreExists - TRUE } else if (drv == h2) { m - H2() if (missing(dbname) || is.null(dbname)) dbname - :memory: dbPreExists - dbname != :memory: file.exists(dbname) connection - if (missing(dbname) || dbname == :memory:) { dbConnect(m, jdbc:h2:mem:, sa, ) } else { jdbc.string - paste(jdbc:h2, dbname, sep = :) dbConnect(m, jdbc.string) } } else { m - dbDriver(SQLite) if (missing(dbname)) dbname - :memory: dbPreExists - dbname != :memory: file.exists(dbname) if (is.null(getOption(sqldf.dll))) { dll - Sys.which(libspatialite-1.dll) if (dll != ) options(sqldf.dll = dll) else options(sqldf.dll = FALSE) } dll - getOption(sqldf.dll) if (length(dll) != 1 || identical(dll, FALSE) || nchar(dll) == 0) { dll - FALSE } else { if (dll == basename(dll)) dll - Sys.which(dll) } options(sqldf.dll = dll) if (!identical(dll, FALSE)) { connection - dbConnect(m, dbname = dbname, loadable.extensions = TRUE) s - sprintf(select load_extension('%s'), dll) dbGetQuery(connection, s) } else connection - dbConnect(m, dbname = dbname) init_extensions(connection) } attr(connection, dbPreExists) - dbPreExists if (missing(dbname) drv == sqlite) dbname - :memory: attr(connection, dbname) - dbname if (request.open) { options(sqldf.connection = connection)
Re: [R] sqldf hanging on macintosh - works on windows
On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote: library(sqldf) Loading required package: DBI Loading required package: RSQLite Loading required package: RSQLite.extfuns Loading required package: gsubfn Loading required package: proto Loading required package: chron debug(sqldf) df.final - sqldf('select Date, Hour, x as RoomsInUse from df.possible.combos + left join df.aggregate using (Hour, Date)') debugging in: sqldf(select Date, Hour, x as RoomsInUse from ... debug: words. - words - strapply(x, [[:alnum:]._]+) Browse[2] Loading required package: tcltk Loading Tcl/Tk interface ... + There is something wrong with tcltk on your system. You can tell it not to use tcltk by setting the appropriate option as discussed in sqldf FAQ #5: http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf hanging on macintosh - works on windows
On Nov 1, 2010, at 10:55 AM, Gabor Grothendieck wrote: On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote: library(sqldf) Loading required package: DBI Loading required package: RSQLite Loading required package: RSQLite.extfuns Loading required package: gsubfn Loading required package: proto Loading required package: chron debug(sqldf) df.final - sqldf('select Date, Hour, x as RoomsInUse from df.possible.combos + left join df.aggregate using (Hour, Date)') debugging in: sqldf(select Date, Hour, x as RoomsInUse from ... debug: words. - words - strapply(x, [[:alnum:]._]+) Browse[2] Loading required package: tcltk Loading Tcl/Tk interface ... + There is something wrong with tcltk on your system. You can tell it not to use tcltk by setting the appropriate option as discussed in sqldf FAQ #5: http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing. GL, If you installed R using the OSX binary from CRAN, it does not include tcl/tk. You need to install the separate tcltk package that Simon has put together and is available from: http://cran.us.r-project.org/bin/macosx/tools/ You also need to have X11 installed, which is available from the OSX DVD in the Optional Installs section. HTH, Marc Schwartz __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf hanging on macintosh - works on windows
On Mon, Nov 1, 2010 at 12:10 PM, Marc Schwartz marc_schwa...@me.com wrote: On Nov 1, 2010, at 10:55 AM, Gabor Grothendieck wrote: On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote: library(sqldf) Loading required package: DBI Loading required package: RSQLite Loading required package: RSQLite.extfuns Loading required package: gsubfn Loading required package: proto Loading required package: chron debug(sqldf) df.final - sqldf('select Date, Hour, x as RoomsInUse from df.possible.combos + left join df.aggregate using (Hour, Date)') debugging in: sqldf(select Date, Hour, x as RoomsInUse from ... debug: words. - words - strapply(x, [[:alnum:]._]+) Browse[2] Loading required package: tcltk Loading Tcl/Tk interface ... + There is something wrong with tcltk on your system. You can tell it not to use tcltk by setting the appropriate option as discussed in sqldf FAQ #5: http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing. GL, If you installed R using the OSX binary from CRAN, it does not include tcl/tk. You need to install the separate tcltk package that Simon has put together and is available from: http://cran.us.r-project.org/bin/macosx/tools/ You also need to have X11 installed, which is available from the OSX DVD in the Optional Installs section. HTH, Marc Schwartz Note that sqldf can work without tcltk, as well. The gsubfn package does check for tcltk and and sets the engine to R rather than tcltk if capabilities()[[tcltk]] is FALSE. There may be a bug in R or a problem with the installation. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf syntax
I had checked those references before posting, actually. SQLite has a very limited implementation of the standard. To do a single table update I would not go to sql. It's easy enough to do in R. The problem is when I need to do an update from a left outer join, which I had to do with sqlSave (to a mySQL table), then sqlQuery, then sqlFetch. sqlSave is amazingly slow, takes half an hour. (Would never do that at home :-) just too lazy to write a formal table def and use load data infile from a csv dump. Also not happy with Dates becoming years in the transition. Will check the other suggestion about data.table and report. Cheers everybody. Stephen B -Original Message- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Thursday, August 26, 2010 4:26 PM To: Bond, Stephen Cc: r-help@r-project.org Subject: Re: [R] sqldf syntax On Thu, Aug 26, 2010 at 2:31 PM, Bond, Stephen stephen.b...@cibc.com wrote: Please correct the following sqldf(update esc left join forwagg on esc.ym=forwagg.Date set esc.ri2=forwagg.N1 where esc.age=12,select * from main.esc) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near left: syntax error) 1. sqldf takes one sql argument whereas the above has two sql arguments; however, the one argument may be a vector of sql commands. See ?sqldf and the examples on the sqldf home page http://sqldf.googlecode.com 2. there is an error in the syntax of your update statement. For correct syntax see the sqlite site: http://sqlite.org/lang_update.html -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf syntax
Please correct the following sqldf(update esc left join forwagg on esc.ym=forwagg.Date set esc.ri2=forwagg.N1 where esc.age=12,select * from main.esc) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near left: syntax error) Thanks. Stephen [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf syntax
On Thu, Aug 26, 2010 at 2:31 PM, Bond, Stephen stephen.b...@cibc.com wrote: Please correct the following sqldf(update esc left join forwagg on esc.ym=forwagg.Date set esc.ri2=forwagg.N1 where esc.age=12,select * from main.esc) Error in sqliteExecStatement(con, statement, bind.data) : RS-DBI driver: (error in statement: near left: syntax error) 1. sqldf takes one sql argument whereas the above has two sql arguments; however, the one argument may be a vector of sql commands. See ?sqldf and the examples on the sqldf home page http://sqldf.googlecode.com 2. there is an error in the syntax of your update statement. For correct syntax see the sqlite site: http://sqlite.org/lang_update.html -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQLDF from Variable Matrix
On Wed, Aug 4, 2010 at 12:29 AM, Suphajak Ngamlak supha...@phatrasecurities.com wrote: Dear all, I would like to do sample statistics, e.g. mean, median from very large dataset. This is part of commands I use routinely with several dataset so I would like to make it into function. The simplified examples are Test-data.frame(A=c('a','b','c','a','b','c'),B=c(1,2,3,4,5,6)) #Create function (This one work) GetAvg-function(Input,Bygroup){ AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median from Test group by $Bygroup) return(AVG) } Result-GetAvg(Test,'A') #Create function (This one does not work) GetAvg-function(Input,Bygroup){ AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median from $Input group by $Bygroup) return(AVG) } Result-GetAvg(Test,'A') That should be GetAvg('Test', 'A') with quotes around Test or if you want to be able to specify the data unquoted then insert this line at the top of your function: Input - deparse(substitute(Input)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] SQLDF from Variable Matrix
Dear all, I would like to do sample statistics, e.g. mean, median from very large dataset. This is part of commands I use routinely with several dataset so I would like to make it into function. The simplified examples are Test-data.frame(A=c('a','b','c','a','b','c'),B=c(1,2,3,4,5,6)) #Create function (This one work) GetAvg-function(Input,Bygroup){ AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median from Test group by $Bygroup) return(AVG) } Result-GetAvg(Test,'A') #Create function (This one does not work) GetAvg-function(Input,Bygroup){ AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median from $Input group by $Bygroup) return(AVG) } Result-GetAvg(Test,'A') The only difference is with $Input. How can I refer to this variable in sqldf. Thank you Best Regards, Suphajak Ngamlak Equity and Derivatives Trading Phatra Securities Public Company Limited Tel: +662-305-9179 Email: supha...@phatrasecurities.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf 0.3-5 package or tcltk problem
On Wed, Jul 28, 2010 at 1:21 AM, erickso...@aol.com wrote: This is my first post. I am running Mac OS X version 10.6.3. I am running R 2.11.0 GUI 1.33 64 bit. This may or may not be related to sqldf, but I experienced this problem while attempting to use an sqldf query. The same code runs with no problem on my Windows machine. Here is what happens: r=sqldf(select ... ) Loading required package: tcltk Loading Tcl/Tk interface ... Then it never loads. I have X11 open. I have all the latest versions of all the necessary packages for sqldf 0.3-5: DBI 0.2-5 RSQLite 0.9-1 RSQLite.extfuns 0.0.1 gsubfn 0.5-3 proto 0.3-8 chron 2.3-35 Although it gives warning messages for these: package 'sqldf' was built under R version 2.11.1 package 'RSQLite' was built under R version 2.11.1 package 'RSQLite.extfuns' was built under R version 2.11.1 package 'gsubfun' was built under R version 2.11.1 What can I do to load the Tcl/Tk interface? Some things to try: - upgrade to R 2.11.1 - try this alone: library(tcltk) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf 0.3-5 package or tcltk problem
This is my first post. I am running Mac OS X version 10.6.3. I am running R 2.11.0 GUI 1.33 64 bit. This may or may not be related to sqldf, but I experienced this problem while attempting to use an sqldf query. The same code runs with no problem on my Windows machine. Here is what happens: r=sqldf(select ... ) Loading required package: tcltk Loading Tcl/Tk interface ... Then it never loads. I have X11 open. I have all the latest versions of all the necessary packages for sqldf 0.3-5: DBI 0.2-5 RSQLite 0.9-1 RSQLite.extfuns 0.0.1 gsubfn 0.5-3 proto 0.3-8 chron 2.3-35 Although it gives warning messages for these: package 'sqldf' was built under R version 2.11.1 package 'RSQLite' was built under R version 2.11.1 package 'RSQLite.extfuns' was built under R version 2.11.1 package 'gsubfun' was built under R version 2.11.1 What can I do to load the Tcl/Tk interface? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf modify table
Hi - I am something of a newbie and am a little perplexed. When (trying to) modify a table I issue the following commands with subsequent errors sqldf(alter table Korea drop column code, dbname = mydb) error in statement: near drop: syntax error or sqldf(alter table Korea rename column hyr to hyrI, dbname = mydb) error in statement: near column: syntax error These are simple commands - am I missing something obvious? I can retrieve data from them, and retrieve their table_info Thanks Peter -- View this message in context: http://r.789695.n4.nabble.com/sqldf-modify-table-tp2291804p2291804.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf modify table
On Fri, Jul 16, 2010 at 2:46 PM, PeterTucker pthet...@gmail.com wrote: Hi - I am something of a newbie and am a little perplexed. When (trying to) modify a table I issue the following commands with subsequent errors sqldf(alter table Korea drop column code, dbname = mydb) error in statement: near drop: syntax error or sqldf(alter table Korea rename column hyr to hyrI, dbname = mydb) error in statement: near column: syntax error These are simple commands - am I missing something obvious? I can retrieve data from them, and retrieve their table_info SQLite does not support dropping columns. See: http://www.sqlite.org/lang_altertable.html however, sqldf does support the H2 and PostgreSQL databases in addition to sqlite so you can try one of those if this feature is important to you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf: issues with natural joins
Hello, I'm having trouble discovering what's going wrong with my use of natural joins via sqldf. Following the instructions under 4i at http://code.google.com/p/sqldf/, which discusses creating indices to speed joins, I have been only unreliably able to get natural joins to work. For example, Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05 10:58:00', 'AES 01-01-11 12:30:00') A - data.frame(Tid, dfName = 'a') B - data.frame(Tid = Tid[2:4], dfName = 'b') C - data.frame(Tid = Tid[1:3], dfName = 'c') # then use the sqldf library library(sqldf) sqldf() # to create indices on the Tid variable shared across data.frames sqldf('create index indA on A(Tid)') sqldf('create index indB on B(Tid)') sqldf('create index indC on C(Tid)') # check to make sure everything is there sqldf('select * from sqlite_master') # doing a natural join (implicitly on Tid) # does not give the expected joins sqldf('select * from main.A natural join main.B') [1] TiddfName 0 rows (or 0-length row.names) sqldf('select * from main.A natural join main.C') [1] TiddfName 0 rows (or 0-length row.names) sqldf('select * from main.B natural join main.C') [1] TiddfName 0 rows (or 0-length row.names) # even using a where clause (which doesn't have the efficiency qualities I need the indexed natural joins for) is problematic, setting values of the dfName variable incorrectly for the data from C sqldf('select * from main.B b, main.C c where b.Tid = c.Tid') Tid dfName Tid dfName 1 AES 01-01-02 11:53:00 b AES 01-01-02 11:53:00 b 2 AES 01-01-05 10:58:00 b AES 01-01-05 10:58:00 b I'm grateful for your guidance on what I'm doing wrong with the natural join in sqldf. many thanks, Nick [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf: issues with natural joins
There are two problems: 1. A natural join will join all columns with the same names in the two tables and that includes not only Tid but also dfName and since there are no rows that have the same Tid and dfName the result has zero rows. 2. the heuristic it uses fails when you retrieve the same column name from multiple tables so use method = raw to turn off the heuristic. The heuristic will be improved to cover this case in the future. Read FAQ #1 on the home page: http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors? This should work: sqldf('select * from main.A join main.B using(Tid)', method = raw) Tid dfName dfName 1 AES 01-01-02 11:53:00 a b 2 AES 01-01-05\n10:58:00 a b 3 AES 01-01-11 12:30:00 a b This works too as the double dfName no longer exists to confuse the heuristic: names(B)[2] - dfNameB sqldf('select * from main.A join main.B using(Tid)') On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote: Hello, I'm having trouble discovering what's going wrong with my use of natural joins via sqldf. Following the instructions under 4i at http://code.google.com/p/sqldf/, which discusses creating indices to speed joins, I have been only unreliably able to get natural joins to work. For example, Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05 10:58:00', 'AES 01-01-11 12:30:00') A - data.frame(Tid, dfName = 'a') B - data.frame(Tid = Tid[2:4], dfName = 'b') C - data.frame(Tid = Tid[1:3], dfName = 'c') # then use the sqldf library library(sqldf) sqldf() # to create indices on the Tid variable shared across data.frames sqldf('create index indA on A(Tid)') sqldf('create index indB on B(Tid)') sqldf('create index indC on C(Tid)') # check to make sure everything is there sqldf('select * from sqlite_master') # doing a natural join (implicitly on Tid) # does not give the expected joins sqldf('select * from main.A natural join main.B') [1] Tid dfName 0 rows (or 0-length row.names) sqldf('select * from main.A natural join main.C') [1] Tid dfName 0 rows (or 0-length row.names) sqldf('select * from main.B natural join main.C') [1] Tid dfName 0 rows (or 0-length row.names) # even using a where clause (which doesn't have the efficiency qualities I need the indexed natural joins for) is problematic, setting values of the dfName variable incorrectly for the data from C sqldf('select * from main.B b, main.C c where b.Tid = c.Tid') Tid dfName Tid dfName 1 AES 01-01-02 11:53:00 b AES 01-01-02 11:53:00 b 2 AES 01-01-05 10:58:00 b AES 01-01-05 10:58:00 b I'm grateful for your guidance on what I'm doing wrong with the natural join in sqldf. many thanks, Nick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf: issues with natural joins
Although that works I had meant to write: names(B)[2] - dfNameB # ... other commands sqldf('select * from main.A natural join main.B') so that now only Tid is in common so the natural join just picks it up and also the heuristic works again since we no longer retrieve duplicate column names. On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: There are two problems: 1. A natural join will join all columns with the same names in the two tables and that includes not only Tid but also dfName and since there are no rows that have the same Tid and dfName the result has zero rows. 2. the heuristic it uses fails when you retrieve the same column name from multiple tables so use method = raw to turn off the heuristic. The heuristic will be improved to cover this case in the future. Read FAQ #1 on the home page: http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors? This should work: sqldf('select * from main.A join main.B using(Tid)', method = raw) Tid dfName dfName 1 AES 01-01-02 11:53:00 a b 2 AES 01-01-05\n10:58:00 a b 3 AES 01-01-11 12:30:00 a b This works too as the double dfName no longer exists to confuse the heuristic: names(B)[2] - dfNameB sqldf('select * from main.A join main.B using(Tid)') On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote: Hello, I'm having trouble discovering what's going wrong with my use of natural joins via sqldf. Following the instructions under 4i at http://code.google.com/p/sqldf/, which discusses creating indices to speed joins, I have been only unreliably able to get natural joins to work. For example, Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05 10:58:00', 'AES 01-01-11 12:30:00') A - data.frame(Tid, dfName = 'a') B - data.frame(Tid = Tid[2:4], dfName = 'b') C - data.frame(Tid = Tid[1:3], dfName = 'c') # then use the sqldf library library(sqldf) sqldf() # to create indices on the Tid variable shared across data.frames sqldf('create index indA on A(Tid)') sqldf('create index indB on B(Tid)') sqldf('create index indC on C(Tid)') # check to make sure everything is there sqldf('select * from sqlite_master') # doing a natural join (implicitly on Tid) # does not give the expected joins sqldf('select * from main.A natural join main.B') [1] Tid dfName 0 rows (or 0-length row.names) sqldf('select * from main.A natural join main.C') [1] Tid dfName 0 rows (or 0-length row.names) sqldf('select * from main.B natural join main.C') [1] Tid dfName 0 rows (or 0-length row.names) # even using a where clause (which doesn't have the efficiency qualities I need the indexed natural joins for) is problematic, setting values of the dfName variable incorrectly for the data from C sqldf('select * from main.B b, main.C c where b.Tid = c.Tid') Tid dfName Tid dfName 1 AES 01-01-02 11:53:00 b AES 01-01-02 11:53:00 b 2 AES 01-01-05 10:58:00 b AES 01-01-05 10:58:00 b I'm grateful for your guidance on what I'm doing wrong with the natural join in sqldf. many thanks, Nick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf: issues with natural joins
Thank you very much for these clarifying responses, Gabor. I had mistakenly assumed that creating the index on Tid restricted the natural join to joining on Tid. Can you describe when and how indices speed up joins, or can you point me to resources that address this? Is it only for natural joins or any joins (including, say, a select statement with where clause)? thanks, nick On Thu, May 20, 2010 at 11:42 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Although that works I had meant to write: names(B)[2] - dfNameB # ... other commands sqldf('select * from main.A natural join main.B') so that now only Tid is in common so the natural join just picks it up and also the heuristic works again since we no longer retrieve duplicate column names. On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: There are two problems: 1. A natural join will join all columns with the same names in the two tables and that includes not only Tid but also dfName and since there are no rows that have the same Tid and dfName the result has zero rows. 2. the heuristic it uses fails when you retrieve the same column name from multiple tables so use method = raw to turn off the heuristic. The heuristic will be improved to cover this case in the future. Read FAQ #1 on the home page: http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors ? This should work: sqldf('select * from main.A join main.B using(Tid)', method = raw) Tid dfName dfName 1 AES 01-01-02 11:53:00 a b 2 AES 01-01-05\n10:58:00 a b 3 AES 01-01-11 12:30:00 a b This works too as the double dfName no longer exists to confuse the heuristic: names(B)[2] - dfNameB sqldf('select * from main.A join main.B using(Tid)') On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote: Hello, I'm having trouble discovering what's going wrong with my use of natural joins via sqldf. Following the instructions under 4i at http://code.google.com/p/sqldf/, which discusses creating indices to speed joins, I have been only unreliably able to get natural joins to work. For example, Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05 10:58:00', 'AES 01-01-11 12:30:00') A - data.frame(Tid, dfName = 'a') B - data.frame(Tid = Tid[2:4], dfName = 'b') C - data.frame(Tid = Tid[1:3], dfName = 'c') # then use the sqldf library library(sqldf) sqldf() # to create indices on the Tid variable shared across data.frames sqldf('create index indA on A(Tid)') sqldf('create index indB on B(Tid)') sqldf('create index indC on C(Tid)') # check to make sure everything is there sqldf('select * from sqlite_master') # doing a natural join (implicitly on Tid) # does not give the expected joins sqldf('select * from main.A natural join main.B') [1] TiddfName 0 rows (or 0-length row.names) sqldf('select * from main.A natural join main.C') [1] TiddfName 0 rows (or 0-length row.names) sqldf('select * from main.B natural join main.C') [1] TiddfName 0 rows (or 0-length row.names) # even using a where clause (which doesn't have the efficiency qualities I need the indexed natural joins for) is problematic, setting values of the dfName variable incorrectly for the data from C sqldf('select * from main.B b, main.C c where b.Tid = c.Tid') Tid dfName Tid dfName 1 AES 01-01-02 11:53:00 b AES 01-01-02 11:53:00 b 2 AES 01-01-05 10:58:00 b AES 01-01-05 10:58:00 b I'm grateful for your guidance on what I'm doing wrong with the natural join in sqldf. many thanks, Nick [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf: issues with natural joins
They work on any join that is able to make use of them. If you preface the select statement with explain query plan then it will give you some info, e.g. sqldf('explain query plan select * from main.A natural join main.B') order from detail 1 00 TABLE A 2 11 TABLE B WITH INDEX indB Note that its using the index on B but not the index on A so there was actually no point in adding that one if this were the only query. This is potentially a large topic so see the sqlite.org site and mailing list for more info. On Thu, May 20, 2010 at 1:28 PM, Nick Switanek nswita...@gmail.com wrote: Thank you very much for these clarifying responses, Gabor. I had mistakenly assumed that creating the index on Tid restricted the natural join to joining on Tid. Can you describe when and how indices speed up joins, or can you point me to resources that address this? Is it only for natural joins or any joins (including, say, a select statement with where clause)? thanks, nick On Thu, May 20, 2010 at 11:42 AM, Gabor Grothendieck ggrothendi...@gmail.com wrote: Although that works I had meant to write: names(B)[2] - dfNameB # ... other commands sqldf('select * from main.A natural join main.B') so that now only Tid is in common so the natural join just picks it up and also the heuristic works again since we no longer retrieve duplicate column names. On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck ggrothendi...@gmail.com wrote: There are two problems: 1. A natural join will join all columns with the same names in the two tables and that includes not only Tid but also dfName and since there are no rows that have the same Tid and dfName the result has zero rows. 2. the heuristic it uses fails when you retrieve the same column name from multiple tables so use method = raw to turn off the heuristic. The heuristic will be improved to cover this case in the future. Read FAQ #1 on the home page: http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors? This should work: sqldf('select * from main.A join main.B using(Tid)', method = raw) Tid dfName dfName 1 AES 01-01-02 11:53:00 a b 2 AES 01-01-05\n10:58:00 a b 3 AES 01-01-11 12:30:00 a b This works too as the double dfName no longer exists to confuse the heuristic: names(B)[2] - dfNameB sqldf('select * from main.A join main.B using(Tid)') On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote: Hello, I'm having trouble discovering what's going wrong with my use of natural joins via sqldf. Following the instructions under 4i at http://code.google.com/p/sqldf/, which discusses creating indices to speed joins, I have been only unreliably able to get natural joins to work. For example, Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05 10:58:00', 'AES 01-01-11 12:30:00') A - data.frame(Tid, dfName = 'a') B - data.frame(Tid = Tid[2:4], dfName = 'b') C - data.frame(Tid = Tid[1:3], dfName = 'c') # then use the sqldf library library(sqldf) sqldf() # to create indices on the Tid variable shared across data.frames sqldf('create index indA on A(Tid)') sqldf('create index indB on B(Tid)') sqldf('create index indC on C(Tid)') # check to make sure everything is there sqldf('select * from sqlite_master') # doing a natural join (implicitly on Tid) # does not give the expected joins sqldf('select * from main.A natural join main.B') [1] Tid dfName 0 rows (or 0-length row.names) sqldf('select * from main.A natural join main.C') [1] Tid dfName 0 rows (or 0-length row.names) sqldf('select * from main.B natural join main.C') [1] Tid dfName 0 rows (or 0-length row.names) # even using a where clause (which doesn't have the efficiency qualities I need the indexed natural joins for) is problematic, setting values of the dfName variable incorrectly for the data from C sqldf('select * from main.B b, main.C c where b.Tid = c.Tid') Tid dfName Tid dfName 1 AES 01-01-02 11:53:00 b AES 01-01-02 11:53:00 b 2 AES 01-01-05 10:58:00 b AES 01-01-05 10:58:00 b I'm grateful for your guidance on what I'm doing wrong with the natural join in sqldf. many thanks, Nick __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf not joining all the fields
Dear R users, I have two data frames that were read from text files as follows: x_data - read.table(x.txt, header = TRUE, sep = |, quote = \', dec = .,as.is = TRUE,na.strings = NA,colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = #, allowEscapes = FALSE, flush = FALSE, fileEncoding = , encoding = unknown) x_data prochi prescribed_date dataMonth item_code res_seqno quantity directions CAO713 22/06/2001 NULL842752 NULL 601/D CAO713 28/04/2000 NULL 7800 NULL 100G A/TD CAO713 10/04/2000 NULL842652 NULL 601/D CAO713 03/07/2000 NULL842652 NULL 601/D CAO713 09/01/2001 NULL842752 NULL 601/D CAO713 16/10/2001 NULL842752 NULL 601/D CAO713 16/08/2001 NULL842752 NULL 601/D CAO713 17/09/1993 NULL 39620 NULL 5MLNIL CAO713 01/05/2001 NULL842752 NULL 601/D CAO713 05/03/2001 NULL842752 NULL 601/D y_data item_codename formulation_code strength bnf_code 100 NEONACLEX KTABS NULL2.2.8 110NEONACLEX TABS 5MG2.2.1 50 MESORB DRESS 10CMX10CM 20.3.1 160 ABSORBENT CELLULOSE MESODRESS 10CMX10CM 20.3.1 161 ABSORBENT CELLULOSE MESODRESS 10CMX15CM 20.3.1 164 ABSORBENT CELLULOSE MESODRESS 20CMX25CM 20.3.1 200 SEPTRINTABS 480MG5.1.8 210 SEPTRIN PAED SFSUSP 240MG/5ML5.1.8 212SEPTRIN ADULT SUSP 480MG/5ML5.1.8 220SEPTRIN FORTE TABS 960MG5.1.8 etc contains all the information for the item codes y was read in in the same way. I then used the following code: z - sqldf(select * from x left join y using (code)) when I use this on my real data I get an output: prochi prescribed_date dataMonth item_code res_seqno quantity directions 1 CAO713 22/06/2001 NULL842752 NULL 60 1/D 2 CAO713 28/04/2000 NULL 7800 NULL 100G A/TD 3 CAO713 10/04/2000 NULL842652 NULL 60 1/D 4 CAO713 03/07/2000 NULL842652 NULL 60 1/D 5 CAO713 09/01/2001 NULL842752 NULL 60 1/D 6 CAO713 16/10/2001 NULL842752 NULL 60 1/D 7 CAO713 16/08/2001 NULL842752 NULL 60 1/D 8 CAO713 17/09/1993 NULL 39620 NULL 5ML NIL 9 CAO713 01/05/2001 NULL842752 NULL 60 1/D 10 CAO713 05/03/2001 NULL842752 NULL 60 1/D no_of_packs datasource scan_ref_no name formulation_code strength 1 NULLTSFNULL NA NA NA 2 NULLTSFNULL BETNOVATE RD OINT 0.025% 3 NULLTSFNULL NA NA NA 4 NULLTSFNULL NA NA NA 5 NULLTSFNULL NA NA NA 6 NULLTSFNULL NA NA NA 7 NULLTSFNULL NA NA NA 8 NULLTSFNULL GAMMABULIN INJ320MG 9 NULLTSFNULL NA NA NA 10NULLTSFNULL NA NA NA bnf_code 1 NA 2 13.4.1.2 3 NA 4 NA 5 NA 6 NA 7 NA 8 14.5 9 NA 10 NA There is absolutely no reason for there to be NA anywhere as the information for both the tables is complete. Not sure what the problem is? Thanks, Natalie -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
Can you show the output of dput(x_data) and dput(y_data). On Fri, Mar 12, 2010 at 11:56 AM, Newbie19_02 nvanzuy...@gmail.com wrote: Dear R users, I have two data frames that were read from text files as follows: x_data - read.table(x.txt, header = TRUE, sep = |, quote = \', dec = .,as.is = TRUE,na.strings = NA,colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = #, allowEscapes = FALSE, flush = FALSE, fileEncoding = , encoding = unknown) x_data prochi prescribed_date dataMonth item_code res_seqno quantity directions CAO713 22/06/2001 NULL 842752 NULL 60 1/D CAO713 28/04/2000 NULL 7800 NULL 100G A/TD CAO713 10/04/2000 NULL 842652 NULL 60 1/D CAO713 03/07/2000 NULL 842652 NULL 60 1/D CAO713 09/01/2001 NULL 842752 NULL 60 1/D CAO713 16/10/2001 NULL 842752 NULL 60 1/D CAO713 16/08/2001 NULL 842752 NULL 60 1/D CAO713 17/09/1993 NULL 39620 NULL 5ML NIL CAO713 01/05/2001 NULL 842752 NULL 60 1/D CAO713 05/03/2001 NULL 842752 NULL 60 1/D y_data item_code name formulation_code strength bnf_code 100 NEONACLEX K TABS NULL 2.2.8 110 NEONACLEX TABS 5MG 2.2.1 50 MESORB DRESS 10CMX10CM 20.3.1 160 ABSORBENT CELLULOSE MESO DRESS 10CMX10CM 20.3.1 161 ABSORBENT CELLULOSE MESO DRESS 10CMX15CM 20.3.1 164 ABSORBENT CELLULOSE MESO DRESS 20CMX25CM 20.3.1 200 SEPTRIN TABS 480MG 5.1.8 210 SEPTRIN PAED SF SUSP 240MG/5ML 5.1.8 212 SEPTRIN ADULT SUSP 480MG/5ML 5.1.8 220 SEPTRIN FORTE TABS 960MG 5.1.8 etc contains all the information for the item codes y was read in in the same way. I then used the following code: z - sqldf(select * from x left join y using (code)) when I use this on my real data I get an output: prochi prescribed_date dataMonth item_code res_seqno quantity directions 1 CAO713 22/06/2001 NULL 842752 NULL 60 1/D 2 CAO713 28/04/2000 NULL 7800 NULL 100G A/TD 3 CAO713 10/04/2000 NULL 842652 NULL 60 1/D 4 CAO713 03/07/2000 NULL 842652 NULL 60 1/D 5 CAO713 09/01/2001 NULL 842752 NULL 60 1/D 6 CAO713 16/10/2001 NULL 842752 NULL 60 1/D 7 CAO713 16/08/2001 NULL 842752 NULL 60 1/D 8 CAO713 17/09/1993 NULL 39620 NULL 5ML NIL 9 CAO713 01/05/2001 NULL 842752 NULL 60 1/D 10 CAO713 05/03/2001 NULL 842752 NULL 60 1/D no_of_packs datasource scan_ref_no name formulation_code strength 1 NULL TSF NULL NA NA NA 2 NULL TSF NULL BETNOVATE RD OINT 0.025% 3 NULL TSF NULL NA NA NA 4 NULL TSF NULL NA NA NA 5 NULL TSF NULL NA NA NA 6 NULL TSF NULL NA NA NA 7 NULL TSF NULL NA NA NA 8 NULL TSF NULL GAMMABULIN INJ 320MG 9 NULL TSF NULL NA NA NA 10 NULL TSF NULL NA NA NA bnf_code 1 NA 2 13.4.1.2 3 NA 4 NA 5 NA 6 NA 7 NA 8 14.5 9 NA 10 NA There is absolutely no reason for there to be NA anywhere as the information for both the tables is complete. Not sure what the problem is? Thanks, Natalie -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list
Re: [R] sqldf not joining all the fields
http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt feb09_267_presc_items_tsf.txt is the total file for y so if I use the command line with the total data for y then I get the output specified in z Thanks, Natalie -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
dput(x_data) structure(list(prochi = c(CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713), prescribed_date = c(22/06/2001, 28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001, 16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), item_code = c(842752, 7800, 842652, 842652, 842752, 842752, 842752, 39620, 842752, 842752), res_seqno = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60, 5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D, 1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL )), .Names = c(prochi, prescribed_date, dataMonth, item_code, res_seqno, quantity, directions, no_of_packs, datasource, scan_ref_no), row.names = c(NA, 10L), class = data.frame) -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
y_data - read.table(feb_267_presc_items_tsf.txt, header = TRUE, sep = |, quote = \', dec = .,as.is = TRUE,na.strings = NA,colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = #, allowEscapes = FALSE, flush = FALSE, fileEncoding = , encoding = unknown) Will read the file in same that I have and I have posted the dput. Sorry for not giving you what you originally wanted... Natalie -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590826.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
Please provide code that I can just copy from your post and paste into my session. Either provide dput output as requested or provide the files on the internet together with code that reads them off the internet. On Fri, Mar 12, 2010 at 12:06 PM, Newbie19_02 nvanzuy...@gmail.com wrote: http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt feb09_267_presc_items_tsf.txt is the total file for y so if I use the command line with the total data for y then I get the output specified in z Thanks, Natalie -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
What about y_data? On Fri, Mar 12, 2010 at 12:14 PM, Newbie19_02 nvanzuy...@gmail.com wrote: dput(x_data) structure(list(prochi = c(CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713), prescribed_date = c(22/06/2001, 28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001, 16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), item_code = c(842752, 7800, 842652, 842652, 842752, 842752, 842752, 39620, 842752, 842752), res_seqno = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60, 5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D, 1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL )), .Names = c(prochi, prescribed_date, dataMonth, item_code, res_seqno, quantity, directions, no_of_packs, datasource, scan_ref_no), row.names = c(NA, 10L), class = data.frame) -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
The y_data file has over 9000 rows in it so I thought it would be more practical to give you the file to download -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
You have now given two different assignments to x_data and none to y_data: The str( from the file access offering: str(x_data) 'data.frame': 2848 obs. of 5 variables: $ item_code : int 100 110 150 160 161 164 200 210 212 220 ... $ name: chr NEONACLEX K NEONACLEX MESORB ABSORBENT CELLULOSE MESO ... $ formulation_code: chr TABS TABS DRESS DRESS ... $ strength: chr NULL 5MG 10CMX10CM 10CMX10CM ... $ bnf_code: chr 2.2.8 2.2.1 20.3.1 20.3.1 ... The str from assignment from the dput offering str(x_data) 'data.frame': 10 obs. of 10 variables: $ prochi : chr CAO713 CAO713 CAO713 CAO713 ... $ prescribed_date: chr 22/06/2001 28/04/2000 10/04/2000 03/07/2000 ... $ dataMonth : chr NULL NULL NULL NULL ... $ item_code : chr 842752 7800 842652 842652 ... $ res_seqno : chr NULL NULL NULL NULL ... $ quantity : chr 60 100G 60 60 ... $ directions : chr 1/D A/TD 1/D 1/D ... $ no_of_packs: chr NULL NULL NULL NULL ... $ datasource : chr TSF TSF TSF TSF ... $ scan_ref_no: chr NULL NULL NULL NULL ... This code worked, but it is not clear that the x-y assignments were correct: x_data - read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt , header = TRUE, sep = |, quote = \', dec = .,as.is = TRUE,na.strings = NA,colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = #, allowEscapes = FALSE, flush = FALSE, fileEncoding = , encoding = unknown) -- David. On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote: The y_data file has over 9000 rows in it so I thought it would be more practical to give you the file to download -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
Sorry! It is the end of the day for me. So dput(x) structure(list(prochi = c(CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713), prescribed_date = c(22/06/2001, 28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001, 16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), item_code = c(842752, 7800, 842652, 842652, 842752, 842752, 842752, 39620, 842752, 842752), res_seqno = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60, 5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D, 1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL )), .Names = c(prochi, prescribed_date, dataMonth, item_code, res_seqno, quantity, directions, no_of_packs, datasource, scan_ref_no), row.names = c(NA, 10L), class = data.frame) y_data - read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt;, header = TRUE, sep = |, quote = \', dec = .,as.is = TRUE,na.strings = NA,colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = #, allowEscapes = FALSE, flush = FALSE, fileEncoding = , encoding = unknown) So the y_data essentially contains the lookup table for the item codes in x. Thanks and sorry for the mix up -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590849.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
Yes, its not clear what data was used. Please provide a single email that contains code that can be copied from R and pasted into another session that will reproduce the problem. x_data - ...dput output goes here... y_data - dput output goes here ... library(sqldf) ... rest of code ... or else: x_data - read.table(http://...;, ...) y_data - read.table(http://...;. ...) library(sqldf) ... rest of code ... On Fri, Mar 12, 2010 at 12:29 PM, David Winsemius dwinsem...@comcast.net wrote: You have now given two different assignments to x_data and none to y_data: The str( from the file access offering: str(x_data) 'data.frame': 2848 obs. of 5 variables: $ item_code : int 100 110 150 160 161 164 200 210 212 220 ... $ name : chr NEONACLEX K NEONACLEX MESORB ABSORBENT CELLULOSE MESO ... $ formulation_code: chr TABS TABS DRESS DRESS ... $ strength : chr NULL 5MG 10CMX10CM 10CMX10CM ... $ bnf_code : chr 2.2.8 2.2.1 20.3.1 20.3.1 ... The str from assignment from the dput offering str(x_data) 'data.frame': 10 obs. of 10 variables: $ prochi : chr CAO713 CAO713 CAO713 CAO713 ... $ prescribed_date: chr 22/06/2001 28/04/2000 10/04/2000 03/07/2000 ... $ dataMonth : chr NULL NULL NULL NULL ... $ item_code : chr 842752 7800 842652 842652 ... $ res_seqno : chr NULL NULL NULL NULL ... $ quantity : chr 60 100G 60 60 ... $ directions : chr 1/D A/TD 1/D 1/D ... $ no_of_packs : chr NULL NULL NULL NULL ... $ datasource : chr TSF TSF TSF TSF ... $ scan_ref_no : chr NULL NULL NULL NULL ... This code worked, but it is not clear that the x-y assignments were correct: x_data - read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt;, header = TRUE, sep = |, quote = \', dec = .,as.is = TRUE,na.strings = NA,colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = #, allowEscapes = FALSE, flush = FALSE, fileEncoding = , encoding = unknown) -- David. On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote: The y_data file has over 9000 rows in it so I thought it would be more practical to give you the file to download -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
Sorry! It is the end of the day for me. So dput(x) structure(list(prochi = c(CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713, CAO713), prescribed_date = c(22/06/2001, 28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001, 16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), item_code = c(842752, 7800, 842652, 842652, 842752, 842752, 842752, 39620, 842752, 842752), res_seqno = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60, 5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D, 1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL )), .Names = c(prochi, prescribed_date, dataMonth, item_code, res_seqno, quantity, directions, no_of_packs, datasource, scan_ref_no), row.names = c(NA, 10L), class = data.frame) y_data - read.table(file= http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt;, header = TRUE, sep = |, quote = \', dec = .,as.is = TRUE,na.strings = NA,colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = #, allowEscapes = FALSE, flush = FALSE, fileEncoding = , encoding = unknown) So the y_data essentially contains the lookup table for the item codes in x. require(sqldf) z - sqldf(select * from x left join y using (item_code)) z [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf not joining all the fields
If I assign the file input to y_data and change you sqldf to z - sqldf(select * from x_data left join y_data using (item_code)); z I can replicate your result. Even after changing the types of the two item_code fields to match I still get the same result and when I see to what degree they share values I get: sum(x_data$item_code %in% y_data$item_code) [1] 2 sum(y_data$item_code %in% x_data$item_code) [1] 2 So why are you so sure they are complete as you claimed in your first email. -- David. On Mar 12, 2010, at 12:29 PM, David Winsemius wrote: You have now given two different assignments to x_data and none to y_data: The str( from the file access offering: str(x_data) 'data.frame': 2848 obs. of 5 variables: $ item_code : int 100 110 150 160 161 164 200 210 212 220 ... $ name: chr NEONACLEX K NEONACLEX MESORB ABSORBENT CELLULOSE MESO ... $ formulation_code: chr TABS TABS DRESS DRESS ... $ strength: chr NULL 5MG 10CMX10CM 10CMX10CM ... $ bnf_code: chr 2.2.8 2.2.1 20.3.1 20.3.1 ... The str from assignment from the dput offering str(x_data) 'data.frame': 10 obs. of 10 variables: $ prochi : chr CAO713 CAO713 CAO713 CAO713 ... $ prescribed_date: chr 22/06/2001 28/04/2000 10/04/2000 03/07/2000 ... $ dataMonth : chr NULL NULL NULL NULL ... $ item_code : chr 842752 7800 842652 842652 ... $ res_seqno : chr NULL NULL NULL NULL ... $ quantity : chr 60 100G 60 60 ... $ directions : chr 1/D A/TD 1/D 1/D ... $ no_of_packs: chr NULL NULL NULL NULL ... $ datasource : chr TSF TSF TSF TSF ... $ scan_ref_no: chr NULL NULL NULL NULL ... This code worked, but it is not clear that the x-y assignments were correct: x_data - read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt , header = TRUE, sep = |, quote = \', dec = .,as.is = TRUE,na.strings = NA,colClasses = NA, nrows = 3864284, skip = 0, check.names = TRUE,fill=TRUE, strip.white = TRUE, blank.lines.skip = TRUE, comment.char = #, allowEscapes = FALSE, flush = FALSE, fileEncoding = , encoding = unknown) -- David. On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote: The y_data file has over 9000 rows in it so I thought it would be more practical to give you the file to download -- View this message in context: http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf
Hello, I would like some help with sqldf syntax. Suppose I have table 1 and table 2. What do I have to do to generate a table with columns 2,5,6 from table 1 (for example), and columns 3,4,5,9 from table 2, but only when values in column 2 from table 1 are equal to values in column 5 from table 2. tnks -- View this message in context: http://old.nabble.com/sqldf-tp26421449p26421449.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] sqldf
Google for sql join and see the examples in Example 4 on the sqldf home page: http://code.google.com/p/sqldf/#Example_4._Join On Thu, Nov 19, 2009 at 2:30 PM, JoK LoQ jok...@gmail.com wrote: Hello, I would like some help with sqldf syntax. Suppose I have table 1 and table 2. What do I have to do to generate a table with columns 2,5,6 from table 1 (for example), and columns 3,4,5,9 from table 2, but only when values in column 2 from table 1 are equal to values in column 5 from table 2. tnks -- View this message in context: http://old.nabble.com/sqldf-tp26421449p26421449.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] sqldf with date (class date) variables ?
Hello, I am having trouble with retrieving some data in queries involve with variables with data type date. See the enclosed example: ll-c(21DEC2006,15DEC2006) ss-data.frame(ll) ss-transform(ss,ll=as.date(as.character(ll))) (ss) ll 1 17156 2 17150 tt-sqldf(select ll from ss) (tt) ll 1 NA 2 NA str(ss) 'data.frame': 2 obs. of 1 variable: $ ll:Class 'date' int [1:2] 17156 17150 str(tt) 'data.frame': 2 obs. of 1 variable: $ ll:Class 'date' int [1:2] NA NA So all the elements of ll (of tt) has been converted into NA. Is there anyway that I can query the variables with date data type using sqldf? Many Thanks, Si. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.