Re: [R] sqldf returns Error: database or disk is full

2022-06-27 Thread Ebert,Timothy Aaron
The simple interpretation is that you have run out of space and it takes sqldf 
30 minutes to work through the error process and then return the error. A 
reason for sql is the more compact representation of a sparse matrix. Changing 
it into a data frame may result in a data structure with many empty cells that 
consumes a great deal more memory.

-Original Message-
From: R-help  On Behalf Of Ian Worthington via 
R-help
Sent: Monday, June 27, 2022 5:06 PM
Cc: R-help 
Subject: Re: [R] sqldf returns Error: database or disk is full

[External Email]

Well, iiuc the df is copied to the db early on in the process. This is constant 
with the temporarily file growing rapidly in size at the start of the process 
and then not challenging for 30 minutes until the error message is produced.
I have no idea what sqldf is doing under the hood.

Sent from Yahoo Mail on Android

  On Mon, 27 Jun 2022 at 23:00, Ebert,Timothy Aaron wrote:   I 
would try answering these questions:
1) What is the expected size?
2) Is the current size smaller or larger?
3) Every program running on your computer takes up memory. Given all the other 
programs currently running, do you have space?

Maybe someone can help by suggesting a way to get R to confess how much memory 
it has available given current system resources.


Tim
-Original Message-
From: R-help  On Behalf Of Bert Gunter
Sent: Monday, June 27, 2022 1:44 PM
To: Ian Worthington 
Cc: R-help 
Subject: Re: [R] sqldf returns Error: database or disk is full

[External Email]

You should probably heed the warning messages you received before doing 
anything further.

Bert

On Mon, Jun 27, 2022, 2:55 PM Ian Worthington via R-help < 
r-help@r-project.org> wrote:

> I'm using sqldf() to execute a query using a named dbname.  It appears 
> to create an external db but after 30 minutes or so returns:
>
> Error: database or disk is full
> The disk is not full, so have I hit some internal db size limit?  The 
> external db created is about 610MB and doesn't grow, so maybe it's 
> running out  of room for the result set, where ever that is stored?
>
> Code being executed is:
>  sqlCmds <- c(#"create index ixsip on ipTrace100(source)",
>    "create index ixdip on ipTrace100(destination)",
>    #"create index ixspo on ipTrace100(SrcPo)",
>    "create index ixdpo on ipTrace100(DstPo)",
>    #"create index ixno  on ipTrace100(`No.`)",
>    # "explain query plan
>    "select tr1.`No.` InNo,  tr1.Time InTime, tr1.source
> InSource, /*tr1.destination InDest,*/ tr1.SrcPo InSrcPo, tr1.DstPo InDstPo,
>    min(tr2.`No.`) OutNo, tr2.Time OutTime /*,  
>tr2.source OutSource,*/ /*tr2.destination OutDest,*/ /*tr2.SrcPo  
>OutSrcPo*/ /*, tr2.DstPo OutDstPo */
>    from main.ipTrace100 tr1
>    join main.ipTrace100 tr2
>    on tr1.destination = '10.27.187.1' and
>  tr1.source = tr2.destination and
>  tr1.SrcPo = tr2.DstPo and
>  tr1.`No.` < tr2.`No.`
>    group by tr1.`No.`"
>    )
>  (tempfile <- tempfile())
>  system.time( sqldf(sqlCmds, dbname=tempfile, verbose=TRUE) ) and  
>console log:
>
> |
> |  |
>
> >  system.time( sqldf(sqlCmds, dbname=tempfile, verbose=TRUE) )
> sqldf: library(RSQLite)
> sqldf: m <- dbDriver("SQLite")
> sqldf: connection <- dbConnect(m, dbname =
> "C:\Users\PIWN01~1\AppData\Local\Temp\RtmpSycE1E\file78e464e72860")
> sqldf: initExtension(connection)
> sqldf: dbWriteTable(connection, 'ipTrace100', ipTrace100, row.names =
> FALSE)
> sqldf: dbGetQuery(connection, 'create index ixdip on
> ipTrace100(destination)')
> Warning in result_fetch(res@ptr, n = n) :
>  SQL statements must be issued with dbExecute() or dbSendStatement()  
>instead of dbGetQuery() or dbSendQuery().
> sqldf: dbGetQuery(connection, 'create index ixdpo on
> ipTrace100(DstPo)') Warning in result_fetch(res@ptr, n = n) :
>  SQL statements must be issued with dbExecute() or dbSendStatement()  
>instead of dbGetQuery() or dbSendQuery().
> sqldf: dbGetQuery(connection, 'select tr1.`No.` InNo,  tr1.Time
> InTime, tr1.source InSource, /*tr1.destination InDest,*/ tr1.SrcPo  
>InSrcPo, tr1.DstPo InDstPo,
>    min(tr2.`No.`) OutNo, tr2.Time OutTime /*,  
>tr2.source OutSource,*/ /*tr2.destination OutDest,*/ /*tr2.SrcPo  
>OutSrcPo*/ /*, tr2.DstPo OutDstPo */
>    from main.ipTrace100 tr1
>    join main.ipTrace100 tr2
>    on tr1.destination = '10.27.187.1' and
>  tr1.source = tr2.destination and
>  tr1.SrcPo = tr2.Ds

Re: [R] sqldf and number of records affected

2020-06-11 Thread Ravi Jeyaraman
You're correct.  It does work.  I was looking at some other result printed.  My 
bad.

Looks like we can also get the same result using 'SELECT changes()'. 

Approach 1:

con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
sqldf()
suppressWarnings(sqldf(c(" update con set V1 = 0 where V1 > 5 ", "select 
changes()")))
sqldf("select * from main.con")
sqldf()

Approach 2:

con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
sqldf()
suppressWarnings(sqldf(c("pragma count_changes = 1", "update con set V1 = 0 
where V1 > 5 ")))
sqldf("select * from main.con")
sqldf()



-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Thursday, June 11, 2020 10:30 AM
To: Ravi Jeyaraman 
Cc: r-help@r-project.org
Subject: Re: [R] sqldf and number of records affected

There is no real difference between your example and the example I provided.
Both use a data.frame in R and both work for me under R 3.5 with RSQLite 2.2.0 
See log below.

Note that there is a bug in R 4.0 related to tcltk that could possibly affect 
sqldf as it uses tcltk.  A fix has been announced for R 4.0.2.

> library(sqldf)
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
> con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
> sqldf()

  Path: :memory:
  Extensions: TRUE
> sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 
> 5 "))
  rows updated
15
Warning message:
In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead 
of dbGetQuery() or dbSendQuery().
> ans <- sqldf("select * from main.con")
> sqldf()
NULL
> ans
   V1
1   1
2   2
3   3
4   4
5   5
6   0
7   0
8   0
9   0
10  0
> R.version.string
[1] "R version 3.5.3 (2019-03-11)"
> packageVersion("sqldf")
[1] ‘0.4.11’
> packageVersion("RSQLite")
[1] ‘2.2.0’
> packageVersion("DBI")
[1] ‘1.1.0’



On Thu, Jun 11, 2020 at 10:06 AM Ravi Jeyaraman  wrote:
>
> Thanks for the response Gabor.  Looks like the below example will work when 
> using SQLite, but in my case I'm just creating a dataframe in R and trying to 
> update it using sqldf as below and it doesn't seem to work ...
>
> con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
> sqldf()
> sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 
> 5 ")) ans <- sqldf("select * from main.con")
> sqldf()
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Thursday, June 11, 2020 9:12 AM
> To: Ravi Jeyaraman 
> Cc: r-help@r-project.org
> Subject: Re: [R] sqldf and number of records affected
>
> Here is an example.  Ignore the warning or use the workaround 
> discussed here
> https://github.com/ggrothendieck/sqldf/issues/40
> to avoid the warning.
>
>   library(sqldf)
>   sqldf()  # use same connection until next sqldf()
>   sqldf(c("pragma count_changes = 1", "update BOD set demand = 99 where Time 
> > 4"))
>   sqldf("select * from main.BOD")
>   sqldf()
>
>
> On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman  wrote:
> >
> > Hello all, When I execute a SQL using SQLDF, how do I get the number 
> > of records affected?  I mean, if I run an UPDATE on a data frame, it 
> > doesn't tell me if and how many records got updated.  I've read 
> > through the documentation and there don't seem to be a way to get 
> > this info unless it's done on a database.  Any ideas?
> >
> > Thanks
> > Ravi
> >
> >
> > --
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf and number of records affected

2020-06-11 Thread Gabor Grothendieck
There is no real difference between your example and the example I provided.
Both use a data.frame in R and both work for me under R 3.5 with RSQLite 2.2.0
See log below.

Note that there is a bug in R 4.0 related to tcltk that could possibly affect
sqldf as it uses tcltk.  A fix has been announced for R 4.0.2.

> library(sqldf)
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
> con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
> sqldf()

  Path: :memory:
  Extensions: TRUE
> sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 5 "))
  rows updated
15
Warning message:
In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement()
instead of dbGetQuery() or dbSendQuery().
> ans <- sqldf("select * from main.con")
> sqldf()
NULL
> ans
   V1
1   1
2   2
3   3
4   4
5   5
6   0
7   0
8   0
9   0
10  0
> R.version.string
[1] "R version 3.5.3 (2019-03-11)"
> packageVersion("sqldf")
[1] ‘0.4.11’
> packageVersion("RSQLite")
[1] ‘2.2.0’
> packageVersion("DBI")
[1] ‘1.1.0’



On Thu, Jun 11, 2020 at 10:06 AM Ravi Jeyaraman  wrote:
>
> Thanks for the response Gabor.  Looks like the below example will work when 
> using SQLite, but in my case I'm just creating a dataframe in R and trying to 
> update it using sqldf as below and it doesn't seem to work ...
>
> con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
> sqldf()
> sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 5 "))
> ans <- sqldf("select * from main.con")
> sqldf()
>
> -----Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Thursday, June 11, 2020 9:12 AM
> To: Ravi Jeyaraman 
> Cc: r-help@r-project.org
> Subject: Re: [R] sqldf and number of records affected
>
> Here is an example.  Ignore the warning or use the workaround discussed here
> https://github.com/ggrothendieck/sqldf/issues/40
> to avoid the warning.
>
>   library(sqldf)
>   sqldf()  # use same connection until next sqldf()
>   sqldf(c("pragma count_changes = 1", "update BOD set demand = 99 where Time 
> > 4"))
>   sqldf("select * from main.BOD")
>   sqldf()
>
>
> On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman  wrote:
> >
> > Hello all, When I execute a SQL using SQLDF, how do I get the number
> > of records affected?  I mean, if I run an UPDATE on a data frame, it
> > doesn't tell me if and how many records got updated.  I've read
> > through the documentation and there don't seem to be a way to get this
> > info unless it's done on a database.  Any ideas?
> >
> > Thanks
> > Ravi
> >
> >
> > --
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf and number of records affected

2020-06-11 Thread Ravi Jeyaraman
Thanks for the response Gabor.  Looks like the below example will work when 
using SQLite, but in my case I'm just creating a dataframe in R and trying to 
update it using sqldf as below and it doesn't seem to work ...

con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
sqldf()
sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 5 "))
ans <- sqldf("select * from main.con")
sqldf()

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Thursday, June 11, 2020 9:12 AM
To: Ravi Jeyaraman 
Cc: r-help@r-project.org
Subject: Re: [R] sqldf and number of records affected

Here is an example.  Ignore the warning or use the workaround discussed here
https://github.com/ggrothendieck/sqldf/issues/40
to avoid the warning.

  library(sqldf)
  sqldf()  # use same connection until next sqldf()
  sqldf(c("pragma count_changes = 1", "update BOD set demand = 99 where Time > 
4"))
  sqldf("select * from main.BOD")
  sqldf()


On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman  wrote:
>
> Hello all, When I execute a SQL using SQLDF, how do I get the number 
> of records affected?  I mean, if I run an UPDATE on a data frame, it 
> doesn't tell me if and how many records got updated.  I've read 
> through the documentation and there don't seem to be a way to get this 
> info unless it's done on a database.  Any ideas?
>
> Thanks
> Ravi
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


-- 
This email has been checked for viruses by AVG.
https://www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf and number of records affected

2020-06-11 Thread Gabor Grothendieck
Here is an example.  Ignore the warning or use the workaround discussed here
https://github.com/ggrothendieck/sqldf/issues/40
to avoid the warning.

  library(sqldf)
  sqldf()  # use same connection until next sqldf()
  sqldf(c("pragma count_changes = 1", "update BOD set demand = 99
where Time > 4"))
  sqldf("select * from main.BOD")
  sqldf()


On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman  wrote:
>
> Hello all, When I execute a SQL using SQLDF, how do I get the number of
> records affected?  I mean, if I run an UPDATE on a data frame, it doesn't
> tell me if and how many records got updated.  I've read through the
> documentation and there don't seem to be a way to get this info unless it's
> done on a database.  Any ideas?
>
> Thanks
> Ravi
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf --Warning message:

2016-02-19 Thread Gabor Grothendieck
sqldf does not use Tk so you can ignore this.

On Fri, Feb 19, 2016 at 12:32 PM, Divakar Reddy
 wrote:
> Dear R users,
>
> I'm getting Waring message while trying to load "sqldf" package in R3.2.3
> and assuming that we can ignore this as it's WARNING Message and not an
> error message.
> Can you guide me if my assumption is wrong?
>
>
>> library(sqldf);
> Loading required package: gsubfn
> Loading required package: proto
> Loading required package: RSQLite
> Loading required package: DBI
> Warning message:
> no DISPLAY variable so Tk is not available
>
>> version   _
> platform   x86_64-redhat-linux-gnu
> version.string R version 3.2.3 (2015-12-10)
>>
>
> Thanks,
> Divakar
> Phoenix,USA
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf() difference between R 3.1.2 and 3.0.1

2015-02-11 Thread Doran, Harold
That seems to have worked, both in the new and old version of R. I'll do more 
unit testing on other files.

Thank you, Gabor.

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Wednesday, February 11, 2015 10:22 AM
To: Doran, Harold
Cc: r-help@r-project.org
Subject: Re: [R] sqldf() difference between R 3.1.2 and 3.0.1

On Wed, Feb 11, 2015 at 9:45 AM, Doran, Harold hdo...@air.org wrote:
 I have a function written and tested using R 3.0.1 and sqldf_0.4-7.1 that 
 works perfectly. However, using this same code with R 3.1.2 and sqldf_0.4-10 
 yields the error below that I am having a difficult time deciphering. Hence, 
 same code behaves differently on different versions of R and sqldf().

 Error in sqliteSendQuery(con, statement, bind.data) :
   error in statement: no such column: V1


 Reproducible example below as well as complete sessionInfo all provided below.


 My function and code using the function are below.

 dorReader - function(dorFile, layout, sepChar = '\n'){
 sepChar - as.character(sepChar)
 dorFile - as.character(dorFile)
 layout$type2 - ifelse(layout$type == 'C', 'character',
 
 ifelse(layout$type == 'N', 'numeric', 'Date'))
 dor - file(dorFile)
 attr(dor, file.format) - list(sep = sepChar)
 getVars - paste(select,
paste(substr(V1, , layout$Start, , ,
  layout$Length, ) ', layout$Variable.Name, ', 
 collapse = , ), from dor)
 dat - sqldf(getVars)

 classConverter - function(obj, types){
 out - lapply(1:length(obj),FUN = 
 function(i){FUN1 - switch(types[i],character = as.character,numeric = 
 as.numeric,factor = as.factor, Date = as.character); FUN1(obj[,i])})
 names(out) - colnames(obj)
 as.data.frame(out)
 }
 dat - classConverter(dat, layout$type2)
 names(dat) - layout$Variable.Name
 dat
 }

 ### contents of fwf file 'sample.txt'
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567

 layout - data.frame(Variable.Name =c('test1', 'test2'), Length = 
 c(3,4), Start =c(1,4), End = c(3,7), type = c('N', 'N'))

 tmp - dorReader('sample.txt', layout)

sqldf is documented to use the sqliteImportFile defaults for file.format 
components.  It may be that RSQLite 1.0 has changed the default for header in 
sqliteImportFile.

Try replacing your statement that sets file.format with this:

attr(dor, file.format) - list(sep = sepChar, header = FALSE)
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf() difference between R 3.1.2 and 3.0.1

2015-02-11 Thread Gabor Grothendieck
On Wed, Feb 11, 2015 at 9:45 AM, Doran, Harold hdo...@air.org wrote:
 I have a function written and tested using R 3.0.1 and sqldf_0.4-7.1 that 
 works perfectly. However, using this same code with R 3.1.2 and sqldf_0.4-10 
 yields the error below that I am having a difficult time deciphering. Hence, 
 same code behaves differently on different versions of R and sqldf().

 Error in sqliteSendQuery(con, statement, bind.data) :
   error in statement: no such column: V1


 Reproducible example below as well as complete sessionInfo all provided below.


 My function and code using the function are below.

 dorReader - function(dorFile, layout, sepChar = '\n'){
 sepChar - as.character(sepChar)
 dorFile - as.character(dorFile)
 layout$type2 - ifelse(layout$type == 'C', 'character',
 
 ifelse(layout$type == 'N', 'numeric', 'Date'))
 dor - file(dorFile)
 attr(dor, file.format) - list(sep = sepChar)
 getVars - paste(select,
paste(substr(V1, , layout$Start, , ,
  layout$Length, ) ', layout$Variable.Name, ', 
 collapse = , ), from dor)
 dat - sqldf(getVars)

 classConverter - function(obj, types){
 out - lapply(1:length(obj),FUN = 
 function(i){FUN1 - switch(types[i],character = as.character,numeric = 
 as.numeric,factor = as.factor, Date = as.character); FUN1(obj[,i])})
 names(out) - colnames(obj)
 as.data.frame(out)
 }
 dat - classConverter(dat, layout$type2)
 names(dat) - layout$Variable.Name
 dat
 }

 ### contents of fwf file 'sample.txt'
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567

 layout - data.frame(Variable.Name =c('test1', 'test2'), Length = c(3,4), 
 Start =c(1,4), End = c(3,7), type = c('N', 'N'))

 tmp - dorReader('sample.txt', layout)

sqldf is documented to use the sqliteImportFile defaults for
file.format components.  It may be that RSQLite 1.0 has changed the
default for header in sqliteImportFile.

Try replacing your statement that sets file.format with this:

attr(dor, file.format) - list(sep = sepChar, header = FALSE)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf problems

2014-07-14 Thread Gabor Grothendieck
2014-07-14 7:52 GMT-04:00 Juan Daniel García juandaniel...@gmail.com:
 Hello:
 I'm trying to run this code

 data2 - sqldf ( SELECT plot, age,  avg(N) as N FROM data1 GROUP BY plot,
 t)

 The problem is that when calling sqldf with
require(sqldf)

 appears this message

 Warning message:
 In sqliteCloseConnection(conn, ...) :
   RS-DBI driver warning: (closing pending result sets before closing this
 connection)

 If try again

require(sqldf)

 the message does not appear, but when trying the code afterwards, I get a
 new message

 Error in sqliteFetch(rs, n = -1, ...) :
   RSQLite driver: (RS_SQLite_fetch: failed: Domain error)

 I have a new Windows 8.1

 Do I need any other installations o packages or something else?
 The code was working in older Windows XP and Linux


See the Troubleshooting section of the sqldf home page.
http://sqldf.googlecode.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf problems

2014-07-14 Thread Jeff Newmiller
One doesn't call a package... so your description is unclear. One doesn't 
load (using the require or library functions) the relevant package AFTER 
calling functions in the package. Since it appears you have left out some steps 
or described them out of order, please supply a reproducible example [1] 
(including a test data set) as the Posting Guide requests.

NB I don't have Windows 8 or 8.1, so if that turns out to be the problem then I 
won't be able to tell, but I have a feeling there is something more basic going 
on here.

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On July 14, 2014 4:52:18 AM PDT, Juan Daniel García juandaniel...@gmail.com 
wrote:
Hello:
I'm trying to run this code

data2 - sqldf ( SELECT plot, age,  avg(N) as N FROM data1 GROUP BY
plot,
t)

The problem is that when calling sqldf with
require(sqldf)

appears this message

Warning message:
In sqliteCloseConnection(conn, ...) :
RS-DBI driver warning: (closing pending result sets before closing this
connection)

If try again

require(sqldf)

the message does not appear, but when trying the code afterwards, I get
a
new message

Error in sqliteFetch(rs, n = -1, ...) :
  RSQLite driver: (RS_SQLite_fetch: failed: Domain error)

I have a new Windows 8.1

Do I need any other installations o packages or something else?
The code was working in older Windows XP and Linux

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF column errors

2013-01-15 Thread Anthony Damico
i think this is what you want..  :)

http://stackoverflow.com/questions/4765936/using-joined-tables-to-exclude-certain-records

library(sqldf)

# use the mtcars example table
mtcars

# keep the first eight records in a second, separate data set
x - mtcars[ 1:8 , ]

# keep all the records from mtcars where the mpg value doesn't match a
record in the `x` data table
y - sqldf( select * from mtcars where not exists ( select * from x where
x.mpg == mtcars.mpg ) )


# the above example assumes your tables won't fit in RAM.
# if they do, this method is obviously much easier
y - mtcars[ !( mtcars$mpg %in% x$mpg ) , ]




On Tue, Jan 15, 2013 at 4:02 PM, Matthew Liebers mr...@cornell.edu wrote:

 I am trying to exclude integer values from a small data frame 1, d1 that
 have matching hits in data frame 2, d2 (Very big) which involves matching
 those hits first.  I am trying to use sqldf on the df's in the following
 fashion:

 df1:
 V1
 12675
 14753
 16222
 18765

 df2: head(df2)
 V1  V2
 13647 rd1500
 14753 rd1580
 15987 rd1590
 16222 rd2020.

 df1_new-sqldf(select df1.V1, df2.V2 where rs10.V1 = d10.pos”) - Ideally I
 would like to try to use delete or not equal to != though I can only
 find that delete works with sqldf.
 but it returns this error:
  Error in sqliteExecStatement(con, statement, bind.data) :
   RS-DBI driver: (error in statement: no such column: df1.V1)
 I am also trying this:
 df1_new-sqldf(select V1 from df1, V2 from df2 where df1.V1 = df2.V1)
 which returns this error:
 Error in sqliteExecStatement(con, statement, bind.data) :
   RS-DBI driver: (error in statement: near from: syntax error)

 If anyone with sqldf knowledge could lend me a hand that would be great.
 Thanks!

 Matt

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf package: using variables in where condition

2013-01-14 Thread Ravi Sreedhar


Sent from my iPhone

On Jan 13, 2013, at 9:47 PM, ravsre ravishre...@hotmail.com wrote:

 I am trying to use the sqldf package to create independent data frames from a
 master dataframe. 
 I want to use sqldf package and perform a simple select statement. However,
 what  I want to do is to create a loop and repeatedly send a variable to the
 where clause condition such as a: select * from mydfrm where Symbol =
 list[i]
 The 'list' is another data frame with a single column with 15 values.  All I
 what to do is to create loop for i in 1:15 and pass on into the select
 statement as list[1], list[2 }..tile list[15]
 
 Can someone help since, I am unable to reference the element location  as 
 variable in the sqldf function?
 
 also, is there a better way?
 would appreciate your help
 
 Rgds, Ravi
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/sqldf-package-using-variables-in-where-condition-tp4655445.html
 Sent from the R help mailing list archive at Nabble.com.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf package: using variables in where condition

2013-01-14 Thread Gabor Grothendieck
On Mon, Jan 14, 2013 at 8:36 AM, Ravi Sreedhar ravishre...@hotmail.com wrote:


 Sent from my iPhone

 On Jan 13, 2013, at 9:47 PM, ravsre ravishre...@hotmail.com wrote:

 I am trying to use the sqldf package to create independent data frames from a
 master dataframe.
 I want to use sqldf package and perform a simple select statement. However,
 what  I want to do is to create a loop and repeatedly send a variable to the
 where clause condition such as a: select * from mydfrm where Symbol =
 list[i]
 The 'list' is another data frame with a single column with 15 values.  All I
 what to do is to create loop for i in 1:15 and pass on into the select
 statement as list[1], list[2 }..tile list[15]

 Can someone help since, I am unable to reference the element location  as
 variable in the sqldf function?


See Example 5 on the sqldf home page.
http://code.google.com/p/sqldf/#Example_5._Insert_Variables

--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf merging with subset in specific range

2012-12-26 Thread jim holtman
Is this what you want:


  m - read.table(text = 10
+  15
+  36
+  37
+  38
+  44
+  45
+  57
+  61
+  62
+  69 )
 n - read.table(text = 30   38
+  52   62   )

  require(sqldf)
  sqldf(select m.V1
+ from m, n
+ where m.V1 between n.V1 and n.V2
+  )
  V1
1 36
2 37
3 38
4 57
5 61
6 62



On Wed, Dec 26, 2012 at 7:00 PM, Matthew Liebers mr...@cornell.edu wrote:
 Hi all:

 I have two data sets.  Set A includes a long list of hits in a single
 column, say:
 m$V1
 10
 15
 36
 37
 38
 44
 45
 57
 61
 62
 69 ...and so on

 Set B includes just a few key ranges set up by way of a minimum in column X
 and a maximum in column Y.  Say,
 n$X n$Y
 30   38   # range from 30 to 38
 52   62   # range from 52 to 62

 I would like the output to be the rows containing the following columns:
 m$V1
 36
 37
 38
 57
 61
 62

 I am interested in isolating the hits in data set A that correspond to any
 of the hotspot ranges in data set B.  I have downloaded sqldf and tried a
 couple things but I cannot do a traditional merge since set B is based on a
 range.  I can always do a manual subset but I am trying to figure out if
 there is anything more expedient since these df's will be quite large.

 Thanks!

 Matt

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf Date problem

2012-11-04 Thread Andreas Recktenwald


Zitat von jim holtman jholt...@gmail.com:


Most likely your Date is either a character or a factor (you need to
provide an 'str' of the dataframe).  You are therefore most likely
doing a character compare and that is the reason for your problem.
You need to convert to a character string of the format -MM-DD to
do the correct character comparison.

##

x - data.frame(Date = paste0('1/', 1:31, '/2011'))
str(x)

'data.frame':   31 obs. of  1 variable:
 $ Date: Factor w/ 31 levels 1/1/2011,1/10/2011,..: 1 12 23 26 27
28 29 30 31 2 ...

x

Date
1   1/1/2011
2   1/2/2011
3   1/3/2011
4   1/4/2011
5   1/5/2011
6   1/6/2011
7   1/7/2011
8   1/8/2011
9   1/9/2011
10 1/10/2011
11 1/11/2011
12 1/12/2011
13 1/13/2011
14 1/14/2011
15 1/15/2011
16 1/16/2011
17 1/17/2011
18 1/18/2011
19 1/19/2011
20 1/20/2011
21 1/21/2011
22 1/22/2011
23 1/23/2011
24 1/24/2011
25 1/25/2011
26 1/26/2011
27 1/27/2011
28 1/28/2011
29 1/29/2011
30 1/30/2011
31 1/31/2011


require(sqldf)
# not correct because of character compares
sqldf('select * from x where Date  1/13/2011 and Date  1/25/2011')

Date
1   1/2/2011
2  1/14/2011
3  1/15/2011
4  1/16/2011
5  1/17/2011
6  1/18/2011
7  1/19/2011
8  1/20/2011
9  1/21/2011
10 1/22/2011
11 1/23/2011
12 1/24/2011

# convert the date to /MM/DD for character compares
x$newDate - as.character(as.Date(as.character(x$Date), format =  
%m/%d/%Y))

# now do the select
sqldf('select * from x where newDate between 2011-01-13 and 2011-01-25')

DatenewDate
1  1/13/2011 2011-01-13
2  1/14/2011 2011-01-14
3  1/15/2011 2011-01-15
4  1/16/2011 2011-01-16
5  1/17/2011 2011-01-17
6  1/18/2011 2011-01-18
7  1/19/2011 2011-01-19
8  1/20/2011 2011-01-20
9  1/21/2011 2011-01-21
10 1/22/2011 2011-01-22
11 1/23/2011 2011-01-23
12 1/24/2011 2011-01-24
13 1/25/2011 2011-01-25


On Sat, Nov 3, 2012 at 4:22 PM, Andreas Recktenwald
a.recktenw...@mx.uni-saarland.de wrote:

Dear R-help readers,

i've created a database for quotes data (for 4 years; 2007 -- 2010) with the
sqldf package. This database contains a column Date in the format
mm/dd/.

The table in the database is called main.data and the database itself
Honda. I tried to get the Data just for certain period, say from
01/01/2007 until 01/10/2007 with the following code:

sqldf(select * from main.data where Date='01/10/2007' and
Date='01/01/2007'),
   dbname=Honda)


I get the data for this period for every year(2007,2008,2009,2010) not only
for 2007. It seems that the year is overlooked and just looked for the
fitting days and months.

Because I haven't really much experience with sql I decide to send my
problem to the list.

Many thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



Thanks for your quick response Jim,

you are right the entries in my Date column are characters (my fault  
not to mention this in my first post).



Now i know the reasons for my problem and can solve it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf Date problem

2012-11-03 Thread jim holtman
Most likely your Date is either a character or a factor (you need to
provide an 'str' of the dataframe).  You are therefore most likely
doing a character compare and that is the reason for your problem.
You need to convert to a character string of the format -MM-DD to
do the correct character comparison.

##
 x - data.frame(Date = paste0('1/', 1:31, '/2011'))
 str(x)
'data.frame':   31 obs. of  1 variable:
 $ Date: Factor w/ 31 levels 1/1/2011,1/10/2011,..: 1 12 23 26 27
28 29 30 31 2 ...
 x
Date
1   1/1/2011
2   1/2/2011
3   1/3/2011
4   1/4/2011
5   1/5/2011
6   1/6/2011
7   1/7/2011
8   1/8/2011
9   1/9/2011
10 1/10/2011
11 1/11/2011
12 1/12/2011
13 1/13/2011
14 1/14/2011
15 1/15/2011
16 1/16/2011
17 1/17/2011
18 1/18/2011
19 1/19/2011
20 1/20/2011
21 1/21/2011
22 1/22/2011
23 1/23/2011
24 1/24/2011
25 1/25/2011
26 1/26/2011
27 1/27/2011
28 1/28/2011
29 1/29/2011
30 1/30/2011
31 1/31/2011

 require(sqldf)
 # not correct because of character compares
 sqldf('select * from x where Date  1/13/2011 and Date  1/25/2011')
Date
1   1/2/2011
2  1/14/2011
3  1/15/2011
4  1/16/2011
5  1/17/2011
6  1/18/2011
7  1/19/2011
8  1/20/2011
9  1/21/2011
10 1/22/2011
11 1/23/2011
12 1/24/2011
 # convert the date to /MM/DD for character compares
 x$newDate - as.character(as.Date(as.character(x$Date), format = %m/%d/%Y))
 # now do the select
 sqldf('select * from x where newDate between 2011-01-13 and 2011-01-25')
DatenewDate
1  1/13/2011 2011-01-13
2  1/14/2011 2011-01-14
3  1/15/2011 2011-01-15
4  1/16/2011 2011-01-16
5  1/17/2011 2011-01-17
6  1/18/2011 2011-01-18
7  1/19/2011 2011-01-19
8  1/20/2011 2011-01-20
9  1/21/2011 2011-01-21
10 1/22/2011 2011-01-22
11 1/23/2011 2011-01-23
12 1/24/2011 2011-01-24
13 1/25/2011 2011-01-25


On Sat, Nov 3, 2012 at 4:22 PM, Andreas Recktenwald
a.recktenw...@mx.uni-saarland.de wrote:
 Dear R-help readers,

 i've created a database for quotes data (for 4 years; 2007 -- 2010) with the
 sqldf package. This database contains a column Date in the format
 mm/dd/.

 The table in the database is called main.data and the database itself
 Honda. I tried to get the Data just for certain period, say from
 01/01/2007 until 01/10/2007 with the following code:

 sqldf(select * from main.data where Date='01/10/2007' and
 Date='01/01/2007'),
dbname=Honda)


 I get the data for this period for every year(2007,2008,2009,2010) not only
 for 2007. It seems that the year is overlooked and just looked for the
 fitting days and months.

 Because I haven't really much experience with sql I decide to send my
 problem to the list.

 Many thanks in advance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-04 Thread Steve Lianoglou
Hi,

On Fri, Feb 3, 2012 at 1:12 PM, HC hca...@yahoo.co.in wrote:
 Thank you.

 The readLines command is working fine and I am able to read 10^6 lines in
 one go and write them using the write.table command.

 Does this readLines command using a block concept to optimize or goes line
 by line?

 Steve has mentioned about *nix and split commands. Would there be any speed
 benefit as compared to readLines?

It's easy to test, right? Just give it a shot and see ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-04 Thread HC
Thank you Jim for your reply.

I could figure out that readLines works fine until 35,841,335 lines
(records). 

When the next line is read to be read, a window with R for Windows GUI
front-end has stopped working message comes, with an option to close
program or checking online for a solution.

The tab-separated .txt file I am working on has 9 columns and about 3.25
billion rows. I suspect that the 35,841,336th line is becoming so very big
that RAM out runs its capacity. Perhaps all tabs that are expected after
each values are missing, making the line so very big, I am not sure.

Is there any way to skip this line with readLines or any other function? I
am only hoping that the data after this bad line is good and I can read them
if I can skip the bad one. 

Thank you.
HC

--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4357730.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC
Thank you for indicating that SQLite may not handle a file as big as 160 GB.

Would you know of any utility for *physically splitting *the 160 GB text
file into pieces. And if one can control the splitting at the  end of a
record.

Thank you again.
HC

--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4354285.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Gabor Grothendieck
On Fri, Feb 3, 2012 at 6:03 AM, HC hca...@yahoo.co.in wrote:
 Thank you for indicating that SQLite may not handle a file as big as 160 GB.

 Would you know of any utility for *physically splitting *the 160 GB text
 file into pieces. And if one can control the splitting at the  end of a
 record.


If they are csv files or similar data files then you could use R or
any scripting language to do that.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Steve Lianoglou
On Fri, Feb 3, 2012 at 7:37 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Fri, Feb 3, 2012 at 6:03 AM, HC hca...@yahoo.co.in wrote:
 Thank you for indicating that SQLite may not handle a file as big as 160 GB.

 Would you know of any utility for *physically splitting *the 160 GB text
 file into pieces. And if one can control the splitting at the  end of a
 record.


 If they are csv files or similar data files then you could use R or
 any scripting language to do that.

Or even the *nix `split` command ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC
This is a 160 GB tab-separated .txt file. It has 9 columns and 3.25x10^9
rows.

Can R handle it?  

Thank you.
HC



--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4354556.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Gabor Grothendieck
On Fri, Feb 3, 2012 at 8:08 AM, HC hca...@yahoo.co.in wrote:
 This is a 160 GB tab-separated .txt file. It has 9 columns and 3.25x10^9
 rows.

 Can R handle it?


You can process a file N lines at time like this:

con - file(myfile.dat, r)
while(length(Lines - readLines(con, n = N))  0) {
  ... whatever...
}

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC
Thank you.

The readLines command is working fine and I am able to read 10^6 lines in
one go and write them using the write.table command.

Does this readLines command using a block concept to optimize or goes line
by line?

Steve has mentioned about *nix and split commands. Would there be any speed
benefit as compared to readLines?

Thank you.
HC

--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355362.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC
Bad news!

The readLines command works fine upto a certain limit. Once a few files have
been written the R program crashes.

I used the following code:
*
iFile-Test.txt
con - file(iFile, r)

N-125; 
iLoop-1
 
while(length(Lines - readLines(con, n = N))  0  iLoop41) { 
oFile-paste(Split_,iLoop,.txt,sep=)
  write.table(Lines, oFile, sep = \t, quote = FALSE, col.names= FALSE,
row.names = FALSE)
  iLoop-iLoop+1
} 
close(con)


With above N=1.25 million, it wrote 28 files of about 57 mb each. That is a
total of about 1.6 GB and then crashed.
I tried with other values on N and it crashes at about the same place in
terms of total size output, i.e., about 1.6 GB.

Is this due to any limitation of Windows 7, in terms of not having the
pointer after this size?

Your insight would be very helpful.

Thank you.
HC






--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355679.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread jim holtman
Exactly what does crashed mean?  What was the error message?  How
you tried to put:

rm(Lines)
gc()

at the end of the loop to free up and compact memory?  If you watch
the performance, does the R process seem to be growing in terms of the
amount of memory that is being used?  You can add:

memory.size()

before the above statements to see how much memory is being used.
This is just some more elementary debugging that you will have to
learn when using any system.

On Fri, Feb 3, 2012 at 3:22 PM, HC hca...@yahoo.co.in wrote:
 Bad news!

 The readLines command works fine upto a certain limit. Once a few files have
 been written the R program crashes.

 I used the following code:
 *
 iFile-Test.txt
 con - file(iFile, r)

 N-125;
 iLoop-1

 while(length(Lines - readLines(con, n = N))  0  iLoop41) {
 oFile-paste(Split_,iLoop,.txt,sep=)
  write.table(Lines, oFile, sep = \t, quote = FALSE, col.names= FALSE,
 row.names = FALSE)
  iLoop-iLoop+1
 }
 close(con)
 

 With above N=1.25 million, it wrote 28 files of about 57 mb each. That is a
 total of about 1.6 GB and then crashed.
 I tried with other values on N and it crashes at about the same place in
 terms of total size output, i.e., about 1.6 GB.

 Is this due to any limitation of Windows 7, in terms of not having the
 pointer after this size?

 Your insight would be very helpful.

 Thank you.
 HC






 --
 View this message in context: 
 http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355679.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread Gabor Grothendieck
On Wed, Feb 1, 2012 at 11:57 PM, HC hca...@yahoo.co.in wrote:
 Hi All,

 I have a very (very) large tab-delimited text file without headers. There
 are only 8 columns and millions of rows. I want to make numerous pieces of
 this file by sub-setting it for individual stations. Station is given as in
 the first column. I am trying to learn and use sqldf package for this but am
 stuck in a couple of places.

 To simulate my requirement, I have taken iris dataset as an example and have
 done the following:
 (1) create a tab-delimited file without headers.
 (2) read it using read.csv.sql command
 (3) write the result of a query, getting first 10 records

 Here is the reproducible code that I am trying:
 # Text data file
 write.table(iris, irisNoH.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)
 # create an empty database (can skip this step if database already exists)
 sqldf(attach myTestdbT as new)
 f1-file(irisNoH.txt)
 attr(f1, file.format) - list(header=FALSE,sep=\t)
 # read into table called irisTab in the mytestdb sqlite database
 read.csv.sql(irisNoH.txt, sql = create table main.irisTab1 as select *
 from file, dbname = mytestdb)
 res1-sqldf(select * from main.irisTab1 limit 10, dbname = mytestdb)
 write.table(res1, iris10.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)

 # For querying records of a particular species - unresolved problems
 #a1-virginica
 #attr(f1, names) - c(A1,A2,A3,A4,A5)
 #res2-fn$sqldf(select * from main.irisTab1 where A5 = '$a1')

 In the above, I am not able to:
 (1) assign the names to various columns
 (2) query for particular value of a column; in this case for particular
 species, say virginica
 (3) I guess fn$sqldf can do the job but it requires assigning column names

 Any help would be most appreciated.


Ignoring your iris file for a moment, to query the 5th column (getting
its name via sql rather than via R) we can do this:

library(sqldf)
species - virginica
nms - names(dbGetQuery(con, select * from iris limit 0))
fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

Now, sqldf is best used when you are getting the data from R but if
you want to store it in a database and just leave it there then you
might be better off using RSQLite directly like this (the eol = \r\n
in the dbWriteTable statement was needed on my Windows system but you
may not need that depending on your platform):


write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names
= FALSE, row.names = FALSE)

library(sqldf)
library(RSQLite)

con - dbConnect(SQLite(), dbname = mytestdb)

dbWriteTable(con, iris, irisNoH.txt, sep = \t, eol = \r\n)

species - virginica
nms - names(dbGetQuery(con, select * from iris limit 0))
fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

dbDisconnect(con)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread Gabor Grothendieck
On Thu, Feb 2, 2012 at 3:11 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Wed, Feb 1, 2012 at 11:57 PM, HC hca...@yahoo.co.in wrote:
 Hi All,

 I have a very (very) large tab-delimited text file without headers. There
 are only 8 columns and millions of rows. I want to make numerous pieces of
 this file by sub-setting it for individual stations. Station is given as in
 the first column. I am trying to learn and use sqldf package for this but am
 stuck in a couple of places.

 To simulate my requirement, I have taken iris dataset as an example and have
 done the following:
 (1) create a tab-delimited file without headers.
 (2) read it using read.csv.sql command
 (3) write the result of a query, getting first 10 records

 Here is the reproducible code that I am trying:
 # Text data file
 write.table(iris, irisNoH.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)
 # create an empty database (can skip this step if database already exists)
 sqldf(attach myTestdbT as new)
 f1-file(irisNoH.txt)
 attr(f1, file.format) - list(header=FALSE,sep=\t)
 # read into table called irisTab in the mytestdb sqlite database
 read.csv.sql(irisNoH.txt, sql = create table main.irisTab1 as select *
 from file, dbname = mytestdb)
 res1-sqldf(select * from main.irisTab1 limit 10, dbname = mytestdb)
 write.table(res1, iris10.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)

 # For querying records of a particular species - unresolved problems
 #a1-virginica
 #attr(f1, names) - c(A1,A2,A3,A4,A5)
 #res2-fn$sqldf(select * from main.irisTab1 where A5 = '$a1')

 In the above, I am not able to:
 (1) assign the names to various columns
 (2) query for particular value of a column; in this case for particular
 species, say virginica
 (3) I guess fn$sqldf can do the job but it requires assigning column names

 Any help would be most appreciated.


 Ignoring your iris file for a moment, to query the 5th column (getting
 its name via sql rather than via R) we can do this:

 library(sqldf)
 species - virginica
 nms - names(dbGetQuery(con, select * from iris limit 0))
 fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

 Now, sqldf is best used when you are getting the data from R but if
 you want to store it in a database and just leave it there then you
 might be better off using RSQLite directly like this (the eol = \r\n
 in the dbWriteTable statement was needed on my Windows system but you
 may not need that depending on your platform):


 write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names
 = FALSE, row.names = FALSE)

 library(sqldf)
 library(RSQLite)

 con - dbConnect(SQLite(), dbname = mytestdb)

 dbWriteTable(con, iris, irisNoH.txt, sep = \t, eol = \r\n)

 species - virginica
 nms - names(dbGetQuery(con, select * from iris limit 0))
 fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

 dbDisconnect(con)

There seems to have been a pasting error here.  The first part was
intended to show how to do this using sqldf and the second using
RSQLite.Thus the first part was intended to be:

library(sqldf)
species - virginica

# obviously we could just do nms - names(iris) but to get
# names from database instead
nms - names(dbGetQuery(con, select * from iris limit 0))

# use 5th column
fn$sqldf(select * from iris where `nms[5]` = '$species' limit 3)


and the second part that illustrates RSQLite was ok.  Note that fn$
comes from the gsubfn package which sqldf loads.





-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread HC
Hi Gabor,

Thank you very much for your guidance and help.

I could run the following code successfully on a 500 mb test data file. A
snapshot of the data file is attached herewith.

code start***
library(sqldf)
library(RSQLite)

iFile-Test100.txt
con - dbConnect(SQLite(),dbname = myTest100)
dbWriteTable(con, TestDB100, iFile, sep = \t) #, eol = \r\n)
nms - names(dbGetQuery(con, select * from TestDB100 limit 0))

nRec-fn$dbGetQuery(con, select count(*)from TestDB100)
aL1-1;

while (aL1=nRec){
res1-fn$dbGetQuery(con, select * from (select * from TestDB100 limit
'$aL1',1))
istn-res1[1,1]
res1-fn$dbGetQuery(con, select * from TestDB100 where `nms[1]` = '$istn')
icount-dim(res1)[1]
oFile-paste(istn,_Test.txt,sep=)
write.table(res1, oFile, sep = \t, quote = FALSE, col.names= FALSE,
row.names = FALSE)
aL1-aL1+icount
}
dbDisconnect(con)
code end***

However, the actual data file that I want to handle is about *160 GB*. And
when I use the same above code on that file, it gives following error for
dbWriteTable(con, ...) statement
error start**
dbWriteTable(con, TestDB, iFile, sep = \t) #, eol = \r\n)
Error in try({ : RS-DBI driver: (RS_sqlite_getline could not realloc)
[1] FALSE
error end**

I am not sure about the reason of this error. Is this due to the big file
size? I understood from sqldf webpage that SQLite can work for even a larger
file than this and is only restricted by the disc space and not RAM. I have
about 400GB free space on the PC I am using, with Windows 7 as the operating
system. I am assuming that the about dbWriteTable command is using the disc
memory only and is not the issue.

In fact this file has been created using MySQLdump and I do not have access
to the original MYSQL database file. 
I want to know the following:
(1)  Am I missing something in the above code that is preventing handling of
this big 160  GB file?
(2)  Should this be handled outside of R, if R is becoming a limitation in
this? And if yes then what is a possible way forward?

Thank you again for your quick response and all the help.
HC
http://r.789695.n4.nabble.com/file/n4353362/Test100.txt Test100.txt 
 





--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4353362.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread Gabor Grothendieck
On Thu, Feb 2, 2012 at 8:07 PM, HC hca...@yahoo.co.in wrote:
 Hi Gabor,

 Thank you very much for your guidance and help.

 I could run the following code successfully on a 500 mb test data file. A
 snapshot of the data file is attached herewith.

 code start***
 library(sqldf)
 library(RSQLite)

 iFile-Test100.txt
 con - dbConnect(SQLite(),dbname = myTest100)
 dbWriteTable(con, TestDB100, iFile, sep = \t) #, eol = \r\n)
 nms - names(dbGetQuery(con, select * from TestDB100 limit 0))

 nRec-fn$dbGetQuery(con, select count(*)from TestDB100)
 aL1-1;

 while (aL1=nRec){
 res1-fn$dbGetQuery(con, select * from (select * from TestDB100 limit
 '$aL1',1))
 istn-res1[1,1]
 res1-fn$dbGetQuery(con, select * from TestDB100 where `nms[1]` = '$istn')
 icount-dim(res1)[1]
 oFile-paste(istn,_Test.txt,sep=)
 write.table(res1, oFile, sep = \t, quote = FALSE, col.names= FALSE,
 row.names = FALSE)
 aL1-aL1+icount
 }
 dbDisconnect(con)
 code end***

 However, the actual data file that I want to handle is about *160 GB*. And
 when I use the same above code on that file, it gives following error for
 dbWriteTable(con, ...) statement
 error start**
 dbWriteTable(con, TestDB, iFile, sep = \t) #, eol = \r\n)
 Error in try({ : RS-DBI driver: (RS_sqlite_getline could not realloc)
 [1] FALSE
 error end**

 I am not sure about the reason of this error. Is this due to the big file
 size? I understood from sqldf webpage that SQLite can work for even a larger
 file than this and is only restricted by the disc space and not RAM. I have
 about 400GB free space on the PC I am using, with Windows 7 as the operating
 system. I am assuming that the about dbWriteTable command is using the disc
 memory only and is not the issue.

 In fact this file has been created using MySQLdump and I do not have access
 to the original MYSQL database file.
 I want to know the following:
 (1)  Am I missing something in the above code that is preventing handling of
 this big 160  GB file?
 (2)  Should this be handled outside of R, if R is becoming a limitation in
 this? And if yes then what is a possible way forward?

 Thank you again for your quick response and all the help.
 HC
 http://r.789695.n4.nabble.com/file/n4353362/Test100.txt Test100.txt


I think its unlikely SQLite could handle a database that large unless
you can divide it into multiple separate databases.  At one time the
SQLite site said it did not handle databases over 1 GB and although I
think that is outdated by more recent versions of SQLite its still
likely true that your size is too large for it.


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.

2012-01-22 Thread Gabor Grothendieck
On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth gvfa...@gmail.com wrote:
 I've been using sqldf heavily lately but have encountered problems
 with ordering of observations or calculating statistics such as max()
 and min() when the variable used is of class Date.

 For example, if I run the following code:

 === begin code =
 library(sqldf)
 A-data.frame(Dates=as.Date(c(1994-02-14,1977-02-23,2001-09-18,2009-08-01)),Ret=rnorm(4))
 OrderedA-sqldf('select * from A order by Dates')
 MaxA-sqldf('select max(Dates) as Dates from A')[1,1]
 MinA-sqldf('select min(Dates) as Dates from A')[1,1]
 === end code =

 Then the result is this:

 A
       Dates        Ret
 1 1994-02-14  1.2414706
 2 1977-02-23 -0.7728146
 3 2001-09-18  1.2551331
 4 2009-08-01 -0.2538359

 OrderedA
       Dates        Ret
 1 2001-09-18  1.2551331
 2 2009-08-01 -0.2538359
 3 1977-02-23 -0.7728146
 4 1994-02-14  1.2414706

 MaxA
 [1] 1994-02-14

 MinA
 [1] 2001-09-18

 Completely wrong order, no warnings issued, and the summary stats are
 wrong as well (but consistent with the ordering).

 According to the sqldf manual found at the following URL

 http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables?

 this type of query should work correctly.  Any clue why it is not
 doing so?  User error or bug?


You are using an old version.  Update to latest version of R and sqldf.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.

2012-01-22 Thread Grant Farnsworth
On Mon, Jan 23, 2012 at 12:46 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth gvfa...@gmail.com wrote:
 I've been using sqldf heavily lately but have encountered problems
 with ordering of observations or calculating statistics such as max()
 and min() when the variable used is of class Date.

 For example, if I run the following code:

 === begin code =
 library(sqldf)
 A-data.frame(Dates=as.Date(c(1994-02-14,1977-02-23,2001-09-18,2009-08-01)),Ret=rnorm(4))
 OrderedA-sqldf('select * from A order by Dates')
 MaxA-sqldf('select max(Dates) as Dates from A')[1,1]
 MinA-sqldf('select min(Dates) as Dates from A')[1,1]
 === end code =

 Then the result is this:

 A
       Dates        Ret
 1 1994-02-14  1.2414706
 2 1977-02-23 -0.7728146
 3 2001-09-18  1.2551331
 4 2009-08-01 -0.2538359

 OrderedA
       Dates        Ret
 1 2001-09-18  1.2551331
 2 2009-08-01 -0.2538359
 3 1977-02-23 -0.7728146
 4 1994-02-14  1.2414706

 MaxA
 [1] 1994-02-14

 MinA
 [1] 2001-09-18

 Completely wrong order, no warnings issued, and the summary stats are
 wrong as well (but consistent with the ordering).

 According to the sqldf manual found at the following URL

 http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables?

 this type of query should work correctly.  Any clue why it is not
 doing so?  User error or bug?


 You are using an old version.  Update to latest version of R and sqldf.

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


Thanks, that worked.  Known bug, I guess.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf and not converting integers to floating point in SQLite

2012-01-03 Thread jim holtman
try this:

 library(sqldf)
 table1 - read.csv(text = POSTAL | VALUE
+ 1000|49
+ 1010|100
+ 1020|50, sep=|)
 table2 - read.csv(text = INSEE | POSTAL
+ A|1000
+ B|1000
+ C|1010
+ D|1020, sep=|)
 table3 - sqldf(
+ select table2.INSEE
+ , 1.0 * table1.VALUE / counts.nPostals as value_spread
+ from table1
+ , table2
+ ,(select POSTAL
+ , count(INSEE) as nPostals
+ from table2
+ group by POSTAL) counts
+ where table1.POSTAL = counts.POSTAL
+ and table1.POSTAL=table2.POSTAL
+ )
 table3
  INSEE value_spread
1 A 24.5
2 B 24.5
3 C100.0
4 D 50.0



On Tue, Jan 3, 2012 at 3:13 PM, Frederik Vanrenterghem
frede...@vanrenterghem.biz wrote:
 Hi,

 I have following 2 tables:

 Table 1:
 POSTAL | VALUE
 1000|49
 1010|100
 1020|50

 Table 2:
 INSEE | POSTAL
 A|1000
 B|1000
 C|1010
 D|1020

 I would like to convert this to the following:

 INSEE | VALUE_SPREAD
 A|24.5
 B|24.5
 C|100
 D|50

 I can achieve this with a nested SQL query (through counting the
 number of POSTAL that belong to any given INSEE, and diving the value
 of the postal in that INSEE by that number).

 library(sqldf)
 table1 - read.csv(c:/R/table1.csv, sep=;)
 table2 - read.csv(c:/R/table2.csv, sep=;)
 table3 - sqldf(select table2.INSEE, table1.VALUE / counts.nPostals
 as value_spread from table1, table2,(select POSTAL, count(INSEE) as
 nPostals from table2 group by POSTAL) counts where table1.POSTAL =
 counts.POSTAL and table1.POSTAL=table2.POSTAL)

 Unfortunately, the value I'm working with is an integer. In SQLite,
 this results in the computed value also not being a float - so it gets
 rounded up or down. In this case, I'm getting 24 for A  B instead of
 24.5.

 Is there a way to take care of this using other R concepts, avoiding
 that problem (for instance using melt  cast)?

 Thanks,
 Frederik

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf if iif

2011-11-26 Thread Jeff Newmiller
sqldf uses the SQLite database by default for backend processing. The iif 
function is specific to the Jet database engine syntax (which underlies MS 
Access). You could read up on SQLite syntax, or you could avoid using 
nonstandard SQL syntax, retrieve the data into a data frame, and use R code to 
do your logical merging into one column.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Carlos Rivera limnoriv...@gmail.com wrote:

Dear all,

 

I have problems with iif function using sqldf library.

I counted abundance (Num) of different SPECIES in two moments (esf)
saving
the information in two Tables (esf50, esf100):

esf50

SAMPLE  SPECIES  Num esf

1289diso1   44  50

1289diso2   5 50

1289diso3   1 50

diso1   44  50

diso2   5 50

diso3   1 50

   

esf100

SAMPLE  SPECIES  Num esf

1289diso1   82  100

1289diso2   13  100

1289diso3   2 100

1289diso4   3 100

diso1   82  100

diso2   13  100

diso3   2 100

diso4   3 100

 

I would like subtract column Num between the two moments considering
only
the changes, therefore I use the conditional if:

 

var100-sqldf(select esf100.SAMPLE, esf100.SPECIES, esf100.Num,
esf100.esf,

  iif esf100.Num - esf50.Num =0, esf100.Num-esf50.Num,
esf100.Num as PIPAS 

   from esf100 left join esf50 on esf100.SAMPLE =
esf50.SAMPLE

   and esf100.SPECIES = esf50.SPECIES)

 

I think the structure is right because the SQL query run ok in Access.
Is
the if syntax the problems?

 

Thank in advanced.

 

Best wishes,

 

Carlos Rivera

 

 

 

 


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf syntax, selecting rows, and skipping

2011-09-29 Thread David Winsemius


On Sep 29, 2011, at 10:06 AM, Juliet Hannah wrote:


I am using the example in this post:

https://stat.ethz.ch/pipermail/r-help/2010-October/257204.html

# create a file
write.table(iris,iris.csv,row.names=FALSE,sep=,,quote=FALSE)


# this does not work
# has the syntax changed or  is there a mistake in my usage?
# the line from the post above is:
#  read.csv.sql(myfile.csv, sql = select * from file 2000, 1000)


You didn't read to the end of that thread. The two errors above were  
corrected.




library(sqldf)
read.csv.sql(iris.csv, sql = select * from file 5, 5)

# this works
# but i would like to keep the header

read.csv.sql(iris.csv, sql = select * from file limit
5,skip=5,header=FALSE)

# thanks



sessionInfo()

R version 2.13.1 (2011-07-08)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] sqldf_0.4-2   chron_2.3-42  gsubfn_0.5-7
proto_0.3-9.2 RSQLite.extfuns_0.0.1 RSQLite_0.9-4
DBI_0.2-5 myfunctions_1.0

loaded via a namespace (and not attached):
[1] tools_2.13.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQldf with sqlite and H2

2011-07-14 Thread Gabor Grothendieck
On Thu, Jul 14, 2011 at 10:33 AM, Mandans mandan...@yahoo.com wrote:
 SQldf with sqlite and H2

 I have a large csv file (about 2GB) and wanted to import the file into R and 
 do some filtering and analysis. Came across sqldf ( a great idea and product) 
 and was trying to play around to see what would be the best method of doing 
 this. csv file is comma delimited with some columns having comma inside the 
 quoation like this John, Doe.

 I tried this first

 ###
 library(sqldf)
 sqldf(attach testdb as new)
 In.File - C:/JP/Temp/2008.csv
 read.csv.sql(In.File, sql = create table table1 as select * from file,
  dbname = testdb)

 It errored out with message

 NULL
 Warning message:
 closing unused connection 3 (C:/JP/Temp/2008.csv)

 When this failed, I converted this file from comma delimited to tab delimited 
 and used this command

 #
 read.csv.sql(In.File, sql = create table table1 as select * from file,
  dbname = testdb, sep = \t)

 and this worked, it created testdb sqlite file with the size of 3GB

 now my question is in 3 parts.

 1. Is it possible to create a dataframe with appropriate column classes and 
 use that column classes when I use the read.csv.sql command to create the 
 table. Something like may be create the table from that DF and then update 
 with read.csv.sql.?

 Any example code will be really helpful.

Here is an example of using method = name__class.  Note there are
two underscores in a row.  It appears I neglected to document that
Date2 means convert from character representation whereas Date means
convert from numeric representation.  It would also be possible to use
method = raw and then coerce the columns yourself afterwards.

# create test file
Lines - 'A__Date2|B
2000-01-01|x,y
2000-01-02|c,d
'
tf - tempfile()
cat(Lines, file = tf)


library(sqldf)
DF - read.csv.sql(tf, sep = |, method = name__class)
str(DF)


 2. If we use the H2 database instead of default sqlite and use the readcsv 
 option, will that be faster and is there a way we can specify the above 
 thought of applying a DF class to table column properties and update with 
 CSVREAD

 library(RH2)
 something like SELECT * FROM CSVREAD('C:/JP/Temp/2008.csv')

 Any example code will be really helpful.

Sorry, I haven't tested the speed of this.  postgresql and mysql, both
supported by sqldf, also have builtin methods to read files. If I had
to guess I would guess that mysql would be fastest but this would have
to be tested.


 3. How do we specify where the H2 file is saved. Saw something like this, 
 when I ran this example from RH2 package, couldn't find the file in the 
 working directory.

 con - dbConnect(H2(), jdbc:h2:~/test, sa, )

~ means your home directory so ~/test means test is in the home directory.

Try

normalizePath(~)
normalizePath(~/test)
etc.

to see what they refer to.

Regards.


 Sorry for the long mail. Appreciate all for building a great community and 
 for the wonderful software in R.
 Thanks for Gabor Grothendieck for bring sqldf to this great community.

 Any help or direction you can provide in this is highly appreciated.

 Thanks all.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQldf with sqlite and H2

2011-07-14 Thread Mandans
Thanks a lot Gabor. It helped a lot. Appreciate your time and effort.

Thanks

--- On Thu, 7/14/11, Gabor Grothendieck ggrothendi...@gmail.com wrote:

 From: Gabor Grothendieck ggrothendi...@gmail.com
 Subject: Re: [R] SQldf with sqlite and H2
 To: Mandans mandan...@yahoo.com
 Cc: r-help@r-project.org
 Date: Thursday, July 14, 2011, 2:22 PM
 On Thu, Jul 14, 2011 at 10:33 AM,
 Mandans mandan...@yahoo.com
 wrote:
  SQldf with sqlite and H2
 
  I have a large csv file (about 2GB) and wanted to
 import the file into R and do some filtering and analysis.
 Came across sqldf ( a great idea and product) and was trying
 to play around to see what would be the best method of doing
 this. csv file is comma delimited with some columns having
 comma inside the quoation like this John, Doe.
 
  I tried this first
 
  ###
  library(sqldf)
  sqldf(attach testdb as new)
  In.File - C:/JP/Temp/2008.csv
  read.csv.sql(In.File, sql = create table table1 as
 select * from file,
   dbname = testdb)
 
  It errored out with message
 
  NULL
  Warning message:
  closing unused connection 3 (C:/JP/Temp/2008.csv)
 
  When this failed, I converted this file from comma
 delimited to tab delimited and used this command
 
  #
  read.csv.sql(In.File, sql = create table table1 as
 select * from file,
   dbname = testdb, sep = \t)
 
  and this worked, it created testdb sqlite file with
 the size of 3GB
 
  now my question is in 3 parts.
 
  1. Is it possible to create a dataframe with
 appropriate column classes and use that column classes when
 I use the read.csv.sql command to create the table.
 Something like may be create the table from that DF and then
 update with read.csv.sql.?
 
  Any example code will be really helpful.
 
 Here is an example of using method = name__class. 
 Note there are
 two underscores in a row.  It appears I neglected to
 document that
 Date2 means convert from character representation whereas
 Date means
 convert from numeric representation.  It would also be
 possible to use
 method = raw and then coerce the columns yourself
 afterwards.
 
 # create test file
 Lines - 'A__Date2|B
 2000-01-01|x,y
 2000-01-02|c,d
 '
 tf - tempfile()
 cat(Lines, file = tf)
 
 
 library(sqldf)
 DF - read.csv.sql(tf, sep = |, method =
 name__class)
 str(DF)
 
 
  2. If we use the H2 database instead of default sqlite
 and use the readcsv option, will that be faster and is there
 a way we can specify the above thought of applying a DF
 class to table column properties and update with CSVREAD
 
  library(RH2)
  something like SELECT * FROM
 CSVREAD('C:/JP/Temp/2008.csv')
 
  Any example code will be really helpful.
 
 Sorry, I haven't tested the speed of this.  postgresql
 and mysql, both
 supported by sqldf, also have builtin methods to read
 files. If I had
 to guess I would guess that mysql would be fastest but this
 would have
 to be tested.
 
 
  3. How do we specify where the H2 file is saved. Saw
 something like this, when I ran this example from RH2
 package, couldn't find the file in the working directory.
 
  con - dbConnect(H2(), jdbc:h2:~/test, sa, )
 
 ~ means your home directory so ~/test means test is in the
 home directory.
 
 Try
 
 normalizePath(~)
 normalizePath(~/test)
 etc.
 
 to see what they refer to.
 
 Regards.
 
 
  Sorry for the long mail. Appreciate all for building a
 great community and for the wonderful software in R.
  Thanks for Gabor Grothendieck for bring sqldf to this
 great community.
 
  Any help or direction you can provide in this is
 highly appreciated.
 
  Thanks all.
 
  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 
 
 
 
 -- 
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sqldf INSERT INTO

2011-04-20 Thread Gabor Grothendieck
On Wed, Apr 20, 2011 at 12:39 PM, new2R bv_agr...@yahoo.co.in wrote:
  Hi,

 I am new to R and trying to migrate from SAS. I am trying to copy data from
 one table to another table which have same columns using sqldf. but not
 working and showing NULL

 I wrote statement as sqldf(INSERT INTO new select * from data) but showing
 NULL

 Please help me in this regard.


In your example new is a table in the sqlite database, not in R's
workspace, so you have to return it:

 library(sqldf)
 BOD
  Time demand
118.3
22   10.3
33   19.0
44   16.0
55   15.6
67   19.8
 New - BOD[1, ]
 BOD1 - BOD[2:3,]
 sqldf(c(insert into New select * from BOD1, select * from New))
  Time demand
118.3
22   10.3
33   19.0


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF syntax

2011-04-19 Thread Gabor Grothendieck
On Mon, Apr 18, 2011 at 6:34 PM, new2R bv_agr...@yahoo.co.in wrote:
 Hi,

 I am new to R and trying to migrate from SAS. I am trying to use sqldf to
 create a new table from existed table and change some of the columns. I have
 table called DataOld with columns commodity, rate and total and I am trying
 to create new table called DataNew with columns commodity, ratenew and
 totalNew.

 sqldf(create table datanew as select commodity, ratenew as rate * 10,
 totalnew as total *10 from DataOld)

 I got error message  Error in sqliteExecStatement(con, statement,
 bind.data) :
  RS-DBI driver: (error in statement: near *: syntax error)


Its expression as name, not name as expression.  Try this:

 library(sqldf)
 BODnew - sqldf(select demand, Time, demand + 1 as demandPlusOne from BOD)
 BODnew
  demand Time demandPlusOne
18.31   9.3
2   10.32  11.3
3   19.03  20.0
4   16.04  17.0
5   15.65  16.6
6   19.87  20.8

For more, the sqldf home page at http://sqldf.googlecode.com has links
to sqlite site where you can find sql syntax diagrams. See the links
along the left side of the page there.




-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF syntax

2011-04-19 Thread new2R
Thank you very much. Its working.

--
View this message in context: 
http://r.789695.n4.nabble.com/SQLDF-syntax-tp3458919p3460448.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF - Submitting Queries with R Objects as Columns

2011-03-09 Thread Gabor Grothendieck
On Wed, Mar 9, 2011 at 11:41 AM, Mike Schumacher
mike.schumac...@gmail.com wrote:
 Fellow R programmers,

 I'd like to submit SQLDF statements with R objects as column names.

 For example, I want to assign X to var1  (var1-X) and then refer to
 var1 in the SQLDF statement.  SQLDF needs to understand that when I
 reference var1, it should look for X in the dataframe.

 This is necessary because my SQLDF is part of a larger function that I call
 that repeatedly with different column names.

 Code below... thank you in advance!

 Mike


 library(sqldf)

 testdf-data.frame(c(1,2,3,4,5,6,7,8,9,10),c(1,1,1,2,2,2,3,3,3,3))
 names(testdf)-c(X,Y)

 # Works as intended
 sqldf(select sum(X) as XSUM,
       Y             as Y
       from testdf
       group by Y)

 # Now... can I reference var1 in the code?
 var1-X


Here are two ways:

sqldf(sprintf(select sum(%s) XSUM, Y from testdf group by Y, var1))

fn$sqldf(select sum($var1) XSUM, Y from testdf group by Y)

See ?sprintf

fn comes from the gsubfn package (which is automatically pulled in by
sqldf) and adds quasi perl style string interpolation to the arguments
passed to the function call it prefaces.  See
http://gsubfn.googlecode.com and ?fn


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF - Submitting Queries with R Objects as Columns

2011-03-09 Thread Rob Tirrell
You're submitting queries for SQLDF to execute as strings. So, if you want
to use a variable column name, sprintf() or paste() your statement together,
like:

sqldf(sprintf('select sum(%s) as XSUM, Y as Y from testdf group by Y',
var1))

--
Robert Tirrell | r...@stanford.edu | (607) 437-6532
Program in Biomedical Informatics | Butte Lab | Stanford University


 sqldf(select sum(return(var1)) as XSUM,
   Y as Y
   from testdf
   group by Y)



 --
 Michael Schumacher
 mike.schumac...@gmail.com
 Manager Data  Analytics, ValueClick
 818-851-8638

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-02 Thread GL

Marc: Installing Simon's package worked perfectly. Thanks so much! 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-hanging-on-macintosh-works-on-windows-tp3022193p3023736.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf error only on Unix not Windows

2010-11-01 Thread Gabor Grothendieck
On Mon, Nov 1, 2010 at 9:28 AM, Alex Bryant abry...@i-review.com wrote:
 Hello Group,

                I am having trouble with the sqldf package on unix.  The same 
 code works fine on windows.

 Silly Example script:

 # Load the package
 library(sqldf)

 # Use the titanic data set

 data(women)
 colnames(women)
 head(women)

 sqldf('select height, count(*) from women where height is not null group by 
 weight')


Some things to try:

-  try adding dbname = tempfile() argument to your sqldf statement and
see if that makes any difference

- try it with the H2 database rather than sqlite (or with PostgreSQL)
  To use it with H2 make sure you have Java and the CRAN package, RH2,
installed.
  RH2 includes the H2 database itself so you don't need to install that.
  Then issue this line in R any time before your first sqldf call
 library(RH2)
  sqldf will notice it and automatically use the H2 database instead of sqlite

- try it with R 2.11

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Gabor Grothendieck
On Mon, Nov 1, 2010 at 9:59 AM, GL pfl...@shands.ufl.edu wrote:

 Have a long script that runs fine on windows (32 bit). When I try to run in
 on two different macs (64 bit), however, it hangs with identical behavior.

 I start with:
 library(sqldf)

 This results in messages:
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron

 I then read some data, etc.

 I execute the following:

 #merge raw data and all possible combinations
  df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
    left join df.aggregate using (Hour, Date)')

 I receive the messages:
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +

 Then I get into some kind of loop. Message at bottom ribbon says:

 executing:
 try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE)


That is not a line that appears in the sqldf source code.  Try these
suggestions anyways:

http://permalink.gmane.org/gmane.comp.lang.r.general/209443


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread GL

added library(RH2)

Still get message:

Loading required package: tcltk
Loading Tcl/Tk interface
+

directly after sqldf statement 

   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
+ left join df.aggregate using (Hour, Date)')

There is no progress spinner. If I hit enter I get a 

At that point I start to enter any command (just summary, for instance), I
get the progress spinner, the
try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE)
 message in the bottom ribbon, and the system apparently hangs. 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-hanging-on-macintosh-works-on-windows-tp3022193p3022233.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Gabor Grothendieck
On Mon, Nov 1, 2010 at 10:32 AM, GL pfl...@shands.ufl.edu wrote:

 added library(RH2)

 Still get message:

 Loading required package: tcltk
 Loading Tcl/Tk interface
 +

 directly after sqldf statement

   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 +     left join df.aggregate using (Hour, Date)')

 There is no progress spinner. If I hit enter I get a 

 At that point I start to enter any command (just summary, for instance), I
 get the progress spinner, the
 try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE)
  message in the bottom ribbon, and the system apparently hangs.

I don't have a Mac but if you wish to pursue it try this:

library(sqldf)
debug(sqldf)
sqldf(...whatever...)
# now step through it by repeatedly pressing Enter and send me the
console output of the session

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread GL


 
 library(sqldf)
Loading required package: DBI
Loading required package: RSQLite
Loading required package: RSQLite.extfuns
Loading required package: gsubfn
Loading required package: proto
Loading required package: chron
 debug(sqldf)  
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
+ left join df.aggregate using (Hour, Date)')
debugging in: sqldf(select Date, Hour, x as RoomsInUse from
\df.possible.combos\\nleft join \df.aggregate\ using (Hour, Date))
debug: {
as.POSIXct.character - function(x) structure(as.numeric(x), 
class = c(POSIXt, POSIXct))
as.Date.character - function(x) structure(as.numeric(x), 
class = Date)
as.Date.numeric - function(x, origin = 1970-01-01, ...)
base::as.Date.numeric(x, 
origin = origin, ...)
as.dates.character - function(x) structure(as.numeric(x), 
class = c(dates, times))
as.times.character - function(x) structure(as.numeric(x), 
class = times)
overwrite - FALSE
request.open - missing(x)  is.null(connection)
request.close - missing(x)  !is.null(connection)
request.con - !missing(x)  !is.null(connection)
request.nocon - !missing(x)  is.null(connection)
dfnames - fileobjs - character(0)
if (request.close || request.nocon) {
on.exit({
dbPreExists - attr(connection, dbPreExists)
dbname - attr(connection, dbname)
if (!missing(dbname)  !is.null(dbname)  dbname == 
:memory:) {
dbDisconnect(connection)
} else if (!dbPreExists  drv == sqlite) {
dbDisconnect(connection)
file.remove(dbname)
} else {
for (nam in dfnames) dbRemoveTable(connection, 
  nam)
for (fo in fileobjs) dbRemoveTable(connection, 
  fo)
dbDisconnect(connection)
}
})
if (request.close) {
if (identical(connection, getOption(sqldf.connection))) 
options(sqldf.connection = NULL)
return()
}
}
if (request.open || request.nocon) {
if (is.null(drv)) {
drv - if (package:RpgSQL %in% search()) {
pgSQL
}
else if (package:RMySQL %in% search()) {
MySQL
}
else if (package:RH2 %in% search()) {
H2
}
else SQLite
}
drv - tolower(drv)
if (drv == mysql) {
m - dbDriver(MySQL)
connection - if (missing(dbname) || dbname == :memory:) {
dbConnect(m)
}
else dbConnect(m, dbname = dbname)
dbPreExists - TRUE
}
else if (drv == pgsql) {
m - dbDriver(pgSQL)
if (missing(dbname) || is.null(dbname)) {
dbname - getOption(RpgSQL.dbname)
if (is.null(dbname)) 
  dbname - test
}
connection - dbConnect(m, dbname = dbname)
dbPreExists - TRUE
}
else if (drv == h2) {
m - H2()
if (missing(dbname) || is.null(dbname)) 
dbname - :memory:
dbPreExists - dbname != :memory:  file.exists(dbname)
connection - if (missing(dbname) || dbname == :memory:) {
dbConnect(m, jdbc:h2:mem:, sa, )
}
else {
jdbc.string - paste(jdbc:h2, dbname, sep = :)
dbConnect(m, jdbc.string)
}
}
else {
m - dbDriver(SQLite)
if (missing(dbname)) 
dbname - :memory:
dbPreExists - dbname != :memory:  file.exists(dbname)
if (is.null(getOption(sqldf.dll))) {
dll - Sys.which(libspatialite-1.dll)
if (dll != ) 
  options(sqldf.dll = dll)
else options(sqldf.dll = FALSE)
}
dll - getOption(sqldf.dll)
if (length(dll) != 1 || identical(dll, FALSE) || 
nchar(dll) == 0) {
dll - FALSE
}
else {
if (dll == basename(dll)) 
  dll - Sys.which(dll)
}
options(sqldf.dll = dll)
if (!identical(dll, FALSE)) {
connection - dbConnect(m, dbname = dbname,
loadable.extensions = TRUE)
s - sprintf(select load_extension('%s'), dll)
dbGetQuery(connection, s)
}
else connection - dbConnect(m, dbname = dbname)
init_extensions(connection)
}
attr(connection, dbPreExists) - dbPreExists
if (missing(dbname)  drv == sqlite) 
dbname - :memory:
attr(connection, dbname) - dbname
if (request.open) {
options(sqldf.connection = connection)

Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Gabor Grothendieck
On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote:



 library(sqldf)
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron
 debug(sqldf)
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 +     left join df.aggregate using (Hour, Date)')
 debugging in: sqldf(select Date, Hour, x as RoomsInUse from
...
 debug: words. - words - strapply(x, [[:alnum:]._]+)
 Browse[2]
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +

There is something wrong with tcltk on your system.  You can tell it
not to use tcltk by setting the appropriate option as discussed in
sqldf FAQ #5:

http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Marc Schwartz
On Nov 1, 2010, at 10:55 AM, Gabor Grothendieck wrote:

 On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote:
 
 
 
 library(sqldf)
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron
 debug(sqldf)
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 + left join df.aggregate using (Hour, Date)')
 debugging in: sqldf(select Date, Hour, x as RoomsInUse from
 ...
 debug: words. - words - strapply(x, [[:alnum:]._]+)
 Browse[2]
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +
 
 There is something wrong with tcltk on your system.  You can tell it
 not to use tcltk by setting the appropriate option as discussed in
 sqldf FAQ #5:
 
 http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing.


GL,

If you installed R using the OSX binary from CRAN, it does not include tcl/tk. 
You need to install the separate tcltk package that Simon has put together and 
is available from:

  http://cran.us.r-project.org/bin/macosx/tools/

You also need to have X11 installed, which is available from the OSX DVD in the 
Optional Installs section.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Gabor Grothendieck
On Mon, Nov 1, 2010 at 12:10 PM, Marc Schwartz marc_schwa...@me.com wrote:
 On Nov 1, 2010, at 10:55 AM, Gabor Grothendieck wrote:

 On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote:



 library(sqldf)
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron
 debug(sqldf)
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 +     left join df.aggregate using (Hour, Date)')
 debugging in: sqldf(select Date, Hour, x as RoomsInUse from
 ...
 debug: words. - words - strapply(x, [[:alnum:]._]+)
 Browse[2]
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +

 There is something wrong with tcltk on your system.  You can tell it
 not to use tcltk by setting the appropriate option as discussed in
 sqldf FAQ #5:

 http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing.


 GL,

 If you installed R using the OSX binary from CRAN, it does not include 
 tcl/tk. You need to install the separate tcltk package that Simon has put 
 together and is available from:

  http://cran.us.r-project.org/bin/macosx/tools/

 You also need to have X11 installed, which is available from the OSX DVD in 
 the Optional Installs section.

 HTH,

 Marc Schwartz




Note that sqldf can work without tcltk, as well.  The gsubfn package
does check for tcltk and and sets the engine to R rather than
tcltk if

capabilities()[[tcltk]]

is FALSE.  There may be a bug in R or a problem with the installation.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf syntax

2010-08-27 Thread Bond, Stephen
I had checked those references before posting, actually. SQLite has a very 
limited implementation of the standard. To do a single table update I would not 
go to sql. It's easy enough to do in R.

The problem is when I need to do an update from a left outer join, which I had 
to do with sqlSave (to a mySQL table), then sqlQuery, then sqlFetch.
sqlSave is amazingly slow, takes half an hour. (Would never do that at home :-) 
just too lazy to write a formal table def and use load data infile from a csv 
dump.

Also not happy with Dates becoming years in the transition. 
Will check the other suggestion about data.table and report.

Cheers everybody.

Stephen B

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Thursday, August 26, 2010 4:26 PM
To: Bond, Stephen
Cc: r-help@r-project.org
Subject: Re: [R] sqldf syntax

On Thu, Aug 26, 2010 at 2:31 PM, Bond, Stephen stephen.b...@cibc.com wrote:
 Please correct the following

 sqldf(update esc left join forwagg  on esc.ym=forwagg.Date set 
 esc.ri2=forwagg.N1 where esc.age=12,select * from main.esc)
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near left: syntax error)


1. sqldf takes one sql argument whereas the above has two sql
arguments; however, the one argument may be a vector of sql commands.
 See ?sqldf and the examples on the sqldf home page
http://sqldf.googlecode.com

2. there is an error in the syntax of your update statement.  For
correct syntax see the sqlite site:

http://sqlite.org/lang_update.html

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf syntax

2010-08-26 Thread Gabor Grothendieck
On Thu, Aug 26, 2010 at 2:31 PM, Bond, Stephen stephen.b...@cibc.com wrote:
 Please correct the following

 sqldf(update esc left join forwagg  on esc.ym=forwagg.Date set 
 esc.ri2=forwagg.N1 where esc.age=12,select * from main.esc)
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near left: syntax error)


1. sqldf takes one sql argument whereas the above has two sql
arguments; however, the one argument may be a vector of sql commands.
 See ?sqldf and the examples on the sqldf home page
http://sqldf.googlecode.com

2. there is an error in the syntax of your update statement.  For
correct syntax see the sqlite site:

http://sqlite.org/lang_update.html

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF from Variable Matrix

2010-08-04 Thread Gabor Grothendieck
On Wed, Aug 4, 2010 at 12:29 AM, Suphajak Ngamlak
supha...@phatrasecurities.com wrote:
 Dear all,

 I would like to do sample statistics, e.g. mean, median from very large
 dataset. This is part of commands I use routinely with several dataset
 so I would like to make it into function. The simplified examples are

 Test-data.frame(A=c('a','b','c','a','b','c'),B=c(1,2,3,4,5,6))

 #Create function (This one work)

 GetAvg-function(Input,Bygroup){
 AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median
                from Test
    group by $Bygroup)
 return(AVG)
 }

 Result-GetAvg(Test,'A')

 #Create function (This one does not work)

 GetAvg-function(Input,Bygroup){
 AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median
                from $Input
    group by $Bygroup)
 return(AVG)
 }

 Result-GetAvg(Test,'A')

That should be GetAvg('Test', 'A') with quotes around Test or if you
want to be able to specify the data unquoted then insert this line at
the top of your function:

Input - deparse(substitute(Input))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf 0.3-5 package or tcltk problem

2010-07-28 Thread Gabor Grothendieck
On Wed, Jul 28, 2010 at 1:21 AM,  erickso...@aol.com wrote:



  This is my first post. I am running Mac OS X version 10.6.3. I am running R 
 2.11.0 GUI 1.33 64 bit.

 This may or may not be related to sqldf, but I experienced this problem while 
 attempting to use an sqldf query. The same code runs with no problem on my 
 Windows machine. Here is what happens:

 r=sqldf(select ... )
 Loading required package: tcltk
 Loading Tcl/Tk interface ...

 Then it never loads.

 I have X11 open.

 I have all the latest versions of all the necessary packages for sqldf 0.3-5:

 DBI 0.2-5
 RSQLite 0.9-1
 RSQLite.extfuns 0.0.1
 gsubfn 0.5-3
 proto 0.3-8
 chron 2.3-35

 Although it gives warning messages for these:

 package 'sqldf' was built under R version 2.11.1
 package 'RSQLite' was built under R version 2.11.1
 package 'RSQLite.extfuns' was built under R version 2.11.1
 package 'gsubfun' was built under R version 2.11.1

 What can I do to load the Tcl/Tk interface?

Some things to try:

- upgrade to R 2.11.1
- try this alone: library(tcltk)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf modify table

2010-07-16 Thread Gabor Grothendieck
On Fri, Jul 16, 2010 at 2:46 PM, PeterTucker pthet...@gmail.com wrote:

 Hi - I am something of a newbie and am a little perplexed.  When (trying to)
 modify a table I issue the following commands with subsequent errors

 sqldf(alter table Korea drop column code, dbname = mydb)
 error in statement: near drop: syntax error

 or

 sqldf(alter table Korea rename column hyr to hyrI, dbname = mydb)
 error in statement: near column: syntax error

 These are simple commands - am I missing something obvious?  I can retrieve
 data from them, and retrieve their table_info


SQLite does not support dropping columns. See:

   http://www.sqlite.org/lang_altertable.html

however, sqldf does support the H2 and PostgreSQL databases in
addition to sqlite so you can try one of those if this feature is
important to you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Gabor Grothendieck
There are two problems:

1. A natural join will join all columns with the same names in the two
tables and that includes not only Tid but also dfName and since there
are no rows that have the same Tid and dfName the result has zero
rows.

2. the heuristic it uses fails when you retrieve the same column name
from multiple tables so use method = raw to turn off the heuristic.
The heuristic will be improved to cover this case in the future.
Read FAQ #1 on the home page:
http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors?

This should work:

 sqldf('select * from main.A join main.B using(Tid)', method = raw)
 Tid dfName dfName
1  AES 01-01-02 11:53:00  a  b
2 AES 01-01-05\n10:58:00  a  b
3  AES 01-01-11 12:30:00  a  b

This works too as the double dfName no longer exists to confuse the heuristic:

names(B)[2] - dfNameB
sqldf('select * from main.A join main.B using(Tid)')



On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote:
 Hello,

 I'm having trouble discovering what's going wrong with my use of natural
 joins via sqldf.

 Following the instructions under 4i at http://code.google.com/p/sqldf/,
 which discusses creating indices to speed joins, I have been only unreliably
 able to get natural joins to work.

 For example,

 Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05
 10:58:00', 'AES 01-01-11 12:30:00')
 A - data.frame(Tid, dfName = 'a')
 B - data.frame(Tid = Tid[2:4], dfName = 'b')
 C - data.frame(Tid = Tid[1:3], dfName = 'c')

 # then use the sqldf library
 library(sqldf)
 sqldf()

 # to create indices on the Tid variable shared across data.frames
 sqldf('create index indA on A(Tid)')
 sqldf('create index indB on B(Tid)')
 sqldf('create index indC on C(Tid)')

 # check to make sure everything is there
 sqldf('select * from sqlite_master')

 # doing a natural join (implicitly on Tid)
 # does not give the expected joins
 sqldf('select * from main.A natural join main.B')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.A natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.B natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)

 # even using a where clause (which doesn't have the efficiency qualities I
 need the indexed natural joins for) is problematic, setting values of the
 dfName variable incorrectly for the data from C
 sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
                    Tid dfName                   Tid dfName
 1 AES 01-01-02 11:53:00      b AES 01-01-02 11:53:00      b
 2 AES 01-01-05 10:58:00      b AES 01-01-05 10:58:00      b

 I'm grateful for your guidance on what I'm doing wrong with the natural join
 in sqldf.

 many thanks,
 Nick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Gabor Grothendieck
Although that works I had meant to write:

 names(B)[2] - dfNameB
 # ... other commands
 sqldf('select * from main.A natural join main.B')

so that now only Tid is in common so the natural join just picks it up
and also the heuristic works again since we no longer retrieve
duplicate column names.

On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 There are two problems:

 1. A natural join will join all columns with the same names in the two
 tables and that includes not only Tid but also dfName and since there
 are no rows that have the same Tid and dfName the result has zero
 rows.

 2. the heuristic it uses fails when you retrieve the same column name
 from multiple tables so use method = raw to turn off the heuristic.
 The heuristic will be improved to cover this case in the future.
 Read FAQ #1 on the home page:
 http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors?

 This should work:

 sqldf('select * from main.A join main.B using(Tid)', method = raw)
                     Tid dfName dfName
 1  AES 01-01-02 11:53:00      a      b
 2 AES 01-01-05\n10:58:00      a      b
 3  AES 01-01-11 12:30:00      a      b

 This works too as the double dfName no longer exists to confuse the heuristic:

 names(B)[2] - dfNameB
 sqldf('select * from main.A join main.B using(Tid)')



 On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote:
 Hello,

 I'm having trouble discovering what's going wrong with my use of natural
 joins via sqldf.

 Following the instructions under 4i at http://code.google.com/p/sqldf/,
 which discusses creating indices to speed joins, I have been only unreliably
 able to get natural joins to work.

 For example,

 Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05
 10:58:00', 'AES 01-01-11 12:30:00')
 A - data.frame(Tid, dfName = 'a')
 B - data.frame(Tid = Tid[2:4], dfName = 'b')
 C - data.frame(Tid = Tid[1:3], dfName = 'c')

 # then use the sqldf library
 library(sqldf)
 sqldf()

 # to create indices on the Tid variable shared across data.frames
 sqldf('create index indA on A(Tid)')
 sqldf('create index indB on B(Tid)')
 sqldf('create index indC on C(Tid)')

 # check to make sure everything is there
 sqldf('select * from sqlite_master')

 # doing a natural join (implicitly on Tid)
 # does not give the expected joins
 sqldf('select * from main.A natural join main.B')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.A natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.B natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)

 # even using a where clause (which doesn't have the efficiency qualities I
 need the indexed natural joins for) is problematic, setting values of the
 dfName variable incorrectly for the data from C
 sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
                    Tid dfName                   Tid dfName
 1 AES 01-01-02 11:53:00      b AES 01-01-02 11:53:00      b
 2 AES 01-01-05 10:58:00      b AES 01-01-05 10:58:00      b

 I'm grateful for your guidance on what I'm doing wrong with the natural join
 in sqldf.

 many thanks,
 Nick


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Nick Switanek
Thank you very much for these clarifying responses, Gabor.

I had mistakenly assumed that creating the index on Tid restricted the
natural join to joining on Tid. Can you describe when and how indices speed
up joins, or can you point me to resources that address this? Is it only for
natural joins or any joins (including, say, a select statement with where
clause)?

thanks,
nick


On Thu, May 20, 2010 at 11:42 AM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 Although that works I had meant to write:

  names(B)[2] - dfNameB
  # ... other commands
  sqldf('select * from main.A natural join main.B')

 so that now only Tid is in common so the natural join just picks it up
 and also the heuristic works again since we no longer retrieve
 duplicate column names.

 On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
  There are two problems:
 
  1. A natural join will join all columns with the same names in the two
  tables and that includes not only Tid but also dfName and since there
  are no rows that have the same Tid and dfName the result has zero
  rows.
 
  2. the heuristic it uses fails when you retrieve the same column name
  from multiple tables so use method = raw to turn off the heuristic.
  The heuristic will be improved to cover this case in the future.
  Read FAQ #1 on the home page:
 
 http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors
 ?
 
  This should work:
 
  sqldf('select * from main.A join main.B using(Tid)', method = raw)
  Tid dfName dfName
  1  AES 01-01-02 11:53:00  a  b
  2 AES 01-01-05\n10:58:00  a  b
  3  AES 01-01-11 12:30:00  a  b
 
  This works too as the double dfName no longer exists to confuse the
 heuristic:
 
  names(B)[2] - dfNameB
  sqldf('select * from main.A join main.B using(Tid)')
 
 
 
  On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com
 wrote:
  Hello,
 
  I'm having trouble discovering what's going wrong with my use of natural
  joins via sqldf.
 
  Following the instructions under 4i at http://code.google.com/p/sqldf/,
  which discusses creating indices to speed joins, I have been only
 unreliably
  able to get natural joins to work.
 
  For example,
 
  Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES
 01-01-05
  10:58:00', 'AES 01-01-11 12:30:00')
  A - data.frame(Tid, dfName = 'a')
  B - data.frame(Tid = Tid[2:4], dfName = 'b')
  C - data.frame(Tid = Tid[1:3], dfName = 'c')
 
  # then use the sqldf library
  library(sqldf)
  sqldf()
 
  # to create indices on the Tid variable shared across data.frames
  sqldf('create index indA on A(Tid)')
  sqldf('create index indB on B(Tid)')
  sqldf('create index indC on C(Tid)')
 
  # check to make sure everything is there
  sqldf('select * from sqlite_master')
 
  # doing a natural join (implicitly on Tid)
  # does not give the expected joins
  sqldf('select * from main.A natural join main.B')
  [1] TiddfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.A natural join main.C')
  [1] TiddfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.B natural join main.C')
  [1] TiddfName
  0 rows (or 0-length row.names)
 
  # even using a where clause (which doesn't have the efficiency qualities
 I
  need the indexed natural joins for) is problematic, setting values of
 the
  dfName variable incorrectly for the data from C
  sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
 Tid dfName   Tid dfName
  1 AES 01-01-02 11:53:00  b AES 01-01-02 11:53:00  b
  2 AES 01-01-05 10:58:00  b AES 01-01-05 10:58:00  b
 
  I'm grateful for your guidance on what I'm doing wrong with the natural
 join
  in sqldf.
 
  many thanks,
  Nick
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Gabor Grothendieck
They work on any join that is able to make use of them.   If you
preface the select statement with explain query plan then it will give
you some info, e.g.

 sqldf('explain query plan select * from main.A natural join main.B')
  order from  detail
1 00 TABLE A
2 11 TABLE B WITH INDEX indB

Note that its using the index on B but not the index on A so there was
actually no point in adding that one if this were the only query.

This is potentially a large topic so see the sqlite.org site and
mailing list for more info.

On Thu, May 20, 2010 at 1:28 PM, Nick Switanek nswita...@gmail.com wrote:
 Thank you very much for these clarifying responses, Gabor.

 I had mistakenly assumed that creating the index on Tid restricted the
 natural join to joining on Tid. Can you describe when and how indices speed
 up joins, or can you point me to resources that address this? Is it only for
 natural joins or any joins (including, say, a select statement with where
 clause)?

 thanks,
 nick


 On Thu, May 20, 2010 at 11:42 AM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:

 Although that works I had meant to write:

  names(B)[2] - dfNameB
  # ... other commands
  sqldf('select * from main.A natural join main.B')

 so that now only Tid is in common so the natural join just picks it up
 and also the heuristic works again since we no longer retrieve
 duplicate column names.

 On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
  There are two problems:
 
  1. A natural join will join all columns with the same names in the two
  tables and that includes not only Tid but also dfName and since there
  are no rows that have the same Tid and dfName the result has zero
  rows.
 
  2. the heuristic it uses fails when you retrieve the same column name
  from multiple tables so use method = raw to turn off the heuristic.
  The heuristic will be improved to cover this case in the future.
  Read FAQ #1 on the home page:
 
  http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors?
 
  This should work:
 
  sqldf('select * from main.A join main.B using(Tid)', method = raw)
                      Tid dfName dfName
  1  AES 01-01-02 11:53:00      a      b
  2 AES 01-01-05\n10:58:00      a      b
  3  AES 01-01-11 12:30:00      a      b
 
  This works too as the double dfName no longer exists to confuse the
  heuristic:
 
  names(B)[2] - dfNameB
  sqldf('select * from main.A join main.B using(Tid)')
 
 
 
  On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com
  wrote:
  Hello,
 
  I'm having trouble discovering what's going wrong with my use of
  natural
  joins via sqldf.
 
  Following the instructions under 4i at http://code.google.com/p/sqldf/,
  which discusses creating indices to speed joins, I have been only
  unreliably
  able to get natural joins to work.
 
  For example,
 
  Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES
  01-01-05
  10:58:00', 'AES 01-01-11 12:30:00')
  A - data.frame(Tid, dfName = 'a')
  B - data.frame(Tid = Tid[2:4], dfName = 'b')
  C - data.frame(Tid = Tid[1:3], dfName = 'c')
 
  # then use the sqldf library
  library(sqldf)
  sqldf()
 
  # to create indices on the Tid variable shared across data.frames
  sqldf('create index indA on A(Tid)')
  sqldf('create index indB on B(Tid)')
  sqldf('create index indC on C(Tid)')
 
  # check to make sure everything is there
  sqldf('select * from sqlite_master')
 
  # doing a natural join (implicitly on Tid)
  # does not give the expected joins
  sqldf('select * from main.A natural join main.B')
  [1] Tid    dfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.A natural join main.C')
  [1] Tid    dfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.B natural join main.C')
  [1] Tid    dfName
  0 rows (or 0-length row.names)
 
  # even using a where clause (which doesn't have the efficiency
  qualities I
  need the indexed natural joins for) is problematic, setting values of
  the
  dfName variable incorrectly for the data from C
  sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
                     Tid dfName                   Tid dfName
  1 AES 01-01-02 11:53:00      b AES 01-01-02 11:53:00      b
  2 AES 01-01-05 10:58:00      b AES 01-01-05 10:58:00      b
 
  I'm grateful for your guidance on what I'm doing wrong with the natural
  join
  in sqldf.
 
  many thanks,
  Nick
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
Can you show the output of dput(x_data) and dput(y_data).

On Fri, Mar 12, 2010 at 11:56 AM, Newbie19_02 nvanzuy...@gmail.com wrote:

 Dear R users,

 I have two data frames that were read from text files as follows:

 x_data - read.table(x.txt, header = TRUE, sep = |, quote = \',
                dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
 nrows = 3864284,
                skip = 0, check.names = TRUE,fill=TRUE,
                strip.white = TRUE, blank.lines.skip = TRUE,
                comment.char = #, allowEscapes = FALSE, flush = FALSE,
                fileEncoding = , encoding = unknown)

 x_data

 prochi prescribed_date dataMonth item_code res_seqno quantity directions
 CAO713      22/06/2001      NULL    842752      NULL       60        1/D
 CAO713      28/04/2000      NULL      7800      NULL     100G       A/TD
 CAO713      10/04/2000      NULL    842652      NULL       60        1/D
 CAO713      03/07/2000      NULL    842652      NULL       60        1/D
 CAO713      09/01/2001      NULL    842752      NULL       60        1/D
 CAO713      16/10/2001      NULL    842752      NULL       60        1/D
 CAO713      16/08/2001      NULL    842752      NULL       60        1/D
 CAO713      17/09/1993      NULL     39620      NULL      5ML        NIL
 CAO713      01/05/2001      NULL    842752      NULL       60        1/D
 CAO713      05/03/2001      NULL    842752      NULL       60        1/D



 y_data

  item_code    name                              formulation_code  strength
 bnf_code
 100              NEONACLEX K                    TABS      NULL    2.2.8
 110                NEONACLEX                     TABS       5MG    2.2.1
 50                   MESORB                         DRESS 10CMX10CM   20.3.1
 160 ABSORBENT CELLULOSE MESO            DRESS 10CMX10CM   20.3.1
 161 ABSORBENT CELLULOSE MESO            DRESS 10CMX15CM   20.3.1
 164 ABSORBENT CELLULOSE MESO            DRESS 20CMX25CM   20.3.1
 200                  SEPTRIN                        TABS     480MG    5.1.8
 210          SEPTRIN PAED SF                    SUSP 240MG/5ML    5.1.8
 212            SEPTRIN ADULT                     SUSP 480MG/5ML    5.1.8
 220            SEPTRIN FORTE                     TABS     960MG    5.1.8
  etc


 contains all the information for the item codes
 y was read in in the same way.

 I then used the following code:

 z  - sqldf(select * from x left join y using (code))

 when I use this on my real data I get an output:
  prochi prescribed_date dataMonth item_code res_seqno quantity directions
 1  CAO713      22/06/2001      NULL    842752      NULL       60
 1/D
 2  CAO713      28/04/2000      NULL      7800      NULL     100G
 A/TD
 3  CAO713      10/04/2000      NULL    842652      NULL       60
 1/D
 4  CAO713      03/07/2000      NULL    842652      NULL       60
 1/D
 5  CAO713      09/01/2001      NULL    842752      NULL       60
 1/D
 6  CAO713      16/10/2001      NULL    842752      NULL       60
 1/D
 7  CAO713      16/08/2001      NULL    842752      NULL       60
 1/D
 8  CAO713      17/09/1993      NULL     39620      NULL      5ML
 NIL
 9  CAO713      01/05/2001      NULL    842752      NULL       60
 1/D
 10 CAO713      05/03/2001      NULL    842752      NULL       60
 1/D
   no_of_packs datasource scan_ref_no         name formulation_code strength
 1         NULL        TSF        NULL         NA             NA     NA
 2         NULL        TSF        NULL BETNOVATE RD             OINT   0.025%
 3         NULL        TSF        NULL         NA             NA     NA
 4         NULL        TSF        NULL         NA             NA     NA
 5         NULL        TSF        NULL         NA             NA     NA
 6         NULL        TSF        NULL         NA             NA     NA
 7         NULL        TSF        NULL         NA             NA     NA
 8         NULL        TSF        NULL   GAMMABULIN              INJ    320MG
 9         NULL        TSF        NULL         NA             NA     NA
 10        NULL        TSF        NULL         NA             NA     NA
   bnf_code
 1      NA
 2  13.4.1.2
 3      NA
 4      NA
 5      NA
 6      NA
 7      NA
 8      14.5
 9      NA
 10     NA


 There is absolutely no reason for there to be NA anywhere as the
 information for both the tables is complete.

 Not sure what the problem is?

 Thanks,
 Natalie
 --
 View this message in context: 
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list

Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt
feb09_267_presc_items_tsf.txt 

is the total file for y so if I use the command line with the total data for
y then I get the output specified in z

Thanks,
Natalie
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

dput(x_data)

structure(list(prochi = c(CAO713, CAO713, CAO713, 
CAO713, CAO713, CAO713, CAO713, CAO713, 
CAO713, CAO713), prescribed_date = c(22/06/2001, 
28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001, 
16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), item_code = c(842752, 7800, 842652, 842652, 
842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60, 
5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D, 
1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF, 
TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)), .Names = c(prochi, prescribed_date, dataMonth, item_code, 
res_seqno, quantity, directions, no_of_packs, datasource, 
scan_ref_no), row.names = c(NA, 10L), class = data.frame)

-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

y_data - read.table(feb_267_presc_items_tsf.txt, header = TRUE, sep = |,
quote = \',
dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
nrows = 3864284,
skip = 0, check.names = TRUE,fill=TRUE,
strip.white = TRUE, blank.lines.skip = TRUE,
comment.char = #, allowEscapes = FALSE, flush = FALSE,
fileEncoding = , encoding = unknown) 

Will read the file in same that I have and I have posted the dput.

Sorry for not giving you what you originally wanted...

Natalie
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590826.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
Please provide code that I can just copy from your post and paste into
my session.   Either provide dput output as requested or provide the
files on the internet together with code that reads them off the
internet.

On Fri, Mar 12, 2010 at 12:06 PM, Newbie19_02 nvanzuy...@gmail.com wrote:

 http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt
 feb09_267_presc_items_tsf.txt

 is the total file for y so if I use the command line with the total data for
 y then I get the output specified in z

 Thanks,
 Natalie
 --
 View this message in context: 
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
What  about y_data?

On Fri, Mar 12, 2010 at 12:14 PM, Newbie19_02 nvanzuy...@gmail.com wrote:

 dput(x_data)

 structure(list(prochi = c(CAO713, CAO713, CAO713,
 CAO713, CAO713, CAO713, CAO713, CAO713,
 CAO713, CAO713), prescribed_date = c(22/06/2001,
 28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001,
 16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
 c(NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL), item_code = c(842752, 7800, 842652, 842652,
 842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
 c(NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60,
 5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D,
 1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF,
 TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
 )), .Names = c(prochi, prescribed_date, dataMonth, item_code,
 res_seqno, quantity, directions, no_of_packs, datasource,
 scan_ref_no), row.names = c(NA, 10L), class = data.frame)

 --
 View this message in context: 
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

The y_data file has over 9000 rows in it so I thought it would be more
practical to give you the file to download
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread David Winsemius
You have now given two different assignments to x_data and none to  
y_data:


The str( from the file access offering:

 str(x_data)
'data.frame':   2848 obs. of  5 variables:
 $ item_code   : int  100 110 150 160 161 164 200 210 212 220 ...
 $ name: chr  NEONACLEX K NEONACLEX MESORB  
ABSORBENT CELLULOSE MESO ...

 $ formulation_code: chr  TABS TABS DRESS DRESS ...
 $ strength: chr  NULL 5MG 10CMX10CM 10CMX10CM ...
 $ bnf_code: chr  2.2.8 2.2.1 20.3.1 20.3.1 ...

The str from assignment from the dput offering
 str(x_data)
'data.frame':   10 obs. of  10 variables:
 $ prochi : chr  CAO713 CAO713 CAO713  
CAO713 ...
 $ prescribed_date: chr  22/06/2001 28/04/2000 10/04/2000  
03/07/2000 ...

 $ dataMonth  : chr  NULL NULL NULL NULL ...
 $ item_code  : chr  842752 7800 842652 842652 ...
 $ res_seqno  : chr  NULL NULL NULL NULL ...
 $ quantity   : chr  60 100G 60 60 ...
 $ directions : chr  1/D A/TD 1/D 1/D ...
 $ no_of_packs: chr  NULL NULL NULL NULL ...
 $ datasource : chr  TSF TSF TSF TSF ...
 $ scan_ref_no: chr  NULL NULL NULL NULL ...

This code worked, but it is not clear that the x-y assignments were  
correct:


x_data - read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt 
, header = TRUE, sep = |, quote = \',
   dec = .,as.is = TRUE,na.strings = NA,colClasses =  
NA,

nrows = 3864284,
   skip = 0, check.names = TRUE,fill=TRUE,
   strip.white = TRUE, blank.lines.skip = TRUE,
   comment.char = #, allowEscapes = FALSE, flush = FALSE,
   fileEncoding = , encoding = unknown)

--
David.

On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:



The y_data file has over 9000 rows in it so I thought it would be more
practical to give you the file to download
--
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

Sorry!  It is the end of the day for me.

So 

dput(x)

structure(list(prochi = c(CAO713, CAO713, CAO713, 
CAO713, CAO713, CAO713, CAO713, CAO713, 
CAO713, CAO713), prescribed_date = c(22/06/2001, 
28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001, 
16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), item_code = c(842752, 7800, 842652, 842652, 
842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60, 
5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D, 
1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF, 
TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)), .Names = c(prochi, prescribed_date, dataMonth, item_code, 
res_seqno, quantity, directions, no_of_packs, datasource, 
scan_ref_no), row.names = c(NA, 10L), class = data.frame)


y_data -
read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt;,
header = TRUE, sep = |, quote = \',

  dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
nrows = 3864284,
  skip = 0, check.names = TRUE,fill=TRUE,
  strip.white = TRUE, blank.lines.skip = TRUE,
  comment.char = #, allowEscapes = FALSE, flush = FALSE,
  fileEncoding = , encoding = unknown)

So the y_data essentially contains the lookup table for the item codes in x.

Thanks and sorry for the mix up
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590849.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
Yes, its not clear what data was used.  Please provide a single email
that contains code that can be copied from R and pasted into another
session that will reproduce the problem.

x_data - ...dput output goes here...
y_data - dput output goes here ...
library(sqldf)
... rest of code ...

or else:

x_data - read.table(http://...;, ...)
y_data - read.table(http://...;. ...)
library(sqldf)
... rest of code ...

On Fri, Mar 12, 2010 at 12:29 PM, David Winsemius
dwinsem...@comcast.net wrote:
 You have now given two different assignments to x_data and none to y_data:

 The str( from the file access offering:

 str(x_data)
 'data.frame':   2848 obs. of  5 variables:
  $ item_code       : int  100 110 150 160 161 164 200 210 212 220 ...
  $ name            : chr  NEONACLEX K NEONACLEX MESORB ABSORBENT
 CELLULOSE MESO ...
  $ formulation_code: chr  TABS TABS DRESS DRESS ...
  $ strength        : chr  NULL 5MG 10CMX10CM 10CMX10CM ...
  $ bnf_code        : chr  2.2.8 2.2.1 20.3.1 20.3.1 ...

 The str from assignment from the dput offering
 str(x_data)
 'data.frame':   10 obs. of  10 variables:
  $ prochi         : chr  CAO713 CAO713 CAO713 CAO713
 ...
  $ prescribed_date: chr  22/06/2001 28/04/2000 10/04/2000 03/07/2000
 ...
  $ dataMonth      : chr  NULL NULL NULL NULL ...
  $ item_code      : chr  842752 7800 842652 842652 ...
  $ res_seqno      : chr  NULL NULL NULL NULL ...
  $ quantity       : chr  60 100G 60 60 ...
  $ directions     : chr  1/D A/TD 1/D 1/D ...
  $ no_of_packs    : chr  NULL NULL NULL NULL ...
  $ datasource     : chr  TSF TSF TSF TSF ...
  $ scan_ref_no    : chr  NULL NULL NULL NULL ...

 This code worked, but it is not clear that the x-y assignments were
 correct:

 x_data -
 read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt;,
 header = TRUE, sep = |, quote = \',
               dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
 nrows = 3864284,
               skip = 0, check.names = TRUE,fill=TRUE,
               strip.white = TRUE, blank.lines.skip = TRUE,
               comment.char = #, allowEscapes = FALSE, flush = FALSE,
               fileEncoding = , encoding = unknown)

 --
 David.

 On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:


 The y_data file has over 9000 rows in it so I thought it would be more
 practical to give you the file to download
 --
 View this message in context:
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Natalie Van Zuydam
Sorry!  It is the end of the day for me.

So

dput(x)

structure(list(prochi = c(CAO713, CAO713, CAO713,
CAO713, CAO713, CAO713, CAO713, CAO713,
CAO713, CAO713), prescribed_date = c(22/06/2001,
28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001,
16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
c(NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL), item_code = c(842752, 7800, 842652, 842652,
842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
c(NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60,
5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D,
1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF,
TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)), .Names = c(prochi, prescribed_date, dataMonth, item_code,
res_seqno, quantity, directions, no_of_packs, datasource,
scan_ref_no), row.names = c(NA, 10L), class = data.frame)


y_data -
read.table(file=
http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt;,
header = TRUE, sep = |, quote = \',

 dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
nrows = 3864284,
 skip = 0, check.names = TRUE,fill=TRUE,
 strip.white = TRUE, blank.lines.skip = TRUE,
 comment.char = #, allowEscapes = FALSE, flush = FALSE,
 fileEncoding = , encoding = unknown)

So the y_data essentially contains the lookup table for the item codes in x.

require(sqldf)
z - sqldf(select * from x left join y using (item_code))
z

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread David Winsemius

If I assign the file input to y_data and change you sqldf to

 z  - sqldf(select * from x_data left join y_data using  
(item_code)); z


I can replicate your result. Even after changing the types of the two  
item_code fields to match I still get the same result and when I see  
to what degree they share values I get:


 sum(x_data$item_code %in% y_data$item_code)
[1] 2
 sum(y_data$item_code %in% x_data$item_code)
[1] 2


So why are you so sure they are complete as you claimed in your  
first email.


--
David.


On Mar 12, 2010, at 12:29 PM, David Winsemius wrote:

You have now given two different assignments to x_data and none to  
y_data:


The str( from the file access offering:

 str(x_data)
'data.frame':   2848 obs. of  5 variables:
$ item_code   : int  100 110 150 160 161 164 200 210 212 220 ...
$ name: chr  NEONACLEX K NEONACLEX MESORB  
ABSORBENT CELLULOSE MESO ...

$ formulation_code: chr  TABS TABS DRESS DRESS ...
$ strength: chr  NULL 5MG 10CMX10CM 10CMX10CM ...
$ bnf_code: chr  2.2.8 2.2.1 20.3.1 20.3.1 ...

The str from assignment from the dput offering
 str(x_data)
'data.frame':   10 obs. of  10 variables:
$ prochi : chr  CAO713 CAO713 CAO713  
CAO713 ...
$ prescribed_date: chr  22/06/2001 28/04/2000 10/04/2000  
03/07/2000 ...

$ dataMonth  : chr  NULL NULL NULL NULL ...
$ item_code  : chr  842752 7800 842652 842652 ...
$ res_seqno  : chr  NULL NULL NULL NULL ...
$ quantity   : chr  60 100G 60 60 ...
$ directions : chr  1/D A/TD 1/D 1/D ...
$ no_of_packs: chr  NULL NULL NULL NULL ...
$ datasource : chr  TSF TSF TSF TSF ...
$ scan_ref_no: chr  NULL NULL NULL NULL ...

This code worked, but it is not clear that the x-y assignments  
were correct:


x_data - read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt 
, header = TRUE, sep = |, quote = \',
  dec = .,as.is = TRUE,na.strings = NA,colClasses =  
NA,

nrows = 3864284,
  skip = 0, check.names = TRUE,fill=TRUE,
  strip.white = TRUE, blank.lines.skip = TRUE,
  comment.char = #, allowEscapes = FALSE, flush = FALSE,
  fileEncoding = , encoding = unknown)

--
David.

On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:



The y_data file has over 9000 rows in it so I thought it would be  
more

practical to give you the file to download
--
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf

2009-11-19 Thread Gabor Grothendieck
Google for sql join and see the examples in Example 4 on the sqldf home page:
http://code.google.com/p/sqldf/#Example_4._Join


On Thu, Nov 19, 2009 at 2:30 PM, JoK LoQ jok...@gmail.com wrote:

 Hello,

   I would like some help with sqldf syntax.

   Suppose I have table 1 and table 2.
   What do I have to do to generate a table with columns 2,5,6 from table 1
 (for example), and columns 3,4,5,9 from table 2, but only when values in
 column 2 from table 1 are equal to values in column 5 from table 2.

 tnks
 --
 View this message in context: 
 http://old.nabble.com/sqldf-tp26421449p26421449.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf with date (class date) variables ?

2009-01-06 Thread Gabor Grothendieck
sqldf has some facilities for Date class but has no knowledge of date class.

See example 7b on home page: http://sqldf.googlecode.com
and here is a second example:

 DF - data.frame(D = as.Date(c(21DEC2006,15DEC2006), %d%b%Y))
 DF
   D
1 2006-12-21
2 2006-12-15
 sqldf(select * from DF)
   D
1 2006-12-21
2 2006-12-15


On Tue, Jan 6, 2009 at 1:09 PM,  si...@ntlworld.com wrote:
 Hello,

 I am having trouble with retrieving some data in queries involve with 
 variables
 with data type date. See the enclosed example:

 ll-c(21DEC2006,15DEC2006)
 ss-data.frame(ll)
 ss-transform(ss,ll=as.date(as.character(ll)))
 (ss)
 ll
 1 17156
 2 17150

 tt-sqldf(select ll from ss)

 (tt)

  ll
 1 NA
 2 NA

 str(ss)
 'data.frame':   2 obs. of  1 variable:
  $ ll:Class 'date'  int [1:2] 17156 17150

 str(tt)
 'data.frame':   2 obs. of  1 variable:
  $ ll:Class 'date'  int [1:2] NA NA

 So all the elements of ll (of tt) has been converted into NA. Is there anyway 
 that I can query the variables with date data type using sqldf?


 Many Thanks,

 Si.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf file specification, non-ASCII

2008-04-03 Thread Duncan Murdoch
On 4/3/2008 10:22 AM, Peter Jepsen wrote:
 Dear R-Listers,
 
 I am a Windows user (R 2.6.2) using the development version of sqldf to
 try to read a 3GB file originally stored in .sas7bdat-format. I convert
 it to comma-delimited ASCII format with StatTransfer before trying to
 import just the rows I need into R. The problem is that I get this
 error:
 
 f - file(hugedata.csv)
 DF - sqldf(select * from f where C_OPR like 'KKA2%',
 file.format=list(header=T, row.names=F))
 Error in try({ : 
   RS-DBI driver: (RS_sqlite_import: hugedata.csv line 1562740 expected
 52 columns of data but found 19)
 Error in sqliteExecStatement(con, statement, bind.data) : 
   RS-DBI driver: (error in statement: no such table: f)

That error message looks pretty clear:  there's a problem on line 
1562740.  Can you look at that line and spot what the problem is?  If 
you don't have a text editor that can handle big files, you should be 
able to do it with something like this:

f - file(hugedata.csv, r)

skip - 1562739
while (skip  1) {
   junk - readLines(f, 1)
   skip - skip - 1
}
junk - readLines(f, skip)
readLines(f, 1)



 
 Now, I know that my SAS-using colleagues are able to use this file with
 SAS, so I was wondering whether StatTransfer'ing it to the SAS XPORT
 format which can be read with the 'read.xport' function in the 'foreign'
 package would be a better approach. 

R can usually read CSV files without a problem.  You've likely got a 
problem in your file on that line; you just need to figure out what it 
is, and fix it.  (It's possible the sqldf function has a bug, but I'd 
suspect the file, first.)

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf file specification, non-ASCII

2008-04-03 Thread Peter Jepsen
Thank you for your help, Duncan and Gabor. Yes, I found an early line
feed in line 1562740, so I have corrected that error. The thing is, it
takes me many, many hours to save the file, so I would like to confirm
that there are no more errors further down the file. The ffe tool sounds
like a perfect tool for this job, but it doesn't seem to be available
for Windows. Is anybody out there aware of a similar Windows tool?

Thank you again for your help.
Peter.

-Original Message-
From: Gabor Grothendieck [mailto:[EMAIL PROTECTED] 
Sent: 3. april 2008 17:08
To: Peter Jepsen
Subject: Re: [R] sqldf file specification, non-ASCII

One other thing you could try would be to run it through
ffe (fast file extractor) which is a free utility that you can
find via google.  Use the ffe's loose argument.  It can find
bad lines and since its not dependent on R would give
you and independent check.  Regards.

On Thu, Apr 3, 2008 at 10:36 AM, Gabor Grothendieck
[EMAIL PROTECTED] wrote:
 Hi, Can you try it with the first 100 lines, say, of the data and
 also try reading it with read.csv to double check your arguments
 (note that sql args are similar but not entirely identical to
read.csv)
 and if it still gives this error send me that 100 line file and I will
 look at it tonight or tomorrow.  Regards.


 On Thu, Apr 3, 2008 at 10:22 AM, Peter Jepsen [EMAIL PROTECTED] wrote:
  Dear R-Listers,
 
  I am a Windows user (R 2.6.2) using the development version of sqldf
to
  try to read a 3GB file originally stored in .sas7bdat-format. I
convert
  it to comma-delimited ASCII format with StatTransfer before trying
to
  import just the rows I need into R. The problem is that I get this
  error:
 
   f - file(hugedata.csv)
   DF - sqldf(select * from f where C_OPR like 'KKA2%',
  file.format=list(header=T, row.names=F))
  Error in try({ :
   RS-DBI driver: (RS_sqlite_import: hugedata.csv line 1562740
expected
  52 columns of data but found 19)
  Error in sqliteExecStatement(con, statement, bind.data) :
   RS-DBI driver: (error in statement: no such table: f)
 
  Now, I know that my SAS-using colleagues are able to use this file
with
  SAS, so I was wondering whether StatTransfer'ing it to the SAS XPORT
  format which can be read with the 'read.xport' function in the
'foreign'
  package would be a better approach. The problem is, I don't know
  how/whether I can do that at all with sqldf. I tried various ways
like
  f - file(read.xport(hugedata.xport))
  but I consistently got an error message from the sqldf command. I
don't
  recall the exact error message, unfortunately, but can anybody tell
me
  whether it is at all possible to read in files in non-ASCII format
  without having to put them in R memory?
 
  Thank you for your assistance.
  Peter.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf file specification, non-ASCII

2008-04-03 Thread Gabor Grothendieck
The Windows version is on sourceforge.

On Thu, Apr 3, 2008 at 1:29 PM, Peter Jepsen [EMAIL PROTECTED] wrote:
 Thank you for your help, Duncan and Gabor. Yes, I found an early line
 feed in line 1562740, so I have corrected that error. The thing is, it
 takes me many, many hours to save the file, so I would like to confirm
 that there are no more errors further down the file. The ffe tool sounds
 like a perfect tool for this job, but it doesn't seem to be available
 for Windows. Is anybody out there aware of a similar Windows tool?

 Thank you again for your help.
 Peter.


 -Original Message-
 From: Gabor Grothendieck [mailto:[EMAIL PROTECTED]
 Sent: 3. april 2008 17:08
 To: Peter Jepsen
 Subject: Re: [R] sqldf file specification, non-ASCII

 One other thing you could try would be to run it through
 ffe (fast file extractor) which is a free utility that you can
 find via google.  Use the ffe's loose argument.  It can find
 bad lines and since its not dependent on R would give
 you and independent check.  Regards.

 On Thu, Apr 3, 2008 at 10:36 AM, Gabor Grothendieck
 [EMAIL PROTECTED] wrote:
  Hi, Can you try it with the first 100 lines, say, of the data and
  also try reading it with read.csv to double check your arguments
  (note that sql args are similar but not entirely identical to
 read.csv)
  and if it still gives this error send me that 100 line file and I will
  look at it tonight or tomorrow.  Regards.
 
 
  On Thu, Apr 3, 2008 at 10:22 AM, Peter Jepsen [EMAIL PROTECTED] wrote:
   Dear R-Listers,
  
   I am a Windows user (R 2.6.2) using the development version of sqldf
 to
   try to read a 3GB file originally stored in .sas7bdat-format. I
 convert
   it to comma-delimited ASCII format with StatTransfer before trying
 to
   import just the rows I need into R. The problem is that I get this
   error:
  
f - file(hugedata.csv)
DF - sqldf(select * from f where C_OPR like 'KKA2%',
   file.format=list(header=T, row.names=F))
   Error in try({ :
RS-DBI driver: (RS_sqlite_import: hugedata.csv line 1562740
 expected
   52 columns of data but found 19)
   Error in sqliteExecStatement(con, statement, bind.data) :
RS-DBI driver: (error in statement: no such table: f)
  
   Now, I know that my SAS-using colleagues are able to use this file
 with
   SAS, so I was wondering whether StatTransfer'ing it to the SAS XPORT
   format which can be read with the 'read.xport' function in the
 'foreign'
   package would be a better approach. The problem is, I don't know
   how/whether I can do that at all with sqldf. I tried various ways
 like
   f - file(read.xport(hugedata.xport))
   but I consistently got an error message from the sqldf command. I
 don't
   recall the exact error message, unfortunately, but can anybody tell
 me
   whether it is at all possible to read in files in non-ASCII format
   without having to put them in R memory?
  
   Thank you for your assistance.
   Peter.
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf error

2008-01-29 Thread Gabor Grothendieck
I can't reproduce your error:

 library(sqldf)
 sqldf(select  *  from  warpbreaks  limit 6)
  breaks wool tension
1 26A   L
2 30A   L
3 54A   L
4 25A   L
5 70A   L
6 52A   L
 R.version.string # Vista
[1] R version 2.6.2 alpha (2008-01-26 r44181)

A few things to try:

1. maybe you have some other package loaded that is interfering?  The error
occurs in 'combine' but combine is not a function in sqldf nor in RSQLite, DBI
or proto -- the packages on which it depends.  Try it from a new session.  Be
sure you don't have an .Rprofile file that is loading other packages.

Rgui --vanilla
library(sqldf)
sqldf(...command...)

2. Try issuing the source statement in example 6 on the sqldf home page first
(that will get you the development version) and try that.  I doubt that will
solve it but its worth a try. http://sqldf.googlecode.com




On Jan 29, 2008 5:58 AM, Werner Wernersen [EMAIL PROTECTED] wrote:
 Hi,

 sqldf sounds like a very useful package but I don't
 even get the example to run:
  a1s  -  sqldf(select  *  from  warpbreaks  limit
 6)
 Error in combine(FUN(...)) : argument value is
 missing, with no default
 

 I am using R 2.6.1 on Windows Vista Business and have
 updated all packages.

 Some help would be very much appreciated.

 Many thanks,
   Werner

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf error

2008-01-29 Thread Werner Wernersen
I cleaned up the environment and restarted everything
and it does work. But I have no idea what it is what
is different now. Anyway, it works!

Thanks a lot Gabor!

Best,
  Werner

 I can't reproduce your error:
 
  library(sqldf)
  sqldf(select  *  from  warpbreaks  limit 6)
   breaks wool tension
 1 26A   L
 2 30A   L
 3 54A   L
 4 25A   L
 5 70A   L
 6 52A   L
  R.version.string # Vista
 [1] R version 2.6.2 alpha (2008-01-26 r44181)
 
 A few things to try:
 
 1. maybe you have some other package loaded that is
 interfering?  The error
 occurs in 'combine' but combine is not a function in
 sqldf nor in RSQLite, DBI
 or proto -- the packages on which it depends.  Try
 it from a new session.  Be
 sure you don't have an .Rprofile file that is
 loading other packages.
 
 Rgui --vanilla
 library(sqldf)
 sqldf(...command...)
 
 2. Try issuing the source statement in example 6 on
 the sqldf home page first
 (that will get you the development version) and try
 that.  I doubt that will
 solve it but its worth a try.
 http://sqldf.googlecode.com
 
 
 
 
 On Jan 29, 2008 5:58 AM, Werner Wernersen
 [EMAIL PROTECTED] wrote:
  Hi,
 
  sqldf sounds like a very useful package but I
 don't
  even get the example to run:
   a1s  -  sqldf(select  *  from  warpbreaks 
 limit
  6)
  Error in combine(FUN(...)) : argument value is
  missing, with no default
  
 
  I am using R 2.6.1 on Windows Vista Business and
 have
  updated all packages.
 
  Some help would be very much appreciated.
 
  Many thanks,
Werner
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.