Re: [R] sqldf returns Error: database or disk is full

2022-06-27 Thread Ebert,Timothy Aaron
The simple interpretation is that you have run out of space and it takes sqldf 
30 minutes to work through the error process and then return the error. A 
reason for sql is the more compact representation of a sparse matrix. Changing 
it into a data frame may result in a data structure with many empty cells that 
consumes a great deal more memory.

-Original Message-
From: R-help  On Behalf Of Ian Worthington via 
R-help
Sent: Monday, June 27, 2022 5:06 PM
Cc: R-help 
Subject: Re: [R] sqldf returns Error: database or disk is full

[External Email]

Well, iiuc the df is copied to the db early on in the process. This is constant 
with the temporarily file growing rapidly in size at the start of the process 
and then not challenging for 30 minutes until the error message is produced.
I have no idea what sqldf is doing under the hood.

Sent from Yahoo Mail on Android

  On Mon, 27 Jun 2022 at 23:00, Ebert,Timothy Aaron wrote:   I 
would try answering these questions:
1) What is the expected size?
2) Is the current size smaller or larger?
3) Every program running on your computer takes up memory. Given all the other 
programs currently running, do you have space?

Maybe someone can help by suggesting a way to get R to confess how much memory 
it has available given current system resources.


Tim
-Original Message-
From: R-help  On Behalf Of Bert Gunter
Sent: Monday, June 27, 2022 1:44 PM
To: Ian Worthington 
Cc: R-help 
Subject: Re: [R] sqldf returns Error: database or disk is full

[External Email]

You should probably heed the warning messages you received before doing 
anything further.

Bert

On Mon, Jun 27, 2022, 2:55 PM Ian Worthington via R-help < 
r-help@r-project.org> wrote:

> I'm using sqldf() to execute a query using a named dbname.  It appears 
> to create an external db but after 30 minutes or so returns:
>
> Error: database or disk is full
> The disk is not full, so have I hit some internal db size limit?  The 
> external db created is about 610MB and doesn't grow, so maybe it's 
> running out  of room for the result set, where ever that is stored?
>
> Code being executed is:
>  sqlCmds <- c(#"create index ixsip on ipTrace100(source)",
>    "create index ixdip on ipTrace100(destination)",
>    #"create index ixspo on ipTrace100(SrcPo)",
>    "create index ixdpo on ipTrace100(DstPo)",
>    #"create index ixno  on ipTrace100(`No.`)",
>    # "explain query plan
>    "select tr1.`No.` InNo,  tr1.Time InTime, tr1.source
> InSource, /*tr1.destination InDest,*/ tr1.SrcPo InSrcPo, tr1.DstPo InDstPo,
>    min(tr2.`No.`) OutNo, tr2.Time OutTime /*,  
>tr2.source OutSource,*/ /*tr2.destination OutDest,*/ /*tr2.SrcPo  
>OutSrcPo*/ /*, tr2.DstPo OutDstPo */
>    from main.ipTrace100 tr1
>    join main.ipTrace100 tr2
>    on tr1.destination = '10.27.187.1' and
>  tr1.source = tr2.destination and
>  tr1.SrcPo = tr2.DstPo and
>  tr1.`No.` < tr2.`No.`
>    group by tr1.`No.`"
>    )
>  (tempfile <- tempfile())
>  system.time( sqldf(sqlCmds, dbname=tempfile, verbose=TRUE) ) and  
>console log:
>
> |
> |  |
>
> >  system.time( sqldf(sqlCmds, dbname=tempfile, verbose=TRUE) )
> sqldf: library(RSQLite)
> sqldf: m <- dbDriver("SQLite")
> sqldf: connection <- dbConnect(m, dbname =
> "C:\Users\PIWN01~1\AppData\Local\Temp\RtmpSycE1E\file78e464e72860")
> sqldf: initExtension(connection)
> sqldf: dbWriteTable(connection, 'ipTrace100', ipTrace100, row.names =
> FALSE)
> sqldf: dbGetQuery(connection, 'create index ixdip on
> ipTrace100(destination)')
> Warning in result_fetch(res@ptr, n = n) :
>  SQL statements must be issued with dbExecute() or dbSendStatement()  
>instead of dbGetQuery() or dbSendQuery().
> sqldf: dbGetQuery(connection, 'create index ixdpo on
> ipTrace100(DstPo)') Warning in result_fetch(res@ptr, n = n) :
>  SQL statements must be issued with dbExecute() or dbSendStatement()  
>instead of dbGetQuery() or dbSendQuery().
> sqldf: dbGetQuery(connection, 'select tr1.`No.` InNo,  tr1.Time
> InTime, tr1.source InSource, /*tr1.destination InDest,*/ tr1.SrcPo  
>InSrcPo, tr1.DstPo InDstPo,
>    min(tr2.`No.`) OutNo, tr2.Time OutTime /*,  
>tr2.source OutSource,*/ /*tr2.destination OutDest,*/ /*tr2.SrcPo  
>OutSrcPo*/ /*, tr2.DstPo OutDstPo */
>    from main.ipTrace100 tr1
>    join main.ipTrace100 tr2
>    on tr1.destination = '10.27.187.1' and
>  tr1.source = tr2.destination and
>  tr1.SrcPo = tr2.Ds

Re: [R] sqldf and number of records affected

2020-06-11 Thread Ravi Jeyaraman
You're correct.  It does work.  I was looking at some other result printed.  My 
bad.

Looks like we can also get the same result using 'SELECT changes()'. 

Approach 1:

con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
sqldf()
suppressWarnings(sqldf(c(" update con set V1 = 0 where V1 > 5 ", "select 
changes()")))
sqldf("select * from main.con")
sqldf()

Approach 2:

con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
sqldf()
suppressWarnings(sqldf(c("pragma count_changes = 1", "update con set V1 = 0 
where V1 > 5 ")))
sqldf("select * from main.con")
sqldf()



-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Thursday, June 11, 2020 10:30 AM
To: Ravi Jeyaraman 
Cc: r-help@r-project.org
Subject: Re: [R] sqldf and number of records affected

There is no real difference between your example and the example I provided.
Both use a data.frame in R and both work for me under R 3.5 with RSQLite 2.2.0 
See log below.

Note that there is a bug in R 4.0 related to tcltk that could possibly affect 
sqldf as it uses tcltk.  A fix has been announced for R 4.0.2.

> library(sqldf)
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
> con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
> sqldf()

  Path: :memory:
  Extensions: TRUE
> sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 
> 5 "))
  rows updated
15
Warning message:
In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement() instead 
of dbGetQuery() or dbSendQuery().
> ans <- sqldf("select * from main.con")
> sqldf()
NULL
> ans
   V1
1   1
2   2
3   3
4   4
5   5
6   0
7   0
8   0
9   0
10  0
> R.version.string
[1] "R version 3.5.3 (2019-03-11)"
> packageVersion("sqldf")
[1] ‘0.4.11’
> packageVersion("RSQLite")
[1] ‘2.2.0’
> packageVersion("DBI")
[1] ‘1.1.0’



On Thu, Jun 11, 2020 at 10:06 AM Ravi Jeyaraman  wrote:
>
> Thanks for the response Gabor.  Looks like the below example will work when 
> using SQLite, but in my case I'm just creating a dataframe in R and trying to 
> update it using sqldf as below and it doesn't seem to work ...
>
> con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
> sqldf()
> sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 
> 5 ")) ans <- sqldf("select * from main.con")
> sqldf()
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Thursday, June 11, 2020 9:12 AM
> To: Ravi Jeyaraman 
> Cc: r-help@r-project.org
> Subject: Re: [R] sqldf and number of records affected
>
> Here is an example.  Ignore the warning or use the workaround 
> discussed here
> https://github.com/ggrothendieck/sqldf/issues/40
> to avoid the warning.
>
>   library(sqldf)
>   sqldf()  # use same connection until next sqldf()
>   sqldf(c("pragma count_changes = 1", "update BOD set demand = 99 where Time 
> > 4"))
>   sqldf("select * from main.BOD")
>   sqldf()
>
>
> On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman  wrote:
> >
> > Hello all, When I execute a SQL using SQLDF, how do I get the number 
> > of records affected?  I mean, if I run an UPDATE on a data frame, it 
> > doesn't tell me if and how many records got updated.  I've read 
> > through the documentation and there don't seem to be a way to get 
> > this info unless it's done on a database.  Any ideas?
> >
> > Thanks
> > Ravi
> >
> >
> > --
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf and number of records affected

2020-06-11 Thread Gabor Grothendieck
There is no real difference between your example and the example I provided.
Both use a data.frame in R and both work for me under R 3.5 with RSQLite 2.2.0
See log below.

Note that there is a bug in R 4.0 related to tcltk that could possibly affect
sqldf as it uses tcltk.  A fix has been announced for R 4.0.2.

> library(sqldf)
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
> con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
> sqldf()

  Path: :memory:
  Extensions: TRUE
> sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 5 "))
  rows updated
15
Warning message:
In result_fetch(res@ptr, n = n) :
  SQL statements must be issued with dbExecute() or dbSendStatement()
instead of dbGetQuery() or dbSendQuery().
> ans <- sqldf("select * from main.con")
> sqldf()
NULL
> ans
   V1
1   1
2   2
3   3
4   4
5   5
6   0
7   0
8   0
9   0
10  0
> R.version.string
[1] "R version 3.5.3 (2019-03-11)"
> packageVersion("sqldf")
[1] ‘0.4.11’
> packageVersion("RSQLite")
[1] ‘2.2.0’
> packageVersion("DBI")
[1] ‘1.1.0’



On Thu, Jun 11, 2020 at 10:06 AM Ravi Jeyaraman  wrote:
>
> Thanks for the response Gabor.  Looks like the below example will work when 
> using SQLite, but in my case I'm just creating a dataframe in R and trying to 
> update it using sqldf as below and it doesn't seem to work ...
>
> con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
> sqldf()
> sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 5 "))
> ans <- sqldf("select * from main.con")
> sqldf()
>
> -Original Message-
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Thursday, June 11, 2020 9:12 AM
> To: Ravi Jeyaraman 
> Cc: r-help@r-project.org
> Subject: Re: [R] sqldf and number of records affected
>
> Here is an example.  Ignore the warning or use the workaround discussed here
> https://github.com/ggrothendieck/sqldf/issues/40
> to avoid the warning.
>
>   library(sqldf)
>   sqldf()  # use same connection until next sqldf()
>   sqldf(c("pragma count_changes = 1", "update BOD set demand = 99 where Time 
> > 4"))
>   sqldf("select * from main.BOD")
>   sqldf()
>
>
> On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman  wrote:
> >
> > Hello all, When I execute a SQL using SQLDF, how do I get the number
> > of records affected?  I mean, if I run an UPDATE on a data frame, it
> > doesn't tell me if and how many records got updated.  I've read
> > through the documentation and there don't seem to be a way to get this
> > info unless it's done on a database.  Any ideas?
> >
> > Thanks
> > Ravi
> >
> >
> > --
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>


--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf and number of records affected

2020-06-11 Thread Ravi Jeyaraman
Thanks for the response Gabor.  Looks like the below example will work when 
using SQLite, but in my case I'm just creating a dataframe in R and trying to 
update it using sqldf as below and it doesn't seem to work ...

con <- data.frame(V1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
sqldf()
sqldf(c("pragma count_changes = 1", "update con set V1 = 0 where V1 > 5 "))
ans <- sqldf("select * from main.con")
sqldf()

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Thursday, June 11, 2020 9:12 AM
To: Ravi Jeyaraman 
Cc: r-help@r-project.org
Subject: Re: [R] sqldf and number of records affected

Here is an example.  Ignore the warning or use the workaround discussed here
https://github.com/ggrothendieck/sqldf/issues/40
to avoid the warning.

  library(sqldf)
  sqldf()  # use same connection until next sqldf()
  sqldf(c("pragma count_changes = 1", "update BOD set demand = 99 where Time > 
4"))
  sqldf("select * from main.BOD")
  sqldf()


On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman  wrote:
>
> Hello all, When I execute a SQL using SQLDF, how do I get the number 
> of records affected?  I mean, if I run an UPDATE on a data frame, it 
> doesn't tell me if and how many records got updated.  I've read 
> through the documentation and there don't seem to be a way to get this 
> info unless it's done on a database.  Any ideas?
>
> Thanks
> Ravi
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com


-- 
This email has been checked for viruses by AVG.
https://www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf and number of records affected

2020-06-11 Thread Gabor Grothendieck
Here is an example.  Ignore the warning or use the workaround discussed here
https://github.com/ggrothendieck/sqldf/issues/40
to avoid the warning.

  library(sqldf)
  sqldf()  # use same connection until next sqldf()
  sqldf(c("pragma count_changes = 1", "update BOD set demand = 99
where Time > 4"))
  sqldf("select * from main.BOD")
  sqldf()


On Thu, Jun 11, 2020 at 9:01 AM Ravi Jeyaraman  wrote:
>
> Hello all, When I execute a SQL using SQLDF, how do I get the number of
> records affected?  I mean, if I run an UPDATE on a data frame, it doesn't
> tell me if and how many records got updated.  I've read through the
> documentation and there don't seem to be a way to get this info unless it's
> done on a database.  Any ideas?
>
> Thanks
> Ravi
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf and number of records affected

2020-06-11 Thread Ravi Jeyaraman
Hello all, When I execute a SQL using SQLDF, how do I get the number of
records affected?  I mean, if I run an UPDATE on a data frame, it doesn't
tell me if and how many records got updated.  I've read through the
documentation and there don't seem to be a way to get this info unless it's
done on a database.  Any ideas?

Thanks
Ravi


-- 
This email has been checked for viruses by AVG.
https://www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf --Warning message:

2016-02-19 Thread Gabor Grothendieck
sqldf does not use Tk so you can ignore this.

On Fri, Feb 19, 2016 at 12:32 PM, Divakar Reddy
 wrote:
> Dear R users,
>
> I'm getting Waring message while trying to load "sqldf" package in R3.2.3
> and assuming that we can ignore this as it's WARNING Message and not an
> error message.
> Can you guide me if my assumption is wrong?
>
>
>> library(sqldf);
> Loading required package: gsubfn
> Loading required package: proto
> Loading required package: RSQLite
> Loading required package: DBI
> Warning message:
> no DISPLAY variable so Tk is not available
>
>> version   _
> platform   x86_64-redhat-linux-gnu
> version.string R version 3.2.3 (2015-12-10)
>>
>
> Thanks,
> Divakar
> Phoenix,USA
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf --Warning message:

2016-02-19 Thread Divakar Reddy
Dear R users,

I'm getting Waring message while trying to load "sqldf" package in R3.2.3
and assuming that we can ignore this as it's WARNING Message and not an
error message.
Can you guide me if my assumption is wrong?


> library(sqldf);
Loading required package: gsubfn
Loading required package: proto
Loading required package: RSQLite
Loading required package: DBI
Warning message:
no DISPLAY variable so Tk is not available

> version   _
platform   x86_64-redhat-linux-gnu
version.string R version 3.2.3 (2015-12-10)
>

Thanks,
Divakar
Phoenix,USA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf() difference between R 3.1.2 and 3.0.1

2015-02-11 Thread Doran, Harold
That seems to have worked, both in the new and old version of R. I'll do more 
unit testing on other files.

Thank you, Gabor.

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Wednesday, February 11, 2015 10:22 AM
To: Doran, Harold
Cc: r-help@r-project.org
Subject: Re: [R] sqldf() difference between R 3.1.2 and 3.0.1

On Wed, Feb 11, 2015 at 9:45 AM, Doran, Harold hdo...@air.org wrote:
 I have a function written and tested using R 3.0.1 and sqldf_0.4-7.1 that 
 works perfectly. However, using this same code with R 3.1.2 and sqldf_0.4-10 
 yields the error below that I am having a difficult time deciphering. Hence, 
 same code behaves differently on different versions of R and sqldf().

 Error in sqliteSendQuery(con, statement, bind.data) :
   error in statement: no such column: V1


 Reproducible example below as well as complete sessionInfo all provided below.


 My function and code using the function are below.

 dorReader - function(dorFile, layout, sepChar = '\n'){
 sepChar - as.character(sepChar)
 dorFile - as.character(dorFile)
 layout$type2 - ifelse(layout$type == 'C', 'character',
 
 ifelse(layout$type == 'N', 'numeric', 'Date'))
 dor - file(dorFile)
 attr(dor, file.format) - list(sep = sepChar)
 getVars - paste(select,
paste(substr(V1, , layout$Start, , ,
  layout$Length, ) ', layout$Variable.Name, ', 
 collapse = , ), from dor)
 dat - sqldf(getVars)

 classConverter - function(obj, types){
 out - lapply(1:length(obj),FUN = 
 function(i){FUN1 - switch(types[i],character = as.character,numeric = 
 as.numeric,factor = as.factor, Date = as.character); FUN1(obj[,i])})
 names(out) - colnames(obj)
 as.data.frame(out)
 }
 dat - classConverter(dat, layout$type2)
 names(dat) - layout$Variable.Name
 dat
 }

 ### contents of fwf file 'sample.txt'
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567

 layout - data.frame(Variable.Name =c('test1', 'test2'), Length = 
 c(3,4), Start =c(1,4), End = c(3,7), type = c('N', 'N'))

 tmp - dorReader('sample.txt', layout)

sqldf is documented to use the sqliteImportFile defaults for file.format 
components.  It may be that RSQLite 1.0 has changed the default for header in 
sqliteImportFile.

Try replacing your statement that sets file.format with this:

attr(dor, file.format) - list(sep = sepChar, header = FALSE)
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf() difference between R 3.1.2 and 3.0.1

2015-02-11 Thread Gabor Grothendieck
On Wed, Feb 11, 2015 at 9:45 AM, Doran, Harold hdo...@air.org wrote:
 I have a function written and tested using R 3.0.1 and sqldf_0.4-7.1 that 
 works perfectly. However, using this same code with R 3.1.2 and sqldf_0.4-10 
 yields the error below that I am having a difficult time deciphering. Hence, 
 same code behaves differently on different versions of R and sqldf().

 Error in sqliteSendQuery(con, statement, bind.data) :
   error in statement: no such column: V1


 Reproducible example below as well as complete sessionInfo all provided below.


 My function and code using the function are below.

 dorReader - function(dorFile, layout, sepChar = '\n'){
 sepChar - as.character(sepChar)
 dorFile - as.character(dorFile)
 layout$type2 - ifelse(layout$type == 'C', 'character',
 
 ifelse(layout$type == 'N', 'numeric', 'Date'))
 dor - file(dorFile)
 attr(dor, file.format) - list(sep = sepChar)
 getVars - paste(select,
paste(substr(V1, , layout$Start, , ,
  layout$Length, ) ', layout$Variable.Name, ', 
 collapse = , ), from dor)
 dat - sqldf(getVars)

 classConverter - function(obj, types){
 out - lapply(1:length(obj),FUN = 
 function(i){FUN1 - switch(types[i],character = as.character,numeric = 
 as.numeric,factor = as.factor, Date = as.character); FUN1(obj[,i])})
 names(out) - colnames(obj)
 as.data.frame(out)
 }
 dat - classConverter(dat, layout$type2)
 names(dat) - layout$Variable.Name
 dat
 }

 ### contents of fwf file 'sample.txt'
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567
 1234567

 layout - data.frame(Variable.Name =c('test1', 'test2'), Length = c(3,4), 
 Start =c(1,4), End = c(3,7), type = c('N', 'N'))

 tmp - dorReader('sample.txt', layout)

sqldf is documented to use the sqliteImportFile defaults for
file.format components.  It may be that RSQLite 1.0 has changed the
default for header in sqliteImportFile.

Try replacing your statement that sets file.format with this:

attr(dor, file.format) - list(sep = sepChar, header = FALSE)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf() difference between R 3.1.2 and 3.0.1

2015-02-11 Thread Doran, Harold
I have a function written and tested using R 3.0.1 and sqldf_0.4-7.1 that works 
perfectly. However, using this same code with R 3.1.2 and sqldf_0.4-10 yields 
the error below that I am having a difficult time deciphering. Hence, same code 
behaves differently on different versions of R and sqldf().

Error in sqliteSendQuery(con, statement, bind.data) :
  error in statement: no such column: V1


Reproducible example below as well as complete sessionInfo all provided below.


My function and code using the function are below.

dorReader - function(dorFile, layout, sepChar = '\n'){
sepChar - as.character(sepChar)
dorFile - as.character(dorFile)
layout$type2 - ifelse(layout$type == 'C', 'character',

ifelse(layout$type == 'N', 'numeric', 'Date'))
dor - file(dorFile)
attr(dor, file.format) - list(sep = sepChar)
getVars - paste(select,
   paste(substr(V1, , layout$Start, , ,
 layout$Length, ) ', layout$Variable.Name, ', collapse 
= , ), from dor)
dat - sqldf(getVars)

classConverter - function(obj, types){
out - lapply(1:length(obj),FUN = 
function(i){FUN1 - switch(types[i],character = as.character,numeric = 
as.numeric,factor = as.factor, Date = as.character); FUN1(obj[,i])})
names(out) - colnames(obj)
as.data.frame(out)
}
dat - classConverter(dat, layout$type2)
names(dat) - layout$Variable.Name
dat
}

### contents of fwf file 'sample.txt'
1234567
1234567
1234567
1234567
1234567
1234567
1234567
1234567

layout - data.frame(Variable.Name =c('test1', 'test2'), Length = c(3,4), 
Start =c(1,4), End = c(3,7), type = c('N', 'N'))

tmp - dorReader('sample.txt', layout)

### SessionInfo where functions behaves as expected
 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C   LC_TIME=English_United States.1252

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] sqldf_0.4-7.1 RSQLite.extfuns_0.0.1 RSQLite_0.11.4DBI_0.2-7 
gsubfn_0.6-5
[6] proto_0.3-10  MiscPsycho_1.6statmod_1.4.18

loaded via a namespace (and not attached):
[1] chron_2.3-45 tools_3.0.1




### SessionInfo for version not working
 sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] sqldf_0.4-10  RSQLite_1.0.0 DBI_0.3.1 gsubfn_0.6-6  proto_0.3-10

loaded via a namespace (and not attached):
[1] chron_2.3-45 tools_3.1.2

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf problems

2014-07-14 Thread Juan Daniel García
Hello:
I'm trying to run this code

data2 - sqldf ( SELECT plot, age,  avg(N) as N FROM data1 GROUP BY plot,
t)

The problem is that when calling sqldf with
require(sqldf)

appears this message

Warning message:
In sqliteCloseConnection(conn, ...) :
  RS-DBI driver warning: (closing pending result sets before closing this
connection)

If try again

require(sqldf)

the message does not appear, but when trying the code afterwards, I get a
new message

Error in sqliteFetch(rs, n = -1, ...) :
  RSQLite driver: (RS_SQLite_fetch: failed: Domain error)

I have a new Windows 8.1

Do I need any other installations o packages or something else?
The code was working in older Windows XP and Linux

Thanks

-- 
**
Juan Daniel García Villabrille, Ingeniero de Montes, colegiado nº 5.319
627 188 682
e-mail: juandaniel...@gmail.com
**
Cuando imprima este correo, no olvide reciclarlo. Porque el papel es el
soporte natural, renovable y reciclable de lo más humano: la palabra.

##-- AVISO --##
La información a incluir en este e-mail es CONFIDENCIAL, siendo de uso
exclusivo del destinatario arriba mencionado. Si usted lee este mensaje y
no es el destinatario indicado, le informamos que está totalmente prohibida
cualquier utilización, divulgación, distribución y/o reproducción de esta
comunicación sin autorización expresa en virtud de la legislación vigente.
Si ha recibido este mensaje por error, le rogamos nos lo notifique
inmediatamente por esta misma vía y proceda a su eliminación.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf problems

2014-07-14 Thread Gabor Grothendieck
2014-07-14 7:52 GMT-04:00 Juan Daniel García juandaniel...@gmail.com:
 Hello:
 I'm trying to run this code

 data2 - sqldf ( SELECT plot, age,  avg(N) as N FROM data1 GROUP BY plot,
 t)

 The problem is that when calling sqldf with
require(sqldf)

 appears this message

 Warning message:
 In sqliteCloseConnection(conn, ...) :
   RS-DBI driver warning: (closing pending result sets before closing this
 connection)

 If try again

require(sqldf)

 the message does not appear, but when trying the code afterwards, I get a
 new message

 Error in sqliteFetch(rs, n = -1, ...) :
   RSQLite driver: (RS_SQLite_fetch: failed: Domain error)

 I have a new Windows 8.1

 Do I need any other installations o packages or something else?
 The code was working in older Windows XP and Linux


See the Troubleshooting section of the sqldf home page.
http://sqldf.googlecode.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf problems

2014-07-14 Thread Jeff Newmiller
One doesn't call a package... so your description is unclear. One doesn't 
load (using the require or library functions) the relevant package AFTER 
calling functions in the package. Since it appears you have left out some steps 
or described them out of order, please supply a reproducible example [1] 
(including a test data set) as the Posting Guide requests.

NB I don't have Windows 8 or 8.1, so if that turns out to be the problem then I 
won't be able to tell, but I have a feeling there is something more basic going 
on here.

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On July 14, 2014 4:52:18 AM PDT, Juan Daniel García juandaniel...@gmail.com 
wrote:
Hello:
I'm trying to run this code

data2 - sqldf ( SELECT plot, age,  avg(N) as N FROM data1 GROUP BY
plot,
t)

The problem is that when calling sqldf with
require(sqldf)

appears this message

Warning message:
In sqliteCloseConnection(conn, ...) :
RS-DBI driver warning: (closing pending result sets before closing this
connection)

If try again

require(sqldf)

the message does not appear, but when trying the code afterwards, I get
a
new message

Error in sqliteFetch(rs, n = -1, ...) :
  RSQLite driver: (RS_SQLite_fetch: failed: Domain error)

I have a new Windows 8.1

Do I need any other installations o packages or something else?
The code was working in older Windows XP and Linux

Thanks

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SQLDF column errors

2013-01-15 Thread Matthew Liebers
I am trying to exclude integer values from a small data frame 1, d1 that
have matching hits in data frame 2, d2 (Very big) which involves matching
those hits first.  I am trying to use sqldf on the df's in the following
fashion:

df1:
V1
12675
14753
16222
18765

df2: head(df2)
V1  V2
13647 rd1500
14753 rd1580
15987 rd1590
16222 rd2020.

df1_new-sqldf(select df1.V1, df2.V2 where rs10.V1 = d10.pos”) - Ideally I
would like to try to use delete or not equal to != though I can only
find that delete works with sqldf.
but it returns this error:
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: no such column: df1.V1)
I am also trying this:
df1_new-sqldf(select V1 from df1, V2 from df2 where df1.V1 = df2.V1)
which returns this error:
Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near from: syntax error)

If anyone with sqldf knowledge could lend me a hand that would be great.
Thanks!

Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF column errors

2013-01-15 Thread Anthony Damico
i think this is what you want..  :)

http://stackoverflow.com/questions/4765936/using-joined-tables-to-exclude-certain-records

library(sqldf)

# use the mtcars example table
mtcars

# keep the first eight records in a second, separate data set
x - mtcars[ 1:8 , ]

# keep all the records from mtcars where the mpg value doesn't match a
record in the `x` data table
y - sqldf( select * from mtcars where not exists ( select * from x where
x.mpg == mtcars.mpg ) )


# the above example assumes your tables won't fit in RAM.
# if they do, this method is obviously much easier
y - mtcars[ !( mtcars$mpg %in% x$mpg ) , ]




On Tue, Jan 15, 2013 at 4:02 PM, Matthew Liebers mr...@cornell.edu wrote:

 I am trying to exclude integer values from a small data frame 1, d1 that
 have matching hits in data frame 2, d2 (Very big) which involves matching
 those hits first.  I am trying to use sqldf on the df's in the following
 fashion:

 df1:
 V1
 12675
 14753
 16222
 18765

 df2: head(df2)
 V1  V2
 13647 rd1500
 14753 rd1580
 15987 rd1590
 16222 rd2020.

 df1_new-sqldf(select df1.V1, df2.V2 where rs10.V1 = d10.pos”) - Ideally I
 would like to try to use delete or not equal to != though I can only
 find that delete works with sqldf.
 but it returns this error:
  Error in sqliteExecStatement(con, statement, bind.data) :
   RS-DBI driver: (error in statement: no such column: df1.V1)
 I am also trying this:
 df1_new-sqldf(select V1 from df1, V2 from df2 where df1.V1 = df2.V1)
 which returns this error:
 Error in sqliteExecStatement(con, statement, bind.data) :
   RS-DBI driver: (error in statement: near from: syntax error)

 If anyone with sqldf knowledge could lend me a hand that would be great.
 Thanks!

 Matt

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf package: using variables in where condition

2013-01-14 Thread Ravi Sreedhar


Sent from my iPhone

On Jan 13, 2013, at 9:47 PM, ravsre ravishre...@hotmail.com wrote:

 I am trying to use the sqldf package to create independent data frames from a
 master dataframe. 
 I want to use sqldf package and perform a simple select statement. However,
 what  I want to do is to create a loop and repeatedly send a variable to the
 where clause condition such as a: select * from mydfrm where Symbol =
 list[i]
 The 'list' is another data frame with a single column with 15 values.  All I
 what to do is to create loop for i in 1:15 and pass on into the select
 statement as list[1], list[2 }..tile list[15]
 
 Can someone help since, I am unable to reference the element location  as 
 variable in the sqldf function?
 
 also, is there a better way?
 would appreciate your help
 
 Rgds, Ravi
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/sqldf-package-using-variables-in-where-condition-tp4655445.html
 Sent from the R help mailing list archive at Nabble.com.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf package: using variables in where condition

2013-01-14 Thread Gabor Grothendieck
On Mon, Jan 14, 2013 at 8:36 AM, Ravi Sreedhar ravishre...@hotmail.com wrote:


 Sent from my iPhone

 On Jan 13, 2013, at 9:47 PM, ravsre ravishre...@hotmail.com wrote:

 I am trying to use the sqldf package to create independent data frames from a
 master dataframe.
 I want to use sqldf package and perform a simple select statement. However,
 what  I want to do is to create a loop and repeatedly send a variable to the
 where clause condition such as a: select * from mydfrm where Symbol =
 list[i]
 The 'list' is another data frame with a single column with 15 values.  All I
 what to do is to create loop for i in 1:15 and pass on into the select
 statement as list[1], list[2 }..tile list[15]

 Can someone help since, I am unable to reference the element location  as
 variable in the sqldf function?


See Example 5 on the sqldf home page.
http://code.google.com/p/sqldf/#Example_5._Insert_Variables

--
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf merging with subset in specific range

2012-12-26 Thread Matthew Liebers
Hi all:

I have two data sets.  Set A includes a long list of hits in a single
column, say:
m$V1
10
15
36
37
38
44
45
57
61
62
69 ...and so on

Set B includes just a few key ranges set up by way of a minimum in column X
and a maximum in column Y.  Say,
n$X n$Y
30   38   # range from 30 to 38
52   62   # range from 52 to 62

I would like the output to be the rows containing the following columns:
m$V1
36
37
38
57
61
62

I am interested in isolating the hits in data set A that correspond to any
of the hotspot ranges in data set B.  I have downloaded sqldf and tried a
couple things but I cannot do a traditional merge since set B is based on a
range.  I can always do a manual subset but I am trying to figure out if
there is anything more expedient since these df's will be quite large.

Thanks!

Matt

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf merging with subset in specific range

2012-12-26 Thread jim holtman
Is this what you want:


  m - read.table(text = 10
+  15
+  36
+  37
+  38
+  44
+  45
+  57
+  61
+  62
+  69 )
 n - read.table(text = 30   38
+  52   62   )

  require(sqldf)
  sqldf(select m.V1
+ from m, n
+ where m.V1 between n.V1 and n.V2
+  )
  V1
1 36
2 37
3 38
4 57
5 61
6 62



On Wed, Dec 26, 2012 at 7:00 PM, Matthew Liebers mr...@cornell.edu wrote:
 Hi all:

 I have two data sets.  Set A includes a long list of hits in a single
 column, say:
 m$V1
 10
 15
 36
 37
 38
 44
 45
 57
 61
 62
 69 ...and so on

 Set B includes just a few key ranges set up by way of a minimum in column X
 and a maximum in column Y.  Say,
 n$X n$Y
 30   38   # range from 30 to 38
 52   62   # range from 52 to 62

 I would like the output to be the rows containing the following columns:
 m$V1
 36
 37
 38
 57
 61
 62

 I am interested in isolating the hits in data set A that correspond to any
 of the hotspot ranges in data set B.  I have downloaded sqldf and tried a
 couple things but I cannot do a traditional merge since set B is based on a
 range.  I can always do a manual subset but I am trying to figure out if
 there is anything more expedient since these df's will be quite large.

 Thanks!

 Matt

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf Date problem

2012-11-04 Thread Andreas Recktenwald


Zitat von jim holtman jholt...@gmail.com:


Most likely your Date is either a character or a factor (you need to
provide an 'str' of the dataframe).  You are therefore most likely
doing a character compare and that is the reason for your problem.
You need to convert to a character string of the format -MM-DD to
do the correct character comparison.

##

x - data.frame(Date = paste0('1/', 1:31, '/2011'))
str(x)

'data.frame':   31 obs. of  1 variable:
 $ Date: Factor w/ 31 levels 1/1/2011,1/10/2011,..: 1 12 23 26 27
28 29 30 31 2 ...

x

Date
1   1/1/2011
2   1/2/2011
3   1/3/2011
4   1/4/2011
5   1/5/2011
6   1/6/2011
7   1/7/2011
8   1/8/2011
9   1/9/2011
10 1/10/2011
11 1/11/2011
12 1/12/2011
13 1/13/2011
14 1/14/2011
15 1/15/2011
16 1/16/2011
17 1/17/2011
18 1/18/2011
19 1/19/2011
20 1/20/2011
21 1/21/2011
22 1/22/2011
23 1/23/2011
24 1/24/2011
25 1/25/2011
26 1/26/2011
27 1/27/2011
28 1/28/2011
29 1/29/2011
30 1/30/2011
31 1/31/2011


require(sqldf)
# not correct because of character compares
sqldf('select * from x where Date  1/13/2011 and Date  1/25/2011')

Date
1   1/2/2011
2  1/14/2011
3  1/15/2011
4  1/16/2011
5  1/17/2011
6  1/18/2011
7  1/19/2011
8  1/20/2011
9  1/21/2011
10 1/22/2011
11 1/23/2011
12 1/24/2011

# convert the date to /MM/DD for character compares
x$newDate - as.character(as.Date(as.character(x$Date), format =  
%m/%d/%Y))

# now do the select
sqldf('select * from x where newDate between 2011-01-13 and 2011-01-25')

DatenewDate
1  1/13/2011 2011-01-13
2  1/14/2011 2011-01-14
3  1/15/2011 2011-01-15
4  1/16/2011 2011-01-16
5  1/17/2011 2011-01-17
6  1/18/2011 2011-01-18
7  1/19/2011 2011-01-19
8  1/20/2011 2011-01-20
9  1/21/2011 2011-01-21
10 1/22/2011 2011-01-22
11 1/23/2011 2011-01-23
12 1/24/2011 2011-01-24
13 1/25/2011 2011-01-25


On Sat, Nov 3, 2012 at 4:22 PM, Andreas Recktenwald
a.recktenw...@mx.uni-saarland.de wrote:

Dear R-help readers,

i've created a database for quotes data (for 4 years; 2007 -- 2010) with the
sqldf package. This database contains a column Date in the format
mm/dd/.

The table in the database is called main.data and the database itself
Honda. I tried to get the Data just for certain period, say from
01/01/2007 until 01/10/2007 with the following code:

sqldf(select * from main.data where Date='01/10/2007' and
Date='01/01/2007'),
   dbname=Honda)


I get the data for this period for every year(2007,2008,2009,2010) not only
for 2007. It seems that the year is overlooked and just looked for the
fitting days and months.

Because I haven't really much experience with sql I decide to send my
problem to the list.

Many thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



Thanks for your quick response Jim,

you are right the entries in my Date column are characters (my fault  
not to mention this in my first post).



Now i know the reasons for my problem and can solve it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf Date problem

2012-11-03 Thread Andreas Recktenwald


Dear R-help readers,

i've created a database for quotes data (for 4 years; 2007 -- 2010)  
with the sqldf package. This database contains a column Date in the  
format mm/dd/.


The table in the database is called main.data and the database  
itself Honda. I tried to get the Data just for certain period, say  
from 01/01/2007 until 01/10/2007 with the following code:


sqldf(select * from main.data where Date='01/10/2007' and  
Date='01/01/2007'),

   dbname=Honda)


I get the data for this period for every year(2007,2008,2009,2010) not  
only for 2007. It seems that the year is overlooked and just looked  
for the fitting days and months.


Because I haven't really much experience with sql I decide to send my  
problem to the list.


Many thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf Date problem

2012-11-03 Thread Andreas Recktenwald

Dear R-help readers,

i've created a database for quotes data (for 4 years; 2007 -- 2010)  
with the sqldf package. This database contains a column Date in the  
format mm/dd/.


The table in the database is called main.data and the database  
itself Honda. I tried to get the Data just for certain period, say  
from 01/01/2007 until 01/10/2007 with the following code:


sqldf(select * from main.data where Date='01/10/2007' and  
Date='01/01/2007'),

   dbname=Honda)


I get the data for this period for every year(2007,2008,2009,2010) not  
only for 2007. It seems that the year is overlooked and just looked  
for the fitting days and months.


Because I haven't really much experience with sql I decide to send my  
problem to the list.


Many thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf Date problem

2012-11-03 Thread jim holtman
Most likely your Date is either a character or a factor (you need to
provide an 'str' of the dataframe).  You are therefore most likely
doing a character compare and that is the reason for your problem.
You need to convert to a character string of the format -MM-DD to
do the correct character comparison.

##
 x - data.frame(Date = paste0('1/', 1:31, '/2011'))
 str(x)
'data.frame':   31 obs. of  1 variable:
 $ Date: Factor w/ 31 levels 1/1/2011,1/10/2011,..: 1 12 23 26 27
28 29 30 31 2 ...
 x
Date
1   1/1/2011
2   1/2/2011
3   1/3/2011
4   1/4/2011
5   1/5/2011
6   1/6/2011
7   1/7/2011
8   1/8/2011
9   1/9/2011
10 1/10/2011
11 1/11/2011
12 1/12/2011
13 1/13/2011
14 1/14/2011
15 1/15/2011
16 1/16/2011
17 1/17/2011
18 1/18/2011
19 1/19/2011
20 1/20/2011
21 1/21/2011
22 1/22/2011
23 1/23/2011
24 1/24/2011
25 1/25/2011
26 1/26/2011
27 1/27/2011
28 1/28/2011
29 1/29/2011
30 1/30/2011
31 1/31/2011

 require(sqldf)
 # not correct because of character compares
 sqldf('select * from x where Date  1/13/2011 and Date  1/25/2011')
Date
1   1/2/2011
2  1/14/2011
3  1/15/2011
4  1/16/2011
5  1/17/2011
6  1/18/2011
7  1/19/2011
8  1/20/2011
9  1/21/2011
10 1/22/2011
11 1/23/2011
12 1/24/2011
 # convert the date to /MM/DD for character compares
 x$newDate - as.character(as.Date(as.character(x$Date), format = %m/%d/%Y))
 # now do the select
 sqldf('select * from x where newDate between 2011-01-13 and 2011-01-25')
DatenewDate
1  1/13/2011 2011-01-13
2  1/14/2011 2011-01-14
3  1/15/2011 2011-01-15
4  1/16/2011 2011-01-16
5  1/17/2011 2011-01-17
6  1/18/2011 2011-01-18
7  1/19/2011 2011-01-19
8  1/20/2011 2011-01-20
9  1/21/2011 2011-01-21
10 1/22/2011 2011-01-22
11 1/23/2011 2011-01-23
12 1/24/2011 2011-01-24
13 1/25/2011 2011-01-25


On Sat, Nov 3, 2012 at 4:22 PM, Andreas Recktenwald
a.recktenw...@mx.uni-saarland.de wrote:
 Dear R-help readers,

 i've created a database for quotes data (for 4 years; 2007 -- 2010) with the
 sqldf package. This database contains a column Date in the format
 mm/dd/.

 The table in the database is called main.data and the database itself
 Honda. I tried to get the Data just for certain period, say from
 01/01/2007 until 01/10/2007 with the following code:

 sqldf(select * from main.data where Date='01/10/2007' and
 Date='01/01/2007'),
dbname=Honda)


 I get the data for this period for every year(2007,2008,2009,2010) not only
 for 2007. It seems that the year is overlooked and just looked for the
 fitting days and months.

 Because I haven't really much experience with sql I decide to send my
 problem to the list.

 Many thanks in advance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-04 Thread Steve Lianoglou
Hi,

On Fri, Feb 3, 2012 at 1:12 PM, HC hca...@yahoo.co.in wrote:
 Thank you.

 The readLines command is working fine and I am able to read 10^6 lines in
 one go and write them using the write.table command.

 Does this readLines command using a block concept to optimize or goes line
 by line?

 Steve has mentioned about *nix and split commands. Would there be any speed
 benefit as compared to readLines?

It's easy to test, right? Just give it a shot and see ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-04 Thread HC
Thank you Jim for your reply.

I could figure out that readLines works fine until 35,841,335 lines
(records). 

When the next line is read to be read, a window with R for Windows GUI
front-end has stopped working message comes, with an option to close
program or checking online for a solution.

The tab-separated .txt file I am working on has 9 columns and about 3.25
billion rows. I suspect that the 35,841,336th line is becoming so very big
that RAM out runs its capacity. Perhaps all tabs that are expected after
each values are missing, making the line so very big, I am not sure.

Is there any way to skip this line with readLines or any other function? I
am only hoping that the data after this bad line is good and I can read them
if I can skip the bad one. 

Thank you.
HC

--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4357730.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC
Thank you for indicating that SQLite may not handle a file as big as 160 GB.

Would you know of any utility for *physically splitting *the 160 GB text
file into pieces. And if one can control the splitting at the  end of a
record.

Thank you again.
HC

--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4354285.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Gabor Grothendieck
On Fri, Feb 3, 2012 at 6:03 AM, HC hca...@yahoo.co.in wrote:
 Thank you for indicating that SQLite may not handle a file as big as 160 GB.

 Would you know of any utility for *physically splitting *the 160 GB text
 file into pieces. And if one can control the splitting at the  end of a
 record.


If they are csv files or similar data files then you could use R or
any scripting language to do that.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Steve Lianoglou
On Fri, Feb 3, 2012 at 7:37 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Fri, Feb 3, 2012 at 6:03 AM, HC hca...@yahoo.co.in wrote:
 Thank you for indicating that SQLite may not handle a file as big as 160 GB.

 Would you know of any utility for *physically splitting *the 160 GB text
 file into pieces. And if one can control the splitting at the  end of a
 record.


 If they are csv files or similar data files then you could use R or
 any scripting language to do that.

Or even the *nix `split` command ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC
This is a 160 GB tab-separated .txt file. It has 9 columns and 3.25x10^9
rows.

Can R handle it?  

Thank you.
HC



--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4354556.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Gabor Grothendieck
On Fri, Feb 3, 2012 at 8:08 AM, HC hca...@yahoo.co.in wrote:
 This is a 160 GB tab-separated .txt file. It has 9 columns and 3.25x10^9
 rows.

 Can R handle it?


You can process a file N lines at time like this:

con - file(myfile.dat, r)
while(length(Lines - readLines(con, n = N))  0) {
  ... whatever...
}

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC
Thank you.

The readLines command is working fine and I am able to read 10^6 lines in
one go and write them using the write.table command.

Does this readLines command using a block concept to optimize or goes line
by line?

Steve has mentioned about *nix and split commands. Would there be any speed
benefit as compared to readLines?

Thank you.
HC

--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355362.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread HC
Bad news!

The readLines command works fine upto a certain limit. Once a few files have
been written the R program crashes.

I used the following code:
*
iFile-Test.txt
con - file(iFile, r)

N-125; 
iLoop-1
 
while(length(Lines - readLines(con, n = N))  0  iLoop41) { 
oFile-paste(Split_,iLoop,.txt,sep=)
  write.table(Lines, oFile, sep = \t, quote = FALSE, col.names= FALSE,
row.names = FALSE)
  iLoop-iLoop+1
} 
close(con)


With above N=1.25 million, it wrote 28 files of about 57 mb each. That is a
total of about 1.6 GB and then crashed.
I tried with other values on N and it crashes at about the same place in
terms of total size output, i.e., about 1.6 GB.

Is this due to any limitation of Windows 7, in terms of not having the
pointer after this size?

Your insight would be very helpful.

Thank you.
HC






--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355679.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread jim holtman
Exactly what does crashed mean?  What was the error message?  How
you tried to put:

rm(Lines)
gc()

at the end of the loop to free up and compact memory?  If you watch
the performance, does the R process seem to be growing in terms of the
amount of memory that is being used?  You can add:

memory.size()

before the above statements to see how much memory is being used.
This is just some more elementary debugging that you will have to
learn when using any system.

On Fri, Feb 3, 2012 at 3:22 PM, HC hca...@yahoo.co.in wrote:
 Bad news!

 The readLines command works fine upto a certain limit. Once a few files have
 been written the R program crashes.

 I used the following code:
 *
 iFile-Test.txt
 con - file(iFile, r)

 N-125;
 iLoop-1

 while(length(Lines - readLines(con, n = N))  0  iLoop41) {
 oFile-paste(Split_,iLoop,.txt,sep=)
  write.table(Lines, oFile, sep = \t, quote = FALSE, col.names= FALSE,
 row.names = FALSE)
  iLoop-iLoop+1
 }
 close(con)
 

 With above N=1.25 million, it wrote 28 files of about 57 mb each. That is a
 total of about 1.6 GB and then crashed.
 I tried with other values on N and it crashes at about the same place in
 terms of total size output, i.e., about 1.6 GB.

 Is this due to any limitation of Windows 7, in terms of not having the
 pointer after this size?

 Your insight would be very helpful.

 Thank you.
 HC






 --
 View this message in context: 
 http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355679.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread Gabor Grothendieck
On Wed, Feb 1, 2012 at 11:57 PM, HC hca...@yahoo.co.in wrote:
 Hi All,

 I have a very (very) large tab-delimited text file without headers. There
 are only 8 columns and millions of rows. I want to make numerous pieces of
 this file by sub-setting it for individual stations. Station is given as in
 the first column. I am trying to learn and use sqldf package for this but am
 stuck in a couple of places.

 To simulate my requirement, I have taken iris dataset as an example and have
 done the following:
 (1) create a tab-delimited file without headers.
 (2) read it using read.csv.sql command
 (3) write the result of a query, getting first 10 records

 Here is the reproducible code that I am trying:
 # Text data file
 write.table(iris, irisNoH.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)
 # create an empty database (can skip this step if database already exists)
 sqldf(attach myTestdbT as new)
 f1-file(irisNoH.txt)
 attr(f1, file.format) - list(header=FALSE,sep=\t)
 # read into table called irisTab in the mytestdb sqlite database
 read.csv.sql(irisNoH.txt, sql = create table main.irisTab1 as select *
 from file, dbname = mytestdb)
 res1-sqldf(select * from main.irisTab1 limit 10, dbname = mytestdb)
 write.table(res1, iris10.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)

 # For querying records of a particular species - unresolved problems
 #a1-virginica
 #attr(f1, names) - c(A1,A2,A3,A4,A5)
 #res2-fn$sqldf(select * from main.irisTab1 where A5 = '$a1')

 In the above, I am not able to:
 (1) assign the names to various columns
 (2) query for particular value of a column; in this case for particular
 species, say virginica
 (3) I guess fn$sqldf can do the job but it requires assigning column names

 Any help would be most appreciated.


Ignoring your iris file for a moment, to query the 5th column (getting
its name via sql rather than via R) we can do this:

library(sqldf)
species - virginica
nms - names(dbGetQuery(con, select * from iris limit 0))
fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

Now, sqldf is best used when you are getting the data from R but if
you want to store it in a database and just leave it there then you
might be better off using RSQLite directly like this (the eol = \r\n
in the dbWriteTable statement was needed on my Windows system but you
may not need that depending on your platform):


write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names
= FALSE, row.names = FALSE)

library(sqldf)
library(RSQLite)

con - dbConnect(SQLite(), dbname = mytestdb)

dbWriteTable(con, iris, irisNoH.txt, sep = \t, eol = \r\n)

species - virginica
nms - names(dbGetQuery(con, select * from iris limit 0))
fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

dbDisconnect(con)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread Gabor Grothendieck
On Thu, Feb 2, 2012 at 3:11 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Wed, Feb 1, 2012 at 11:57 PM, HC hca...@yahoo.co.in wrote:
 Hi All,

 I have a very (very) large tab-delimited text file without headers. There
 are only 8 columns and millions of rows. I want to make numerous pieces of
 this file by sub-setting it for individual stations. Station is given as in
 the first column. I am trying to learn and use sqldf package for this but am
 stuck in a couple of places.

 To simulate my requirement, I have taken iris dataset as an example and have
 done the following:
 (1) create a tab-delimited file without headers.
 (2) read it using read.csv.sql command
 (3) write the result of a query, getting first 10 records

 Here is the reproducible code that I am trying:
 # Text data file
 write.table(iris, irisNoH.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)
 # create an empty database (can skip this step if database already exists)
 sqldf(attach myTestdbT as new)
 f1-file(irisNoH.txt)
 attr(f1, file.format) - list(header=FALSE,sep=\t)
 # read into table called irisTab in the mytestdb sqlite database
 read.csv.sql(irisNoH.txt, sql = create table main.irisTab1 as select *
 from file, dbname = mytestdb)
 res1-sqldf(select * from main.irisTab1 limit 10, dbname = mytestdb)
 write.table(res1, iris10.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)

 # For querying records of a particular species - unresolved problems
 #a1-virginica
 #attr(f1, names) - c(A1,A2,A3,A4,A5)
 #res2-fn$sqldf(select * from main.irisTab1 where A5 = '$a1')

 In the above, I am not able to:
 (1) assign the names to various columns
 (2) query for particular value of a column; in this case for particular
 species, say virginica
 (3) I guess fn$sqldf can do the job but it requires assigning column names

 Any help would be most appreciated.


 Ignoring your iris file for a moment, to query the 5th column (getting
 its name via sql rather than via R) we can do this:

 library(sqldf)
 species - virginica
 nms - names(dbGetQuery(con, select * from iris limit 0))
 fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

 Now, sqldf is best used when you are getting the data from R but if
 you want to store it in a database and just leave it there then you
 might be better off using RSQLite directly like this (the eol = \r\n
 in the dbWriteTable statement was needed on my Windows system but you
 may not need that depending on your platform):


 write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names
 = FALSE, row.names = FALSE)

 library(sqldf)
 library(RSQLite)

 con - dbConnect(SQLite(), dbname = mytestdb)

 dbWriteTable(con, iris, irisNoH.txt, sep = \t, eol = \r\n)

 species - virginica
 nms - names(dbGetQuery(con, select * from iris limit 0))
 fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

 dbDisconnect(con)

There seems to have been a pasting error here.  The first part was
intended to show how to do this using sqldf and the second using
RSQLite.Thus the first part was intended to be:

library(sqldf)
species - virginica

# obviously we could just do nms - names(iris) but to get
# names from database instead
nms - names(dbGetQuery(con, select * from iris limit 0))

# use 5th column
fn$sqldf(select * from iris where `nms[5]` = '$species' limit 3)


and the second part that illustrates RSQLite was ok.  Note that fn$
comes from the gsubfn package which sqldf loads.





-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread HC
Hi Gabor,

Thank you very much for your guidance and help.

I could run the following code successfully on a 500 mb test data file. A
snapshot of the data file is attached herewith.

code start***
library(sqldf)
library(RSQLite)

iFile-Test100.txt
con - dbConnect(SQLite(),dbname = myTest100)
dbWriteTable(con, TestDB100, iFile, sep = \t) #, eol = \r\n)
nms - names(dbGetQuery(con, select * from TestDB100 limit 0))

nRec-fn$dbGetQuery(con, select count(*)from TestDB100)
aL1-1;

while (aL1=nRec){
res1-fn$dbGetQuery(con, select * from (select * from TestDB100 limit
'$aL1',1))
istn-res1[1,1]
res1-fn$dbGetQuery(con, select * from TestDB100 where `nms[1]` = '$istn')
icount-dim(res1)[1]
oFile-paste(istn,_Test.txt,sep=)
write.table(res1, oFile, sep = \t, quote = FALSE, col.names= FALSE,
row.names = FALSE)
aL1-aL1+icount
}
dbDisconnect(con)
code end***

However, the actual data file that I want to handle is about *160 GB*. And
when I use the same above code on that file, it gives following error for
dbWriteTable(con, ...) statement
error start**
dbWriteTable(con, TestDB, iFile, sep = \t) #, eol = \r\n)
Error in try({ : RS-DBI driver: (RS_sqlite_getline could not realloc)
[1] FALSE
error end**

I am not sure about the reason of this error. Is this due to the big file
size? I understood from sqldf webpage that SQLite can work for even a larger
file than this and is only restricted by the disc space and not RAM. I have
about 400GB free space on the PC I am using, with Windows 7 as the operating
system. I am assuming that the about dbWriteTable command is using the disc
memory only and is not the issue.

In fact this file has been created using MySQLdump and I do not have access
to the original MYSQL database file. 
I want to know the following:
(1)  Am I missing something in the above code that is preventing handling of
this big 160  GB file?
(2)  Should this be handled outside of R, if R is becoming a limitation in
this? And if yes then what is a possible way forward?

Thank you again for your quick response and all the help.
HC
http://r.789695.n4.nabble.com/file/n4353362/Test100.txt Test100.txt 
 





--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4353362.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread Gabor Grothendieck
On Thu, Feb 2, 2012 at 8:07 PM, HC hca...@yahoo.co.in wrote:
 Hi Gabor,

 Thank you very much for your guidance and help.

 I could run the following code successfully on a 500 mb test data file. A
 snapshot of the data file is attached herewith.

 code start***
 library(sqldf)
 library(RSQLite)

 iFile-Test100.txt
 con - dbConnect(SQLite(),dbname = myTest100)
 dbWriteTable(con, TestDB100, iFile, sep = \t) #, eol = \r\n)
 nms - names(dbGetQuery(con, select * from TestDB100 limit 0))

 nRec-fn$dbGetQuery(con, select count(*)from TestDB100)
 aL1-1;

 while (aL1=nRec){
 res1-fn$dbGetQuery(con, select * from (select * from TestDB100 limit
 '$aL1',1))
 istn-res1[1,1]
 res1-fn$dbGetQuery(con, select * from TestDB100 where `nms[1]` = '$istn')
 icount-dim(res1)[1]
 oFile-paste(istn,_Test.txt,sep=)
 write.table(res1, oFile, sep = \t, quote = FALSE, col.names= FALSE,
 row.names = FALSE)
 aL1-aL1+icount
 }
 dbDisconnect(con)
 code end***

 However, the actual data file that I want to handle is about *160 GB*. And
 when I use the same above code on that file, it gives following error for
 dbWriteTable(con, ...) statement
 error start**
 dbWriteTable(con, TestDB, iFile, sep = \t) #, eol = \r\n)
 Error in try({ : RS-DBI driver: (RS_sqlite_getline could not realloc)
 [1] FALSE
 error end**

 I am not sure about the reason of this error. Is this due to the big file
 size? I understood from sqldf webpage that SQLite can work for even a larger
 file than this and is only restricted by the disc space and not RAM. I have
 about 400GB free space on the PC I am using, with Windows 7 as the operating
 system. I am assuming that the about dbWriteTable command is using the disc
 memory only and is not the issue.

 In fact this file has been created using MySQLdump and I do not have access
 to the original MYSQL database file.
 I want to know the following:
 (1)  Am I missing something in the above code that is preventing handling of
 this big 160  GB file?
 (2)  Should this be handled outside of R, if R is becoming a limitation in
 this? And if yes then what is a possible way forward?

 Thank you again for your quick response and all the help.
 HC
 http://r.789695.n4.nabble.com/file/n4353362/Test100.txt Test100.txt


I think its unlikely SQLite could handle a database that large unless
you can divide it into multiple separate databases.  At one time the
SQLite site said it did not handle databases over 1 GB and although I
think that is outdated by more recent versions of SQLite its still
likely true that your size is too large for it.


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf for Very Large Tab Delimited Files

2012-02-01 Thread HC
Hi All,

I have a very (very) large tab-delimited text file without headers. There
are only 8 columns and millions of rows. I want to make numerous pieces of
this file by sub-setting it for individual stations. Station is given as in
the first column. I am trying to learn and use sqldf package for this but am
stuck in a couple of places.

To simulate my requirement, I have taken iris dataset as an example and have
done the following:
(1) create a tab-delimited file without headers.
(2) read it using read.csv.sql command
(3) write the result of a query, getting first 10 records

Here is the reproducible code that I am trying:
# Text data file
write.table(iris, irisNoH.txt, sep = \t, quote = FALSE,
col.names=FALSE,row.names = FALSE)
# create an empty database (can skip this step if database already exists)
sqldf(attach myTestdbT as new)
f1-file(irisNoH.txt)
attr(f1, file.format) - list(header=FALSE,sep=\t)
# read into table called irisTab in the mytestdb sqlite database
read.csv.sql(irisNoH.txt, sql = create table main.irisTab1 as select *
from file, dbname = mytestdb)
res1-sqldf(select * from main.irisTab1 limit 10, dbname = mytestdb)
write.table(res1, iris10.txt, sep = \t, quote = FALSE,
col.names=FALSE,row.names = FALSE)

# For querying records of a particular species - unresolved problems
#a1-virginica
#attr(f1, names) - c(A1,A2,A3,A4,A5)
#res2-fn$sqldf(select * from main.irisTab1 where A5 = '$a1')

In the above, I am not able to:
(1) assign the names to various columns
(2) query for particular value of a column; in this case for particular
species, say virginica
(3) I guess fn$sqldf can do the job but it requires assigning column names

Any help would be most appreciated.

Thanks
HC




--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4350555.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.

2012-01-22 Thread Grant Farnsworth
I've been using sqldf heavily lately but have encountered problems
with ordering of observations or calculating statistics such as max()
and min() when the variable used is of class Date.

For example, if I run the following code:

=== begin code =
library(sqldf)
A-data.frame(Dates=as.Date(c(1994-02-14,1977-02-23,2001-09-18,2009-08-01)),Ret=rnorm(4))
OrderedA-sqldf('select * from A order by Dates')
MaxA-sqldf('select max(Dates) as Dates from A')[1,1]
MinA-sqldf('select min(Dates) as Dates from A')[1,1]
=== end code =

Then the result is this:

 A
   DatesRet
1 1994-02-14  1.2414706
2 1977-02-23 -0.7728146
3 2001-09-18  1.2551331
4 2009-08-01 -0.2538359

 OrderedA
   DatesRet
1 2001-09-18  1.2551331
2 2009-08-01 -0.2538359
3 1977-02-23 -0.7728146
4 1994-02-14  1.2414706

 MaxA
[1] 1994-02-14

 MinA
[1] 2001-09-18

Completely wrong order, no warnings issued, and the summary stats are
wrong as well (but consistent with the ordering).

According to the sqldf manual found at the following URL

http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables?

this type of query should work correctly.  Any clue why it is not
doing so?  User error or bug?

=== debug info =
 sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets  methods
[8] base

other attached packages:
[1] sqldf_0.4-2   chron_2.3-42  gsubfn_0.5-7
[4] proto_0.3-9.2 RSQLite.extfuns_0.0.1 RSQLite_0.10.0
[7] DBI_0.2-5
 end debug info =

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.

2012-01-22 Thread Gabor Grothendieck
On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth gvfa...@gmail.com wrote:
 I've been using sqldf heavily lately but have encountered problems
 with ordering of observations or calculating statistics such as max()
 and min() when the variable used is of class Date.

 For example, if I run the following code:

 === begin code =
 library(sqldf)
 A-data.frame(Dates=as.Date(c(1994-02-14,1977-02-23,2001-09-18,2009-08-01)),Ret=rnorm(4))
 OrderedA-sqldf('select * from A order by Dates')
 MaxA-sqldf('select max(Dates) as Dates from A')[1,1]
 MinA-sqldf('select min(Dates) as Dates from A')[1,1]
 === end code =

 Then the result is this:

 A
       Dates        Ret
 1 1994-02-14  1.2414706
 2 1977-02-23 -0.7728146
 3 2001-09-18  1.2551331
 4 2009-08-01 -0.2538359

 OrderedA
       Dates        Ret
 1 2001-09-18  1.2551331
 2 2009-08-01 -0.2538359
 3 1977-02-23 -0.7728146
 4 1994-02-14  1.2414706

 MaxA
 [1] 1994-02-14

 MinA
 [1] 2001-09-18

 Completely wrong order, no warnings issued, and the summary stats are
 wrong as well (but consistent with the ordering).

 According to the sqldf manual found at the following URL

 http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables?

 this type of query should work correctly.  Any clue why it is not
 doing so?  User error or bug?


You are using an old version.  Update to latest version of R and sqldf.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.

2012-01-22 Thread Grant Farnsworth
On Mon, Jan 23, 2012 at 12:46 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth gvfa...@gmail.com wrote:
 I've been using sqldf heavily lately but have encountered problems
 with ordering of observations or calculating statistics such as max()
 and min() when the variable used is of class Date.

 For example, if I run the following code:

 === begin code =
 library(sqldf)
 A-data.frame(Dates=as.Date(c(1994-02-14,1977-02-23,2001-09-18,2009-08-01)),Ret=rnorm(4))
 OrderedA-sqldf('select * from A order by Dates')
 MaxA-sqldf('select max(Dates) as Dates from A')[1,1]
 MinA-sqldf('select min(Dates) as Dates from A')[1,1]
 === end code =

 Then the result is this:

 A
       Dates        Ret
 1 1994-02-14  1.2414706
 2 1977-02-23 -0.7728146
 3 2001-09-18  1.2551331
 4 2009-08-01 -0.2538359

 OrderedA
       Dates        Ret
 1 2001-09-18  1.2551331
 2 2009-08-01 -0.2538359
 3 1977-02-23 -0.7728146
 4 1994-02-14  1.2414706

 MaxA
 [1] 1994-02-14

 MinA
 [1] 2001-09-18

 Completely wrong order, no warnings issued, and the summary stats are
 wrong as well (but consistent with the ordering).

 According to the sqldf manual found at the following URL

 http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables?

 this type of query should work correctly.  Any clue why it is not
 doing so?  User error or bug?


 You are using an old version.  Update to latest version of R and sqldf.

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


Thanks, that worked.  Known bug, I guess.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf and not converting integers to floating point in SQLite

2012-01-03 Thread Frederik Vanrenterghem
Hi,

I have following 2 tables:

Table 1:
POSTAL | VALUE
1000|49
1010|100
1020|50

Table 2:
INSEE | POSTAL
A|1000
B|1000
C|1010
D|1020

I would like to convert this to the following:

INSEE | VALUE_SPREAD
A|24.5
B|24.5
C|100
D|50

I can achieve this with a nested SQL query (through counting the
number of POSTAL that belong to any given INSEE, and diving the value
of the postal in that INSEE by that number).

library(sqldf)
table1 - read.csv(c:/R/table1.csv, sep=;)
table2 - read.csv(c:/R/table2.csv, sep=;)
table3 - sqldf(select table2.INSEE, table1.VALUE / counts.nPostals
as value_spread from table1, table2,(select POSTAL, count(INSEE) as
nPostals from table2 group by POSTAL) counts where table1.POSTAL =
counts.POSTAL and table1.POSTAL=table2.POSTAL)

Unfortunately, the value I'm working with is an integer. In SQLite,
this results in the computed value also not being a float - so it gets
rounded up or down. In this case, I'm getting 24 for A  B instead of
24.5.

Is there a way to take care of this using other R concepts, avoiding
that problem (for instance using melt  cast)?

Thanks,
Frederik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf and not converting integers to floating point in SQLite

2012-01-03 Thread jim holtman
try this:

 library(sqldf)
 table1 - read.csv(text = POSTAL | VALUE
+ 1000|49
+ 1010|100
+ 1020|50, sep=|)
 table2 - read.csv(text = INSEE | POSTAL
+ A|1000
+ B|1000
+ C|1010
+ D|1020, sep=|)
 table3 - sqldf(
+ select table2.INSEE
+ , 1.0 * table1.VALUE / counts.nPostals as value_spread
+ from table1
+ , table2
+ ,(select POSTAL
+ , count(INSEE) as nPostals
+ from table2
+ group by POSTAL) counts
+ where table1.POSTAL = counts.POSTAL
+ and table1.POSTAL=table2.POSTAL
+ )
 table3
  INSEE value_spread
1 A 24.5
2 B 24.5
3 C100.0
4 D 50.0



On Tue, Jan 3, 2012 at 3:13 PM, Frederik Vanrenterghem
frede...@vanrenterghem.biz wrote:
 Hi,

 I have following 2 tables:

 Table 1:
 POSTAL | VALUE
 1000|49
 1010|100
 1020|50

 Table 2:
 INSEE | POSTAL
 A|1000
 B|1000
 C|1010
 D|1020

 I would like to convert this to the following:

 INSEE | VALUE_SPREAD
 A|24.5
 B|24.5
 C|100
 D|50

 I can achieve this with a nested SQL query (through counting the
 number of POSTAL that belong to any given INSEE, and diving the value
 of the postal in that INSEE by that number).

 library(sqldf)
 table1 - read.csv(c:/R/table1.csv, sep=;)
 table2 - read.csv(c:/R/table2.csv, sep=;)
 table3 - sqldf(select table2.INSEE, table1.VALUE / counts.nPostals
 as value_spread from table1, table2,(select POSTAL, count(INSEE) as
 nPostals from table2 group by POSTAL) counts where table1.POSTAL =
 counts.POSTAL and table1.POSTAL=table2.POSTAL)

 Unfortunately, the value I'm working with is an integer. In SQLite,
 this results in the computed value also not being a float - so it gets
 rounded up or down. In this case, I'm getting 24 for A  B instead of
 24.5.

 Is there a way to take care of this using other R concepts, avoiding
 that problem (for instance using melt  cast)?

 Thanks,
 Frederik

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf if iif

2011-11-26 Thread Carlos Rivera
Dear all,

 

I have problems with iif function using sqldf library.

I counted abundance (Num) of different SPECIES in two moments (esf) saving
the information in two Tables (esf50, esf100):

esf50

SAMPLE  SPECIES  Num esf

1289diso1   44  50

1289diso2   5 50

1289diso3   1 50

diso1   44  50

diso2   5 50

diso3   1 50

   

esf100

SAMPLE  SPECIES  Num esf

1289diso1   82  100

1289diso2   13  100

1289diso3   2 100

1289diso4   3 100

diso1   82  100

diso2   13  100

diso3   2 100

diso4   3 100

 

I would like subtract column Num between the two moments considering only
the changes, therefore I use the conditional if:

 

var100-sqldf(select esf100.SAMPLE, esf100.SPECIES, esf100.Num, esf100.esf,

   iif esf100.Num - esf50.Num =0, esf100.Num-esf50.Num,
esf100.Num as PIPAS 

   from esf100 left join esf50 on esf100.SAMPLE =
esf50.SAMPLE

   and esf100.SPECIES = esf50.SPECIES)

 

I think the structure is right because the SQL query run ok in Access. Is
the if syntax the problems?

 

Thank in advanced.

 

Best wishes,

 

Carlos Rivera

 

 

 

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf if iif

2011-11-26 Thread Jeff Newmiller
sqldf uses the SQLite database by default for backend processing. The iif 
function is specific to the Jet database engine syntax (which underlies MS 
Access). You could read up on SQLite syntax, or you could avoid using 
nonstandard SQL syntax, retrieve the data into a data frame, and use R code to 
do your logical merging into one column.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Carlos Rivera limnoriv...@gmail.com wrote:

Dear all,

 

I have problems with iif function using sqldf library.

I counted abundance (Num) of different SPECIES in two moments (esf)
saving
the information in two Tables (esf50, esf100):

esf50

SAMPLE  SPECIES  Num esf

1289diso1   44  50

1289diso2   5 50

1289diso3   1 50

diso1   44  50

diso2   5 50

diso3   1 50

   

esf100

SAMPLE  SPECIES  Num esf

1289diso1   82  100

1289diso2   13  100

1289diso3   2 100

1289diso4   3 100

diso1   82  100

diso2   13  100

diso3   2 100

diso4   3 100

 

I would like subtract column Num between the two moments considering
only
the changes, therefore I use the conditional if:

 

var100-sqldf(select esf100.SAMPLE, esf100.SPECIES, esf100.Num,
esf100.esf,

  iif esf100.Num - esf50.Num =0, esf100.Num-esf50.Num,
esf100.Num as PIPAS 

   from esf100 left join esf50 on esf100.SAMPLE =
esf50.SAMPLE

   and esf100.SPECIES = esf50.SPECIES)

 

I think the structure is right because the SQL query run ok in Access.
Is
the if syntax the problems?

 

Thank in advanced.

 

Best wishes,

 

Carlos Rivera

 

 

 

 


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf syntax, selecting rows, and skipping

2011-09-29 Thread Juliet Hannah
I am using the example in this post:

https://stat.ethz.ch/pipermail/r-help/2010-October/257204.html

# create a file
write.table(iris,iris.csv,row.names=FALSE,sep=,,quote=FALSE)


# this does not work
# has the syntax changed or  is there a mistake in my usage?
# the line from the post above is:
#  read.csv.sql(myfile.csv, sql = select * from file 2000, 1000)

library(sqldf)
read.csv.sql(iris.csv, sql = select * from file 5, 5)

# this works
# but i would like to keep the header

 read.csv.sql(iris.csv, sql = select * from file limit
5,skip=5,header=FALSE)

# thanks


 sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] sqldf_0.4-2   chron_2.3-42  gsubfn_0.5-7
proto_0.3-9.2 RSQLite.extfuns_0.0.1 RSQLite_0.9-4
DBI_0.2-5 myfunctions_1.0

loaded via a namespace (and not attached):
[1] tools_2.13.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf syntax, selecting rows, and skipping

2011-09-29 Thread David Winsemius


On Sep 29, 2011, at 10:06 AM, Juliet Hannah wrote:


I am using the example in this post:

https://stat.ethz.ch/pipermail/r-help/2010-October/257204.html

# create a file
write.table(iris,iris.csv,row.names=FALSE,sep=,,quote=FALSE)


# this does not work
# has the syntax changed or  is there a mistake in my usage?
# the line from the post above is:
#  read.csv.sql(myfile.csv, sql = select * from file 2000, 1000)


You didn't read to the end of that thread. The two errors above were  
corrected.




library(sqldf)
read.csv.sql(iris.csv, sql = select * from file 5, 5)

# this works
# but i would like to keep the header

read.csv.sql(iris.csv, sql = select * from file limit
5,skip=5,header=FALSE)

# thanks



sessionInfo()

R version 2.13.1 (2011-07-08)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] sqldf_0.4-2   chron_2.3-42  gsubfn_0.5-7
proto_0.3-9.2 RSQLite.extfuns_0.0.1 RSQLite_0.9-4
DBI_0.2-5 myfunctions_1.0

loaded via a namespace (and not attached):
[1] tools_2.13.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SQldf with sqlite and H2

2011-07-14 Thread Mandans
SQldf with sqlite and H2

I have a large csv file (about 2GB) and wanted to import the file into R and do 
some filtering and analysis. Came across sqldf ( a great idea and product) and 
was trying to play around to see what would be the best method of doing this. 
csv file is comma delimited with some columns having comma inside the quoation 
like this John, Doe. 

I tried this first 

###
library(sqldf)
sqldf(attach testdb as new)
In.File - C:/JP/Temp/2008.csv
read.csv.sql(In.File, sql = create table table1 as select * from file, 
  dbname = testdb)

It errored out with message

NULL
Warning message:
closing unused connection 3 (C:/JP/Temp/2008.csv)   

When this failed, I converted this file from comma delimited to tab delimited 
and used this command  

#  
read.csv.sql(In.File, sql = create table table1 as select * from file, 
  dbname = testdb, sep = \t)

and this worked, it created testdb sqlite file with the size of 3GB

now my question is in 3 parts.

1. Is it possible to create a dataframe with appropriate column classes and use 
that column classes when I use the read.csv.sql command to create the table. 
Something like may be create the table from that DF and then update with 
read.csv.sql.?

Any example code will be really helpful.

2. If we use the H2 database instead of default sqlite and use the readcsv 
option, will that be faster and is there a way we can specify the above thought 
of applying a DF class to table column properties and update with CSVREAD

library(RH2)
something like SELECT * FROM CSVREAD('C:/JP/Temp/2008.csv')

Any example code will be really helpful.

3. How do we specify where the H2 file is saved. Saw something like this, when 
I ran this example from RH2 package, couldn't find the file in the working 
directory.

con - dbConnect(H2(), jdbc:h2:~/test, sa, )

Sorry for the long mail. Appreciate all for building a great community and for 
the wonderful software in R.
Thanks for Gabor Grothendieck for bring sqldf to this great community.

Any help or direction you can provide in this is highly appreciated.

Thanks all.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQldf with sqlite and H2

2011-07-14 Thread Gabor Grothendieck
On Thu, Jul 14, 2011 at 10:33 AM, Mandans mandan...@yahoo.com wrote:
 SQldf with sqlite and H2

 I have a large csv file (about 2GB) and wanted to import the file into R and 
 do some filtering and analysis. Came across sqldf ( a great idea and product) 
 and was trying to play around to see what would be the best method of doing 
 this. csv file is comma delimited with some columns having comma inside the 
 quoation like this John, Doe.

 I tried this first

 ###
 library(sqldf)
 sqldf(attach testdb as new)
 In.File - C:/JP/Temp/2008.csv
 read.csv.sql(In.File, sql = create table table1 as select * from file,
  dbname = testdb)

 It errored out with message

 NULL
 Warning message:
 closing unused connection 3 (C:/JP/Temp/2008.csv)

 When this failed, I converted this file from comma delimited to tab delimited 
 and used this command

 #
 read.csv.sql(In.File, sql = create table table1 as select * from file,
  dbname = testdb, sep = \t)

 and this worked, it created testdb sqlite file with the size of 3GB

 now my question is in 3 parts.

 1. Is it possible to create a dataframe with appropriate column classes and 
 use that column classes when I use the read.csv.sql command to create the 
 table. Something like may be create the table from that DF and then update 
 with read.csv.sql.?

 Any example code will be really helpful.

Here is an example of using method = name__class.  Note there are
two underscores in a row.  It appears I neglected to document that
Date2 means convert from character representation whereas Date means
convert from numeric representation.  It would also be possible to use
method = raw and then coerce the columns yourself afterwards.

# create test file
Lines - 'A__Date2|B
2000-01-01|x,y
2000-01-02|c,d
'
tf - tempfile()
cat(Lines, file = tf)


library(sqldf)
DF - read.csv.sql(tf, sep = |, method = name__class)
str(DF)


 2. If we use the H2 database instead of default sqlite and use the readcsv 
 option, will that be faster and is there a way we can specify the above 
 thought of applying a DF class to table column properties and update with 
 CSVREAD

 library(RH2)
 something like SELECT * FROM CSVREAD('C:/JP/Temp/2008.csv')

 Any example code will be really helpful.

Sorry, I haven't tested the speed of this.  postgresql and mysql, both
supported by sqldf, also have builtin methods to read files. If I had
to guess I would guess that mysql would be fastest but this would have
to be tested.


 3. How do we specify where the H2 file is saved. Saw something like this, 
 when I ran this example from RH2 package, couldn't find the file in the 
 working directory.

 con - dbConnect(H2(), jdbc:h2:~/test, sa, )

~ means your home directory so ~/test means test is in the home directory.

Try

normalizePath(~)
normalizePath(~/test)
etc.

to see what they refer to.

Regards.


 Sorry for the long mail. Appreciate all for building a great community and 
 for the wonderful software in R.
 Thanks for Gabor Grothendieck for bring sqldf to this great community.

 Any help or direction you can provide in this is highly appreciated.

 Thanks all.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQldf with sqlite and H2

2011-07-14 Thread Mandans
Thanks a lot Gabor. It helped a lot. Appreciate your time and effort.

Thanks

--- On Thu, 7/14/11, Gabor Grothendieck ggrothendi...@gmail.com wrote:

 From: Gabor Grothendieck ggrothendi...@gmail.com
 Subject: Re: [R] SQldf with sqlite and H2
 To: Mandans mandan...@yahoo.com
 Cc: r-help@r-project.org
 Date: Thursday, July 14, 2011, 2:22 PM
 On Thu, Jul 14, 2011 at 10:33 AM,
 Mandans mandan...@yahoo.com
 wrote:
  SQldf with sqlite and H2
 
  I have a large csv file (about 2GB) and wanted to
 import the file into R and do some filtering and analysis.
 Came across sqldf ( a great idea and product) and was trying
 to play around to see what would be the best method of doing
 this. csv file is comma delimited with some columns having
 comma inside the quoation like this John, Doe.
 
  I tried this first
 
  ###
  library(sqldf)
  sqldf(attach testdb as new)
  In.File - C:/JP/Temp/2008.csv
  read.csv.sql(In.File, sql = create table table1 as
 select * from file,
   dbname = testdb)
 
  It errored out with message
 
  NULL
  Warning message:
  closing unused connection 3 (C:/JP/Temp/2008.csv)
 
  When this failed, I converted this file from comma
 delimited to tab delimited and used this command
 
  #
  read.csv.sql(In.File, sql = create table table1 as
 select * from file,
   dbname = testdb, sep = \t)
 
  and this worked, it created testdb sqlite file with
 the size of 3GB
 
  now my question is in 3 parts.
 
  1. Is it possible to create a dataframe with
 appropriate column classes and use that column classes when
 I use the read.csv.sql command to create the table.
 Something like may be create the table from that DF and then
 update with read.csv.sql.?
 
  Any example code will be really helpful.
 
 Here is an example of using method = name__class. 
 Note there are
 two underscores in a row.  It appears I neglected to
 document that
 Date2 means convert from character representation whereas
 Date means
 convert from numeric representation.  It would also be
 possible to use
 method = raw and then coerce the columns yourself
 afterwards.
 
 # create test file
 Lines - 'A__Date2|B
 2000-01-01|x,y
 2000-01-02|c,d
 '
 tf - tempfile()
 cat(Lines, file = tf)
 
 
 library(sqldf)
 DF - read.csv.sql(tf, sep = |, method =
 name__class)
 str(DF)
 
 
  2. If we use the H2 database instead of default sqlite
 and use the readcsv option, will that be faster and is there
 a way we can specify the above thought of applying a DF
 class to table column properties and update with CSVREAD
 
  library(RH2)
  something like SELECT * FROM
 CSVREAD('C:/JP/Temp/2008.csv')
 
  Any example code will be really helpful.
 
 Sorry, I haven't tested the speed of this.  postgresql
 and mysql, both
 supported by sqldf, also have builtin methods to read
 files. If I had
 to guess I would guess that mysql would be fastest but this
 would have
 to be tested.
 
 
  3. How do we specify where the H2 file is saved. Saw
 something like this, when I ran this example from RH2
 package, couldn't find the file in the working directory.
 
  con - dbConnect(H2(), jdbc:h2:~/test, sa, )
 
 ~ means your home directory so ~/test means test is in the
 home directory.
 
 Try
 
 normalizePath(~)
 normalizePath(~/test)
 etc.
 
 to see what they refer to.
 
 Regards.
 
 
  Sorry for the long mail. Appreciate all for building a
 great community and for the wonderful software in R.
  Thanks for Gabor Grothendieck for bring sqldf to this
 great community.
 
  Any help or direction you can provide in this is
 highly appreciated.
 
  Thanks all.
 
  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.
 
 
 
 
 -- 
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sqldf INSERT INTO

2011-04-20 Thread new2R
 Hi,

I am new to R and trying to migrate from SAS. I am trying to copy data from
one table to another table which have same columns using sqldf. but not
working and showing NULL

I wrote statement as sqldf(INSERT INTO new select * from data) but showing
NULL

Please help me in this regard.

Thank you

--
View this message in context: 
http://r.789695.n4.nabble.com/Sqldf-INSERT-INTO-tp3463533p3463533.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sqldf INSERT INTO

2011-04-20 Thread Gabor Grothendieck
On Wed, Apr 20, 2011 at 12:39 PM, new2R bv_agr...@yahoo.co.in wrote:
  Hi,

 I am new to R and trying to migrate from SAS. I am trying to copy data from
 one table to another table which have same columns using sqldf. but not
 working and showing NULL

 I wrote statement as sqldf(INSERT INTO new select * from data) but showing
 NULL

 Please help me in this regard.


In your example new is a table in the sqlite database, not in R's
workspace, so you have to return it:

 library(sqldf)
 BOD
  Time demand
118.3
22   10.3
33   19.0
44   16.0
55   15.6
67   19.8
 New - BOD[1, ]
 BOD1 - BOD[2:3,]
 sqldf(c(insert into New select * from BOD1, select * from New))
  Time demand
118.3
22   10.3
33   19.0


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF syntax

2011-04-19 Thread Gabor Grothendieck
On Mon, Apr 18, 2011 at 6:34 PM, new2R bv_agr...@yahoo.co.in wrote:
 Hi,

 I am new to R and trying to migrate from SAS. I am trying to use sqldf to
 create a new table from existed table and change some of the columns. I have
 table called DataOld with columns commodity, rate and total and I am trying
 to create new table called DataNew with columns commodity, ratenew and
 totalNew.

 sqldf(create table datanew as select commodity, ratenew as rate * 10,
 totalnew as total *10 from DataOld)

 I got error message  Error in sqliteExecStatement(con, statement,
 bind.data) :
  RS-DBI driver: (error in statement: near *: syntax error)


Its expression as name, not name as expression.  Try this:

 library(sqldf)
 BODnew - sqldf(select demand, Time, demand + 1 as demandPlusOne from BOD)
 BODnew
  demand Time demandPlusOne
18.31   9.3
2   10.32  11.3
3   19.03  20.0
4   16.04  17.0
5   15.65  16.6
6   19.87  20.8

For more, the sqldf home page at http://sqldf.googlecode.com has links
to sqlite site where you can find sql syntax diagrams. See the links
along the left side of the page there.




-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF syntax

2011-04-19 Thread new2R
Thank you very much. Its working.

--
View this message in context: 
http://r.789695.n4.nabble.com/SQLDF-syntax-tp3458919p3460448.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SQLDF syntax

2011-04-18 Thread new2R
Hi, 

I am new to R and trying to migrate from SAS. I am trying to use sqldf to
create a new table from existed table and change some of the columns. I have
table called DataOld with columns commodity, rate and total and I am trying
to create new table called DataNew with columns commodity, ratenew and
totalNew.

 sqldf(create table datanew as select commodity, ratenew as rate * 10,
 totalnew as total *10 from DataOld)

I got error message  Error in sqliteExecStatement(con, statement,
bind.data) : 
  RS-DBI driver: (error in statement: near *: syntax error)

Please help me in this regard.

Thank you



--
View this message in context: 
http://r.789695.n4.nabble.com/SQLDF-syntax-tp3458919p3458919.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SQLDF - Submitting Queries with R Objects as Columns

2011-03-09 Thread Mike Schumacher
Fellow R programmers,

I'd like to submit SQLDF statements with R objects as column names.

For example, I want to assign X to var1  (var1-X) and then refer to
var1 in the SQLDF statement.  SQLDF needs to understand that when I
reference var1, it should look for X in the dataframe.

This is necessary because my SQLDF is part of a larger function that I call
that repeatedly with different column names.

Code below... thank you in advance!

Mike


library(sqldf)

testdf-data.frame(c(1,2,3,4,5,6,7,8,9,10),c(1,1,1,2,2,2,3,3,3,3))
names(testdf)-c(X,Y)

# Works as intended
sqldf(select sum(X) as XSUM,
   Y as Y
   from testdf
   group by Y)

# Now... can I reference var1 in the code?
var1-X

# Unsuccessful Atteps
sqldf(select sum(var1) as XSUM,
   Y as Y
   from testdf
   group by Y)

sqldf(select sum(get(var1)) as XSUM,
   Y as Y
   from testdf
   group by Y)

sqldf(select sum(return(var1)) as XSUM,
   Y as Y
   from testdf
   group by Y)



-- 
Michael Schumacher
mike.schumac...@gmail.com
Manager Data  Analytics, ValueClick
818-851-8638

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF - Submitting Queries with R Objects as Columns

2011-03-09 Thread Gabor Grothendieck
On Wed, Mar 9, 2011 at 11:41 AM, Mike Schumacher
mike.schumac...@gmail.com wrote:
 Fellow R programmers,

 I'd like to submit SQLDF statements with R objects as column names.

 For example, I want to assign X to var1  (var1-X) and then refer to
 var1 in the SQLDF statement.  SQLDF needs to understand that when I
 reference var1, it should look for X in the dataframe.

 This is necessary because my SQLDF is part of a larger function that I call
 that repeatedly with different column names.

 Code below... thank you in advance!

 Mike


 library(sqldf)

 testdf-data.frame(c(1,2,3,4,5,6,7,8,9,10),c(1,1,1,2,2,2,3,3,3,3))
 names(testdf)-c(X,Y)

 # Works as intended
 sqldf(select sum(X) as XSUM,
       Y             as Y
       from testdf
       group by Y)

 # Now... can I reference var1 in the code?
 var1-X


Here are two ways:

sqldf(sprintf(select sum(%s) XSUM, Y from testdf group by Y, var1))

fn$sqldf(select sum($var1) XSUM, Y from testdf group by Y)

See ?sprintf

fn comes from the gsubfn package (which is automatically pulled in by
sqldf) and adds quasi perl style string interpolation to the arguments
passed to the function call it prefaces.  See
http://gsubfn.googlecode.com and ?fn


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF - Submitting Queries with R Objects as Columns

2011-03-09 Thread Rob Tirrell
You're submitting queries for SQLDF to execute as strings. So, if you want
to use a variable column name, sprintf() or paste() your statement together,
like:

sqldf(sprintf('select sum(%s) as XSUM, Y as Y from testdf group by Y',
var1))

--
Robert Tirrell | r...@stanford.edu | (607) 437-6532
Program in Biomedical Informatics | Butte Lab | Stanford University


 sqldf(select sum(return(var1)) as XSUM,
   Y as Y
   from testdf
   group by Y)



 --
 Michael Schumacher
 mike.schumac...@gmail.com
 Manager Data  Analytics, ValueClick
 818-851-8638

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-02 Thread GL

Marc: Installing Simon's package worked perfectly. Thanks so much! 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-hanging-on-macintosh-works-on-windows-tp3022193p3023736.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf error only on Unix not Windows

2010-11-01 Thread Alex Bryant
Hello Group,

I am having trouble with the sqldf package on unix.  The same 
code works fine on windows.

Silly Example script:

# Load the package
library(sqldf)

# Use the titanic data set

data(women)
colnames(women)
head(women)

sqldf('select height, count(*) from women where height is not null group by 
weight')

Unix Output and error:

bash-3.00$ R --vanilla testR.R

R version 2.10.1 (2009-12-14)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 # Load the package
 library(sqldf)
Loading required package: DBI
Loading required package: RSQLite
Loading required package: RSQLite.extfuns
Loading required package: gsubfn
Loading required package: proto
Loading required package: tcltk
Loading Tcl/Tk interface ... done
Loading required package: chron


 # Use the titanic data set

 data(women)
 colnames(women)
[1] height weight
 head(women)
  height weight
1 58115
2 59117
3 60120
4 61123
5 62126
6 63129

 sqldf('select height, count(*) from women where height is not null group by 
 weight')
Error in sqliteFetch(rs, n = -1, ...) :
  RSQLite driver: (RS_SQLite_fetch: failed first step: SQL logic error or 
missing database)
Calls: sqldf ... dbGetQuery - sqliteQuickSQL - sqliteFetch - .Call
Error in !dbPreExists : invalid argument type
Calls: sqldf ... dbGetQuery - sqliteQuickSQL - sqliteFetch - .Call
Execution halted
Warning message:
RS-DBI driver warning: (closing pending result sets before closing this 
connection)


Box:
Solaris: SunOS icsun7 5.10 Generic_118822-25 sun4u sparc SUNW,Sun-Fire-280R



I am unable to find any hits on the generated errors which makes me think 
either something is really wrong (resource issues or something) or this is not 
a common use of sqldf??  Is there a better package I should be using?

Thank you very much for your help,

 Alex Bryant
Software Development
Integrated Clinical Systems, Inc.
908-996-7208



Confidentiality Note: This e-mail, and any attachment to...{{dropped:13}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf error only on Unix not Windows

2010-11-01 Thread Gabor Grothendieck
On Mon, Nov 1, 2010 at 9:28 AM, Alex Bryant abry...@i-review.com wrote:
 Hello Group,

                I am having trouble with the sqldf package on unix.  The same 
 code works fine on windows.

 Silly Example script:

 # Load the package
 library(sqldf)

 # Use the titanic data set

 data(women)
 colnames(women)
 head(women)

 sqldf('select height, count(*) from women where height is not null group by 
 weight')


Some things to try:

-  try adding dbname = tempfile() argument to your sqldf statement and
see if that makes any difference

- try it with the H2 database rather than sqlite (or with PostgreSQL)
  To use it with H2 make sure you have Java and the CRAN package, RH2,
installed.
  RH2 includes the H2 database itself so you don't need to install that.
  Then issue this line in R any time before your first sqldf call
 library(RH2)
  sqldf will notice it and automatically use the H2 database instead of sqlite

- try it with R 2.11

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread GL

Have a long script that runs fine on windows (32 bit). When I try to run in
on two different macs (64 bit), however, it hangs with identical behavior.

I start with:
library(sqldf)

This results in messages:
Loading required package: DBI
Loading required package: RSQLite
Loading required package: RSQLite.extfuns
Loading required package: gsubfn
Loading required package: proto
Loading required package: chron

I then read some data, etc.

I execute the following:

#merge raw data and all possible combinations
  df.final - sqldf('select Date, Hour, x as RoomsInUse from
df.possible.combos
left join df.aggregate using (Hour, Date)')

I receive the messages:
Loading required package: tcltk
Loading Tcl/Tk interface ... 
+ 

Then I get into some kind of loop. Message at bottom ribbon says:

executing:
try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE)

On the pc implementation it runs flawlessly, and quickly. 

Truly appreciate any ideas.

-- 
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-hanging-on-macintosh-works-on-windows-tp3022193p3022193.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Gabor Grothendieck
On Mon, Nov 1, 2010 at 9:59 AM, GL pfl...@shands.ufl.edu wrote:

 Have a long script that runs fine on windows (32 bit). When I try to run in
 on two different macs (64 bit), however, it hangs with identical behavior.

 I start with:
 library(sqldf)

 This results in messages:
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron

 I then read some data, etc.

 I execute the following:

 #merge raw data and all possible combinations
  df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
    left join df.aggregate using (Hour, Date)')

 I receive the messages:
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +

 Then I get into some kind of loop. Message at bottom ribbon says:

 executing:
 try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE)


That is not a line that appears in the sqldf source code.  Try these
suggestions anyways:

http://permalink.gmane.org/gmane.comp.lang.r.general/209443


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread GL

added library(RH2)

Still get message:

Loading required package: tcltk
Loading Tcl/Tk interface
+

directly after sqldf statement 

   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
+ left join df.aggregate using (Hour, Date)')

There is no progress spinner. If I hit enter I get a 

At that point I start to enter any command (just summary, for instance), I
get the progress spinner, the
try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE)
 message in the bottom ribbon, and the system apparently hangs. 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-hanging-on-macintosh-works-on-windows-tp3022193p3022233.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Gabor Grothendieck
On Mon, Nov 1, 2010 at 10:32 AM, GL pfl...@shands.ufl.edu wrote:

 added library(RH2)

 Still get message:

 Loading required package: tcltk
 Loading Tcl/Tk interface
 +

 directly after sqldf statement

   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 +     left join df.aggregate using (Hour, Date)')

 There is no progress spinner. If I hit enter I get a 

 At that point I start to enter any command (just summary, for instance), I
 get the progress spinner, the
 try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE)
  message in the bottom ribbon, and the system apparently hangs.

I don't have a Mac but if you wish to pursue it try this:

library(sqldf)
debug(sqldf)
sqldf(...whatever...)
# now step through it by repeatedly pressing Enter and send me the
console output of the session

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread GL


 
 library(sqldf)
Loading required package: DBI
Loading required package: RSQLite
Loading required package: RSQLite.extfuns
Loading required package: gsubfn
Loading required package: proto
Loading required package: chron
 debug(sqldf)  
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
+ left join df.aggregate using (Hour, Date)')
debugging in: sqldf(select Date, Hour, x as RoomsInUse from
\df.possible.combos\\nleft join \df.aggregate\ using (Hour, Date))
debug: {
as.POSIXct.character - function(x) structure(as.numeric(x), 
class = c(POSIXt, POSIXct))
as.Date.character - function(x) structure(as.numeric(x), 
class = Date)
as.Date.numeric - function(x, origin = 1970-01-01, ...)
base::as.Date.numeric(x, 
origin = origin, ...)
as.dates.character - function(x) structure(as.numeric(x), 
class = c(dates, times))
as.times.character - function(x) structure(as.numeric(x), 
class = times)
overwrite - FALSE
request.open - missing(x)  is.null(connection)
request.close - missing(x)  !is.null(connection)
request.con - !missing(x)  !is.null(connection)
request.nocon - !missing(x)  is.null(connection)
dfnames - fileobjs - character(0)
if (request.close || request.nocon) {
on.exit({
dbPreExists - attr(connection, dbPreExists)
dbname - attr(connection, dbname)
if (!missing(dbname)  !is.null(dbname)  dbname == 
:memory:) {
dbDisconnect(connection)
} else if (!dbPreExists  drv == sqlite) {
dbDisconnect(connection)
file.remove(dbname)
} else {
for (nam in dfnames) dbRemoveTable(connection, 
  nam)
for (fo in fileobjs) dbRemoveTable(connection, 
  fo)
dbDisconnect(connection)
}
})
if (request.close) {
if (identical(connection, getOption(sqldf.connection))) 
options(sqldf.connection = NULL)
return()
}
}
if (request.open || request.nocon) {
if (is.null(drv)) {
drv - if (package:RpgSQL %in% search()) {
pgSQL
}
else if (package:RMySQL %in% search()) {
MySQL
}
else if (package:RH2 %in% search()) {
H2
}
else SQLite
}
drv - tolower(drv)
if (drv == mysql) {
m - dbDriver(MySQL)
connection - if (missing(dbname) || dbname == :memory:) {
dbConnect(m)
}
else dbConnect(m, dbname = dbname)
dbPreExists - TRUE
}
else if (drv == pgsql) {
m - dbDriver(pgSQL)
if (missing(dbname) || is.null(dbname)) {
dbname - getOption(RpgSQL.dbname)
if (is.null(dbname)) 
  dbname - test
}
connection - dbConnect(m, dbname = dbname)
dbPreExists - TRUE
}
else if (drv == h2) {
m - H2()
if (missing(dbname) || is.null(dbname)) 
dbname - :memory:
dbPreExists - dbname != :memory:  file.exists(dbname)
connection - if (missing(dbname) || dbname == :memory:) {
dbConnect(m, jdbc:h2:mem:, sa, )
}
else {
jdbc.string - paste(jdbc:h2, dbname, sep = :)
dbConnect(m, jdbc.string)
}
}
else {
m - dbDriver(SQLite)
if (missing(dbname)) 
dbname - :memory:
dbPreExists - dbname != :memory:  file.exists(dbname)
if (is.null(getOption(sqldf.dll))) {
dll - Sys.which(libspatialite-1.dll)
if (dll != ) 
  options(sqldf.dll = dll)
else options(sqldf.dll = FALSE)
}
dll - getOption(sqldf.dll)
if (length(dll) != 1 || identical(dll, FALSE) || 
nchar(dll) == 0) {
dll - FALSE
}
else {
if (dll == basename(dll)) 
  dll - Sys.which(dll)
}
options(sqldf.dll = dll)
if (!identical(dll, FALSE)) {
connection - dbConnect(m, dbname = dbname,
loadable.extensions = TRUE)
s - sprintf(select load_extension('%s'), dll)
dbGetQuery(connection, s)
}
else connection - dbConnect(m, dbname = dbname)
init_extensions(connection)
}
attr(connection, dbPreExists) - dbPreExists
if (missing(dbname)  drv == sqlite) 
dbname - :memory:
attr(connection, dbname) - dbname
if (request.open) {
options(sqldf.connection = connection)

Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Gabor Grothendieck
On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote:



 library(sqldf)
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron
 debug(sqldf)
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 +     left join df.aggregate using (Hour, Date)')
 debugging in: sqldf(select Date, Hour, x as RoomsInUse from
...
 debug: words. - words - strapply(x, [[:alnum:]._]+)
 Browse[2]
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +

There is something wrong with tcltk on your system.  You can tell it
not to use tcltk by setting the appropriate option as discussed in
sqldf FAQ #5:

http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Marc Schwartz
On Nov 1, 2010, at 10:55 AM, Gabor Grothendieck wrote:

 On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote:
 
 
 
 library(sqldf)
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron
 debug(sqldf)
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 + left join df.aggregate using (Hour, Date)')
 debugging in: sqldf(select Date, Hour, x as RoomsInUse from
 ...
 debug: words. - words - strapply(x, [[:alnum:]._]+)
 Browse[2]
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +
 
 There is something wrong with tcltk on your system.  You can tell it
 not to use tcltk by setting the appropriate option as discussed in
 sqldf FAQ #5:
 
 http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing.


GL,

If you installed R using the OSX binary from CRAN, it does not include tcl/tk. 
You need to install the separate tcltk package that Simon has put together and 
is available from:

  http://cran.us.r-project.org/bin/macosx/tools/

You also need to have X11 installed, which is available from the OSX DVD in the 
Optional Installs section.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Gabor Grothendieck
On Mon, Nov 1, 2010 at 12:10 PM, Marc Schwartz marc_schwa...@me.com wrote:
 On Nov 1, 2010, at 10:55 AM, Gabor Grothendieck wrote:

 On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote:



 library(sqldf)
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron
 debug(sqldf)
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 +     left join df.aggregate using (Hour, Date)')
 debugging in: sqldf(select Date, Hour, x as RoomsInUse from
 ...
 debug: words. - words - strapply(x, [[:alnum:]._]+)
 Browse[2]
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +

 There is something wrong with tcltk on your system.  You can tell it
 not to use tcltk by setting the appropriate option as discussed in
 sqldf FAQ #5:

 http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing.


 GL,

 If you installed R using the OSX binary from CRAN, it does not include 
 tcl/tk. You need to install the separate tcltk package that Simon has put 
 together and is available from:

  http://cran.us.r-project.org/bin/macosx/tools/

 You also need to have X11 installed, which is available from the OSX DVD in 
 the Optional Installs section.

 HTH,

 Marc Schwartz




Note that sqldf can work without tcltk, as well.  The gsubfn package
does check for tcltk and and sets the engine to R rather than
tcltk if

capabilities()[[tcltk]]

is FALSE.  There may be a bug in R or a problem with the installation.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf syntax

2010-08-27 Thread Bond, Stephen
I had checked those references before posting, actually. SQLite has a very 
limited implementation of the standard. To do a single table update I would not 
go to sql. It's easy enough to do in R.

The problem is when I need to do an update from a left outer join, which I had 
to do with sqlSave (to a mySQL table), then sqlQuery, then sqlFetch.
sqlSave is amazingly slow, takes half an hour. (Would never do that at home :-) 
just too lazy to write a formal table def and use load data infile from a csv 
dump.

Also not happy with Dates becoming years in the transition. 
Will check the other suggestion about data.table and report.

Cheers everybody.

Stephen B

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Thursday, August 26, 2010 4:26 PM
To: Bond, Stephen
Cc: r-help@r-project.org
Subject: Re: [R] sqldf syntax

On Thu, Aug 26, 2010 at 2:31 PM, Bond, Stephen stephen.b...@cibc.com wrote:
 Please correct the following

 sqldf(update esc left join forwagg  on esc.ym=forwagg.Date set 
 esc.ri2=forwagg.N1 where esc.age=12,select * from main.esc)
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near left: syntax error)


1. sqldf takes one sql argument whereas the above has two sql
arguments; however, the one argument may be a vector of sql commands.
 See ?sqldf and the examples on the sqldf home page
http://sqldf.googlecode.com

2. there is an error in the syntax of your update statement.  For
correct syntax see the sqlite site:

http://sqlite.org/lang_update.html

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf syntax

2010-08-26 Thread Bond, Stephen
Please correct the following

 sqldf(update esc left join forwagg  on esc.ym=forwagg.Date set 
 esc.ri2=forwagg.N1 where esc.age=12,select * from main.esc)
Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near left: syntax error)

Thanks.

Stephen


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf syntax

2010-08-26 Thread Gabor Grothendieck
On Thu, Aug 26, 2010 at 2:31 PM, Bond, Stephen stephen.b...@cibc.com wrote:
 Please correct the following

 sqldf(update esc left join forwagg  on esc.ym=forwagg.Date set 
 esc.ri2=forwagg.N1 where esc.age=12,select * from main.esc)
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near left: syntax error)


1. sqldf takes one sql argument whereas the above has two sql
arguments; however, the one argument may be a vector of sql commands.
 See ?sqldf and the examples on the sqldf home page
http://sqldf.googlecode.com

2. there is an error in the syntax of your update statement.  For
correct syntax see the sqlite site:

http://sqlite.org/lang_update.html

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SQLDF from Variable Matrix

2010-08-04 Thread Gabor Grothendieck
On Wed, Aug 4, 2010 at 12:29 AM, Suphajak Ngamlak
supha...@phatrasecurities.com wrote:
 Dear all,

 I would like to do sample statistics, e.g. mean, median from very large
 dataset. This is part of commands I use routinely with several dataset
 so I would like to make it into function. The simplified examples are

 Test-data.frame(A=c('a','b','c','a','b','c'),B=c(1,2,3,4,5,6))

 #Create function (This one work)

 GetAvg-function(Input,Bygroup){
 AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median
                from Test
    group by $Bygroup)
 return(AVG)
 }

 Result-GetAvg(Test,'A')

 #Create function (This one does not work)

 GetAvg-function(Input,Bygroup){
 AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median
                from $Input
    group by $Bygroup)
 return(AVG)
 }

 Result-GetAvg(Test,'A')

That should be GetAvg('Test', 'A') with quotes around Test or if you
want to be able to specify the data unquoted then insert this line at
the top of your function:

Input - deparse(substitute(Input))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SQLDF from Variable Matrix

2010-08-03 Thread Suphajak Ngamlak
Dear all,

I would like to do sample statistics, e.g. mean, median from very large
dataset. This is part of commands I use routinely with several dataset
so I would like to make it into function. The simplified examples are

Test-data.frame(A=c('a','b','c','a','b','c'),B=c(1,2,3,4,5,6))

#Create function (This one work)

GetAvg-function(Input,Bygroup){
AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median
from Test
group by $Bygroup)
return(AVG)
}

Result-GetAvg(Test,'A')

#Create function (This one does not work)

GetAvg-function(Input,Bygroup){
AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median
from $Input
group by $Bygroup)
return(AVG)
}

Result-GetAvg(Test,'A')

The only difference is with $Input. How can I refer to this variable in
sqldf. Thank you


Best Regards,

Suphajak Ngamlak
Equity and Derivatives Trading
Phatra Securities Public Company Limited
Tel: +662-305-9179   
Email: supha...@phatrasecurities.com



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf 0.3-5 package or tcltk problem

2010-07-28 Thread Gabor Grothendieck
On Wed, Jul 28, 2010 at 1:21 AM,  erickso...@aol.com wrote:



  This is my first post. I am running Mac OS X version 10.6.3. I am running R 
 2.11.0 GUI 1.33 64 bit.

 This may or may not be related to sqldf, but I experienced this problem while 
 attempting to use an sqldf query. The same code runs with no problem on my 
 Windows machine. Here is what happens:

 r=sqldf(select ... )
 Loading required package: tcltk
 Loading Tcl/Tk interface ...

 Then it never loads.

 I have X11 open.

 I have all the latest versions of all the necessary packages for sqldf 0.3-5:

 DBI 0.2-5
 RSQLite 0.9-1
 RSQLite.extfuns 0.0.1
 gsubfn 0.5-3
 proto 0.3-8
 chron 2.3-35

 Although it gives warning messages for these:

 package 'sqldf' was built under R version 2.11.1
 package 'RSQLite' was built under R version 2.11.1
 package 'RSQLite.extfuns' was built under R version 2.11.1
 package 'gsubfun' was built under R version 2.11.1

 What can I do to load the Tcl/Tk interface?

Some things to try:

- upgrade to R 2.11.1
- try this alone: library(tcltk)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf 0.3-5 package or tcltk problem

2010-07-27 Thread ericksonsm

 

 This is my first post. I am running Mac OS X version 10.6.3. I am running R 
2.11.0 GUI 1.33 64 bit.

This may or may not be related to sqldf, but I experienced this problem while 
attempting to use an sqldf query. The same code runs with no problem on my 
Windows machine. Here is what happens:

 r=sqldf(select ... )
Loading required package: tcltk
Loading Tcl/Tk interface ...

Then it never loads.

I have X11 open.

I have all the latest versions of all the necessary packages for sqldf 0.3-5:

DBI 0.2-5
RSQLite 0.9-1
RSQLite.extfuns 0.0.1
gsubfn 0.5-3
proto 0.3-8
chron 2.3-35

Although it gives warning messages for these:

package 'sqldf' was built under R version 2.11.1
package 'RSQLite' was built under R version 2.11.1
package 'RSQLite.extfuns' was built under R version 2.11.1
package 'gsubfun' was built under R version 2.11.1

What can I do to load the Tcl/Tk interface?



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf modify table

2010-07-16 Thread PeterTucker

Hi - I am something of a newbie and am a little perplexed.  When (trying to)
modify a table I issue the following commands with subsequent errors

sqldf(alter table Korea drop column code, dbname = mydb)
error in statement: near drop: syntax error

or

sqldf(alter table Korea rename column hyr to hyrI, dbname = mydb)
error in statement: near column: syntax error

These are simple commands - am I missing something obvious?  I can retrieve
data from them, and retrieve their table_info

Thanks

Peter
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-modify-table-tp2291804p2291804.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf modify table

2010-07-16 Thread Gabor Grothendieck
On Fri, Jul 16, 2010 at 2:46 PM, PeterTucker pthet...@gmail.com wrote:

 Hi - I am something of a newbie and am a little perplexed.  When (trying to)
 modify a table I issue the following commands with subsequent errors

 sqldf(alter table Korea drop column code, dbname = mydb)
 error in statement: near drop: syntax error

 or

 sqldf(alter table Korea rename column hyr to hyrI, dbname = mydb)
 error in statement: near column: syntax error

 These are simple commands - am I missing something obvious?  I can retrieve
 data from them, and retrieve their table_info


SQLite does not support dropping columns. See:

   http://www.sqlite.org/lang_altertable.html

however, sqldf does support the H2 and PostgreSQL databases in
addition to sqlite so you can try one of those if this feature is
important to you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf: issues with natural joins

2010-05-20 Thread Nick Switanek
Hello,

I'm having trouble discovering what's going wrong with my use of natural
joins via sqldf.

Following the instructions under 4i at http://code.google.com/p/sqldf/,
which discusses creating indices to speed joins, I have been only unreliably
able to get natural joins to work.

For example,

 Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05
10:58:00', 'AES 01-01-11 12:30:00')
 A - data.frame(Tid, dfName = 'a')
 B - data.frame(Tid = Tid[2:4], dfName = 'b')
 C - data.frame(Tid = Tid[1:3], dfName = 'c')

# then use the sqldf library
 library(sqldf)
 sqldf()

# to create indices on the Tid variable shared across data.frames
 sqldf('create index indA on A(Tid)')
 sqldf('create index indB on B(Tid)')
 sqldf('create index indC on C(Tid)')

# check to make sure everything is there
 sqldf('select * from sqlite_master')

# doing a natural join (implicitly on Tid)
# does not give the expected joins
 sqldf('select * from main.A natural join main.B')
[1] TiddfName
0 rows (or 0-length row.names)
 sqldf('select * from main.A natural join main.C')
[1] TiddfName
0 rows (or 0-length row.names)
 sqldf('select * from main.B natural join main.C')
[1] TiddfName
0 rows (or 0-length row.names)

# even using a where clause (which doesn't have the efficiency qualities I
need the indexed natural joins for) is problematic, setting values of the
dfName variable incorrectly for the data from C
 sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
Tid dfName   Tid dfName
1 AES 01-01-02 11:53:00  b AES 01-01-02 11:53:00  b
2 AES 01-01-05 10:58:00  b AES 01-01-05 10:58:00  b

I'm grateful for your guidance on what I'm doing wrong with the natural join
in sqldf.

many thanks,
Nick

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Gabor Grothendieck
There are two problems:

1. A natural join will join all columns with the same names in the two
tables and that includes not only Tid but also dfName and since there
are no rows that have the same Tid and dfName the result has zero
rows.

2. the heuristic it uses fails when you retrieve the same column name
from multiple tables so use method = raw to turn off the heuristic.
The heuristic will be improved to cover this case in the future.
Read FAQ #1 on the home page:
http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors?

This should work:

 sqldf('select * from main.A join main.B using(Tid)', method = raw)
 Tid dfName dfName
1  AES 01-01-02 11:53:00  a  b
2 AES 01-01-05\n10:58:00  a  b
3  AES 01-01-11 12:30:00  a  b

This works too as the double dfName no longer exists to confuse the heuristic:

names(B)[2] - dfNameB
sqldf('select * from main.A join main.B using(Tid)')



On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote:
 Hello,

 I'm having trouble discovering what's going wrong with my use of natural
 joins via sqldf.

 Following the instructions under 4i at http://code.google.com/p/sqldf/,
 which discusses creating indices to speed joins, I have been only unreliably
 able to get natural joins to work.

 For example,

 Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05
 10:58:00', 'AES 01-01-11 12:30:00')
 A - data.frame(Tid, dfName = 'a')
 B - data.frame(Tid = Tid[2:4], dfName = 'b')
 C - data.frame(Tid = Tid[1:3], dfName = 'c')

 # then use the sqldf library
 library(sqldf)
 sqldf()

 # to create indices on the Tid variable shared across data.frames
 sqldf('create index indA on A(Tid)')
 sqldf('create index indB on B(Tid)')
 sqldf('create index indC on C(Tid)')

 # check to make sure everything is there
 sqldf('select * from sqlite_master')

 # doing a natural join (implicitly on Tid)
 # does not give the expected joins
 sqldf('select * from main.A natural join main.B')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.A natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.B natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)

 # even using a where clause (which doesn't have the efficiency qualities I
 need the indexed natural joins for) is problematic, setting values of the
 dfName variable incorrectly for the data from C
 sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
                    Tid dfName                   Tid dfName
 1 AES 01-01-02 11:53:00      b AES 01-01-02 11:53:00      b
 2 AES 01-01-05 10:58:00      b AES 01-01-05 10:58:00      b

 I'm grateful for your guidance on what I'm doing wrong with the natural join
 in sqldf.

 many thanks,
 Nick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Gabor Grothendieck
Although that works I had meant to write:

 names(B)[2] - dfNameB
 # ... other commands
 sqldf('select * from main.A natural join main.B')

so that now only Tid is in common so the natural join just picks it up
and also the heuristic works again since we no longer retrieve
duplicate column names.

On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 There are two problems:

 1. A natural join will join all columns with the same names in the two
 tables and that includes not only Tid but also dfName and since there
 are no rows that have the same Tid and dfName the result has zero
 rows.

 2. the heuristic it uses fails when you retrieve the same column name
 from multiple tables so use method = raw to turn off the heuristic.
 The heuristic will be improved to cover this case in the future.
 Read FAQ #1 on the home page:
 http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors?

 This should work:

 sqldf('select * from main.A join main.B using(Tid)', method = raw)
                     Tid dfName dfName
 1  AES 01-01-02 11:53:00      a      b
 2 AES 01-01-05\n10:58:00      a      b
 3  AES 01-01-11 12:30:00      a      b

 This works too as the double dfName no longer exists to confuse the heuristic:

 names(B)[2] - dfNameB
 sqldf('select * from main.A join main.B using(Tid)')



 On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote:
 Hello,

 I'm having trouble discovering what's going wrong with my use of natural
 joins via sqldf.

 Following the instructions under 4i at http://code.google.com/p/sqldf/,
 which discusses creating indices to speed joins, I have been only unreliably
 able to get natural joins to work.

 For example,

 Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05
 10:58:00', 'AES 01-01-11 12:30:00')
 A - data.frame(Tid, dfName = 'a')
 B - data.frame(Tid = Tid[2:4], dfName = 'b')
 C - data.frame(Tid = Tid[1:3], dfName = 'c')

 # then use the sqldf library
 library(sqldf)
 sqldf()

 # to create indices on the Tid variable shared across data.frames
 sqldf('create index indA on A(Tid)')
 sqldf('create index indB on B(Tid)')
 sqldf('create index indC on C(Tid)')

 # check to make sure everything is there
 sqldf('select * from sqlite_master')

 # doing a natural join (implicitly on Tid)
 # does not give the expected joins
 sqldf('select * from main.A natural join main.B')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.A natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.B natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)

 # even using a where clause (which doesn't have the efficiency qualities I
 need the indexed natural joins for) is problematic, setting values of the
 dfName variable incorrectly for the data from C
 sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
                    Tid dfName                   Tid dfName
 1 AES 01-01-02 11:53:00      b AES 01-01-02 11:53:00      b
 2 AES 01-01-05 10:58:00      b AES 01-01-05 10:58:00      b

 I'm grateful for your guidance on what I'm doing wrong with the natural join
 in sqldf.

 many thanks,
 Nick


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Nick Switanek
Thank you very much for these clarifying responses, Gabor.

I had mistakenly assumed that creating the index on Tid restricted the
natural join to joining on Tid. Can you describe when and how indices speed
up joins, or can you point me to resources that address this? Is it only for
natural joins or any joins (including, say, a select statement with where
clause)?

thanks,
nick


On Thu, May 20, 2010 at 11:42 AM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 Although that works I had meant to write:

  names(B)[2] - dfNameB
  # ... other commands
  sqldf('select * from main.A natural join main.B')

 so that now only Tid is in common so the natural join just picks it up
 and also the heuristic works again since we no longer retrieve
 duplicate column names.

 On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
  There are two problems:
 
  1. A natural join will join all columns with the same names in the two
  tables and that includes not only Tid but also dfName and since there
  are no rows that have the same Tid and dfName the result has zero
  rows.
 
  2. the heuristic it uses fails when you retrieve the same column name
  from multiple tables so use method = raw to turn off the heuristic.
  The heuristic will be improved to cover this case in the future.
  Read FAQ #1 on the home page:
 
 http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors
 ?
 
  This should work:
 
  sqldf('select * from main.A join main.B using(Tid)', method = raw)
  Tid dfName dfName
  1  AES 01-01-02 11:53:00  a  b
  2 AES 01-01-05\n10:58:00  a  b
  3  AES 01-01-11 12:30:00  a  b
 
  This works too as the double dfName no longer exists to confuse the
 heuristic:
 
  names(B)[2] - dfNameB
  sqldf('select * from main.A join main.B using(Tid)')
 
 
 
  On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com
 wrote:
  Hello,
 
  I'm having trouble discovering what's going wrong with my use of natural
  joins via sqldf.
 
  Following the instructions under 4i at http://code.google.com/p/sqldf/,
  which discusses creating indices to speed joins, I have been only
 unreliably
  able to get natural joins to work.
 
  For example,
 
  Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES
 01-01-05
  10:58:00', 'AES 01-01-11 12:30:00')
  A - data.frame(Tid, dfName = 'a')
  B - data.frame(Tid = Tid[2:4], dfName = 'b')
  C - data.frame(Tid = Tid[1:3], dfName = 'c')
 
  # then use the sqldf library
  library(sqldf)
  sqldf()
 
  # to create indices on the Tid variable shared across data.frames
  sqldf('create index indA on A(Tid)')
  sqldf('create index indB on B(Tid)')
  sqldf('create index indC on C(Tid)')
 
  # check to make sure everything is there
  sqldf('select * from sqlite_master')
 
  # doing a natural join (implicitly on Tid)
  # does not give the expected joins
  sqldf('select * from main.A natural join main.B')
  [1] TiddfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.A natural join main.C')
  [1] TiddfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.B natural join main.C')
  [1] TiddfName
  0 rows (or 0-length row.names)
 
  # even using a where clause (which doesn't have the efficiency qualities
 I
  need the indexed natural joins for) is problematic, setting values of
 the
  dfName variable incorrectly for the data from C
  sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
 Tid dfName   Tid dfName
  1 AES 01-01-02 11:53:00  b AES 01-01-02 11:53:00  b
  2 AES 01-01-05 10:58:00  b AES 01-01-05 10:58:00  b
 
  I'm grateful for your guidance on what I'm doing wrong with the natural
 join
  in sqldf.
 
  many thanks,
  Nick
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Gabor Grothendieck
They work on any join that is able to make use of them.   If you
preface the select statement with explain query plan then it will give
you some info, e.g.

 sqldf('explain query plan select * from main.A natural join main.B')
  order from  detail
1 00 TABLE A
2 11 TABLE B WITH INDEX indB

Note that its using the index on B but not the index on A so there was
actually no point in adding that one if this were the only query.

This is potentially a large topic so see the sqlite.org site and
mailing list for more info.

On Thu, May 20, 2010 at 1:28 PM, Nick Switanek nswita...@gmail.com wrote:
 Thank you very much for these clarifying responses, Gabor.

 I had mistakenly assumed that creating the index on Tid restricted the
 natural join to joining on Tid. Can you describe when and how indices speed
 up joins, or can you point me to resources that address this? Is it only for
 natural joins or any joins (including, say, a select statement with where
 clause)?

 thanks,
 nick


 On Thu, May 20, 2010 at 11:42 AM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:

 Although that works I had meant to write:

  names(B)[2] - dfNameB
  # ... other commands
  sqldf('select * from main.A natural join main.B')

 so that now only Tid is in common so the natural join just picks it up
 and also the heuristic works again since we no longer retrieve
 duplicate column names.

 On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
  There are two problems:
 
  1. A natural join will join all columns with the same names in the two
  tables and that includes not only Tid but also dfName and since there
  are no rows that have the same Tid and dfName the result has zero
  rows.
 
  2. the heuristic it uses fails when you retrieve the same column name
  from multiple tables so use method = raw to turn off the heuristic.
  The heuristic will be improved to cover this case in the future.
  Read FAQ #1 on the home page:
 
  http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors?
 
  This should work:
 
  sqldf('select * from main.A join main.B using(Tid)', method = raw)
                      Tid dfName dfName
  1  AES 01-01-02 11:53:00      a      b
  2 AES 01-01-05\n10:58:00      a      b
  3  AES 01-01-11 12:30:00      a      b
 
  This works too as the double dfName no longer exists to confuse the
  heuristic:
 
  names(B)[2] - dfNameB
  sqldf('select * from main.A join main.B using(Tid)')
 
 
 
  On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com
  wrote:
  Hello,
 
  I'm having trouble discovering what's going wrong with my use of
  natural
  joins via sqldf.
 
  Following the instructions under 4i at http://code.google.com/p/sqldf/,
  which discusses creating indices to speed joins, I have been only
  unreliably
  able to get natural joins to work.
 
  For example,
 
  Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES
  01-01-05
  10:58:00', 'AES 01-01-11 12:30:00')
  A - data.frame(Tid, dfName = 'a')
  B - data.frame(Tid = Tid[2:4], dfName = 'b')
  C - data.frame(Tid = Tid[1:3], dfName = 'c')
 
  # then use the sqldf library
  library(sqldf)
  sqldf()
 
  # to create indices on the Tid variable shared across data.frames
  sqldf('create index indA on A(Tid)')
  sqldf('create index indB on B(Tid)')
  sqldf('create index indC on C(Tid)')
 
  # check to make sure everything is there
  sqldf('select * from sqlite_master')
 
  # doing a natural join (implicitly on Tid)
  # does not give the expected joins
  sqldf('select * from main.A natural join main.B')
  [1] Tid    dfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.A natural join main.C')
  [1] Tid    dfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.B natural join main.C')
  [1] Tid    dfName
  0 rows (or 0-length row.names)
 
  # even using a where clause (which doesn't have the efficiency
  qualities I
  need the indexed natural joins for) is problematic, setting values of
  the
  dfName variable incorrectly for the data from C
  sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
                     Tid dfName                   Tid dfName
  1 AES 01-01-02 11:53:00      b AES 01-01-02 11:53:00      b
  2 AES 01-01-05 10:58:00      b AES 01-01-05 10:58:00      b
 
  I'm grateful for your guidance on what I'm doing wrong with the natural
  join
  in sqldf.
 
  many thanks,
  Nick
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

Dear R users,

I have two data frames that were read from text files as follows:

x_data - read.table(x.txt, header = TRUE, sep = |, quote = \',
dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
nrows = 3864284,
skip = 0, check.names = TRUE,fill=TRUE,
strip.white = TRUE, blank.lines.skip = TRUE,
comment.char = #, allowEscapes = FALSE, flush = FALSE,
fileEncoding = , encoding = unknown)

x_data

prochi prescribed_date dataMonth item_code res_seqno quantity directions
CAO713  22/06/2001  NULL842752  NULL   601/D
CAO713  28/04/2000  NULL  7800  NULL 100G   A/TD
CAO713  10/04/2000  NULL842652  NULL   601/D
CAO713  03/07/2000  NULL842652  NULL   601/D
CAO713  09/01/2001  NULL842752  NULL   601/D
CAO713  16/10/2001  NULL842752  NULL   601/D
CAO713  16/08/2001  NULL842752  NULL   601/D
CAO713  17/09/1993  NULL 39620  NULL  5MLNIL
CAO713  01/05/2001  NULL842752  NULL   601/D
CAO713  05/03/2001  NULL842752  NULL   601/D



y_data

 item_codename  formulation_code  strength
bnf_code
100  NEONACLEX KTABS  NULL2.2.8
110NEONACLEX TABS   5MG2.2.1
50   MESORB DRESS 10CMX10CM   20.3.1
160 ABSORBENT CELLULOSE MESODRESS 10CMX10CM   20.3.1
161 ABSORBENT CELLULOSE MESODRESS 10CMX15CM   20.3.1
164 ABSORBENT CELLULOSE MESODRESS 20CMX25CM   20.3.1
200  SEPTRINTABS 480MG5.1.8
210  SEPTRIN PAED SFSUSP 240MG/5ML5.1.8
212SEPTRIN ADULT SUSP 480MG/5ML5.1.8
220SEPTRIN FORTE TABS 960MG5.1.8
 etc


contains all the information for the item codes
y was read in in the same way.

I then used the following code:

z  - sqldf(select * from x left join y using (code))

when I use this on my real data I get an output:
 prochi prescribed_date dataMonth item_code res_seqno quantity directions
1  CAO713  22/06/2001  NULL842752  NULL   60   
1/D
2  CAO713  28/04/2000  NULL  7800  NULL 100G  
A/TD
3  CAO713  10/04/2000  NULL842652  NULL   60   
1/D
4  CAO713  03/07/2000  NULL842652  NULL   60   
1/D
5  CAO713  09/01/2001  NULL842752  NULL   60   
1/D
6  CAO713  16/10/2001  NULL842752  NULL   60   
1/D
7  CAO713  16/08/2001  NULL842752  NULL   60   
1/D
8  CAO713  17/09/1993  NULL 39620  NULL  5ML   
NIL
9  CAO713  01/05/2001  NULL842752  NULL   60   
1/D
10 CAO713  05/03/2001  NULL842752  NULL   60   
1/D
   no_of_packs datasource scan_ref_no name formulation_code strength
1 NULLTSFNULL NA NA NA
2 NULLTSFNULL BETNOVATE RD OINT   0.025%
3 NULLTSFNULL NA NA NA
4 NULLTSFNULL NA NA NA
5 NULLTSFNULL NA NA NA
6 NULLTSFNULL NA NA NA
7 NULLTSFNULL NA NA NA
8 NULLTSFNULL   GAMMABULIN  INJ320MG
9 NULLTSFNULL NA NA NA
10NULLTSFNULL NA NA NA
   bnf_code
1  NA
2  13.4.1.2
3  NA
4  NA
5  NA
6  NA
7  NA
8  14.5
9  NA
10 NA


There is absolutely no reason for there to be NA anywhere as the
information for both the tables is complete.

Not sure what the problem is?

Thanks,
Natalie
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
Can you show the output of dput(x_data) and dput(y_data).

On Fri, Mar 12, 2010 at 11:56 AM, Newbie19_02 nvanzuy...@gmail.com wrote:

 Dear R users,

 I have two data frames that were read from text files as follows:

 x_data - read.table(x.txt, header = TRUE, sep = |, quote = \',
                dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
 nrows = 3864284,
                skip = 0, check.names = TRUE,fill=TRUE,
                strip.white = TRUE, blank.lines.skip = TRUE,
                comment.char = #, allowEscapes = FALSE, flush = FALSE,
                fileEncoding = , encoding = unknown)

 x_data

 prochi prescribed_date dataMonth item_code res_seqno quantity directions
 CAO713      22/06/2001      NULL    842752      NULL       60        1/D
 CAO713      28/04/2000      NULL      7800      NULL     100G       A/TD
 CAO713      10/04/2000      NULL    842652      NULL       60        1/D
 CAO713      03/07/2000      NULL    842652      NULL       60        1/D
 CAO713      09/01/2001      NULL    842752      NULL       60        1/D
 CAO713      16/10/2001      NULL    842752      NULL       60        1/D
 CAO713      16/08/2001      NULL    842752      NULL       60        1/D
 CAO713      17/09/1993      NULL     39620      NULL      5ML        NIL
 CAO713      01/05/2001      NULL    842752      NULL       60        1/D
 CAO713      05/03/2001      NULL    842752      NULL       60        1/D



 y_data

  item_code    name                              formulation_code  strength
 bnf_code
 100              NEONACLEX K                    TABS      NULL    2.2.8
 110                NEONACLEX                     TABS       5MG    2.2.1
 50                   MESORB                         DRESS 10CMX10CM   20.3.1
 160 ABSORBENT CELLULOSE MESO            DRESS 10CMX10CM   20.3.1
 161 ABSORBENT CELLULOSE MESO            DRESS 10CMX15CM   20.3.1
 164 ABSORBENT CELLULOSE MESO            DRESS 20CMX25CM   20.3.1
 200                  SEPTRIN                        TABS     480MG    5.1.8
 210          SEPTRIN PAED SF                    SUSP 240MG/5ML    5.1.8
 212            SEPTRIN ADULT                     SUSP 480MG/5ML    5.1.8
 220            SEPTRIN FORTE                     TABS     960MG    5.1.8
  etc


 contains all the information for the item codes
 y was read in in the same way.

 I then used the following code:

 z  - sqldf(select * from x left join y using (code))

 when I use this on my real data I get an output:
  prochi prescribed_date dataMonth item_code res_seqno quantity directions
 1  CAO713      22/06/2001      NULL    842752      NULL       60
 1/D
 2  CAO713      28/04/2000      NULL      7800      NULL     100G
 A/TD
 3  CAO713      10/04/2000      NULL    842652      NULL       60
 1/D
 4  CAO713      03/07/2000      NULL    842652      NULL       60
 1/D
 5  CAO713      09/01/2001      NULL    842752      NULL       60
 1/D
 6  CAO713      16/10/2001      NULL    842752      NULL       60
 1/D
 7  CAO713      16/08/2001      NULL    842752      NULL       60
 1/D
 8  CAO713      17/09/1993      NULL     39620      NULL      5ML
 NIL
 9  CAO713      01/05/2001      NULL    842752      NULL       60
 1/D
 10 CAO713      05/03/2001      NULL    842752      NULL       60
 1/D
   no_of_packs datasource scan_ref_no         name formulation_code strength
 1         NULL        TSF        NULL         NA             NA     NA
 2         NULL        TSF        NULL BETNOVATE RD             OINT   0.025%
 3         NULL        TSF        NULL         NA             NA     NA
 4         NULL        TSF        NULL         NA             NA     NA
 5         NULL        TSF        NULL         NA             NA     NA
 6         NULL        TSF        NULL         NA             NA     NA
 7         NULL        TSF        NULL         NA             NA     NA
 8         NULL        TSF        NULL   GAMMABULIN              INJ    320MG
 9         NULL        TSF        NULL         NA             NA     NA
 10        NULL        TSF        NULL         NA             NA     NA
   bnf_code
 1      NA
 2  13.4.1.2
 3      NA
 4      NA
 5      NA
 6      NA
 7      NA
 8      14.5
 9      NA
 10     NA


 There is absolutely no reason for there to be NA anywhere as the
 information for both the tables is complete.

 Not sure what the problem is?

 Thanks,
 Natalie
 --
 View this message in context: 
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list

Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt
feb09_267_presc_items_tsf.txt 

is the total file for y so if I use the command line with the total data for
y then I get the output specified in z

Thanks,
Natalie
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

dput(x_data)

structure(list(prochi = c(CAO713, CAO713, CAO713, 
CAO713, CAO713, CAO713, CAO713, CAO713, 
CAO713, CAO713), prescribed_date = c(22/06/2001, 
28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001, 
16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), item_code = c(842752, 7800, 842652, 842652, 
842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60, 
5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D, 
1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF, 
TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)), .Names = c(prochi, prescribed_date, dataMonth, item_code, 
res_seqno, quantity, directions, no_of_packs, datasource, 
scan_ref_no), row.names = c(NA, 10L), class = data.frame)

-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

y_data - read.table(feb_267_presc_items_tsf.txt, header = TRUE, sep = |,
quote = \',
dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
nrows = 3864284,
skip = 0, check.names = TRUE,fill=TRUE,
strip.white = TRUE, blank.lines.skip = TRUE,
comment.char = #, allowEscapes = FALSE, flush = FALSE,
fileEncoding = , encoding = unknown) 

Will read the file in same that I have and I have posted the dput.

Sorry for not giving you what you originally wanted...

Natalie
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590826.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
Please provide code that I can just copy from your post and paste into
my session.   Either provide dput output as requested or provide the
files on the internet together with code that reads them off the
internet.

On Fri, Mar 12, 2010 at 12:06 PM, Newbie19_02 nvanzuy...@gmail.com wrote:

 http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt
 feb09_267_presc_items_tsf.txt

 is the total file for y so if I use the command line with the total data for
 y then I get the output specified in z

 Thanks,
 Natalie
 --
 View this message in context: 
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
What  about y_data?

On Fri, Mar 12, 2010 at 12:14 PM, Newbie19_02 nvanzuy...@gmail.com wrote:

 dput(x_data)

 structure(list(prochi = c(CAO713, CAO713, CAO713,
 CAO713, CAO713, CAO713, CAO713, CAO713,
 CAO713, CAO713), prescribed_date = c(22/06/2001,
 28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001,
 16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
 c(NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL), item_code = c(842752, 7800, 842652, 842652,
 842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
 c(NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60,
 5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D,
 1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF,
 TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
 )), .Names = c(prochi, prescribed_date, dataMonth, item_code,
 res_seqno, quantity, directions, no_of_packs, datasource,
 scan_ref_no), row.names = c(NA, 10L), class = data.frame)

 --
 View this message in context: 
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

The y_data file has over 9000 rows in it so I thought it would be more
practical to give you the file to download
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread David Winsemius
You have now given two different assignments to x_data and none to  
y_data:


The str( from the file access offering:

 str(x_data)
'data.frame':   2848 obs. of  5 variables:
 $ item_code   : int  100 110 150 160 161 164 200 210 212 220 ...
 $ name: chr  NEONACLEX K NEONACLEX MESORB  
ABSORBENT CELLULOSE MESO ...

 $ formulation_code: chr  TABS TABS DRESS DRESS ...
 $ strength: chr  NULL 5MG 10CMX10CM 10CMX10CM ...
 $ bnf_code: chr  2.2.8 2.2.1 20.3.1 20.3.1 ...

The str from assignment from the dput offering
 str(x_data)
'data.frame':   10 obs. of  10 variables:
 $ prochi : chr  CAO713 CAO713 CAO713  
CAO713 ...
 $ prescribed_date: chr  22/06/2001 28/04/2000 10/04/2000  
03/07/2000 ...

 $ dataMonth  : chr  NULL NULL NULL NULL ...
 $ item_code  : chr  842752 7800 842652 842652 ...
 $ res_seqno  : chr  NULL NULL NULL NULL ...
 $ quantity   : chr  60 100G 60 60 ...
 $ directions : chr  1/D A/TD 1/D 1/D ...
 $ no_of_packs: chr  NULL NULL NULL NULL ...
 $ datasource : chr  TSF TSF TSF TSF ...
 $ scan_ref_no: chr  NULL NULL NULL NULL ...

This code worked, but it is not clear that the x-y assignments were  
correct:


x_data - read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt 
, header = TRUE, sep = |, quote = \',
   dec = .,as.is = TRUE,na.strings = NA,colClasses =  
NA,

nrows = 3864284,
   skip = 0, check.names = TRUE,fill=TRUE,
   strip.white = TRUE, blank.lines.skip = TRUE,
   comment.char = #, allowEscapes = FALSE, flush = FALSE,
   fileEncoding = , encoding = unknown)

--
David.

On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:



The y_data file has over 9000 rows in it so I thought it would be more
practical to give you the file to download
--
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Newbie19_02

Sorry!  It is the end of the day for me.

So 

dput(x)

structure(list(prochi = c(CAO713, CAO713, CAO713, 
CAO713, CAO713, CAO713, CAO713, CAO713, 
CAO713, CAO713), prescribed_date = c(22/06/2001, 
28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001, 
16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), item_code = c(842752, 7800, 842652, 842652, 
842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60, 
5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D, 
1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF, 
TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)), .Names = c(prochi, prescribed_date, dataMonth, item_code, 
res_seqno, quantity, directions, no_of_packs, datasource, 
scan_ref_no), row.names = c(NA, 10L), class = data.frame)


y_data -
read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt;,
header = TRUE, sep = |, quote = \',

  dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
nrows = 3864284,
  skip = 0, check.names = TRUE,fill=TRUE,
  strip.white = TRUE, blank.lines.skip = TRUE,
  comment.char = #, allowEscapes = FALSE, flush = FALSE,
  fileEncoding = , encoding = unknown)

So the y_data essentially contains the lookup table for the item codes in x.

Thanks and sorry for the mix up
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590849.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Gabor Grothendieck
Yes, its not clear what data was used.  Please provide a single email
that contains code that can be copied from R and pasted into another
session that will reproduce the problem.

x_data - ...dput output goes here...
y_data - dput output goes here ...
library(sqldf)
... rest of code ...

or else:

x_data - read.table(http://...;, ...)
y_data - read.table(http://...;. ...)
library(sqldf)
... rest of code ...

On Fri, Mar 12, 2010 at 12:29 PM, David Winsemius
dwinsem...@comcast.net wrote:
 You have now given two different assignments to x_data and none to y_data:

 The str( from the file access offering:

 str(x_data)
 'data.frame':   2848 obs. of  5 variables:
  $ item_code       : int  100 110 150 160 161 164 200 210 212 220 ...
  $ name            : chr  NEONACLEX K NEONACLEX MESORB ABSORBENT
 CELLULOSE MESO ...
  $ formulation_code: chr  TABS TABS DRESS DRESS ...
  $ strength        : chr  NULL 5MG 10CMX10CM 10CMX10CM ...
  $ bnf_code        : chr  2.2.8 2.2.1 20.3.1 20.3.1 ...

 The str from assignment from the dput offering
 str(x_data)
 'data.frame':   10 obs. of  10 variables:
  $ prochi         : chr  CAO713 CAO713 CAO713 CAO713
 ...
  $ prescribed_date: chr  22/06/2001 28/04/2000 10/04/2000 03/07/2000
 ...
  $ dataMonth      : chr  NULL NULL NULL NULL ...
  $ item_code      : chr  842752 7800 842652 842652 ...
  $ res_seqno      : chr  NULL NULL NULL NULL ...
  $ quantity       : chr  60 100G 60 60 ...
  $ directions     : chr  1/D A/TD 1/D 1/D ...
  $ no_of_packs    : chr  NULL NULL NULL NULL ...
  $ datasource     : chr  TSF TSF TSF TSF ...
  $ scan_ref_no    : chr  NULL NULL NULL NULL ...

 This code worked, but it is not clear that the x-y assignments were
 correct:

 x_data -
 read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt;,
 header = TRUE, sep = |, quote = \',
               dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
 nrows = 3864284,
               skip = 0, check.names = TRUE,fill=TRUE,
               strip.white = TRUE, blank.lines.skip = TRUE,
               comment.char = #, allowEscapes = FALSE, flush = FALSE,
               fileEncoding = , encoding = unknown)

 --
 David.

 On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:


 The y_data file has over 9000 rows in it so I thought it would be more
 practical to give you the file to download
 --
 View this message in context:
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread Natalie Van Zuydam
Sorry!  It is the end of the day for me.

So

dput(x)

structure(list(prochi = c(CAO713, CAO713, CAO713,
CAO713, CAO713, CAO713, CAO713, CAO713,
CAO713, CAO713), prescribed_date = c(22/06/2001,
28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001,
16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
c(NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL), item_code = c(842752, 7800, 842652, 842652,
842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
c(NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60,
5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D,
1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF,
TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL,
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)), .Names = c(prochi, prescribed_date, dataMonth, item_code,
res_seqno, quantity, directions, no_of_packs, datasource,
scan_ref_no), row.names = c(NA, 10L), class = data.frame)


y_data -
read.table(file=
http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt;,
header = TRUE, sep = |, quote = \',

 dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
nrows = 3864284,
 skip = 0, check.names = TRUE,fill=TRUE,
 strip.white = TRUE, blank.lines.skip = TRUE,
 comment.char = #, allowEscapes = FALSE, flush = FALSE,
 fileEncoding = , encoding = unknown)

So the y_data essentially contains the lookup table for the item codes in x.

require(sqldf)
z - sqldf(select * from x left join y using (item_code))
z

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf not joining all the fields

2010-03-12 Thread David Winsemius

If I assign the file input to y_data and change you sqldf to

 z  - sqldf(select * from x_data left join y_data using  
(item_code)); z


I can replicate your result. Even after changing the types of the two  
item_code fields to match I still get the same result and when I see  
to what degree they share values I get:


 sum(x_data$item_code %in% y_data$item_code)
[1] 2
 sum(y_data$item_code %in% x_data$item_code)
[1] 2


So why are you so sure they are complete as you claimed in your  
first email.


--
David.


On Mar 12, 2010, at 12:29 PM, David Winsemius wrote:

You have now given two different assignments to x_data and none to  
y_data:


The str( from the file access offering:

 str(x_data)
'data.frame':   2848 obs. of  5 variables:
$ item_code   : int  100 110 150 160 161 164 200 210 212 220 ...
$ name: chr  NEONACLEX K NEONACLEX MESORB  
ABSORBENT CELLULOSE MESO ...

$ formulation_code: chr  TABS TABS DRESS DRESS ...
$ strength: chr  NULL 5MG 10CMX10CM 10CMX10CM ...
$ bnf_code: chr  2.2.8 2.2.1 20.3.1 20.3.1 ...

The str from assignment from the dput offering
 str(x_data)
'data.frame':   10 obs. of  10 variables:
$ prochi : chr  CAO713 CAO713 CAO713  
CAO713 ...
$ prescribed_date: chr  22/06/2001 28/04/2000 10/04/2000  
03/07/2000 ...

$ dataMonth  : chr  NULL NULL NULL NULL ...
$ item_code  : chr  842752 7800 842652 842652 ...
$ res_seqno  : chr  NULL NULL NULL NULL ...
$ quantity   : chr  60 100G 60 60 ...
$ directions : chr  1/D A/TD 1/D 1/D ...
$ no_of_packs: chr  NULL NULL NULL NULL ...
$ datasource : chr  TSF TSF TSF TSF ...
$ scan_ref_no: chr  NULL NULL NULL NULL ...

This code worked, but it is not clear that the x-y assignments  
were correct:


x_data - read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt 
, header = TRUE, sep = |, quote = \',
  dec = .,as.is = TRUE,na.strings = NA,colClasses =  
NA,

nrows = 3864284,
  skip = 0, check.names = TRUE,fill=TRUE,
  strip.white = TRUE, blank.lines.skip = TRUE,
  comment.char = #, allowEscapes = FALSE, flush = FALSE,
  fileEncoding = , encoding = unknown)

--
David.

On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:



The y_data file has over 9000 rows in it so I thought it would be  
more

practical to give you the file to download
--
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf

2009-11-19 Thread JoK LoQ

Hello,

   I would like some help with sqldf syntax.
   
   Suppose I have table 1 and table 2.
   What do I have to do to generate a table with columns 2,5,6 from table 1
(for example), and columns 3,4,5,9 from table 2, but only when values in
column 2 from table 1 are equal to values in column 5 from table 2. 

tnks 
-- 
View this message in context: 
http://old.nabble.com/sqldf-tp26421449p26421449.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] sqldf

2009-11-19 Thread Gabor Grothendieck
Google for sql join and see the examples in Example 4 on the sqldf home page:
http://code.google.com/p/sqldf/#Example_4._Join


On Thu, Nov 19, 2009 at 2:30 PM, JoK LoQ jok...@gmail.com wrote:

 Hello,

   I would like some help with sqldf syntax.

   Suppose I have table 1 and table 2.
   What do I have to do to generate a table with columns 2,5,6 from table 1
 (for example), and columns 3,4,5,9 from table 2, but only when values in
 column 2 from table 1 are equal to values in column 5 from table 2.

 tnks
 --
 View this message in context: 
 http://old.nabble.com/sqldf-tp26421449p26421449.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sqldf with date (class date) variables ?

2009-01-06 Thread si.fu
Hello,

I am having trouble with retrieving some data in queries involve with variables
with data type date. See the enclosed example:

ll-c(21DEC2006,15DEC2006)
ss-data.frame(ll)
ss-transform(ss,ll=as.date(as.character(ll)))
(ss)
 ll
1 17156
2 17150

tt-sqldf(select ll from ss)

(tt)

  ll
1 NA
2 NA

 str(ss)
'data.frame':   2 obs. of  1 variable:
 $ ll:Class 'date'  int [1:2] 17156 17150

 str(tt)
'data.frame':   2 obs. of  1 variable:
 $ ll:Class 'date'  int [1:2] NA NA

So all the elements of ll (of tt) has been converted into NA. Is there anyway 
that I can query the variables with date data type using sqldf?


Many Thanks,

Si.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >