subject:"Re\: \[R\] sqldf"

Thank you for indicating that SQLite may not handle a file as big as 160 GB.

Would you know of any utility for *physically splitting *the 160 GB text
file into pieces. And if one can control the splitting at the  end of a
record.

Thank you again.
HC

--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4354285.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Gabor Grothendieck

On Fri, Feb 3, 2012 at 6:03 AM, HC hca...@yahoo.co.in wrote:
 Thank you for indicating that SQLite may not handle a file as big as 160 GB.

 Would you know of any utility for *physically splitting *the 160 GB text
 file into pieces. And if one can control the splitting at the  end of a
 record.


If they are csv files or similar data files then you could use R or
any scripting language to do that.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Steve Lianoglou

On Fri, Feb 3, 2012 at 7:37 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Fri, Feb 3, 2012 at 6:03 AM, HC hca...@yahoo.co.in wrote:
 Thank you for indicating that SQLite may not handle a file as big as 160 GB.

 Would you know of any utility for *physically splitting *the 160 GB text
 file into pieces. And if one can control the splitting at the  end of a
 record.


 If they are csv files or similar data files then you could use R or
 any scripting language to do that.

Or even the *nix `split` command ...

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

This is a 160 GB tab-separated .txt file. It has 9 columns and 3.25x10^9
rows.

Can R handle it?  

Thank you.
HC



--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4354556.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread Gabor Grothendieck

On Fri, Feb 3, 2012 at 8:08 AM, HC hca...@yahoo.co.in wrote:
 This is a 160 GB tab-separated .txt file. It has 9 columns and 3.25x10^9
 rows.

 Can R handle it?


You can process a file N lines at time like this:

con - file(myfile.dat, r)
while(length(Lines - readLines(con, n = N))  0) {
  ... whatever...
}

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

Thank you.

The readLines command is working fine and I am able to read 10^6 lines in
one go and write them using the write.table command.

Does this readLines command using a block concept to optimize or goes line
by line?

Steve has mentioned about *nix and split commands. Would there be any speed
benefit as compared to readLines?

Thank you.
HC

--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355362.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

Bad news!

The readLines command works fine upto a certain limit. Once a few files have
been written the R program crashes.

I used the following code:
*
iFile-Test.txt
con - file(iFile, r)

N-125; 
iLoop-1
 
while(length(Lines - readLines(con, n = N))  0  iLoop41) { 
oFile-paste(Split_,iLoop,.txt,sep=)
  write.table(Lines, oFile, sep = \t, quote = FALSE, col.names= FALSE,
row.names = FALSE)
  iLoop-iLoop+1
} 
close(con)


With above N=1.25 million, it wrote 28 files of about 57 mb each. That is a
total of about 1.6 GB and then crashed.
I tried with other values on N and it crashes at about the same place in
terms of total size output, i.e., about 1.6 GB.

Is this due to any limitation of Windows 7, in terms of not having the
pointer after this size?

Your insight would be very helpful.

Thank you.
HC






--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355679.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-03 Thread jim holtman

Exactly what does crashed mean? What was the error message? How
you tried to put:

rm(Lines)
gc()

at the end of the loop to free up and compact memory? If you watch
the performance, does the R process seem to be growing in terms of the
amount of memory that is being used? You can add:

memory.size()

before the above statements to see how much memory is being used.
This is just some more elementary debugging that you will have to
learn when using any system.

On Fri, Feb 3, 2012 at 3:22 PM, HC hca...@yahoo.co.in wrote:
Bad news!

The readLines command works fine upto a certain limit. Once a few files have
been written the R program crashes.

I used the following code:
*
iFile-Test.txt
con - file(iFile, r)

N-125;
iLoop-1

while(length(Lines - readLines(con, n = N)) 0 iLoop41) {
oFile-paste(Split_,iLoop,.txt,sep=)
write.table(Lines, oFile, sep = \t, quote = FALSE, col.names= FALSE,
row.names = FALSE)
iLoop-iLoop+1
}
close(con)

With above N=1.25 million, it wrote 28 files of about 57 mb each. That is a
total of about 1.6 GB and then crashed.
I tried with other values on N and it crashes at about the same place in
terms of total size output, i.e., about 1.6 GB.

Is this due to any limitation of Windows 7, in terms of not having the
pointer after this size?

Your insight would be very helpful.

Thank you.
HC

--
View this message in context:
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4355679.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread Gabor Grothendieck

On Wed, Feb 1, 2012 at 11:57 PM, HC hca...@yahoo.co.in wrote:
 Hi All,

 I have a very (very) large tab-delimited text file without headers. There
 are only 8 columns and millions of rows. I want to make numerous pieces of
 this file by sub-setting it for individual stations. Station is given as in
 the first column. I am trying to learn and use sqldf package for this but am
 stuck in a couple of places.

 To simulate my requirement, I have taken iris dataset as an example and have
 done the following:
 (1) create a tab-delimited file without headers.
 (2) read it using read.csv.sql command
 (3) write the result of a query, getting first 10 records

 Here is the reproducible code that I am trying:
 # Text data file
 write.table(iris, irisNoH.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)
 # create an empty database (can skip this step if database already exists)
 sqldf(attach myTestdbT as new)
 f1-file(irisNoH.txt)
 attr(f1, file.format) - list(header=FALSE,sep=\t)
 # read into table called irisTab in the mytestdb sqlite database
 read.csv.sql(irisNoH.txt, sql = create table main.irisTab1 as select *
 from file, dbname = mytestdb)
 res1-sqldf(select * from main.irisTab1 limit 10, dbname = mytestdb)
 write.table(res1, iris10.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)

 # For querying records of a particular species - unresolved problems
 #a1-virginica
 #attr(f1, names) - c(A1,A2,A3,A4,A5)
 #res2-fn$sqldf(select * from main.irisTab1 where A5 = '$a1')

 In the above, I am not able to:
 (1) assign the names to various columns
 (2) query for particular value of a column; in this case for particular
 species, say virginica
 (3) I guess fn$sqldf can do the job but it requires assigning column names

 Any help would be most appreciated.


Ignoring your iris file for a moment, to query the 5th column (getting
its name via sql rather than via R) we can do this:

library(sqldf)
species - virginica
nms - names(dbGetQuery(con, select * from iris limit 0))
fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

Now, sqldf is best used when you are getting the data from R but if
you want to store it in a database and just leave it there then you
might be better off using RSQLite directly like this (the eol = \r\n
in the dbWriteTable statement was needed on my Windows system but you
may not need that depending on your platform):


write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names
= FALSE, row.names = FALSE)

library(sqldf)
library(RSQLite)

con - dbConnect(SQLite(), dbname = mytestdb)

dbWriteTable(con, iris, irisNoH.txt, sep = \t, eol = \r\n)

species - virginica
nms - names(dbGetQuery(con, select * from iris limit 0))
fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

dbDisconnect(con)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread Gabor Grothendieck

On Thu, Feb 2, 2012 at 3:11 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Wed, Feb 1, 2012 at 11:57 PM, HC hca...@yahoo.co.in wrote:
 Hi All,

 I have a very (very) large tab-delimited text file without headers. There
 are only 8 columns and millions of rows. I want to make numerous pieces of
 this file by sub-setting it for individual stations. Station is given as in
 the first column. I am trying to learn and use sqldf package for this but am
 stuck in a couple of places.

 To simulate my requirement, I have taken iris dataset as an example and have
 done the following:
 (1) create a tab-delimited file without headers.
 (2) read it using read.csv.sql command
 (3) write the result of a query, getting first 10 records

 Here is the reproducible code that I am trying:
 # Text data file
 write.table(iris, irisNoH.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)
 # create an empty database (can skip this step if database already exists)
 sqldf(attach myTestdbT as new)
 f1-file(irisNoH.txt)
 attr(f1, file.format) - list(header=FALSE,sep=\t)
 # read into table called irisTab in the mytestdb sqlite database
 read.csv.sql(irisNoH.txt, sql = create table main.irisTab1 as select *
 from file, dbname = mytestdb)
 res1-sqldf(select * from main.irisTab1 limit 10, dbname = mytestdb)
 write.table(res1, iris10.txt, sep = \t, quote = FALSE,
 col.names=FALSE,row.names = FALSE)

 # For querying records of a particular species - unresolved problems
 #a1-virginica
 #attr(f1, names) - c(A1,A2,A3,A4,A5)
 #res2-fn$sqldf(select * from main.irisTab1 where A5 = '$a1')

 In the above, I am not able to:
 (1) assign the names to various columns
 (2) query for particular value of a column; in this case for particular
 species, say virginica
 (3) I guess fn$sqldf can do the job but it requires assigning column names

 Any help would be most appreciated.


 Ignoring your iris file for a moment, to query the 5th column (getting
 its name via sql rather than via R) we can do this:

 library(sqldf)
 species - virginica
 nms - names(dbGetQuery(con, select * from iris limit 0))
 fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

 Now, sqldf is best used when you are getting the data from R but if
 you want to store it in a database and just leave it there then you
 might be better off using RSQLite directly like this (the eol = \r\n
 in the dbWriteTable statement was needed on my Windows system but you
 may not need that depending on your platform):


 write.table(iris, irisNoH.txt, sep = \t, quote = FALSE, col.names
 = FALSE, row.names = FALSE)

 library(sqldf)
 library(RSQLite)

 con - dbConnect(SQLite(), dbname = mytestdb)

 dbWriteTable(con, iris, irisNoH.txt, sep = \t, eol = \r\n)

 species - virginica
 nms - names(dbGetQuery(con, select * from iris limit 0))
 fn$dbGetQuery(con, select * from iris where `nms[5]` = '$species' limit 3)

 dbDisconnect(con)

There seems to have been a pasting error here.  The first part was
intended to show how to do this using sqldf and the second using
RSQLite.Thus the first part was intended to be:

library(sqldf)
species - virginica

# obviously we could just do nms - names(iris) but to get
# names from database instead
nms - names(dbGetQuery(con, select * from iris limit 0))

# use 5th column
fn$sqldf(select * from iris where `nms[5]` = '$species' limit 3)


and the second part that illustrates RSQLite was ok.  Note that fn$
comes from the gsubfn package which sqldf loads.





-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread HC

Hi Gabor,

Thank you very much for your guidance and help.

I could run the following code successfully on a 500 mb test data file. A
snapshot of the data file is attached herewith.

code start***
library(sqldf)
library(RSQLite)

iFile-Test100.txt
con - dbConnect(SQLite(),dbname = myTest100)
dbWriteTable(con, TestDB100, iFile, sep = \t) #, eol = \r\n)
nms - names(dbGetQuery(con, select * from TestDB100 limit 0))

nRec-fn$dbGetQuery(con, select count(*)from TestDB100)
aL1-1;

while (aL1=nRec){
res1-fn$dbGetQuery(con, select * from (select * from TestDB100 limit
'$aL1',1))
istn-res1[1,1]
res1-fn$dbGetQuery(con, select * from TestDB100 where `nms[1]` = '$istn')
icount-dim(res1)[1]
oFile-paste(istn,_Test.txt,sep=)
write.table(res1, oFile, sep = \t, quote = FALSE, col.names= FALSE,
row.names = FALSE)
aL1-aL1+icount
}
dbDisconnect(con)
code end***

However, the actual data file that I want to handle is about *160 GB*. And
when I use the same above code on that file, it gives following error for
dbWriteTable(con, ...) statement
error start**
dbWriteTable(con, TestDB, iFile, sep = \t) #, eol = \r\n)
Error in try({ : RS-DBI driver: (RS_sqlite_getline could not realloc)
[1] FALSE
error end**

I am not sure about the reason of this error. Is this due to the big file
size? I understood from sqldf webpage that SQLite can work for even a larger
file than this and is only restricted by the disc space and not RAM. I have
about 400GB free space on the PC I am using, with Windows 7 as the operating
system. I am assuming that the about dbWriteTable command is using the disc
memory only and is not the issue.

In fact this file has been created using MySQLdump and I do not have access
to the original MYSQL database file. 
I want to know the following:
(1)  Am I missing something in the above code that is preventing handling of
this big 160  GB file?
(2)  Should this be handled outside of R, if R is becoming a limitation in
this? And if yes then what is a possible way forward?

Thank you again for your quick response and all the help.
HC
http://r.789695.n4.nabble.com/file/n4353362/Test100.txt Test100.txt 
 





--
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-for-Very-Large-Tab-Delimited-Files-tp4350555p4353362.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf for Very Large Tab Delimited Files

2012-02-02 Thread Gabor Grothendieck

On Thu, Feb 2, 2012 at 8:07 PM, HC hca...@yahoo.co.in wrote:
 Hi Gabor,

 Thank you very much for your guidance and help.

 I could run the following code successfully on a 500 mb test data file. A
 snapshot of the data file is attached herewith.

 code start***
 library(sqldf)
 library(RSQLite)

 iFile-Test100.txt
 con - dbConnect(SQLite(),dbname = myTest100)
 dbWriteTable(con, TestDB100, iFile, sep = \t) #, eol = \r\n)
 nms - names(dbGetQuery(con, select * from TestDB100 limit 0))

 nRec-fn$dbGetQuery(con, select count(*)from TestDB100)
 aL1-1;

 while (aL1=nRec){
 res1-fn$dbGetQuery(con, select * from (select * from TestDB100 limit
 '$aL1',1))
 istn-res1[1,1]
 res1-fn$dbGetQuery(con, select * from TestDB100 where `nms[1]` = '$istn')
 icount-dim(res1)[1]
 oFile-paste(istn,_Test.txt,sep=)
 write.table(res1, oFile, sep = \t, quote = FALSE, col.names= FALSE,
 row.names = FALSE)
 aL1-aL1+icount
 }
 dbDisconnect(con)
 code end***

 However, the actual data file that I want to handle is about *160 GB*. And
 when I use the same above code on that file, it gives following error for
 dbWriteTable(con, ...) statement
 error start**
 dbWriteTable(con, TestDB, iFile, sep = \t) #, eol = \r\n)
 Error in try({ : RS-DBI driver: (RS_sqlite_getline could not realloc)
 [1] FALSE
 error end**

 I am not sure about the reason of this error. Is this due to the big file
 size? I understood from sqldf webpage that SQLite can work for even a larger
 file than this and is only restricted by the disc space and not RAM. I have
 about 400GB free space on the PC I am using, with Windows 7 as the operating
 system. I am assuming that the about dbWriteTable command is using the disc
 memory only and is not the issue.

 In fact this file has been created using MySQLdump and I do not have access
 to the original MYSQL database file.
 I want to know the following:
 (1)  Am I missing something in the above code that is preventing handling of
 this big 160  GB file?
 (2)  Should this be handled outside of R, if R is becoming a limitation in
 this? And if yes then what is a possible way forward?

 Thank you again for your quick response and all the help.
 HC
 http://r.789695.n4.nabble.com/file/n4353362/Test100.txt Test100.txt


I think its unlikely SQLite could handle a database that large unless
you can divide it into multiple separate databases.  At one time the
SQLite site said it did not handle databases over 1 GB and although I
think that is outdated by more recent versions of SQLite its still
likely true that your size is too large for it.


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.

2012-01-22 Thread Gabor Grothendieck

On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth gvfa...@gmail.com wrote:
 I've been using sqldf heavily lately but have encountered problems
 with ordering of observations or calculating statistics such as max()
 and min() when the variable used is of class Date.

 For example, if I run the following code:

 === begin code =
 library(sqldf)
 A-data.frame(Dates=as.Date(c(1994-02-14,1977-02-23,2001-09-18,2009-08-01)),Ret=rnorm(4))
 OrderedA-sqldf('select * from A order by Dates')
 MaxA-sqldf('select max(Dates) as Dates from A')[1,1]
 MinA-sqldf('select min(Dates) as Dates from A')[1,1]
 === end code =

 Then the result is this:

 A
       Dates        Ret
 1 1994-02-14  1.2414706
 2 1977-02-23 -0.7728146
 3 2001-09-18  1.2551331
 4 2009-08-01 -0.2538359

 OrderedA
       Dates        Ret
 1 2001-09-18  1.2551331
 2 2009-08-01 -0.2538359
 3 1977-02-23 -0.7728146
 4 1994-02-14  1.2414706

 MaxA
 [1] 1994-02-14

 MinA
 [1] 2001-09-18

 Completely wrong order, no warnings issued, and the summary stats are
 wrong as well (but consistent with the ordering).

 According to the sqldf manual found at the following URL

 http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables?

 this type of query should work correctly.  Any clue why it is not
 doing so?  User error or bug?


You are using an old version.  Update to latest version of R and sqldf.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.

2012-01-22 Thread Grant Farnsworth

On Mon, Jan 23, 2012 at 12:46 AM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth gvfa...@gmail.com wrote:
 I've been using sqldf heavily lately but have encountered problems
 with ordering of observations or calculating statistics such as max()
 and min() when the variable used is of class Date.

 For example, if I run the following code:

 === begin code =
 library(sqldf)
 A-data.frame(Dates=as.Date(c(1994-02-14,1977-02-23,2001-09-18,2009-08-01)),Ret=rnorm(4))
 OrderedA-sqldf('select * from A order by Dates')
 MaxA-sqldf('select max(Dates) as Dates from A')[1,1]
 MinA-sqldf('select min(Dates) as Dates from A')[1,1]
 === end code =

 Then the result is this:

 A
       Dates        Ret
 1 1994-02-14  1.2414706
 2 1977-02-23 -0.7728146
 3 2001-09-18  1.2551331
 4 2009-08-01 -0.2538359

 OrderedA
       Dates        Ret
 1 2001-09-18  1.2551331
 2 2009-08-01 -0.2538359
 3 1977-02-23 -0.7728146
 4 1994-02-14  1.2414706

 MaxA
 [1] 1994-02-14

 MinA
 [1] 2001-09-18

 Completely wrong order, no warnings issued, and the summary stats are
 wrong as well (but consistent with the ordering).

 According to the sqldf manual found at the following URL

 http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables?

 this type of query should work correctly.  Any clue why it is not
 doing so?  User error or bug?


 You are using an old version.  Update to latest version of R and sqldf.

 --
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com


Thanks, that worked.  Known bug, I guess.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf and not converting integers to floating point in SQLite

2012-01-03 Thread jim holtman

try this:

 library(sqldf)
 table1 - read.csv(text = POSTAL | VALUE
+ 1000|49
+ 1010|100
+ 1020|50, sep=|)
 table2 - read.csv(text = INSEE | POSTAL
+ A|1000
+ B|1000
+ C|1010
+ D|1020, sep=|)
 table3 - sqldf(
+ select table2.INSEE
+ , 1.0 * table1.VALUE / counts.nPostals as value_spread
+ from table1
+ , table2
+ ,(select POSTAL
+ , count(INSEE) as nPostals
+ from table2
+ group by POSTAL) counts
+ where table1.POSTAL = counts.POSTAL
+ and table1.POSTAL=table2.POSTAL
+ )
 table3
  INSEE value_spread
1 A 24.5
2 B 24.5
3 C100.0
4 D 50.0



On Tue, Jan 3, 2012 at 3:13 PM, Frederik Vanrenterghem
frede...@vanrenterghem.biz wrote:
 Hi,

 I have following 2 tables:

 Table 1:
 POSTAL | VALUE
 1000|49
 1010|100
 1020|50

 Table 2:
 INSEE | POSTAL
 A|1000
 B|1000
 C|1010
 D|1020

 I would like to convert this to the following:

 INSEE | VALUE_SPREAD
 A|24.5
 B|24.5
 C|100
 D|50

 I can achieve this with a nested SQL query (through counting the
 number of POSTAL that belong to any given INSEE, and diving the value
 of the postal in that INSEE by that number).

 library(sqldf)
 table1 - read.csv(c:/R/table1.csv, sep=;)
 table2 - read.csv(c:/R/table2.csv, sep=;)
 table3 - sqldf(select table2.INSEE, table1.VALUE / counts.nPostals
 as value_spread from table1, table2,(select POSTAL, count(INSEE) as
 nPostals from table2 group by POSTAL) counts where table1.POSTAL =
 counts.POSTAL and table1.POSTAL=table2.POSTAL)

 Unfortunately, the value I'm working with is an integer. In SQLite,
 this results in the computed value also not being a float - so it gets
 rounded up or down. In this case, I'm getting 24 for A  B instead of
 24.5.

 Is there a way to take care of this using other R concepts, avoiding
 that problem (for instance using melt  cast)?

 Thanks,
 Frederik

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf if iif

2011-11-26 Thread Jeff Newmiller

sqldf uses the SQLite database by default for backend processing. The iif 
function is specific to the Jet database engine syntax (which underlies MS 
Access). You could read up on SQLite syntax, or you could avoid using 
nonstandard SQL syntax, retrieve the data into a data frame, and use R code to 
do your logical merging into one column.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Carlos Rivera limnoriv...@gmail.com wrote:

Dear all,

 

I have problems with iif function using sqldf library.

I counted abundance (Num) of different SPECIES in two moments (esf)
saving
the information in two Tables (esf50, esf100):

esf50

SAMPLE  SPECIES  Num esf

1289diso1   44  50

1289diso2   5 50

1289diso3   1 50

diso1   44  50

diso2   5 50

diso3   1 50

   

esf100

SAMPLE  SPECIES  Num esf

1289diso1   82  100

1289diso2   13  100

1289diso3   2 100

1289diso4   3 100

diso1   82  100

diso2   13  100

diso3   2 100

diso4   3 100

 

I would like subtract column Num between the two moments considering
only
the changes, therefore I use the conditional if:

 

var100-sqldf(select esf100.SAMPLE, esf100.SPECIES, esf100.Num,
esf100.esf,

  iif esf100.Num - esf50.Num =0, esf100.Num-esf50.Num,
esf100.Num as PIPAS 

   from esf100 left join esf50 on esf100.SAMPLE =
esf50.SAMPLE

   and esf100.SPECIES = esf50.SPECIES)

 

I think the structure is right because the SQL query run ok in Access.
Is
the if syntax the problems?

 

Thank in advanced.

 

Best wishes,

 

Carlos Rivera

 

 

 

 


   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf syntax, selecting rows, and skipping

2011-09-29 Thread David Winsemius



On Sep 29, 2011, at 10:06 AM, Juliet Hannah wrote:


I am using the example in this post:

https://stat.ethz.ch/pipermail/r-help/2010-October/257204.html

# create a file
write.table(iris,iris.csv,row.names=FALSE,sep=,,quote=FALSE)


# this does not work
# has the syntax changed or  is there a mistake in my usage?
# the line from the post above is:
#  read.csv.sql(myfile.csv, sql = select * from file 2000, 1000)


You didn't read to the end of that thread. The two errors above were  
corrected.




library(sqldf)
read.csv.sql(iris.csv, sql = select * from file 5, 5)

# this works
# but i would like to keep the header

read.csv.sql(iris.csv, sql = select * from file limit
5,skip=5,header=FALSE)

# thanks



sessionInfo()

R version 2.13.1 (2011-07-08)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
States.1252LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] tcltk stats graphics  grDevices utils datasets
methods   base

other attached packages:
[1] sqldf_0.4-2   chron_2.3-42  gsubfn_0.5-7
proto_0.3-9.2 RSQLite.extfuns_0.0.1 RSQLite_0.9-4
DBI_0.2-5 myfunctions_1.0

loaded via a namespace (and not attached):
[1] tools_2.13.1

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQldf with sqlite and H2

2011-07-14 Thread Gabor Grothendieck

On Thu, Jul 14, 2011 at 10:33 AM, Mandans mandan...@yahoo.com wrote:
 SQldf with sqlite and H2

 I have a large csv file (about 2GB) and wanted to import the file into R and 
 do some filtering and analysis. Came across sqldf ( a great idea and product) 
 and was trying to play around to see what would be the best method of doing 
 this. csv file is comma delimited with some columns having comma inside the 
 quoation like this John, Doe.

 I tried this first

 ###
 library(sqldf)
 sqldf(attach testdb as new)
 In.File - C:/JP/Temp/2008.csv
 read.csv.sql(In.File, sql = create table table1 as select * from file,
  dbname = testdb)

 It errored out with message

 NULL
 Warning message:
 closing unused connection 3 (C:/JP/Temp/2008.csv)

 When this failed, I converted this file from comma delimited to tab delimited 
 and used this command

 #
 read.csv.sql(In.File, sql = create table table1 as select * from file,
  dbname = testdb, sep = \t)

 and this worked, it created testdb sqlite file with the size of 3GB

 now my question is in 3 parts.

 1. Is it possible to create a dataframe with appropriate column classes and 
 use that column classes when I use the read.csv.sql command to create the 
 table. Something like may be create the table from that DF and then update 
 with read.csv.sql.?

 Any example code will be really helpful.

Here is an example of using method = name__class.  Note there are
two underscores in a row.  It appears I neglected to document that
Date2 means convert from character representation whereas Date means
convert from numeric representation.  It would also be possible to use
method = raw and then coerce the columns yourself afterwards.

# create test file
Lines - 'A__Date2|B
2000-01-01|x,y
2000-01-02|c,d
'
tf - tempfile()
cat(Lines, file = tf)


library(sqldf)
DF - read.csv.sql(tf, sep = |, method = name__class)
str(DF)


 2. If we use the H2 database instead of default sqlite and use the readcsv 
 option, will that be faster and is there a way we can specify the above 
 thought of applying a DF class to table column properties and update with 
 CSVREAD

 library(RH2)
 something like SELECT * FROM CSVREAD('C:/JP/Temp/2008.csv')

 Any example code will be really helpful.

Sorry, I haven't tested the speed of this.  postgresql and mysql, both
supported by sqldf, also have builtin methods to read files. If I had
to guess I would guess that mysql would be fastest but this would have
to be tested.


 3. How do we specify where the H2 file is saved. Saw something like this, 
 when I ran this example from RH2 package, couldn't find the file in the 
 working directory.

 con - dbConnect(H2(), jdbc:h2:~/test, sa, )

~ means your home directory so ~/test means test is in the home directory.

Try

normalizePath(~)
normalizePath(~/test)
etc.

to see what they refer to.

Regards.


 Sorry for the long mail. Appreciate all for building a great community and 
 for the wonderful software in R.
 Thanks for Gabor Grothendieck for bring sqldf to this great community.

 Any help or direction you can provide in this is highly appreciated.

 Thanks all.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQldf with sqlite and H2

2011-07-14 Thread Mandans

Thanks a lot Gabor. It helped a lot. Appreciate your time and effort.

Thanks

--- On Thu, 7/14/11, Gabor Grothendieck ggrothendi...@gmail.com wrote:

 From: Gabor Grothendieck ggrothendi...@gmail.com
 Subject: Re: [R] SQldf with sqlite and H2
 To: Mandans mandan...@yahoo.com
 Cc: r-help@r-project.org
 Date: Thursday, July 14, 2011, 2:22 PM
 On Thu, Jul 14, 2011 at 10:33 AM,
 Mandans mandan...@yahoo.com
 wrote:
  SQldf with sqlite and H2

  I have a large csv file (about 2GB) and wanted to
 import the file into R and do some filtering and analysis.
 Came across sqldf ( a great idea and product) and was trying
 to play around to see what would be the best method of doing
 this. csv file is comma delimited with some columns having
 comma inside the quoation like this John, Doe.

  I tried this first

  ###
  library(sqldf)
  sqldf(attach testdb as new)
  In.File - C:/JP/Temp/2008.csv
  read.csv.sql(In.File, sql = create table table1 as
 select * from file,
   dbname = testdb)

  It errored out with message

  NULL
  Warning message:
  closing unused connection 3 (C:/JP/Temp/2008.csv)

  When this failed, I converted this file from comma
 delimited to tab delimited and used this command

  #
  read.csv.sql(In.File, sql = create table table1 as
 select * from file,
   dbname = testdb, sep = \t)

  and this worked, it created testdb sqlite file with
 the size of 3GB

  now my question is in 3 parts.

  1. Is it possible to create a dataframe with
 appropriate column classes and use that column classes when
 I use the read.csv.sql command to create the table.
 Something like may be create the table from that DF and then
 update with read.csv.sql.?

  Any example code will be really helpful.

 Here is an example of using method = name__class. 
 Note there are
 two underscores in a row.  It appears I neglected to
 document that
 Date2 means convert from character representation whereas
 Date means
 convert from numeric representation.  It would also be
 possible to use
 method = raw and then coerce the columns yourself
 afterwards.

 # create test file
 Lines - 'A__Date2|B
 2000-01-01|x,y
 2000-01-02|c,d
 '
 tf - tempfile()
 cat(Lines, file = tf)

 library(sqldf)
 DF - read.csv.sql(tf, sep = |, method =
 name__class)
 str(DF)

  2. If we use the H2 database instead of default sqlite
 and use the readcsv option, will that be faster and is there
 a way we can specify the above thought of applying a DF
 class to table column properties and update with CSVREAD

  library(RH2)
  something like SELECT * FROM
 CSVREAD('C:/JP/Temp/2008.csv')

  Any example code will be really helpful.

 Sorry, I haven't tested the speed of this.  postgresql
 and mysql, both
 supported by sqldf, also have builtin methods to read
 files. If I had
 to guess I would guess that mysql would be fastest but this
 would have
 to be tested.

  3. How do we specify where the H2 file is saved. Saw
 something like this, when I ran this example from RH2
 package, couldn't find the file in the working directory.

  con - dbConnect(H2(), jdbc:h2:~/test, sa, )

 ~ means your home directory so ~/test means test is in the
 home directory.

 Try

 normalizePath(~)
 normalizePath(~/test)
 etc.

 to see what they refer to.

 Regards.

  Sorry for the long mail. Appreciate all for building a
 great community and for the wonderful software in R.
  Thanks for Gabor Grothendieck for bring sqldf to this
 great community.

  Any help or direction you can provide in this is
 highly appreciated.

  Thanks all.

  __
  R-help@r-project.org
 mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained,
 reproducible code.

 -- 
 Statistics  Software Consulting
 GKX Group, GKX Associates Inc.
 tel: 1-877-GKX-GROUP
 email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sqldf INSERT INTO

2011-04-20 Thread Gabor Grothendieck

On Wed, Apr 20, 2011 at 12:39 PM, new2R bv_agr...@yahoo.co.in wrote:
  Hi,

 I am new to R and trying to migrate from SAS. I am trying to copy data from
 one table to another table which have same columns using sqldf. but not
 working and showing NULL

 I wrote statement as sqldf(INSERT INTO new select * from data) but showing
 NULL

 Please help me in this regard.


In your example new is a table in the sqlite database, not in R's
workspace, so you have to return it:

 library(sqldf)
 BOD
  Time demand
118.3
22   10.3
33   19.0
44   16.0
55   15.6
67   19.8
 New - BOD[1, ]
 BOD1 - BOD[2:3,]
 sqldf(c(insert into New select * from BOD1, select * from New))
  Time demand
118.3
22   10.3
33   19.0


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQLDF syntax

2011-04-19 Thread Gabor Grothendieck

On Mon, Apr 18, 2011 at 6:34 PM, new2R bv_agr...@yahoo.co.in wrote:
 Hi,

 I am new to R and trying to migrate from SAS. I am trying to use sqldf to
 create a new table from existed table and change some of the columns. I have
 table called DataOld with columns commodity, rate and total and I am trying
 to create new table called DataNew with columns commodity, ratenew and
 totalNew.

 sqldf(create table datanew as select commodity, ratenew as rate * 10,
 totalnew as total *10 from DataOld)

 I got error message  Error in sqliteExecStatement(con, statement,
 bind.data) :
  RS-DBI driver: (error in statement: near *: syntax error)


Its expression as name, not name as expression.  Try this:

 library(sqldf)
 BODnew - sqldf(select demand, Time, demand + 1 as demandPlusOne from BOD)
 BODnew
  demand Time demandPlusOne
18.31   9.3
2   10.32  11.3
3   19.03  20.0
4   16.04  17.0
5   15.65  16.6
6   19.87  20.8

For more, the sqldf home page at http://sqldf.googlecode.com has links
to sqlite site where you can find sql syntax diagrams. See the links
along the left side of the page there.




-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQLDF syntax

2011-04-19 Thread new2R

Thank you very much. Its working.

--
View this message in context: 
http://r.789695.n4.nabble.com/SQLDF-syntax-tp3458919p3460448.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQLDF - Submitting Queries with R Objects as Columns

2011-03-09 Thread Gabor Grothendieck

On Wed, Mar 9, 2011 at 11:41 AM, Mike Schumacher
mike.schumac...@gmail.com wrote:
 Fellow R programmers,

 I'd like to submit SQLDF statements with R objects as column names.

 For example, I want to assign X to var1  (var1-X) and then refer to
 var1 in the SQLDF statement.  SQLDF needs to understand that when I
 reference var1, it should look for X in the dataframe.

 This is necessary because my SQLDF is part of a larger function that I call
 that repeatedly with different column names.

 Code below... thank you in advance!

 Mike


 library(sqldf)

 testdf-data.frame(c(1,2,3,4,5,6,7,8,9,10),c(1,1,1,2,2,2,3,3,3,3))
 names(testdf)-c(X,Y)

 # Works as intended
 sqldf(select sum(X) as XSUM,
       Y             as Y
       from testdf
       group by Y)

 # Now... can I reference var1 in the code?
 var1-X


Here are two ways:

sqldf(sprintf(select sum(%s) XSUM, Y from testdf group by Y, var1))

fn$sqldf(select sum($var1) XSUM, Y from testdf group by Y)

See ?sprintf

fn comes from the gsubfn package (which is automatically pulled in by
sqldf) and adds quasi perl style string interpolation to the arguments
passed to the function call it prefaces.  See
http://gsubfn.googlecode.com and ?fn


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQLDF - Submitting Queries with R Objects as Columns

2011-03-09 Thread Rob Tirrell

You're submitting queries for SQLDF to execute as strings. So, if you want
to use a variable column name, sprintf() or paste() your statement together,
like:

sqldf(sprintf('select sum(%s) as XSUM, Y as Y from testdf group by Y',
var1))

--
Robert Tirrell | r...@stanford.edu | (607) 437-6532
Program in Biomedical Informatics | Butte Lab | Stanford University


 sqldf(select sum(return(var1)) as XSUM,
   Y as Y
   from testdf
   group by Y)



 --
 Michael Schumacher
 mike.schumac...@gmail.com
 Manager Data  Analytics, ValueClick
 818-851-8638

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf hanging on macintosh - works on windows

2010-11-02 Thread GL


Marc: Installing Simon's package worked perfectly. Thanks so much! 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-hanging-on-macintosh-works-on-windows-tp3022193p3023736.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf error only on Unix not Windows

On Mon, Nov 1, 2010 at 9:28 AM, Alex Bryant abry...@i-review.com wrote:
 Hello Group,

                I am having trouble with the sqldf package on unix.  The same 
 code works fine on windows.

 Silly Example script:

 # Load the package
 library(sqldf)

 # Use the titanic data set

 data(women)
 colnames(women)
 head(women)

 sqldf('select height, count(*) from women where height is not null group by 
 weight')


Some things to try:

-  try adding dbname = tempfile() argument to your sqldf statement and
see if that makes any difference

- try it with the H2 database rather than sqlite (or with PostgreSQL)
  To use it with H2 make sure you have Java and the CRAN package, RH2,
installed.
  RH2 includes the H2 database itself so you don't need to install that.
  Then issue this line in R any time before your first sqldf call
 library(RH2)
  sqldf will notice it and automatically use the H2 database instead of sqlite

- try it with R 2.11

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf hanging on macintosh - works on windows

On Mon, Nov 1, 2010 at 9:59 AM, GL pfl...@shands.ufl.edu wrote:

 Have a long script that runs fine on windows (32 bit). When I try to run in
 on two different macs (64 bit), however, it hangs with identical behavior.

 I start with:
 library(sqldf)

 This results in messages:
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron

 I then read some data, etc.

 I execute the following:

 #merge raw data and all possible combinations
  df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
    left join df.aggregate using (Hour, Date)')

 I receive the messages:
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +

 Then I get into some kind of loop. Message at bottom ribbon says:

 executing:
 try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE)


That is not a line that appears in the sqldf source code.  Try these
suggestions anyways:

http://permalink.gmane.org/gmane.comp.lang.r.general/209443


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread GL


added library(RH2)

Still get message:

Loading required package: tcltk
Loading Tcl/Tk interface
+

directly after sqldf statement 

   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
+ left join df.aggregate using (Hour, Date)')

There is no progress spinner. If I hit enter I get a 

At that point I start to enter any command (just summary, for instance), I
get the progress spinner, the
try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE)
 message in the bottom ribbon, and the system apparently hangs. 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/sqldf-hanging-on-macintosh-works-on-windows-tp3022193p3022233.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf hanging on macintosh - works on windows

On Mon, Nov 1, 2010 at 10:32 AM, GL pfl...@shands.ufl.edu wrote:

 added library(RH2)

 Still get message:

 Loading required package: tcltk
 Loading Tcl/Tk interface
 +

 directly after sqldf statement

   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 +     left join df.aggregate using (Hour, Date)')

 There is no progress spinner. If I hit enter I get a 

 At that point I start to enter any command (just summary, for instance), I
 get the progress spinner, the
 try(gsub('\\s+','',paste(capture.output(print(arg(summary))),collapse=)),silent=TRUE)
  message in the bottom ribbon, and the system apparently hangs.

I don't have a Mac but if you wish to pursue it try this:

library(sqldf)
debug(sqldf)
sqldf(...whatever...)
# now step through it by repeatedly pressing Enter and send me the
console output of the session

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread GL



 
 library(sqldf)
Loading required package: DBI
Loading required package: RSQLite
Loading required package: RSQLite.extfuns
Loading required package: gsubfn
Loading required package: proto
Loading required package: chron
 debug(sqldf)  
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
+ left join df.aggregate using (Hour, Date)')
debugging in: sqldf(select Date, Hour, x as RoomsInUse from
\df.possible.combos\\nleft join \df.aggregate\ using (Hour, Date))
debug: {
as.POSIXct.character - function(x) structure(as.numeric(x), 
class = c(POSIXt, POSIXct))
as.Date.character - function(x) structure(as.numeric(x), 
class = Date)
as.Date.numeric - function(x, origin = 1970-01-01, ...)
base::as.Date.numeric(x, 
origin = origin, ...)
as.dates.character - function(x) structure(as.numeric(x), 
class = c(dates, times))
as.times.character - function(x) structure(as.numeric(x), 
class = times)
overwrite - FALSE
request.open - missing(x)  is.null(connection)
request.close - missing(x)  !is.null(connection)
request.con - !missing(x)  !is.null(connection)
request.nocon - !missing(x)  is.null(connection)
dfnames - fileobjs - character(0)
if (request.close || request.nocon) {
on.exit({
dbPreExists - attr(connection, dbPreExists)
dbname - attr(connection, dbname)
if (!missing(dbname)  !is.null(dbname)  dbname == 
:memory:) {
dbDisconnect(connection)
} else if (!dbPreExists  drv == sqlite) {
dbDisconnect(connection)
file.remove(dbname)
} else {
for (nam in dfnames) dbRemoveTable(connection, 
  nam)
for (fo in fileobjs) dbRemoveTable(connection, 
  fo)
dbDisconnect(connection)
}
})
if (request.close) {
if (identical(connection, getOption(sqldf.connection))) 
options(sqldf.connection = NULL)
return()
}
}
if (request.open || request.nocon) {
if (is.null(drv)) {
drv - if (package:RpgSQL %in% search()) {
pgSQL
}
else if (package:RMySQL %in% search()) {
MySQL
}
else if (package:RH2 %in% search()) {
H2
}
else SQLite
}
drv - tolower(drv)
if (drv == mysql) {
m - dbDriver(MySQL)
connection - if (missing(dbname) || dbname == :memory:) {
dbConnect(m)
}
else dbConnect(m, dbname = dbname)
dbPreExists - TRUE
}
else if (drv == pgsql) {
m - dbDriver(pgSQL)
if (missing(dbname) || is.null(dbname)) {
dbname - getOption(RpgSQL.dbname)
if (is.null(dbname)) 
  dbname - test
}
connection - dbConnect(m, dbname = dbname)
dbPreExists - TRUE
}
else if (drv == h2) {
m - H2()
if (missing(dbname) || is.null(dbname)) 
dbname - :memory:
dbPreExists - dbname != :memory:  file.exists(dbname)
connection - if (missing(dbname) || dbname == :memory:) {
dbConnect(m, jdbc:h2:mem:, sa, )
}
else {
jdbc.string - paste(jdbc:h2, dbname, sep = :)
dbConnect(m, jdbc.string)
}
}
else {
m - dbDriver(SQLite)
if (missing(dbname)) 
dbname - :memory:
dbPreExists - dbname != :memory:  file.exists(dbname)
if (is.null(getOption(sqldf.dll))) {
dll - Sys.which(libspatialite-1.dll)
if (dll != ) 
  options(sqldf.dll = dll)
else options(sqldf.dll = FALSE)
}
dll - getOption(sqldf.dll)
if (length(dll) != 1 || identical(dll, FALSE) || 
nchar(dll) == 0) {
dll - FALSE
}
else {
if (dll == basename(dll)) 
  dll - Sys.which(dll)
}
options(sqldf.dll = dll)
if (!identical(dll, FALSE)) {
connection - dbConnect(m, dbname = dbname,
loadable.extensions = TRUE)
s - sprintf(select load_extension('%s'), dll)
dbGetQuery(connection, s)
}
else connection - dbConnect(m, dbname = dbname)
init_extensions(connection)
}
attr(connection, dbPreExists) - dbPreExists
if (missing(dbname)  drv == sqlite) 
dbname - :memory:
attr(connection, dbname) - dbname
if (request.open) {
options(sqldf.connection = connection)

Re: [R] sqldf hanging on macintosh - works on windows

On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote:



 library(sqldf)
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron
 debug(sqldf)
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 +     left join df.aggregate using (Hour, Date)')
 debugging in: sqldf(select Date, Hour, x as RoomsInUse from
...
 debug: words. - words - strapply(x, [[:alnum:]._]+)
 Browse[2]
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +

There is something wrong with tcltk on your system.  You can tell it
not to use tcltk by setting the appropriate option as discussed in
sqldf FAQ #5:

http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf hanging on macintosh - works on windows

2010-11-01 Thread Marc Schwartz

On Nov 1, 2010, at 10:55 AM, Gabor Grothendieck wrote:

 On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote:
 
 
 
 library(sqldf)
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron
 debug(sqldf)
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 + left join df.aggregate using (Hour, Date)')
 debugging in: sqldf(select Date, Hour, x as RoomsInUse from
 ...
 debug: words. - words - strapply(x, [[:alnum:]._]+)
 Browse[2]
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +
 
 There is something wrong with tcltk on your system.  You can tell it
 not to use tcltk by setting the appropriate option as discussed in
 sqldf FAQ #5:
 
 http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing.


GL,

If you installed R using the OSX binary from CRAN, it does not include tcl/tk. 
You need to install the separate tcltk package that Simon has put together and 
is available from:

  http://cran.us.r-project.org/bin/macosx/tools/

You also need to have X11 installed, which is available from the OSX DVD in the 
Optional Installs section.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf hanging on macintosh - works on windows

On Mon, Nov 1, 2010 at 12:10 PM, Marc Schwartz marc_schwa...@me.com wrote:
 On Nov 1, 2010, at 10:55 AM, Gabor Grothendieck wrote:

 On Mon, Nov 1, 2010 at 10:55 AM, GL pfl...@shands.ufl.edu wrote:



 library(sqldf)
 Loading required package: DBI
 Loading required package: RSQLite
 Loading required package: RSQLite.extfuns
 Loading required package: gsubfn
 Loading required package: proto
 Loading required package: chron
 debug(sqldf)
   df.final - sqldf('select Date, Hour, x as RoomsInUse from
 df.possible.combos
 +     left join df.aggregate using (Hour, Date)')
 debugging in: sqldf(select Date, Hour, x as RoomsInUse from
 ...
 debug: words. - words - strapply(x, [[:alnum:]._]+)
 Browse[2]
 Loading required package: tcltk
 Loading Tcl/Tk interface ...
 +

 There is something wrong with tcltk on your system.  You can tell it
 not to use tcltk by setting the appropriate option as discussed in
 sqldf FAQ #5:

 http://code.google.com/p/sqldf/#5._I_get_a_message_about_tcl_being_missing.


 GL,

 If you installed R using the OSX binary from CRAN, it does not include 
 tcl/tk. You need to install the separate tcltk package that Simon has put 
 together and is available from:

  http://cran.us.r-project.org/bin/macosx/tools/

 You also need to have X11 installed, which is available from the OSX DVD in 
 the Optional Installs section.

 HTH,

 Marc Schwartz




Note that sqldf can work without tcltk, as well.  The gsubfn package
does check for tcltk and and sets the engine to R rather than
tcltk if

capabilities()[[tcltk]]

is FALSE.  There may be a bug in R or a problem with the installation.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf syntax

2010-08-27 Thread Bond, Stephen

I had checked those references before posting, actually. SQLite has a very 
limited implementation of the standard. To do a single table update I would not 
go to sql. It's easy enough to do in R.

The problem is when I need to do an update from a left outer join, which I had 
to do with sqlSave (to a mySQL table), then sqlQuery, then sqlFetch.
sqlSave is amazingly slow, takes half an hour. (Would never do that at home :-) 
just too lazy to write a formal table def and use load data infile from a csv 
dump.

Also not happy with Dates becoming years in the transition. 
Will check the other suggestion about data.table and report.

Cheers everybody.

Stephen B

-Original Message-
From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] 
Sent: Thursday, August 26, 2010 4:26 PM
To: Bond, Stephen
Cc: r-help@r-project.org
Subject: Re: [R] sqldf syntax

On Thu, Aug 26, 2010 at 2:31 PM, Bond, Stephen stephen.b...@cibc.com wrote:
 Please correct the following

 sqldf(update esc left join forwagg  on esc.ym=forwagg.Date set 
 esc.ri2=forwagg.N1 where esc.age=12,select * from main.esc)
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near left: syntax error)


1. sqldf takes one sql argument whereas the above has two sql
arguments; however, the one argument may be a vector of sql commands.
 See ?sqldf and the examples on the sqldf home page
http://sqldf.googlecode.com

2. there is an error in the syntax of your update statement.  For
correct syntax see the sqlite site:

http://sqlite.org/lang_update.html

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf syntax

2010-08-26 Thread Gabor Grothendieck

On Thu, Aug 26, 2010 at 2:31 PM, Bond, Stephen stephen.b...@cibc.com wrote:
 Please correct the following

 sqldf(update esc left join forwagg  on esc.ym=forwagg.Date set 
 esc.ri2=forwagg.N1 where esc.age=12,select * from main.esc)
 Error in sqliteExecStatement(con, statement, bind.data) :
  RS-DBI driver: (error in statement: near left: syntax error)


1. sqldf takes one sql argument whereas the above has two sql
arguments; however, the one argument may be a vector of sql commands.
 See ?sqldf and the examples on the sqldf home page
http://sqldf.googlecode.com

2. there is an error in the syntax of your update statement.  For
correct syntax see the sqlite site:

http://sqlite.org/lang_update.html

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQLDF from Variable Matrix

2010-08-04 Thread Gabor Grothendieck

On Wed, Aug 4, 2010 at 12:29 AM, Suphajak Ngamlak
supha...@phatrasecurities.com wrote:
 Dear all,

 I would like to do sample statistics, e.g. mean, median from very large
 dataset. This is part of commands I use routinely with several dataset
 so I would like to make it into function. The simplified examples are

 Test-data.frame(A=c('a','b','c','a','b','c'),B=c(1,2,3,4,5,6))

 #Create function (This one work)

 GetAvg-function(Input,Bygroup){
 AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median
                from Test
    group by $Bygroup)
 return(AVG)
 }

 Result-GetAvg(Test,'A')

 #Create function (This one does not work)

 GetAvg-function(Input,Bygroup){
 AVG-fn$sqldf(select A, avg(B) as Average, median(B) as Median
                from $Input
    group by $Bygroup)
 return(AVG)
 }

 Result-GetAvg(Test,'A')

That should be GetAvg('Test', 'A') with quotes around Test or if you
want to be able to specify the data unquoted then insert this line at
the top of your function:

Input - deparse(substitute(Input))

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf 0.3-5 package or tcltk problem

2010-07-28 Thread Gabor Grothendieck

On Wed, Jul 28, 2010 at 1:21 AM,  erickso...@aol.com wrote:



  This is my first post. I am running Mac OS X version 10.6.3. I am running R 
 2.11.0 GUI 1.33 64 bit.

 This may or may not be related to sqldf, but I experienced this problem while 
 attempting to use an sqldf query. The same code runs with no problem on my 
 Windows machine. Here is what happens:

 r=sqldf(select ... )
 Loading required package: tcltk
 Loading Tcl/Tk interface ...

 Then it never loads.

 I have X11 open.

 I have all the latest versions of all the necessary packages for sqldf 0.3-5:

 DBI 0.2-5
 RSQLite 0.9-1
 RSQLite.extfuns 0.0.1
 gsubfn 0.5-3
 proto 0.3-8
 chron 2.3-35

 Although it gives warning messages for these:

 package 'sqldf' was built under R version 2.11.1
 package 'RSQLite' was built under R version 2.11.1
 package 'RSQLite.extfuns' was built under R version 2.11.1
 package 'gsubfun' was built under R version 2.11.1

 What can I do to load the Tcl/Tk interface?

Some things to try:

- upgrade to R 2.11.1
- try this alone: library(tcltk)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf modify table

2010-07-16 Thread Gabor Grothendieck

On Fri, Jul 16, 2010 at 2:46 PM, PeterTucker pthet...@gmail.com wrote:

 Hi - I am something of a newbie and am a little perplexed.  When (trying to)
 modify a table I issue the following commands with subsequent errors

 sqldf(alter table Korea drop column code, dbname = mydb)
 error in statement: near drop: syntax error

 or

 sqldf(alter table Korea rename column hyr to hyrI, dbname = mydb)
 error in statement: near column: syntax error

 These are simple commands - am I missing something obvious?  I can retrieve
 data from them, and retrieve their table_info


SQLite does not support dropping columns. See:

   http://www.sqlite.org/lang_altertable.html

however, sqldf does support the H2 and PostgreSQL databases in
addition to sqlite so you can try one of those if this feature is
important to you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Gabor Grothendieck

There are two problems:

1. A natural join will join all columns with the same names in the two
tables and that includes not only Tid but also dfName and since there
are no rows that have the same Tid and dfName the result has zero
rows.

2. the heuristic it uses fails when you retrieve the same column name
from multiple tables so use method = raw to turn off the heuristic.
The heuristic will be improved to cover this case in the future.
Read FAQ #1 on the home page:
http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors?

This should work:

 sqldf('select * from main.A join main.B using(Tid)', method = raw)
 Tid dfName dfName
1  AES 01-01-02 11:53:00  a  b
2 AES 01-01-05\n10:58:00  a  b
3  AES 01-01-11 12:30:00  a  b

This works too as the double dfName no longer exists to confuse the heuristic:

names(B)[2] - dfNameB
sqldf('select * from main.A join main.B using(Tid)')



On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote:
 Hello,

 I'm having trouble discovering what's going wrong with my use of natural
 joins via sqldf.

 Following the instructions under 4i at http://code.google.com/p/sqldf/,
 which discusses creating indices to speed joins, I have been only unreliably
 able to get natural joins to work.

 For example,

 Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05
 10:58:00', 'AES 01-01-11 12:30:00')
 A - data.frame(Tid, dfName = 'a')
 B - data.frame(Tid = Tid[2:4], dfName = 'b')
 C - data.frame(Tid = Tid[1:3], dfName = 'c')

 # then use the sqldf library
 library(sqldf)
 sqldf()

 # to create indices on the Tid variable shared across data.frames
 sqldf('create index indA on A(Tid)')
 sqldf('create index indB on B(Tid)')
 sqldf('create index indC on C(Tid)')

 # check to make sure everything is there
 sqldf('select * from sqlite_master')

 # doing a natural join (implicitly on Tid)
 # does not give the expected joins
 sqldf('select * from main.A natural join main.B')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.A natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.B natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)

 # even using a where clause (which doesn't have the efficiency qualities I
 need the indexed natural joins for) is problematic, setting values of the
 dfName variable incorrectly for the data from C
 sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
                    Tid dfName                   Tid dfName
 1 AES 01-01-02 11:53:00      b AES 01-01-02 11:53:00      b
 2 AES 01-01-05 10:58:00      b AES 01-01-05 10:58:00      b

 I'm grateful for your guidance on what I'm doing wrong with the natural join
 in sqldf.

 many thanks,
 Nick

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Gabor Grothendieck

Although that works I had meant to write:

 names(B)[2] - dfNameB
 # ... other commands
 sqldf('select * from main.A natural join main.B')

so that now only Tid is in common so the natural join just picks it up
and also the heuristic works again since we no longer retrieve
duplicate column names.

On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck
ggrothendi...@gmail.com wrote:
 There are two problems:

 1. A natural join will join all columns with the same names in the two
 tables and that includes not only Tid but also dfName and since there
 are no rows that have the same Tid and dfName the result has zero
 rows.

 2. the heuristic it uses fails when you retrieve the same column name
 from multiple tables so use method = raw to turn off the heuristic.
 The heuristic will be improved to cover this case in the future.
 Read FAQ #1 on the home page:
 http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors?

 This should work:

 sqldf('select * from main.A join main.B using(Tid)', method = raw)
                     Tid dfName dfName
 1  AES 01-01-02 11:53:00      a      b
 2 AES 01-01-05\n10:58:00      a      b
 3  AES 01-01-11 12:30:00      a      b

 This works too as the double dfName no longer exists to confuse the heuristic:

 names(B)[2] - dfNameB
 sqldf('select * from main.A join main.B using(Tid)')



 On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com wrote:
 Hello,

 I'm having trouble discovering what's going wrong with my use of natural
 joins via sqldf.

 Following the instructions under 4i at http://code.google.com/p/sqldf/,
 which discusses creating indices to speed joins, I have been only unreliably
 able to get natural joins to work.

 For example,

 Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES 01-01-05
 10:58:00', 'AES 01-01-11 12:30:00')
 A - data.frame(Tid, dfName = 'a')
 B - data.frame(Tid = Tid[2:4], dfName = 'b')
 C - data.frame(Tid = Tid[1:3], dfName = 'c')

 # then use the sqldf library
 library(sqldf)
 sqldf()

 # to create indices on the Tid variable shared across data.frames
 sqldf('create index indA on A(Tid)')
 sqldf('create index indB on B(Tid)')
 sqldf('create index indC on C(Tid)')

 # check to make sure everything is there
 sqldf('select * from sqlite_master')

 # doing a natural join (implicitly on Tid)
 # does not give the expected joins
 sqldf('select * from main.A natural join main.B')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.A natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)
 sqldf('select * from main.B natural join main.C')
 [1] Tid    dfName
 0 rows (or 0-length row.names)

 # even using a where clause (which doesn't have the efficiency qualities I
 need the indexed natural joins for) is problematic, setting values of the
 dfName variable incorrectly for the data from C
 sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
                    Tid dfName                   Tid dfName
 1 AES 01-01-02 11:53:00      b AES 01-01-02 11:53:00      b
 2 AES 01-01-05 10:58:00      b AES 01-01-05 10:58:00      b

 I'm grateful for your guidance on what I'm doing wrong with the natural join
 in sqldf.

 many thanks,
 Nick


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Nick Switanek

Thank you very much for these clarifying responses, Gabor.

I had mistakenly assumed that creating the index on Tid restricted the
natural join to joining on Tid. Can you describe when and how indices speed
up joins, or can you point me to resources that address this? Is it only for
natural joins or any joins (including, say, a select statement with where
clause)?

thanks,
nick


On Thu, May 20, 2010 at 11:42 AM, Gabor Grothendieck 
ggrothendi...@gmail.com wrote:

 Although that works I had meant to write:

  names(B)[2] - dfNameB
  # ... other commands
  sqldf('select * from main.A natural join main.B')

 so that now only Tid is in common so the natural join just picks it up
 and also the heuristic works again since we no longer retrieve
 duplicate column names.

 On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
  There are two problems:
 
  1. A natural join will join all columns with the same names in the two
  tables and that includes not only Tid but also dfName and since there
  are no rows that have the same Tid and dfName the result has zero
  rows.
 
  2. the heuristic it uses fails when you retrieve the same column name
  from multiple tables so use method = raw to turn off the heuristic.
  The heuristic will be improved to cover this case in the future.
  Read FAQ #1 on the home page:
 
 http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors
 ?
 
  This should work:
 
  sqldf('select * from main.A join main.B using(Tid)', method = raw)
  Tid dfName dfName
  1  AES 01-01-02 11:53:00  a  b
  2 AES 01-01-05\n10:58:00  a  b
  3  AES 01-01-11 12:30:00  a  b
 
  This works too as the double dfName no longer exists to confuse the
 heuristic:
 
  names(B)[2] - dfNameB
  sqldf('select * from main.A join main.B using(Tid)')
 
 
 
  On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com
 wrote:
  Hello,
 
  I'm having trouble discovering what's going wrong with my use of natural
  joins via sqldf.
 
  Following the instructions under 4i at http://code.google.com/p/sqldf/,
  which discusses creating indices to speed joins, I have been only
 unreliably
  able to get natural joins to work.
 
  For example,
 
  Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES
 01-01-05
  10:58:00', 'AES 01-01-11 12:30:00')
  A - data.frame(Tid, dfName = 'a')
  B - data.frame(Tid = Tid[2:4], dfName = 'b')
  C - data.frame(Tid = Tid[1:3], dfName = 'c')
 
  # then use the sqldf library
  library(sqldf)
  sqldf()
 
  # to create indices on the Tid variable shared across data.frames
  sqldf('create index indA on A(Tid)')
  sqldf('create index indB on B(Tid)')
  sqldf('create index indC on C(Tid)')
 
  # check to make sure everything is there
  sqldf('select * from sqlite_master')
 
  # doing a natural join (implicitly on Tid)
  # does not give the expected joins
  sqldf('select * from main.A natural join main.B')
  [1] TiddfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.A natural join main.C')
  [1] TiddfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.B natural join main.C')
  [1] TiddfName
  0 rows (or 0-length row.names)
 
  # even using a where clause (which doesn't have the efficiency qualities
 I
  need the indexed natural joins for) is problematic, setting values of
 the
  dfName variable incorrectly for the data from C
  sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
 Tid dfName   Tid dfName
  1 AES 01-01-02 11:53:00  b AES 01-01-02 11:53:00  b
  2 AES 01-01-05 10:58:00  b AES 01-01-05 10:58:00  b
 
  I'm grateful for your guidance on what I'm doing wrong with the natural
 join
  in sqldf.
 
  many thanks,
  Nick
 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf: issues with natural joins

2010-05-20 Thread Gabor Grothendieck

They work on any join that is able to make use of them.   If you
preface the select statement with explain query plan then it will give
you some info, e.g.

 sqldf('explain query plan select * from main.A natural join main.B')
  order from  detail
1 00 TABLE A
2 11 TABLE B WITH INDEX indB

Note that its using the index on B but not the index on A so there was
actually no point in adding that one if this were the only query.

This is potentially a large topic so see the sqlite.org site and
mailing list for more info.

On Thu, May 20, 2010 at 1:28 PM, Nick Switanek nswita...@gmail.com wrote:
 Thank you very much for these clarifying responses, Gabor.

 I had mistakenly assumed that creating the index on Tid restricted the
 natural join to joining on Tid. Can you describe when and how indices speed
 up joins, or can you point me to resources that address this? Is it only for
 natural joins or any joins (including, say, a select statement with where
 clause)?

 thanks,
 nick


 On Thu, May 20, 2010 at 11:42 AM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:

 Although that works I had meant to write:

  names(B)[2] - dfNameB
  # ... other commands
  sqldf('select * from main.A natural join main.B')

 so that now only Tid is in common so the natural join just picks it up
 and also the heuristic works again since we no longer retrieve
 duplicate column names.

 On Thu, May 20, 2010 at 12:32 PM, Gabor Grothendieck
 ggrothendi...@gmail.com wrote:
  There are two problems:
 
  1. A natural join will join all columns with the same names in the two
  tables and that includes not only Tid but also dfName and since there
  are no rows that have the same Tid and dfName the result has zero
  rows.
 
  2. the heuristic it uses fails when you retrieve the same column name
  from multiple tables so use method = raw to turn off the heuristic.
  The heuristic will be improved to cover this case in the future.
  Read FAQ #1 on the home page:
 
  http://code.google.com/p/sqldf/#1._How_does_sqldf_handle_classes_and_factors?
 
  This should work:
 
  sqldf('select * from main.A join main.B using(Tid)', method = raw)
                      Tid dfName dfName
  1  AES 01-01-02 11:53:00      a      b
  2 AES 01-01-05\n10:58:00      a      b
  3  AES 01-01-11 12:30:00      a      b
 
  This works too as the double dfName no longer exists to confuse the
  heuristic:
 
  names(B)[2] - dfNameB
  sqldf('select * from main.A join main.B using(Tid)')
 
 
 
  On Thu, May 20, 2010 at 12:04 PM, Nick Switanek nswita...@gmail.com
  wrote:
  Hello,
 
  I'm having trouble discovering what's going wrong with my use of
  natural
  joins via sqldf.
 
  Following the instructions under 4i at http://code.google.com/p/sqldf/,
  which discusses creating indices to speed joins, I have been only
  unreliably
  able to get natural joins to work.
 
  For example,
 
  Tid - c('AES 01-01-02 10:58:00', 'AES 01-01-02 11:53:00', 'AES
  01-01-05
  10:58:00', 'AES 01-01-11 12:30:00')
  A - data.frame(Tid, dfName = 'a')
  B - data.frame(Tid = Tid[2:4], dfName = 'b')
  C - data.frame(Tid = Tid[1:3], dfName = 'c')
 
  # then use the sqldf library
  library(sqldf)
  sqldf()
 
  # to create indices on the Tid variable shared across data.frames
  sqldf('create index indA on A(Tid)')
  sqldf('create index indB on B(Tid)')
  sqldf('create index indC on C(Tid)')
 
  # check to make sure everything is there
  sqldf('select * from sqlite_master')
 
  # doing a natural join (implicitly on Tid)
  # does not give the expected joins
  sqldf('select * from main.A natural join main.B')
  [1] Tid    dfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.A natural join main.C')
  [1] Tid    dfName
  0 rows (or 0-length row.names)
  sqldf('select * from main.B natural join main.C')
  [1] Tid    dfName
  0 rows (or 0-length row.names)
 
  # even using a where clause (which doesn't have the efficiency
  qualities I
  need the indexed natural joins for) is problematic, setting values of
  the
  dfName variable incorrectly for the data from C
  sqldf('select * from main.B b, main.C c where b.Tid = c.Tid')
                     Tid dfName                   Tid dfName
  1 AES 01-01-02 11:53:00      b AES 01-01-02 11:53:00      b
  2 AES 01-01-05 10:58:00      b AES 01-01-05 10:58:00      b
 
  I'm grateful for your guidance on what I'm doing wrong with the natural
  join
  in sqldf.
 
  many thanks,
  Nick
 



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf not joining all the fields

Can you show the output of dput(x_data) and dput(y_data).

On Fri, Mar 12, 2010 at 11:56 AM, Newbie19_02 nvanzuy...@gmail.com wrote:

 Dear R users,

 I have two data frames that were read from text files as follows:

 x_data - read.table(x.txt, header = TRUE, sep = |, quote = \',
                dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
 nrows = 3864284,
                skip = 0, check.names = TRUE,fill=TRUE,
                strip.white = TRUE, blank.lines.skip = TRUE,
                comment.char = #, allowEscapes = FALSE, flush = FALSE,
                fileEncoding = , encoding = unknown)

 x_data

 prochi prescribed_date dataMonth item_code res_seqno quantity directions
 CAO713      22/06/2001      NULL    842752      NULL       60        1/D
 CAO713      28/04/2000      NULL      7800      NULL     100G       A/TD
 CAO713      10/04/2000      NULL    842652      NULL       60        1/D
 CAO713      03/07/2000      NULL    842652      NULL       60        1/D
 CAO713      09/01/2001      NULL    842752      NULL       60        1/D
 CAO713      16/10/2001      NULL    842752      NULL       60        1/D
 CAO713      16/08/2001      NULL    842752      NULL       60        1/D
 CAO713      17/09/1993      NULL     39620      NULL      5ML        NIL
 CAO713      01/05/2001      NULL    842752      NULL       60        1/D
 CAO713      05/03/2001      NULL    842752      NULL       60        1/D



 y_data

  item_code    name                              formulation_code  strength
 bnf_code
 100              NEONACLEX K                    TABS      NULL    2.2.8
 110                NEONACLEX                     TABS       5MG    2.2.1
 50                   MESORB                         DRESS 10CMX10CM   20.3.1
 160 ABSORBENT CELLULOSE MESO            DRESS 10CMX10CM   20.3.1
 161 ABSORBENT CELLULOSE MESO            DRESS 10CMX15CM   20.3.1
 164 ABSORBENT CELLULOSE MESO            DRESS 20CMX25CM   20.3.1
 200                  SEPTRIN                        TABS     480MG    5.1.8
 210          SEPTRIN PAED SF                    SUSP 240MG/5ML    5.1.8
 212            SEPTRIN ADULT                     SUSP 480MG/5ML    5.1.8
 220            SEPTRIN FORTE                     TABS     960MG    5.1.8
  etc


 contains all the information for the item codes
 y was read in in the same way.

 I then used the following code:

 z  - sqldf(select * from x left join y using (code))

 when I use this on my real data I get an output:
  prochi prescribed_date dataMonth item_code res_seqno quantity directions
 1  CAO713      22/06/2001      NULL    842752      NULL       60
 1/D
 2  CAO713      28/04/2000      NULL      7800      NULL     100G
 A/TD
 3  CAO713      10/04/2000      NULL    842652      NULL       60
 1/D
 4  CAO713      03/07/2000      NULL    842652      NULL       60
 1/D
 5  CAO713      09/01/2001      NULL    842752      NULL       60
 1/D
 6  CAO713      16/10/2001      NULL    842752      NULL       60
 1/D
 7  CAO713      16/08/2001      NULL    842752      NULL       60
 1/D
 8  CAO713      17/09/1993      NULL     39620      NULL      5ML
 NIL
 9  CAO713      01/05/2001      NULL    842752      NULL       60
 1/D
 10 CAO713      05/03/2001      NULL    842752      NULL       60
 1/D
   no_of_packs datasource scan_ref_no         name formulation_code strength
 1         NULL        TSF        NULL         NA             NA     NA
 2         NULL        TSF        NULL BETNOVATE RD             OINT   0.025%
 3         NULL        TSF        NULL         NA             NA     NA
 4         NULL        TSF        NULL         NA             NA     NA
 5         NULL        TSF        NULL         NA             NA     NA
 6         NULL        TSF        NULL         NA             NA     NA
 7         NULL        TSF        NULL         NA             NA     NA
 8         NULL        TSF        NULL   GAMMABULIN              INJ    320MG
 9         NULL        TSF        NULL         NA             NA     NA
 10        NULL        TSF        NULL         NA             NA     NA
   bnf_code
 1      NA
 2  13.4.1.2
 3      NA
 4      NA
 5      NA
 6      NA
 7      NA
 8      14.5
 9      NA
 10     NA


 There is absolutely no reason for there to be NA anywhere as the
 information for both the tables is complete.

 Not sure what the problem is?

 Thanks,
 Natalie
 --
 View this message in context: 
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590786.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list

Re: [R] sqldf not joining all the fields


http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt
feb09_267_presc_items_tsf.txt 

is the total file for y so if I use the command line with the total data for
y then I get the output specified in z

Thanks,
Natalie
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf not joining all the fields


dput(x_data)

structure(list(prochi = c(CAO713, CAO713, CAO713, 
CAO713, CAO713, CAO713, CAO713, CAO713, 
CAO713, CAO713), prescribed_date = c(22/06/2001, 
28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001, 
16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), item_code = c(842752, 7800, 842652, 842652, 
842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60, 
5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D, 
1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF, 
TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)), .Names = c(prochi, prescribed_date, dataMonth, item_code, 
res_seqno, quantity, directions, no_of_packs, datasource, 
scan_ref_no), row.names = c(NA, 10L), class = data.frame)

-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf not joining all the fields


y_data - read.table(feb_267_presc_items_tsf.txt, header = TRUE, sep = |,
quote = \',
dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
nrows = 3864284,
skip = 0, check.names = TRUE,fill=TRUE,
strip.white = TRUE, blank.lines.skip = TRUE,
comment.char = #, allowEscapes = FALSE, flush = FALSE,
fileEncoding = , encoding = unknown) 

Will read the file in same that I have and I have posted the dput.

Sorry for not giving you what you originally wanted...

Natalie
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590826.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf not joining all the fields

Please provide code that I can just copy from your post and paste into
my session.   Either provide dput output as requested or provide the
files on the internet together with code that reads them off the
internet.

On Fri, Mar 12, 2010 at 12:06 PM, Newbie19_02 nvanzuy...@gmail.com wrote:

 http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt
 feb09_267_presc_items_tsf.txt

 is the total file for y so if I use the command line with the total data for
 y then I get the output specified in z

 Thanks,
 Natalie
 --
 View this message in context: 
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590804.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf not joining all the fields

What  about y_data?

On Fri, Mar 12, 2010 at 12:14 PM, Newbie19_02 nvanzuy...@gmail.com wrote:

 dput(x_data)

 structure(list(prochi = c(CAO713, CAO713, CAO713,
 CAO713, CAO713, CAO713, CAO713, CAO713,
 CAO713, CAO713), prescribed_date = c(22/06/2001,
 28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001,
 16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
 c(NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL), item_code = c(842752, 7800, 842652, 842652,
 842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
 c(NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60,
 5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D,
 1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL,
 NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF,
 TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL,
 NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
 )), .Names = c(prochi, prescribed_date, dataMonth, item_code,
 res_seqno, quantity, directions, no_of_packs, datasource,
 scan_ref_no), row.names = c(NA, 10L), class = data.frame)

 --
 View this message in context: 
 http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590821.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf not joining all the fields


The y_data file has over 9000 rows in it so I thought it would be more
practical to give you the file to download
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf not joining all the fields

2010-03-12 Thread David Winsemius

You have now given two different assignments to x_data and none to  
y_data:


The str( from the file access offering:

 str(x_data)
'data.frame':   2848 obs. of  5 variables:
 $ item_code   : int  100 110 150 160 161 164 200 210 212 220 ...
 $ name: chr  NEONACLEX K NEONACLEX MESORB  
ABSORBENT CELLULOSE MESO ...

 $ formulation_code: chr  TABS TABS DRESS DRESS ...
 $ strength: chr  NULL 5MG 10CMX10CM 10CMX10CM ...
 $ bnf_code: chr  2.2.8 2.2.1 20.3.1 20.3.1 ...

The str from assignment from the dput offering
 str(x_data)
'data.frame':   10 obs. of  10 variables:
 $ prochi : chr  CAO713 CAO713 CAO713  
CAO713 ...
 $ prescribed_date: chr  22/06/2001 28/04/2000 10/04/2000  
03/07/2000 ...

 $ dataMonth  : chr  NULL NULL NULL NULL ...
 $ item_code  : chr  842752 7800 842652 842652 ...
 $ res_seqno  : chr  NULL NULL NULL NULL ...
 $ quantity   : chr  60 100G 60 60 ...
 $ directions : chr  1/D A/TD 1/D 1/D ...
 $ no_of_packs: chr  NULL NULL NULL NULL ...
 $ datasource : chr  TSF TSF TSF TSF ...
 $ scan_ref_no: chr  NULL NULL NULL NULL ...

This code worked, but it is not clear that the x-y assignments were  
correct:


x_data - read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt 
, header = TRUE, sep = |, quote = \',
   dec = .,as.is = TRUE,na.strings = NA,colClasses =  
NA,

nrows = 3864284,
   skip = 0, check.names = TRUE,fill=TRUE,
   strip.white = TRUE, blank.lines.skip = TRUE,
   comment.char = #, allowEscapes = FALSE, flush = FALSE,
   fileEncoding = , encoding = unknown)

--
David.

On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:



The y_data file has over 9000 rows in it so I thought it would be more
practical to give you the file to download
--
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf not joining all the fields


Sorry!  It is the end of the day for me.

So 

dput(x)

structure(list(prochi = c(CAO713, CAO713, CAO713, 
CAO713, CAO713, CAO713, CAO713, CAO713, 
CAO713, CAO713), prescribed_date = c(22/06/2001, 
28/04/2000, 10/04/2000, 03/07/2000, 09/01/2001, 16/10/2001, 
16/08/2001, 17/09/1993, 01/05/2001, 05/03/2001), dataMonth =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), item_code = c(842752, 7800, 842652, 842652, 
842752, 842752, 842752, 39620, 842752, 842752), res_seqno =
c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), quantity = c(60, 100G, 60, 60, 60, 60, 60, 
5ML, 60, 60), directions = c(1/D, A/TD, 1/D, 1/D, 
1/D, 1/D, 1/D, NIL, 1/D, 1/D), no_of_packs = c(NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 
NULL), datasource = c(TSF, TSF, TSF, TSF, TSF, TSF, 
TSF, TSF, TSF, TSF), scan_ref_no = c(NULL, NULL, 
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)), .Names = c(prochi, prescribed_date, dataMonth, item_code, 
res_seqno, quantity, directions, no_of_packs, datasource, 
scan_ref_no), row.names = c(NA, 10L), class = data.frame)


y_data -
read.table(file=http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt;,
header = TRUE, sep = |, quote = \',

  dec = .,as.is = TRUE,na.strings = NA,colClasses = NA,
nrows = 3864284,
  skip = 0, check.names = TRUE,fill=TRUE,
  strip.white = TRUE, blank.lines.skip = TRUE,
  comment.char = #, allowEscapes = FALSE, flush = FALSE,
  fileEncoding = , encoding = unknown)

So the y_data essentially contains the lookup table for the item codes in x.

Thanks and sorry for the mix up
-- 
View this message in context: 
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590849.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf not joining all the fields