Re: [R] SQldf with sqlite and H2

2011-07-14 Thread Mandans
Thanks a lot Gabor. It helped a lot. Appreciate your time and effort.

Thanks

--- On Thu, 7/14/11, Gabor Grothendieck  wrote:

> From: Gabor Grothendieck 
> Subject: Re: [R] SQldf with sqlite and H2
> To: "Mandans" 
> Cc: r-help@r-project.org
> Date: Thursday, July 14, 2011, 2:22 PM
> On Thu, Jul 14, 2011 at 10:33 AM,
> Mandans 
> wrote:
> > SQldf with sqlite and H2
> >
> > I have a large csv file (about 2GB) and wanted to
> import the file into R and do some filtering and analysis.
> Came across sqldf ( a great idea and product) and was trying
> to play around to see what would be the best method of doing
> this. csv file is comma delimited with some columns having
> comma inside the quoation like this "John, Doe".
> >
> > I tried this first
> >
> > ###
> > library(sqldf)
> > sqldf("attach testdb as new")
> > In.File <- "C:/JP/Temp/2008.csv"
> > read.csv.sql(In.File, sql = "create table table1 as
> select * from file",
> >  dbname = "testdb")
> >
> > It errored out with message
> >
> > NULL
> > Warning message:
> > closing unused connection 3 (C:/JP/Temp/2008.csv)
> >
> > When this failed, I converted this file from comma
> delimited to tab delimited and used this command
> >
> > #
> > read.csv.sql(In.File, sql = "create table table1 as
> select * from file",
> >  dbname = "testdb", sep = "\t")
> >
> > and this worked, it created testdb sqlite file with
> the size of 3GB
> >
> > now my question is in 3 parts.
> >
> > 1. Is it possible to create a dataframe with
> appropriate column classes and use that column classes when
> I use the read.csv.sql command to create the table.
> Something like may be create the table from that DF and then
> update with read.csv.sql.?
> >
> > Any example code will be really helpful.
> 
> Here is an example of using method = "name__class". 
> Note there are
> two underscores in a row.  It appears I neglected to
> document that
> Date2 means convert from character representation whereas
> Date means
> convert from numeric representation.  It would also be
> possible to use
> method = "raw" and then coerce the columns yourself
> afterwards.
> 
> # create test file
> Lines <- 'A__Date2|B
> 2000-01-01|x,y
> 2000-01-02|c,d
> '
> tf <- tempfile()
> cat(Lines, file = tf)
> 
> 
> library(sqldf)
> DF <- read.csv.sql(tf, sep = "|", method =
> "name__class")
> str(DF)
> 
> >
> > 2. If we use the H2 database instead of default sqlite
> and use the readcsv option, will that be faster and is there
> a way we can specify the above thought of applying a DF
> class to table column properties and update with CSVREAD
> >
> > library(RH2)
> > something like SELECT * FROM
> CSVREAD('C:/JP/Temp/2008.csv')
> >
> > Any example code will be really helpful.
> 
> Sorry, I haven't tested the speed of this.  postgresql
> and mysql, both
> supported by sqldf, also have builtin methods to read
> files. If I had
> to guess I would guess that mysql would be fastest but this
> would have
> to be tested.
> 
> >
> > 3. How do we specify where the H2 file is saved. Saw
> something like this, when I ran this example from RH2
> package, couldn't find the file in the working directory.
> >
> > con <- dbConnect(H2(), "jdbc:h2:~/test", "sa", "")
> 
> ~ means your home directory so ~/test means test is in the
> home directory.
> 
> Try
> 
> normalizePath("~")
> normalizePath("~/test")
> etc.
> 
> to see what they refer to.
> 
> Regards.
> 
> >
> > Sorry for the long mail. Appreciate all for building a
> great community and for the wonderful software in R.
> > Thanks for Gabor Grothendieck for bring sqldf to this
> great community.
> >
> > Any help or direction you can provide in this is
> highly appreciated.
> >
> > Thanks all.
> >
> > __
> > R-help@r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> >
> 
> 
> 
> -- 
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] SQldf with sqlite and H2

2011-07-14 Thread Mandans
SQldf with sqlite and H2

I have a large csv file (about 2GB) and wanted to import the file into R and do 
some filtering and analysis. Came across sqldf ( a great idea and product) and 
was trying to play around to see what would be the best method of doing this. 
csv file is comma delimited with some columns having comma inside the quoation 
like this "John, Doe". 

I tried this first 

###
library(sqldf)
sqldf("attach testdb as new")
In.File <- "C:/JP/Temp/2008.csv"
read.csv.sql(In.File, sql = "create table table1 as select * from file", 
  dbname = "testdb")

It errored out with message

NULL
Warning message:
closing unused connection 3 (C:/JP/Temp/2008.csv)   

When this failed, I converted this file from comma delimited to tab delimited 
and used this command  

#  
read.csv.sql(In.File, sql = "create table table1 as select * from file", 
  dbname = "testdb", sep = "\t")

and this worked, it created testdb sqlite file with the size of 3GB

now my question is in 3 parts.

1. Is it possible to create a dataframe with appropriate column classes and use 
that column classes when I use the read.csv.sql command to create the table. 
Something like may be create the table from that DF and then update with 
read.csv.sql.?

Any example code will be really helpful.

2. If we use the H2 database instead of default sqlite and use the readcsv 
option, will that be faster and is there a way we can specify the above thought 
of applying a DF class to table column properties and update with CSVREAD

library(RH2)
something like SELECT * FROM CSVREAD('C:/JP/Temp/2008.csv')

Any example code will be really helpful.

3. How do we specify where the H2 file is saved. Saw something like this, when 
I ran this example from RH2 package, couldn't find the file in the working 
directory.

con <- dbConnect(H2(), "jdbc:h2:~/test", "sa", "")

Sorry for the long mail. Appreciate all for building a great community and for 
the wonderful software in R.
Thanks for Gabor Grothendieck for bring sqldf to this great community.

Any help or direction you can provide in this is highly appreciated.

Thanks all.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.