Re: [R] a question on sqldf's handling of missing value and factor

2011-03-16 Thread xin wei
Dear Gabor:
I did not have time to look at this issue these few days. Now I saw your
investigation. I am very grateful that you quickly identify the root cause
of this. It seems that a little caution needs to be exercised when applying
sqldf to text file with large number of blanks (I have no control over how
the client files are produced). 
thank you again for such a good solution!

--
View this message in context: 
http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3381867.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question on sqldf's handling of missing value and factor

2011-03-02 Thread Gabor Grothendieck
On Wed, Mar 2, 2011 at 10:17 AM, xin wei  wrote:
> I am sorry for posting the wrong source file. the correct source file is as
> follows:
> a       b       c
> aa              23
> aaa     34.6
>             77.8
>
> They are tab delimited but somehow could not be displayed correctly in
> browser.

The problem is that you are using empty fields to represent missing
values but SQLite regarded them as zero length character fields.  See
FAQ 14 on  the sqldf home page for a solution:
http://code.google.com/p/sqldf/#14._How_does_one_read_files_where_numeric_NAs_are_represented_as

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question on sqldf's handling of missing value and factor

2011-03-02 Thread xin wei
Dear Mr. Grothendieck :
thank you so much for your attention. You are the real expert here. the
following is a mock text file:
a   b   c
aa  23
aaa 34   
77

note that both b and c column contain missing value (blank)
I save it under my C drive and use both read.table and sqldf to import it to
R and then use identical() function to compare the result. The following is
the result:

> setwd("c:/")
> library(sqldf)
> test <- file("test.txt") 
> testx <- sqldf("select * from test", 
+ dbname = tempfile(), file.format = list(header = T,
sep="\t", row.names = F))
> testy<- read.table("test.txt", header = T, sep="\t")
> identical(testx, testy)
[1] FALSE
> testx
 abc
1   aa  23.0
2  aaa 34.6  0.0
3   77.8
> testy
 abc
1   aa   NA 23.0
2  aaa 34.6   NA
3    NA 77.8
> class(testx$b)
[1] "factor"
> class(testy$b)
[1] "numeric"
> 
 
read.table seems to get it right while sqldf treats b as factor (if I add
method="raw", b become character). what is more troubling is that column C
has number 0 at the second row while in the original file it is missing. In
my real world situation with a much larger text file, the problem is that
many cells are empty when they all actually have values in the original text
file. 

I would greatly appreciate your help if you can shed some light on this.

thanks

--
View this message in context: 
http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3331662.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question on sqldf's handling of missing value and factor

2011-03-02 Thread xin wei
I am sorry for posting the wrong source file. the correct source file is as
follows:
a   b   c
aa  23
aaa 34.6 
77.8

They are tab delimited but somehow could not be displayed correctly in
browser.

--
View this message in context: 
http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3331667.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question on sqldf's handling of missing value and factor

2011-03-02 Thread Gabor Grothendieck
On Tue, Mar 1, 2011 at 11:52 PM, xin wei  wrote:
> Dear subscribers:
>
> I am using the following code to read a large number of big text files:
> library(sqldf)
> tempd <- file()
> tempdx <- sqldf("select * from tempd", dbname = tempfile(), file.format =
> list(header = T, sep="\t", row.names = F))
>
> The problem is: all my numberical variable become factor (maybe because
> these columns all contain missing value). It would be quite cubersome to
> convert them to numeric variable using as.numeric one by one. Does anyone
> know how to re-set SQLDF so that it would automatically read the numeric
> column with missing row as real numeric instead of factor?
>

If you can provide a minimal ***reproducible*** example it would help.
 Maybe sqldf(..., method = "raw") will give you what you want but I
can't say for sure without the example.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] a question on sqldf's handling of missing value and factor

2011-03-01 Thread xin wei
Dear subscribers:

I am using the following code to read a large number of big text files:
library(sqldf)
tempd <- file()
tempdx <- sqldf("select * from tempd", dbname = tempfile(), file.format =
list(header = T, sep="\t", row.names = F))

The problem is: all my numberical variable become factor (maybe because
these columns all contain missing value). It would be quite cubersome to
convert them to numeric variable using as.numeric one by one. Does anyone
know how to re-set SQLDF so that it would automatically read the numeric
column with missing row as real numeric instead of factor?

many thanks

-- 
View this message in context: 
http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3331007.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.