Re: [R] a question on sqldf's handling of missing value and factor
Dear Gabor: I did not have time to look at this issue these few days. Now I saw your investigation. I am very grateful that you quickly identify the root cause of this. It seems that a little caution needs to be exercised when applying sqldf to text file with large number of blanks (I have no control over how the client files are produced). thank you again for such a good solution! -- View this message in context: http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3381867.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question on sqldf's handling of missing value and factor
On Wed, Mar 2, 2011 at 10:17 AM, xin wei wrote: > I am sorry for posting the wrong source file. the correct source file is as > follows: > a b c > aa 23 > aaa 34.6 > 77.8 > > They are tab delimited but somehow could not be displayed correctly in > browser. The problem is that you are using empty fields to represent missing values but SQLite regarded them as zero length character fields. See FAQ 14 on the sqldf home page for a solution: http://code.google.com/p/sqldf/#14._How_does_one_read_files_where_numeric_NAs_are_represented_as -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question on sqldf's handling of missing value and factor
Dear Mr. Grothendieck : thank you so much for your attention. You are the real expert here. the following is a mock text file: a b c aa 23 aaa 34 77 note that both b and c column contain missing value (blank) I save it under my C drive and use both read.table and sqldf to import it to R and then use identical() function to compare the result. The following is the result: > setwd("c:/") > library(sqldf) > test <- file("test.txt") > testx <- sqldf("select * from test", + dbname = tempfile(), file.format = list(header = T, sep="\t", row.names = F)) > testy<- read.table("test.txt", header = T, sep="\t") > identical(testx, testy) [1] FALSE > testx abc 1 aa 23.0 2 aaa 34.6 0.0 3 77.8 > testy abc 1 aa NA 23.0 2 aaa 34.6 NA 3 NA 77.8 > class(testx$b) [1] "factor" > class(testy$b) [1] "numeric" > read.table seems to get it right while sqldf treats b as factor (if I add method="raw", b become character). what is more troubling is that column C has number 0 at the second row while in the original file it is missing. In my real world situation with a much larger text file, the problem is that many cells are empty when they all actually have values in the original text file. I would greatly appreciate your help if you can shed some light on this. thanks -- View this message in context: http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3331662.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question on sqldf's handling of missing value and factor
I am sorry for posting the wrong source file. the correct source file is as follows: a b c aa 23 aaa 34.6 77.8 They are tab delimited but somehow could not be displayed correctly in browser. -- View this message in context: http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3331667.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] a question on sqldf's handling of missing value and factor
On Tue, Mar 1, 2011 at 11:52 PM, xin wei wrote: > Dear subscribers: > > I am using the following code to read a large number of big text files: > library(sqldf) > tempd <- file() > tempdx <- sqldf("select * from tempd", dbname = tempfile(), file.format = > list(header = T, sep="\t", row.names = F)) > > The problem is: all my numberical variable become factor (maybe because > these columns all contain missing value). It would be quite cubersome to > convert them to numeric variable using as.numeric one by one. Does anyone > know how to re-set SQLDF so that it would automatically read the numeric > column with missing row as real numeric instead of factor? > If you can provide a minimal ***reproducible*** example it would help. Maybe sqldf(..., method = "raw") will give you what you want but I can't say for sure without the example. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] a question on sqldf's handling of missing value and factor
Dear subscribers: I am using the following code to read a large number of big text files: library(sqldf) tempd <- file() tempdx <- sqldf("select * from tempd", dbname = tempfile(), file.format = list(header = T, sep="\t", row.names = F)) The problem is: all my numberical variable become factor (maybe because these columns all contain missing value). It would be quite cubersome to convert them to numeric variable using as.numeric one by one. Does anyone know how to re-set SQLDF so that it would automatically read the numeric column with missing row as real numeric instead of factor? many thanks -- View this message in context: http://r.789695.n4.nabble.com/a-question-on-sqldf-s-handling-of-missing-value-and-factor-tp3331007p3331007.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.