On Mar 4, 2010, at 10:58 PM, Duncan Murdoch wrote:
On 04/03/2010 10:32 PM, David Winsemius wrote:On Mar 4, 2010, at 9:47 PM, jonas garcia wrote:When I opened the file with a hex-editor, the problematic character turned out to be “1a” I am attaching a sample DAT file with 3 lines (the second line is the one with the undesirable character).I got a different "interpretation" of that character when I let R look at it. And I cannot figure out why \032 should be causing problems??? :The furthest I could get was through readBin:[1] 30 32 3a 33 35 3a 33 32 2c 20 34 34 30 33 2c 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36 2c 31 [33] 35 35 2e 39 2c 30 30 2e 37 36 2c 31 31 35 36 0d 0a 30 32 3a 33 35 3a 33 35 2c 20 34 34 33 32 2c [65] 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36 2c 31 35 35 2e 38 2c 1a 30 2e 38 31 2c 31 31 35 37 [97] 0d 0a 30 32 3a 33 35 3a 33 39 2c 20 34 34 36 37 2c 20 33 37 2e 31 31 34 2c 2d 32 30 2e 38 33 36tmp<- readBin("new.dat", what = "raw", n=100000000)[129] 2c 31 35 35 2e 38 2c 30 30 2e 38 31 2c 31 31 35 38tmp[87][1] 1aHex 1a and octal 032 both correspond to Ctrl-Z, which is the MSDOS EOF marker. I forget whether R's text reading routines pay attention to that, or whether it's the C runtime, but it makes sense that it would cause problems on Windows.Duncan Murdoch
Thanks. I was interpreting \032 as decimal, so couldn't figure out why it should equal 0x1A. You've explained the basis (or base) of my confusion.
-- David
> tmporg <- readLines(con="/Users/davidwinsemius/Library/Mail Downloads/new.dat")Warning message:In readLines(con = "/Users/davidwinsemius/Library/Mail Downloads/ new.dat") : incomplete final line found on '/Users/davidwinsemius/Library/ Mail Downloads/new.dat'> tmporg [1] "02:35:32, 4403, 37.114,-20.836,155.9,00.76,1156" [2] "02:35:35, 4432, 37.114,-20.836,155.8,\0320.81,1157" [3] "02:35:39, 4467, 37.114,-20.836,155.8,00.81,1158" > gsub("\\\032", ' ', tmporg)[1] "02:35:32, 4403, 37.114,-20.836,155.9,00.76,1156" "02:35:35, 4432, 37.114,-20.836,155.8, 0.81,1157"[3] "02:35:39, 4467, 37.114,-20.836,155.8,00.81,1158" > read.table(textConnection(gsub("\\\032", ' ', tmporg) ) ,sep=",") V1 V2 V3 V4 V5 V6 V7 1 02:35:32 4403 37.114 -20.836 155.9 0.76 1156 2 02:35:35 4432 37.114 -20.836 155.8 0.81 1157 3 02:35:39 4467 37.114 -20.836 155.8 0.81 1158Looks like gsub might work well .... as long as you can get agreement on what the character really is.> sessionInfo() R version 2.10.1 RC (2009-12-09 r50695) x86_64-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages:[1] splines stats graphics grDevices utils datasets methods baseother attached packages: [1] Design_2.3-0 Hmisc_3.7-0 survival_2.35-7 loaded via a namespace (and not attached): [1] cluster_1.12.1 grid_2.10.1 lattice_0.17-26 tools_2.10.1
David Winsemius, MD Heritage Laboratories West Hartford, CT ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.