... and yet another line I left out below! I apologize for this baloney! On Fri, Nov 18, 2011 at 10:48 AM, Bert Gunter <bgun...@gene.com> wrote: > ... I failed to correctly paste the first line of an example: > > On Fri, Nov 18, 2011 at 10:44 AM, Bert Gunter <bgun...@gene.com> wrote: >> David: >> >> As you now realize "\t" etc. is a perfectly legal single tab character. >> >> Now consider: > ------------- left this out -------------- >> gsub("\\","a","\\") > ----------------------------------------------- >> Error in gsub("\\", "a", "\\") : >> invalid regular expression '\', reason 'Trailing backslash' >> >> BUT >> >>> gsub("\\\\","a","\\") >> [1] "a" >> >> ??? >> >> The issue is there are two levels of escapes here -- the R parser's >> and the reg expression's. The R parser recognizes "\\" as a single >> backslash character in the third argument of gsub above. In the first >> incorrect version, this single backslash is passed on to the reg >> expression engine and it sees a single backslash, which is meaningless >> to it. For example, a backreference would be something like "\\2" = >> "backslash 2." >> >> The second incantation's first argument is correct and is passed onto >> the reg expression engine as "backslash backslash," which it >> interprets as an escaped "\" which is a literal "\" , per the >> documentation. >> >> So what about : >> --------------- also left this out --------------- z <- "ab\tcd" ----------------------------------------------- >>> cat(z) >> ab cd> >>> cat(sub("\\\t","\n",z)) >> ab >> cd> >> >> R passes "backslash tab_character" to the regexp engine, which looks >> also to me like an error ; However, this may be one of those >> "implementation dependent" details mentioned in the Help file, It >> seems to me that the engine sees a meaningless escape sequence and >> just throws away the escape to interpret the character literally. As >> support for this, "\h" is not a meaningful escape sequence in R: >> >>> gsub("\\h","a","\h") >> Error: '\h' is an unrecognized escape in character string starting "\h" >> >> and >> >>> gsub("\\h","a","h") >> [1] "a" >> >> But I may be wrong, and I am hoping that this post will prompt someone >> more knowledgeable than I to respond (if only just to confirm my >> "explanation" if it's correct). >> >> Cheers, >> Bert >> >> >> >> >> >> On Fri, Nov 18, 2011 at 7:26 AM, David Winsemius <dwinsem...@comcast.net> >> wrote: >>> >>> On Nov 18, 2011, at 9:28 AM, jim holtman wrote: >>> >>>> It is pretty straightforward in R: >>>> >>>>> x <- >>>>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >>>>> closeAllConnections() >>>>> # convert tabs to newlines >>>>> x <- gsub("\t", "\n", x) >>> >>> Did the rules get liberalized for escaping patterns? Or have I been >>> unnecessarily expending backslashes all these years. I thought that one >>> needed 3 blackslashes. This code does work and I am wondering if/when I >>> "didn't get the memo". (I do see that there is a line early in the ?regex >>> page that suggests I have been deluded all along.) >>> >>> "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as >>> LF, \r as CR and \t as TAB." >>> >>>> x <- >>>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >>>> closeAllConnections() >>>> # convert tabs to newlines >>>> x2 <- gsub("\\\t", "\n", x) >>>> x2 >>> [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv" >>> >>> So I guess my question is (now) why the triple-slash technique even works? >>> >>> -- >>> David. >>> >>> >>> >>>>> # write out to a temp file and then read in as a data frame >>>>> myFile <- tempfile() >>>>> writeLines(x, con = myFile) >>>>> x.df <- read.table(myFile, sep = "|") >>>>> >>>>> >>>>> x.df >>>> >>>> V1 V2 V3 >>>> 1 sadf asdf asdf >>>> 2 qwer qwer qwer >>>> 3 zxcv zxcv zxfcgv >>>>> >>>> >>>> On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim >>>> <jim.langs...@compuware.com> wrote: >>>>> >>>>> Thanks Paul, >>>>> >>>>> That's the path I was marching down, I was hoping for something >>>>> a little cleaner, I do the same with Perl or Java. >>>>> >>>>> Jim >>>>> >>>>> On 11/18/11 8:35 AM, "Paul Hiemstra" <paul.hiems...@knmi.nl> wrote: >>>>> >>>>>> Hi Jim, >>>>>> >>>>>> You can read the text file using readLines. This puts each line in the >>>>>> file into an element of a list. Then you can go through the lines >>>>>> manually (e.g. using grep, sub, strsplit) and create your data.frame. >>>>>> >>>>>> cheers, >>>>>> Paul >>>>>> >>>>>> On 11/18/2011 12:37 PM, Langston, Jim wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I've been scratching and poking, but basically, the file I need to read >>>>>>> has >>>>>>> two delimiters that I need to contend with. The first is that the file >>>>>>> contains >>>>>>> tabs (\t) , instead of newlines (\n), and the second is that the fields >>>>>>> have >>>>>>> | for the seperators. I can easily do a read if I first convert the \t >>>>>>> to >>>>>>> \n >>>>>>> and then use read.table to get the file read with the | separator. But, >>>>>>> what I would really like to do, is do this all within R. I have a lot >>>>>>> of >>>>>>> files >>>>>>> to read and do analysis on. >>>>>>> >>>>>>> I can read the data into a table using the \t has delimiter, but can't >>>>>>> figure >>>>>>> out how to take that table data and use the | for separation, I've look >>>>>>> at >>>>>>> string splits, etc. but haven't figured out how to split the whole >>>>>>> table. >>>>>>> >>>>>>> Any thoughts ? hints ? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Jim >>>>>>> >>>>>>> >>>>>>> The contents of this e-mail are intended for the named >>>>>>> a...{{dropped:6}} >>>>>>> >>>>>>> >>>>> The contents of this e-mail are intended for the named addressee only. It >>>>> contains information that may be confidential. Unless you are the named >>>>> addressee or an authorized designee, you may not copy or use it, or >>>>> disclose >>>>> it to anyone else. If you received it in error please notify us >>>>> immediately >>>>> and then destroy it. >>>>> >>>>>>> R-help@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>>> >>>>>> -- >>>>>> Paul Hiemstra, Ph.D. >>>>>> Global Climate Division >>>>>> Royal Netherlands Meteorological Institute (KNMI) >>>>>> Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 >>>>>> P.O. Box 201 | 3730 AE | De Bilt >>>>>> tel: +31 30 2206 494 >>>>>> >>>>>> http://intamap.geo.uu.nl/~paul >>>>>> http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 >>>>>> >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>>> >>>> -- >>>> Jim Holtman >>>> Data Munger Guru >>>> >>>> What is the problem that you are trying to solve? >>>> Tell me what you want to do, not how you want to do it. >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius, MD >>> West Hartford, CT >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm >> > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm >
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.