... I failed to correctly paste the first line of an example: On Fri, Nov 18, 2011 at 10:44 AM, Bert Gunter <bgun...@gene.com> wrote: > David: > > As you now realize "\t" etc. is a perfectly legal single tab character. > > Now consider: ------------- left this out -------------- > gsub("\\","a","\\") ----------------------------------------------- > Error in gsub("\\", "a", "\\") : > invalid regular expression '\', reason 'Trailing backslash' > > BUT > >> gsub("\\\\","a","\\") > [1] "a" > > ??? > > The issue is there are two levels of escapes here -- the R parser's > and the reg expression's. The R parser recognizes "\\" as a single > backslash character in the third argument of gsub above. In the first > incorrect version, this single backslash is passed on to the reg > expression engine and it sees a single backslash, which is meaningless > to it. For example, a backreference would be something like "\\2" = > "backslash 2." > > The second incantation's first argument is correct and is passed onto > the reg expression engine as "backslash backslash," which it > interprets as an escaped "\" which is a literal "\" , per the > documentation. > > So what about : > >> cat(z) > ab cd> >> cat(sub("\\\t","\n",z)) > ab > cd> > > R passes "backslash tab_character" to the regexp engine, which looks > also to me like an error ; However, this may be one of those > "implementation dependent" details mentioned in the Help file, It > seems to me that the engine sees a meaningless escape sequence and > just throws away the escape to interpret the character literally. As > support for this, "\h" is not a meaningful escape sequence in R: > >> gsub("\\h","a","\h") > Error: '\h' is an unrecognized escape in character string starting "\h" > > and > >> gsub("\\h","a","h") > [1] "a" > > But I may be wrong, and I am hoping that this post will prompt someone > more knowledgeable than I to respond (if only just to confirm my > "explanation" if it's correct). > > Cheers, > Bert > > > > > > On Fri, Nov 18, 2011 at 7:26 AM, David Winsemius <dwinsem...@comcast.net> > wrote: >> >> On Nov 18, 2011, at 9:28 AM, jim holtman wrote: >> >>> It is pretty straightforward in R: >>> >>>> x <- >>>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >>>> closeAllConnections() >>>> # convert tabs to newlines >>>> x <- gsub("\t", "\n", x) >> >> Did the rules get liberalized for escaping patterns? Or have I been >> unnecessarily expending backslashes all these years. I thought that one >> needed 3 blackslashes. This code does work and I am wondering if/when I >> "didn't get the memo". (I do see that there is a line early in the ?regex >> page that suggests I have been deluded all along.) >> >> "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as >> LF, \r as CR and \t as TAB." >> >>> x <- >>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >>> closeAllConnections() >>> # convert tabs to newlines >>> x2 <- gsub("\\\t", "\n", x) >>> x2 >> [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv" >> >> So I guess my question is (now) why the triple-slash technique even works? >> >> -- >> David. >> >> >> >>>> # write out to a temp file and then read in as a data frame >>>> myFile <- tempfile() >>>> writeLines(x, con = myFile) >>>> x.df <- read.table(myFile, sep = "|") >>>> >>>> >>>> x.df >>> >>> V1 V2 V3 >>> 1 sadf asdf asdf >>> 2 qwer qwer qwer >>> 3 zxcv zxcv zxfcgv >>>> >>> >>> On Fri, Nov 18, 2011 at 9:13 AM, Langston, Jim >>> <jim.langs...@compuware.com> wrote: >>>> >>>> Thanks Paul, >>>> >>>> That's the path I was marching down, I was hoping for something >>>> a little cleaner, I do the same with Perl or Java. >>>> >>>> Jim >>>> >>>> On 11/18/11 8:35 AM, "Paul Hiemstra" <paul.hiems...@knmi.nl> wrote: >>>> >>>>> Hi Jim, >>>>> >>>>> You can read the text file using readLines. This puts each line in the >>>>> file into an element of a list. Then you can go through the lines >>>>> manually (e.g. using grep, sub, strsplit) and create your data.frame. >>>>> >>>>> cheers, >>>>> Paul >>>>> >>>>> On 11/18/2011 12:37 PM, Langston, Jim wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I've been scratching and poking, but basically, the file I need to read >>>>>> has >>>>>> two delimiters that I need to contend with. The first is that the file >>>>>> contains >>>>>> tabs (\t) , instead of newlines (\n), and the second is that the fields >>>>>> have >>>>>> | for the seperators. I can easily do a read if I first convert the \t >>>>>> to >>>>>> \n >>>>>> and then use read.table to get the file read with the | separator. But, >>>>>> what I would really like to do, is do this all within R. I have a lot >>>>>> of >>>>>> files >>>>>> to read and do analysis on. >>>>>> >>>>>> I can read the data into a table using the \t has delimiter, but can't >>>>>> figure >>>>>> out how to take that table data and use the | for separation, I've look >>>>>> at >>>>>> string splits, etc. but haven't figured out how to split the whole >>>>>> table. >>>>>> >>>>>> Any thoughts ? hints ? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jim >>>>>> >>>>>> >>>>>> The contents of this e-mail are intended for the named >>>>>> a...{{dropped:6}} >>>>>> >>>>>> >>>> The contents of this e-mail are intended for the named addressee only. It >>>> contains information that may be confidential. Unless you are the named >>>> addressee or an authorized designee, you may not copy or use it, or >>>> disclose >>>> it to anyone else. If you received it in error please notify us immediately >>>> and then destroy it. >>>> >>>>>> R-help@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> >>>>> -- >>>>> Paul Hiemstra, Ph.D. >>>>> Global Climate Division >>>>> Royal Netherlands Meteorological Institute (KNMI) >>>>> Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 >>>>> P.O. Box 201 | 3730 AE | De Bilt >>>>> tel: +31 30 2206 494 >>>>> >>>>> http://intamap.geo.uu.nl/~paul >>>>> http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 >>>>> >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>> >>> -- >>> Jim Holtman >>> Data Munger Guru >>> >>> What is the problem that you are trying to solve? >>> Tell me what you want to do, not how you want to do it. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > > Bert Gunter > Genentech Nonclinical Biostatistics > > Internal Contact Info: > Phone: 467-7374 > Website: > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm >
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.