On Fri, Nov 18, 2011 at 10:26 AM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Nov 18, 2011, at 9:28 AM, jim holtman wrote: > >> It is pretty straightforward in R: >> >>> x <- >>> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >>> closeAllConnections() >>> # convert tabs to newlines >>> x <- gsub("\t", "\n", x) > > Did the rules get liberalized for escaping patterns? Or have I been > unnecessarily expending backslashes all these years. I thought that one > needed 3 blackslashes. This code does work and I am wondering if/when I > "didn't get the memo". (I do see that there is a line early in the ?regex > page that suggests I have been deluded all along.) > > "The current implementation interprets \a as BEL, \e asESC, \f as FF, \n as > LF, \r as CR and \t as TAB." > >> x <- >> readLines(textConnection("sadf|asdf|asdf\tqwer|qwer|qwer\tzxcv|zxcv|zxfcgv")) >> closeAllConnections() >> # convert tabs to newlines >> x2 <- gsub("\\\t", "\n", x) >> x2 > [1] "sadf|asdf|asdf\nqwer|qwer|qwer\nzxcv|zxcv|zxfcgv" > > So I guess my question is (now) why the triple-slash technique even works? >
There are two levels of parsing: first its converted to a character string by R and in that parse "\\\t" gets converted to a backslash character followed by a tab character (2 characters). Secondly, the regular expression parser interprets those two characters as a tab. For example, consider these: > gsub("\\\t", "x", "\\\t,\t") # 1 [1] "\\x,x" > gsub("\\\t", "x", "\\\t,\t", fixed = TRUE) # 2 [1] "x,\t" The first arg in 1 is processed into backslash tab by R and then the regular expression parser processes that into just tab; however, the third argument in 1 is processed by R to backslash tab comma tab and is not further processed since its not regarded as a regular expression. Thus the result follows. In contrast the first arg in 2 is processed into backlash tab by R as before but now its not regarded as a regular expression so the second level of interpretation that occurred in 1 is not performed. Rather, only occurrences of backslash tab get replaced instead of occurrences of tab. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.