Re: [R] regular expression in gsub() for strings with leading backslash
On 29/04/2011 7:41 PM, Miao wrote: Hello, Can anyone help on gsub() in R? I have a string like something below, and wanted to delete all the strings with leading backslash, including \xa0On, \023, \xab, and many others. How should I write a regular expression pattern in gsub()? I don't care how many characters following backslash. If those are R strings, none of them contain a backslash. In R, a backslash would always be printed as \\. \x is the introduction to a hexadecimal encoding for a character; the next two characters show the hex digits. So your first string contains a single character \xa0, the third one contains \xab, and so on. The \023 is an octal encoding for a single character. Duncan Murdoch txt- Is This Thing\xa0On? http://bit.ly/jAbKem wait \023 for people \xab and be patient : Thanks in advance, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expression in gsub() for strings with leading backslash
Thanks Duncan for clarifying this. I'm pretty a newbie to such type of characters and special characters. In R's gsub() what regular expressions shall I use to handle all these situations? On Fri, Apr 29, 2011 at 6:07 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 29/04/2011 7:41 PM, Miao wrote: Hello, Can anyone help on gsub() in R? I have a string like something below, and wanted to delete all the strings with leading backslash, including \xa0On, \023, \xab, and many others. How should I write a regular expression pattern in gsub()? I don't care how many characters following backslash. If those are R strings, none of them contain a backslash. In R, a backslash would always be printed as \\. \x is the introduction to a hexadecimal encoding for a character; the next two characters show the hex digits. So your first string contains a single character \xa0, the third one contains \xab, and so on. The \023 is an octal encoding for a single character. Duncan Murdoch txt- Is This Thing\xa0On? http://bit.ly/jAbKem wait \023 for people \xab and be patient : Thanks in advance, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- proceed everyday [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expression in gsub() for strings with leading backslash
On 29/04/2011 9:34 PM, Miao wrote: Thanks Duncan for clarifying this. I'm pretty a newbie to such type of characters and special characters. In R's gsub() what regular expressions shall I use to handle all these situations? I don't know. This might work: gsub([\x01-\x1f\x7f-\xff], , x) (i.e. the range from character 1 to character 31, and 127 to 255) but I don't know if our regular expression matcher will accept those characters. Duncan Murdoch On Fri, Apr 29, 2011 at 6:07 PM, Duncan Murdoch murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com wrote: On 29/04/2011 7:41 PM, Miao wrote: Hello, Can anyone help on gsub() in R? I have a string like something below, and wanted to delete all the strings with leading backslash, including \xa0On, \023, \xab, and many others. How should I write a regular expression pattern in gsub()? I don't care how many characters following backslash. If those are R strings, none of them contain a backslash. In R, a backslash would always be printed as \\. \x is the introduction to a hexadecimal encoding for a character; the next two characters show the hex digits. So your first string contains a single character \xa0, the third one contains \xab, and so on. The \023 is an octal encoding for a single character. Duncan Murdoch txt- Is This Thing\xa0On? http://bit.ly/jAbKem wait \023 for people \xab and be patient : Thanks in advance, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- proceed everyday __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expression in gsub() for strings with leading backslash
On Fri, 29 Apr 2011, Duncan Murdoch wrote: On 29/04/2011 7:41 PM, Miao wrote: Can anyone help on gsub() in R? I have a string like something below, and wanted to delete all the strings with leading backslash, including \xa0On, \023, \xab, and many others. How should I write a regular expression pattern in gsub()? I don't care how many characters following backslash. If those are R strings, none of them contain a backslash. In R, a backslash would always be printed as \\. \x is the introduction to a hexadecimal encoding for a character; the next two characters show the hex digits. So your first string contains a single character \xa0, the third one contains \xab, and so on. The \023 is an octal encoding for a single character. If we were dealing with a leading backslash, I guess this would do it: gsub(^.*, , txt) R would display a double backslash, but I believe that represents a single backslash. So if the string were saved using write.table, say, only a single backslash would be stored. a - \\This is a string. a [1] \\This is a string. gsub(^, , a) [1] This is a string. a [1] \\This is a string. gsub(^.*, , a) [1] gsub(^.*, , c(a,Another string,\\more)) [1]Another string write.table(a, file=a.txt, quote=F, row.names=F, col.names=F) $ cat a.txt \This is a string. Apparently this is not what the OP really wanted. The OP probably wanted to remove characters that were not from the regular ASCII set. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] regular expression in gsub() for strings with leading backslash
That works like a charm! Thanks so much Duncan. On Fri, Apr 29, 2011 at 6:37 PM, Duncan Murdoch murdoch.dun...@gmail.comwrote: On 29/04/2011 9:34 PM, Miao wrote: Thanks Duncan for clarifying this. I'm pretty a newbie to such type of characters and special characters. In R's gsub() what regular expressions shall I use to handle all these situations? I don't know. This might work: gsub([\x01-\x1f\x7f-\xff], , x) (i.e. the range from character 1 to character 31, and 127 to 255) but I don't know if our regular expression matcher will accept those characters. Duncan Murdoch On Fri, Apr 29, 2011 at 6:07 PM, Duncan Murdoch murdoch.dun...@gmail.com mailto:murdoch.dun...@gmail.com wrote: On 29/04/2011 7:41 PM, Miao wrote: Hello, Can anyone help on gsub() in R? I have a string like something below, and wanted to delete all the strings with leading backslash, including \xa0On, \023, \xab, and many others. How should I write a regular expression pattern in gsub()? I don't care how many characters following backslash. If those are R strings, none of them contain a backslash. In R, a backslash would always be printed as \\. \x is the introduction to a hexadecimal encoding for a character; the next two characters show the hex digits. So your first string contains a single character \xa0, the third one contains \xab, and so on. The \023 is an octal encoding for a single character. Duncan Murdoch txt- Is This Thing\xa0On? http://bit.ly/jAbKem wait \023 for people \xab and be patient : Thanks in advance, Miao [[alternative HTML version deleted]] __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- proceed everyday -- proceed everyday [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.