10Hi Thomas, On Thu, May 29, 2014 at 9:15 AM, Thomas Stewart <tgs.public.m...@gmail.com> wrote: > Thanks to to Ista Zahn, I was able to find a work around solution. The key > seems to be that string1 needs to be encoded as UTF-8 prior to being passed > to gsub. For whatever reason, > > Encoding(string1) <- "UTF-8" > > does not change the encoding on my Windows machine.
Right, because "ASCII strings will never be marked with a declared encoding" (read ?Encoding again). The work around: I > paste an obvious UTF-8 character "\u00A0" to the start of the string, send > the string through gsub, then remove the "\u00A0" character from the output. > > string1 <- "\u00A0text X"; string1 > Encoding(string1) > new_string1 <- gsub("X","\u2265",string1); new_string1 > new_string2 <- substring(new_string1,2); new_string2 > > If you know of a less hackish way to accomplish this, I'm interested to > hear it. Why not just set the encoding after the fact, as I suggested? string1 <- "X"; string1 new_string1 <- gsub("X","\u2265",string1); new_string1 Encoding(new_string1) <- "UTF-8"; new_string1 Best, Ista However, this work around is sufficient for now. > > Thanks, > -tgs > > > On Wed, May 28, 2014 at 10:25 PM, Thomas Stewart <tgs.public.m...@gmail.com> > wrote: > >> Can anyone help me understand the following behavior? >> >> I want to replace the letter 'X' in >> the string >> 'text X' with '≥' (\u226 >> 5 >> ). The output from gsub is not what I expect. It gives: "text ≥". >> >> Now, suppose I want to replace the character '≤' in >> the string >> 'text ≤' with '≥'. Then, gsub gives the expected, desired output. >> >> What am I missing? >> >> Thanks for any insight. >> -tgs >> >> Minimal Working Example: >> >> string1 <- "text X"; string1 >> new_string1 <- gsub("X","\u2265",string1); new_string1 >> >> string2 <- "text \u2264"; string2 >> new_string2 <- gsub("\u2264","\u2265",string2); new_string2 >> >> charToRaw(new_string1) >> charToRaw(new_string2) >> >> sessionInfo() >> >> ## OUTPUT >> >> > string1 <- "text X"; string1 >> [1] "text X" >> >> > new_string1 <- gsub("X","\u2265",string1); new_string1 >> [1] "text ≥" >> >> > string2 <- "text \u2264"; string2 >> [1] "text ≤" >> >> > new_string2 <- gsub("\u2264","\u2265",string2); new_string2 >> [1] "text ≥" >> >> > charToRaw(new_string1) >> [1] 74 65 78 74 20 e2 89 a5 >> >> > charToRaw(new_string2) >> [1] 74 65 78 74 20 e2 89 a5 >> >> > sessionInfo() >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United >> States.1252 LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C LC_TIME=English_United >> States.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> loaded via a namespace (and not attached): >> [1] tools_3.0.2 >> > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.