Dear Milan and David, Thank you both very much for your help! I finally figured it out.
Text on the website was UTF-8, but in the process of downloading it using RDF, it got converted to the java/javascript encoding. To convert it back to UTF-8: > test <- "4.5\\u00B5g of cDNA was used" > iconv(test, "JAVA", "UTF-8") [1] "4.5µg of cDNA was used" This may also impact anyone using JSON with R. Posting here in case it helps anyone else. =) -Emily On Sat, Apr 6, 2013 at 10:37 AM, David Winsemius <dwinsem...@comcast.net>wrote: > > On Apr 5, 2013, at 11:30 AM, Emily Ottensmeyer wrote: > > > Dear R-Help, > > > > I am using the RDF package/ R 2.14 with the RDF package to download data > > from a website, and then use R to manipulate it. > > > > Text on the website is UTF-8. The RDF package's rdf_load command is > > converting it into a different encoding, which converts non-ASCII > > characters to unicode codes. > > > > On the webpage/sparql RDF: "4.5µg of cDNA was used" > > > > In R, the RDF triple gives: "4.5\\u00B5g of cDNA was used" > > > > I can't seem to convert it back from \\u00B5 into "µ". > > > > I've tried iconv with various settings without success: > >> iconv(test, "latin1", "UTF-8") > > [1] "4.5\\u00B5g of cDNA was used" > > > > And, I tried Encoding, to see if I could figure that out, but it returns > > "unknown" on my string. > >> Encoding(test) > > [1] "unknown" > > > On my device entering this: "4.5\\u00B5g of cDNA was used" > > ... returns [1] "4.5\\u00B5g of cDNA was used" > > But entering: "4.5\u00B5g of cDNA was used" returns: > > [1] "4.5µg of cDNA was used" > > > nchar("4.5\\u00B5g of cDNA was used") > [1] 27 > > nchar("4.5\u00B5g of cDNA was used") > [1] 22 > > So the doubled "\" is really a single character in the first case and has > no effect in escaping the next four hex digits but "\u00B5" in the second > case is a correct "micro-character" (for my setup with my fonts) > > If this is a systematic problem then you should contact the maintainer > with a full problem description and a link to the website. If this is just > a one-off problem just remove the extraneous backslash. > > -- > David. > > > sessionInfo() > R version 3.0.0 RC (2013-03-31 r62463) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > <snipped> > > > Anyone have any ideas on how to correct/convert the text encoding? > > > > > > Thanks! > > -Emily > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.