Re: [R] Text Encoding

2013-04-09 Thread Emily Ottensmeyer
Dear Milan and David,

Thank you both very much for your help!  I finally figured it out.

Text on the website was UTF-8, but in the process of downloading it using
RDF, it got converted to the java/javascript encoding.  To convert it back
to UTF-8:

> test <- "4.5\\u00B5g of cDNA was used"
> iconv(test, "JAVA", "UTF-8")
[1] "4.5µg of cDNA was used"

This may also impact anyone using JSON with R.  Posting here in case it
helps anyone else.  =)

-Emily


On Sat, Apr 6, 2013 at 10:37 AM, David Winsemius wrote:

>
> On Apr 5, 2013, at 11:30 AM, Emily Ottensmeyer wrote:
>
> > Dear R-Help,
> >
> > I am using the RDF package/ R 2.14 with the RDF package to download data
> > from a website, and then use R to manipulate it.
> >
> > Text on the website is UTF-8.  The RDF package's rdf_load command is
> > converting it into a different encoding, which converts non-ASCII
> > characters to unicode codes.
> >
> > On the webpage/sparql RDF: "4.5µg of cDNA was used"
> >
> > In R, the RDF triple gives: "4.5\\u00B5g of cDNA was used"
> >
> > I can't seem to convert it back from \\u00B5  into "µ".
> >
> > I've tried iconv with various settings without success:
> >> iconv(test, "latin1", "UTF-8")
> > [1] "4.5\\u00B5g of cDNA was used"
> >
> > And, I tried Encoding, to see if I could figure that out, but it returns
> > "unknown" on my string.
> >> Encoding(test)
> > [1] "unknown"
> >
> On my device entering this: "4.5\\u00B5g of cDNA was used"
>
> ... returns [1] "4.5\\u00B5g of cDNA was used"
>
> But entering: "4.5\u00B5g of cDNA was used" returns:
>
> [1] "4.5µg of cDNA was used"
>
> > nchar("4.5\\u00B5g of cDNA was used")
> [1] 27
> > nchar("4.5\u00B5g of cDNA was used")
> [1] 22
>
> So the doubled "\" is really a single character in the first case  and has
> no effect in escaping the next four hex digits but "\u00B5" in the second
> case is a correct "micro-character" (for my setup with my fonts)
>
> If this is a systematic problem then you should contact the maintainer
> with a full problem description and a link to the website. If this is just
> a one-off problem just remove the extraneous backslash.
>
> --
> David.
>
> > sessionInfo()
> R version 3.0.0 RC (2013-03-31 r62463)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
>
> > Anyone have any ideas on how to correct/convert the text encoding?
> >
> >
> > Thanks!
> > -Emily
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Text Encoding

2013-04-05 Thread Emily Ottensmeyer
Dear R-Help,

I am using the RDF package/ R 2.14 with the RDF package to download data
from a website, and then use R to manipulate it.

Text on the website is UTF-8.  The RDF package's rdf_load command is
converting it into a different encoding, which converts non-ASCII
characters to unicode codes.

On the webpage/sparql RDF: "4.5µg of cDNA was used"

In R, the RDF triple gives: "4.5\\u00B5g of cDNA was used"

I can't seem to convert it back from \\u00B5  into "µ".

I've tried iconv with various settings without success:
> iconv(test, "latin1", "UTF-8")
[1] "4.5\\u00B5g of cDNA was used"

And, I tried Encoding, to see if I could figure that out, but it returns
"unknown" on my string.
> Encoding(test)
[1] "unknown"


Anyone have any ideas on how to correct/convert the text encoding?


Thanks!
-Emily

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.