On 13-01-23 8:19 PM, Hui Du wrote:
Hi all,
I am planning to parse some information on a website which includes lots of
Chinese characters. Does someone know how to read/display Chinese in R? Thanks.
url = "http://www.teec.org.cn/html/renwujieshao/"
x = readLines(url)
If you look at the first few lines of x you'll see this:
> head(x)
[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0
Transitional//EN\"\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">"
[2] "<html xmlns=\"http://www.w3.org/1999/xhtml\">"
[3] "<head>"
[4] "<meta http-equiv=\"Content-Type\" content=\"text/html;
charset=gb2312\" />"
At the end of line 4 it shows "charset=gb2312". I didn't think that was
an encoding, but this seems to do the conversion:
y <- iconv(x, "gb2312", "utf-8")
y
(I don't know if that will display properly on your Windows machine; it
doesn't work on mine, because I don't have the fonts installed. But it
does work on my Mac.)
Duncan Murdoch
I tried encoding = 'UTF-8' already but it didn't help.
My R version is
$platform
[1] "i386-pc-mingw32"
$arch
[1] "i386"
$os
[1] "mingw32"
$system
[1] "i386, mingw32"
$status
[1] ""
$major
[1] "2"
$minor
[1] "15.0"
HXD
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.