On 13-01-23 8:19 PM, Hui Du wrote:
Hi all,

I am planning to parse some information on a website which includes lots of 
Chinese characters. Does someone know how to read/display Chinese in R? Thanks.


url = "http://www.teec.org.cn/html/renwujieshao/";
x = readLines(url)

If you look at the first few lines of x you'll see this:

> head(x)
[1] "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\t\"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\";>" [2] "<html xmlns=\"http://www.w3.org/1999/xhtml\";>" [3] "<head>" [4] "<meta http-equiv=\"Content-Type\" content=\"text/html; charset=gb2312\" />"

At the end of line 4 it shows "charset=gb2312". I didn't think that was an encoding, but this seems to do the conversion:

y <- iconv(x, "gb2312", "utf-8")
y

(I don't know if that will display properly on your Windows machine; it doesn't work on mine, because I don't have the fonts installed. But it does work on my Mac.)

Duncan Murdoch

I tried encoding = 'UTF-8' already but it didn't help.

My R version is
$platform
[1] "i386-pc-mingw32"

$arch
[1] "i386"

$os
[1] "mingw32"

$system
[1] "i386, mingw32"

$status
[1] ""

$major
[1] "2"

$minor
[1] "15.0"


HXD

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to