[R] reading tables from multiple HTML pages

s1oliver Mon, 29 Aug 2011 11:07:13 -0700

Hi, beginner to R and was having some problems scraping data from tables in
html using the XML package. I have included some code below.


I am trying to loop through a series of html pages, each of which contains a
single table from which I want to scrape data. However, some of the pages
are blank - and so it throws me an error message when it gets to
htmlParse(). The loop then closes out and I get the error message below:

Error in htmlParse(url) : 
  error in creating parser for
http://www.szrd.gov.cn/viewcommondbfc.do?id=728

How might be best to go about keeping the loop running so I can parse the
rest?

****************************************************

library(XML)

url_root<-"http://www.szrd.gov.cn/viewcommondbfc.do?id=";

for(i in 700:750){
        url = paste(url_root, i, sep="")
        doc = htmlParse(url)
        
        tableNodes = getNodeSet(doc, "//table")
        tbl = readHTMLTable(tableNodes[[3]])
}
****************************************************

Steve Oliver
Department of Political Science
University of California at San Diego
9500 Gilman Dr.
La Jolla, CA 92092

--
View this message in context: 
http://r.789695.n4.nabble.com/reading-tables-from-multiple-HTML-pages-tp3776605p3776605.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] reading tables from multiple HTML pages

Reply via email to