Dear colleagues, each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message. The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the URLs of the articles linked to on this search page). Regardless, I'd still like to understand why htmlParse doesn't work. Thank you for any insight. Yours, Simon Kiss
myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=§ion=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011") .x<-htmlParse(myurl) class(.x) #returns "HTMLInternalDocument" "XMLInternalDocument" .x #returns *** caught segfault *** address 0x1398754, cause 'memory not mapped' Traceback: 1: .Call("RS_XML_dumpHTMLDoc", doc, as.integer(indent), as.character(encoding), as.logical(indent), PACKAGE = "XML") 2: saveXML(from) 3: saveXML(from) 4: asMethod(object) 5: as(x, "character") 6: cat(as(x, "character"), "\n") 7: print.XMLInternalDocument(<pointer: 0x11656d3e0>) 8: print(<pointer: 0x11656d3e0>) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_3.4-0 RCurl_1.5-0 bitops_1.0-4.1 ********************************* Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.