Hi Yihui It took me a moment to see the error message as the latest development version of the XML package suppresses/hides them by default for htmlParse().
You can provide your own function via the error parameter. If you just want to see more detailed error messages on the console you can use a function like the following fullInfoErrorHandler = function(msg, code, domain, line, col, level, file) { # level tells how significant the error is # These are 0, 1, 2, 3 for WARNING, ERROR, FATAL # meaning simple warning, recoverable error and fatal/unrecoverable error. # See XML:::xmlErrorLevel # # code is an error code, See the values in XML:::xmlParserErrors # XML_HTML_UNKNOWN_TAG, XML_ERR_DOCUMENT_EMPTY # # domain tells what part of the library raised this error. # See XML:::xmlErrorDomain codeMsg = switch(level, "warning", "recoverable error", "fatal error") cat("There was a", codeMsg, "in the", file, "at line", line, "column", col, "\n", msg, "\n") } doc = htmlParse("~/htmlErrors.html", error = fullInfoErrorHandler) And of course you can mimic xmlErrorCumulator() to form a closure that collects the different details of each message into an object. If you look in the error.R and xmlErrorEnums.R files within the R code of the XML package, you'll find some additional functions that give us further support for working with errors in the XML/HTML parsers. Best, D. Yihui Xie wrote: > I'm using the function htmlParse() in the XML package, and I need a > little bit help on error handling while parsing an HTML page. So far I > can use either the default way: > > # error = xmlErrorCumulator(), by default > library(XML) > doc = htmlParse("http://www.public.iastate.edu/~pdixon/stat500/") > # the error message is: > # htmlParseStartTag: invalid element name > > or the tryCatch() approach: > > # error = NULL, errors to be caught by tryCatch() > tryCatch({ > doc = htmlParse("http://www.public.iastate.edu/~pdixon/stat500/", > error = NULL) > }, XMLError = function(e) { > cat("There was an error in the XML at line", e$line, "column", > e$col, "\n", e$message, "\n") > }) > # verbose error message as: > # There was an error in the XML at line 90 column 2 > # htmlParseStartTag: invalid element name > > I wish to get the verbose error messages without really stopping the > parsing process; the first approach cannot return detailed error > messages, while the second one will stop the program... > > Thanks! > > Regards, > Yihui > -- > Yihui Xie <xieyi...@gmail.com> > Phone: 515-294-6609 Web: http://yihui.name > Department of Statistics, Iowa State University > 3211 Snedecor Hall, Ames, IA > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.