Hi Yihui
It took me a moment to see the error message as the latest
development version of the XML package suppresses/hides them by default
for htmlParse().
You can provide your own function via the error parameter.
If you just want to see more detailed error messages on the console
you can use a function like the following
fullInfoErrorHandler =
function(msg, code, domain, line, col, level, file)
{
# level tells how significant the error is
# These are 0, 1, 2, 3 for WARNING, ERROR, FATAL
# meaning simple warning, recoverable error and fatal/unrecoverable
error.
# See XML:::xmlErrorLevel
#
# code is an error code, See the values in XML:::xmlParserErrors
# XML_HTML_UNKNOWN_TAG, XML_ERR_DOCUMENT_EMPTY
#
# domain tells what part of the library raised this error.
# See XML:::xmlErrorDomain
codeMsg = switch(level, warning, recoverable error, fatal error)
cat(There was a, codeMsg, in the, file, at line, line, column,
col, \n, msg, \n)
}
doc = htmlParse(~/htmlErrors.html, error = fullInfoErrorHandler)
And of course you can mimic xmlErrorCumulator() to form a closure that
collects the different details of each message into an object. If you
look in the error.R and xmlErrorEnums.R files within the R code of the
XML package, you'll find some additional functions that give us further
support for working with errors in the XML/HTML parsers.
Best,
D.
Yihui Xie wrote:
I'm using the function htmlParse() in the XML package, and I need a
little bit help on error handling while parsing an HTML page. So far I
can use either the default way:
# error = xmlErrorCumulator(), by default
library(XML)
doc = htmlParse(http://www.public.iastate.edu/~pdixon/stat500/;)
# the error message is:
# htmlParseStartTag: invalid element name
or the tryCatch() approach:
# error = NULL, errors to be caught by tryCatch()
tryCatch({
doc = htmlParse(http://www.public.iastate.edu/~pdixon/stat500/;,
error = NULL)
}, XMLError = function(e) {
cat(There was an error in the XML at line, e$line, column,
e$col, \n, e$message, \n)
})
# verbose error message as:
# There was an error in the XML at line 90 column 2
# htmlParseStartTag: invalid element name
I wish to get the verbose error messages without really stopping the
parsing process; the first approach cannot return detailed error
messages, while the second one will stop the program...
Thanks!
Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Phone: 515-294-6609 Web: http://yihui.name
Department of Statistics, Iowa State University
3211 Snedecor Hall, Ames, IA
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.