On Wed, 29 May 2013, Nick Burch wrote:
I'm not sure if we do have a properly documented policy on what a parser
should do if it receives a file it can't handle. For ones that are
invalid (eg corrupt), I believe an exception is the expected result. The
case when the file seems valid, but can't be handled by the parser, not
sure
Does anyone know if we have a policy on this, and/or where we should document
it?
I've made a start on documenting this on the wiki:
https://wiki.apache.org/tika/ErrorsAndExceptions
However, there are a few bits we still need to sort out, such as this case
(parser thinks the file is valid, but just in a format it can't cope
with), or the case of an empty file (what we should/shouldn't output, eg
body tag). Hopefully someone can come up with a good suggestion...!
Nick