On Wed, 29 May 2013, Nick Burch wrote:
I'm not sure if we do have a properly documented policy on what a parser should do if it receives a file it can't handle. For ones that are invalid (eg corrupt), I believe an exception is the expected result. The case when the file seems valid, but can't be handled by the parser, not sure

Does anyone know if we have a policy on this, and/or where we should document it?

I've made a start on documenting this on the wiki:
   https://wiki.apache.org/tika/ErrorsAndExceptions

However, there are a few bits we still need to sort out, such as this case (parser thinks the file is valid, but just in a format it can't cope with), or the case of an empty file (what we should/shouldn't output, eg body tag). Hopefully someone can come up with a good suggestion...!

Nick

Reply via email to