On Sat, 11 Sep 2010, Jukka Zitting wrote:
The reason why I originally didn't want to simply catch and ignore the potential exceptions in the TikaConfig constructor was the lack of a good error reporting mechanism. The trick of insulating the external library dependencies to separate extractor classes nicely solved that problem by delaying the exceptions to the actual parse() method calls on specific document types, which obviously would then give the end user a much better idea of what's wrong.
My thinking on exceptions during creating the parser are: * ClassNotFound for parser class - throw the exception, as the user has specified a parser that isn't there * Any other ClassNotFound - warning, as this means that a dependency is missing, but that may be what the user wanted * Any other problem - throw the exception, as there is a fault with the parser, and there's a fair chance that this is a customer parser that has broken. (The standard tika parsers shouldn't do this!) Nick