-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Normally Parsers triggers the ContentHandler.startDocument() method in their parse(... ContentHandler ...) method for sure - this is also true in the case of an Error, which normally throws an Exception.
We wrote and maintain an open source crawler lib (leech crawler) based on Tika, where we works with special Content Handlers that deals with the recursive crawling issues. To recognize that there is an error during the crawl, we are in need to recognize an Exception. On the other hand - in the case there is no error - we need to recognize that there was a crawled entity (to count the crawled items, etc.). To recognize this, we implemented the startDocument() method inside our ContentHandler decorators. This works like a charme, but inside MP4Parser, there exists these lines of code: Line 146-154, parse() method: MovieBox moov = getOrNull(isoFile, MovieBox.class); if (moov == null) { // Bail out return; } XHTMLContentHandler xhtml = new XHTMLContentHandler(handler, metadata); xhtml.startDocument(); ...... ...... There, in the case there is no content?! inside the MP4 file, with a 'Bail out' comment, the parse method will be leaved - at least for us - silently. I don't know if this is also a problem in general (because Tika has also a plenty of ContentHandler decorators), but from our point of view Tika signals an empty content with the invocation of xhtml.startDocument() and xhtml.endDocument() with noting in between. In the case this moov==null situation should be an error, an exception should be thrown. If we are right (and we hope so, because we are in need of this ;) ) we want to suggest this modification, as said: MovieBox moov = getOrNull(isoFile, MovieBox.class); if (moov == null) { // Bail out handler.startDocument(); handler.endDocument(); return; } Looking forward to your opinions! Chris - -- ______________________________________________________________________________ Christian Reuschling, Dipl.-Ing.(BA) Software Engineer Knowledge Management Department German Research Center for Artificial Intelligence DFKI GmbH Trippstadter Straße 122, D-67663 Kaiserslautern, Germany Phone: +49.631.20575-1250 mailto:reuschl...@dfki.de http://www.dfki.uni-kl.de/~reuschling/ - ------------Legal Company Information Required by German Law------------------ Geschäftsführung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313= ______________________________________________________________________________ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlGkw+IACgkQ6EqMXq+WZg8RFQCeLNmQ9XnG7b1CHVyWVLkHDmhf wccAmwRu6V28syceVJJ13c97+dNQ0Xkv =9MGc -----END PGP SIGNATURE-----