On Tue, 5 Apr 2011, Mattmann, Chris A (388J) wrote:
Check out our CmdLineMetExtractor class [2], and this guide [3] on some of our baked in MetExtractors. I think it would be awesome if we could support a similar interface in Tika (I'd love to push those details upstream of OODT).

I think you ought to be able to do most of that with Tika now. I don't know if you'll be able to change your XML files to follow the new Tika syntax and have Tika do everything (I think your config might have more in it than just what to run and how to get the metadata back?), but the new ExternalParser stuff ought to be more flexible for building the parsers dynamically yourself.

You might want to hold off until Jukka's done his usual magic of making my code much more elegant though :)

(I'll hopefully get a chance to do a bit more on this within a week, such as unit tests, and a dedicated ffmpeg external parser which use "ffmpeg -formats" to build the supported mimetypes at runtime)

Nick

Reply via email to