Hi, On Fri, Jul 16, 2010 at 2:43 AM, Paul Jakubik <p...@purediscovery.com> wrote: > On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting <jukka.zitt...@gmail.com>wrote: >> The way I recommend is to pass a custom Parser implementation through >> the ParseContext. This gives you detailed access to each component >> document. > > I looked at the code a little further, and I don't see exactly how I can do > this.
Looks like you're approaching this from the wrong perspective. See the example below (or at http://pastebin.com/ZNfCQ9bk) for a recursive depth-first traversal that prints out the metadata of all the component documents. public static void main(String[] args) throws Exception { Parser parser = new RecursiveMetadataParser(new AutoDetectParser()); ParseContext context = new ParseContext(); context.set(Parser.class, parser); ContentHandler handler = new DefaultHandler(); Metadata metadata = new Metadata(); InputStream stream = TikaInputStream.get(new File(args[0])); try { parser.parse(stream, handler, metadata, context); } finally { stream.close(); } } private static class RecursiveMetadataParser extends ParserDecorator { public RecursiveMetadataParser(Parser parser) { super(parser); } @Override public void parse( InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context) throws IOException, SAXException, TikaException { super.parse(stream, handler, metadata, context); System.out.println("----"); System.out.println(metadata); } } BR, Jukka Zitting