Thank you for this example! Is there any chance this example could be added to the Tika wiki?
On Fri, Jul 16, 2010 at 1:30 AM, Jukka Zitting <[email protected]>wrote: > Hi, > > On Fri, Jul 16, 2010 at 2:43 AM, Paul Jakubik <[email protected]> > wrote: > > On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting <[email protected] > >wrote: > >> The way I recommend is to pass a custom Parser implementation through > >> the ParseContext. This gives you detailed access to each component > >> document. > > > > I looked at the code a little further, and I don't see exactly how I can > do > > this. > > Looks like you're approaching this from the wrong perspective. See the > example below (or at http://pastebin.com/ZNfCQ9bk) for a recursive > depth-first traversal that prints out the metadata of all the > component documents. > > public static void main(String[] args) throws Exception { > Parser parser = new RecursiveMetadataParser(new AutoDetectParser()); > ParseContext context = new ParseContext(); > context.set(Parser.class, parser); > > ContentHandler handler = new DefaultHandler(); > Metadata metadata = new Metadata(); > > InputStream stream = TikaInputStream.get(new File(args[0])); > try { > parser.parse(stream, handler, metadata, context); > } finally { > stream.close(); > } > } > > private static class RecursiveMetadataParser extends ParserDecorator { > > public RecursiveMetadataParser(Parser parser) { > super(parser); > } > > @Override > public void parse( > InputStream stream, ContentHandler handler, > Metadata metadata, ParseContext context) > throws IOException, SAXException, TikaException { > super.parse(stream, handler, metadata, context); > > System.out.println("----"); > System.out.println(metadata); > } > > } > > BR, > > Jukka Zitting >
