Hi Paul, Sure. Feel free to sign up for an account (it's free and pretty simple) and then you can just copy/paste and start a wiki page on your own. We welcome your contribution!
Cheers, Chris On 7/16/10 8:29 AM, "Paul Jakubik" <[email protected]> wrote: Thank you for this example! Is there any chance this example could be added to the Tika wiki? On Fri, Jul 16, 2010 at 1:30 AM, Jukka Zitting <[email protected]>wrote: > Hi, > > On Fri, Jul 16, 2010 at 2:43 AM, Paul Jakubik <[email protected]> > wrote: > > On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting <[email protected] > >wrote: > >> The way I recommend is to pass a custom Parser implementation through > >> the ParseContext. This gives you detailed access to each component > >> document. > > > > I looked at the code a little further, and I don't see exactly how I can > do > > this. > > Looks like you're approaching this from the wrong perspective. See the > example below (or at http://pastebin.com/ZNfCQ9bk) for a recursive > depth-first traversal that prints out the metadata of all the > component documents. > > public static void main(String[] args) throws Exception { > Parser parser = new RecursiveMetadataParser(new AutoDetectParser()); > ParseContext context = new ParseContext(); > context.set(Parser.class, parser); > > ContentHandler handler = new DefaultHandler(); > Metadata metadata = new Metadata(); > > InputStream stream = TikaInputStream.get(new File(args[0])); > try { > parser.parse(stream, handler, metadata, context); > } finally { > stream.close(); > } > } > > private static class RecursiveMetadataParser extends ParserDecorator { > > public RecursiveMetadataParser(Parser parser) { > super(parser); > } > > @Override > public void parse( > InputStream stream, ContentHandler handler, > Metadata metadata, ParseContext context) > throws IOException, SAXException, TikaException { > super.parse(stream, handler, metadata, context); > > System.out.println("----"); > System.out.println(metadata); > } > > } > > BR, > > Jukka Zitting > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
