Hi Paul,

Sure. Feel free to sign up for an account (it's free and pretty simple) and 
then you can just copy/paste and start a wiki page on your own. We welcome your 
contribution!

Cheers,
Chris


On 7/16/10 8:29 AM, "Paul Jakubik" <[email protected]> wrote:

Thank you for this example! Is there any chance this example could be
added to the Tika wiki?

On Fri, Jul 16, 2010 at 1:30 AM, Jukka Zitting <[email protected]>wrote:

> Hi,
>
> On Fri, Jul 16, 2010 at 2:43 AM, Paul Jakubik <[email protected]>
> wrote:
> > On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting <[email protected]
> >wrote:
> >> The way I recommend is to pass a custom Parser implementation through
> >> the ParseContext. This gives you detailed access to each component
> >> document.
> >
> > I looked at the code a little further, and I don't see exactly how I can
> do
> > this.
>
> Looks like you're approaching this from the wrong perspective. See the
> example below (or at http://pastebin.com/ZNfCQ9bk) for a recursive
> depth-first traversal that prints out the metadata of all the
> component documents.
>
>    public static void main(String[] args) throws Exception {
>        Parser parser = new RecursiveMetadataParser(new AutoDetectParser());
>        ParseContext context = new ParseContext();
>        context.set(Parser.class, parser);
>
>        ContentHandler handler = new DefaultHandler();
>        Metadata metadata = new Metadata();
>
>        InputStream stream = TikaInputStream.get(new File(args[0]));
>        try {
>            parser.parse(stream, handler, metadata, context);
>        } finally {
>            stream.close();
>        }
>    }
>
>    private static class RecursiveMetadataParser extends ParserDecorator {
>
>        public RecursiveMetadataParser(Parser parser) {
>            super(parser);
>        }
>
>        @Override
>        public void parse(
>                InputStream stream, ContentHandler handler,
>                Metadata metadata, ParseContext context)
>                throws IOException, SAXException, TikaException {
>            super.parse(stream, handler, metadata, context);
>
>            System.out.println("----");
>            System.out.println(metadata);
>        }
>
>    }
>
> BR,
>
> Jukka Zitting
>



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to