On Thu, Jul 15, 2010 at 6:43 AM, Jukka Zitting <jukka.zitt...@gmail.com>wrote:

> The way I recommend is to pass a custom Parser implementation through
> the ParseContext. This gives you detailed access to each component
> document.
>
> You noted that this approach wouldn't work for recursive metadata. Why?
>
>
I didn't think of passing in a custom parser as a way to get metadata. Now
that you mention it, for my needs I could clone the AutoDetectParser,
change the code to handle Metadata however I want (e.g. keep a metadata
stack, send notifications, or some other solution I haven't thought of, and
pass this new parser through the ParseContext.

Given this solution, I'm left wondering if capturing the metadata for
nested
documents is an oddball use case that most users don't want, or if this is
a common use case that many users would like to see Tika support for. In
other words, should a new parser type be added to Tika's library of
parsers,
or should this be left as an exercise for the users who want metadata.

Paul

Reply via email to