[ https://issues.apache.org/jira/browse/TIKA-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902591#comment-14902591 ]
Tim Allison commented on TIKA-1740: ----------------------------------- How about we store a list of <Metadata, Handler> pairs instead of Metadata objects. The current {{getMetadata()}} would behave as it currently does. We'll add {{getMetadataAndHandlers()}}, which would return the list of <Metadata, Handler> pairs. This would not include TIKA_CONTENT. The current {{getMetadata}} will call {{getMetadataAndHandlers}} under the hood and add TIKA_CONTENT. An initial concern is that this will double memory at the time that {{getMetadata}} is called, but as I think about the way the recursion is working, we're pretty much doing that now. How does this sound? > RecursiveParserWrapper returning ContentHandler-s > ------------------------------------------------- > > Key: TIKA-1740 > URL: https://issues.apache.org/jira/browse/TIKA-1740 > Project: Tika > Issue Type: Wish > Components: core, parser > Reporter: Andrea > > I would like to build a mechanism to allow a custom object being built > starting from a parsing result. This can be done easily by working with a > custom ContentHandler "transformer", but how can I achieve this result using > a RecursiveParserWrapper? In this case I can only set a ContentHandlerFactory > and the parser will just call the toString method and set it as a metadata. > Can you imagine something to get the entire ContentHandler object for each > subfile instead of the result of the toString method? Of course, it would > also be needed to have a flag to disable the TIKA_CONTENT metadata production. -- This message was sent by Atlassian JIRA (v6.3.4#6332)