[ https://issues.apache.org/jira/browse/TIKA-1663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176652#comment-15176652 ]
Nick Burch commented on TIKA-1663: ---------------------------------- The other parser decorators are specified with options inside the parent parser, eg mime includes or excludes are decorators given as options to the main parser. In some ways, this is quite nice, as you do the main definition on the thing that'll do the work, then the decorators after One option, for the general case, would be to add additional decorators too, eg http://tika.apache.org/1.12/configuring.html#Configuring_Parsers becomes {code} <parser class="org.apache.tika.parser.DefaultParser"> <mime-exclude>image/jpeg</mime-exclude> <mime-exclude>application/pdf</mime-exclude> <parser-exclude class="org.apache.tika.parser.executable.ExecutableParser"/> <decorator class="org.foo.bar.DecoratorWithEmojis"/> <decorator class="org.foo.bar.DecoratorWithHashing"/> </parser> {code} For the specific case of the digester, it's a well known thing, so we could give it custom tags. That would make things clearer, and would get round the parameter issue. One option is: {code} <parser class="org.apache.tika.parser.DefaultParser"> <mime-exclude>image/jpeg</mime-exclude> <mime-exclude>application/pdf</mime-exclude> <digest>MD5,SHA256</digest> <parser-exclude class="org.apache.tika.parser.executable.ExecutableParser"/> </parser> {code} The other to keep it more in line with the mime includes/excludes is: {code} <parser class="org.apache.tika.parser.DefaultParser"> <mime-exclude>image/jpeg</mime-exclude> <mime-exclude>application/pdf</mime-exclude> <digest>MD5</digest> <digest>SHA256</digest> <parser-exclude class="org.apache.tika.parser.executable.ExecutableParser"/> </parser> {code} What do people think? > Add a DigestingParser to add MD5/SHA-X hashes as fields in Metadata > ------------------------------------------------------------------- > > Key: TIKA-1663 > URL: https://issues.apache.org/jira/browse/TIKA-1663 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Minor > Attachments: digesting_parser_v1.patch > > > It might be useful to integrate commons' DigestUtils and allow users to > easily add the MD5 or other supported hashes to the Metadata object. > Anyone else find this of use? -- This message was sent by Atlassian JIRA (v6.3.4#6332)