[ https://issues.apache.org/jira/browse/TIKA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729573#comment-14729573 ]
Nick Burch edited comment on TIKA-1657 at 9/3/15 6:54 PM: ---------------------------------------------------------- Let's consider this config file: {code} <properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"> <mime-exclude>image/jpeg</mime-exclude> <mime-exclude>application/pdf</mime-exclude> <parser-exclude class="org.apache.tika.parser.executable.ExecutableParser"/> <parser-exclu class="org.apache.tika.parser.executable.ExecutableParser2"/> </parser> <parser class="org.apache.tika.parser.EmptyParser"> <mime>application/pdf</mime> <no-mime>hello/world</no-mime> </parser> </parsers> </properties> {code} With {{--dump-active-config}} you'd get what Tika was using of that, allowing you to spot what was and wasn't used, eg {code} <properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"> <mime-exclude>image/jpeg</mime-exclude> <mime-exclude>application/pdf</mime-exclude> <parser-exclude class="org.apache.tika.parser.executable.ExecutableParser"/> </parser> <parser class="org.apache.tika.parser.EmptyParser"> <mime>application/pdf</mime> </parser> </parsers> </properties> {code} Or, with {{--dump-static-config}} you'd get something like: {code} <properties> <service-loader dynamic="false" /> <translators/> <detectors> <detector class="org.apache.tika.parser.microsoft.POIFSContainerDetector"/> <detector class="org.apache.tika.parser.pkg.ZipContainerDetector"/> <detector class="org.gagravarr.tika.OggDetector"/> <detector class="org.apache.tika.mime.MimeTypes"/> </detectors> <parsers> <parser class="org.apache.tika.parser.CompositeParser"> <mime-exclude>image/jpeg</mime-exclude> <mime-exclude>application/pdf</mime-exclude> <parser class="org.apache.tika.parser.asm.ClassParser"/> <parser class="org.apache.tika.parser.audio.AudioParser"/> <parser class="org.apache.tika.parser.audio.MidiParser"/> <parser class="org.apache.tika.parser.chm.ChmParser"/> <parser class="org.apache.tika.parser.code.SourceCodeParser"/> ... everything except executable ... </parser> <parser class="org.apache.tika.parser.EmptyParser"> <mime>application/pdf</mime> </parser> </parsers> </properties> {code} was (Author: gagravarr): Let's consider this config file: {{{ <properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"> <mime-exclude>image/jpeg</mime-exclude> <mime-exclude>application/pdf</mime-exclude> <parser-exclude class="org.apache.tika.parser.executable.ExecutableParser"/> <parser-exclu class="org.apache.tika.parser.executable.ExecutableParser2"/> </parser> <parser class="org.apache.tika.parser.EmptyParser"> <mime>application/pdf</mime> <no-mime>hello/world</no-mime> </parser> </parsers> </properties> }}} With {{--dump-active-config}} you'd get what Tika was using of that, allowing you to spot what was and wasn't used, eg {{{ <properties> <parsers> <parser class="org.apache.tika.parser.DefaultParser"> <mime-exclude>image/jpeg</mime-exclude> <mime-exclude>application/pdf</mime-exclude> <parser-exclude class="org.apache.tika.parser.executable.ExecutableParser"/> </parser> <parser class="org.apache.tika.parser.EmptyParser"> <mime>application/pdf</mime> </parser> </parsers> </properties> }}} Or, with {{--dump-static-config}} you'd get something like: {{{ <properties> <service-loader dynamic="false" /> <translators/> <detectors> <detector class="org.apache.tika.parser.microsoft.POIFSContainerDetector"/> <detector class="org.apache.tika.parser.pkg.ZipContainerDetector"/> <detector class="org.gagravarr.tika.OggDetector"/> <detector class="org.apache.tika.mime.MimeTypes"/> </detectors> <parsers> <parser class="org.apache.tika.parser.CompositeParser"> <mime-exclude>image/jpeg</mime-exclude> <mime-exclude>application/pdf</mime-exclude> <parser class="org.apache.tika.parser.asm.ClassParser"/> <parser class="org.apache.tika.parser.audio.AudioParser"/> <parser class="org.apache.tika.parser.audio.MidiParser"/> <parser class="org.apache.tika.parser.chm.ChmParser"/> <parser class="org.apache.tika.parser.code.SourceCodeParser"/> ... everything except executable ... </parser> <parser class="org.apache.tika.parser.EmptyParser"> <mime>application/pdf</mime> </parser> </parsers> </properties> }}} > Allow easier XML serialization of TikaConfig > -------------------------------------------- > > Key: TIKA-1657 > URL: https://issues.apache.org/jira/browse/TIKA-1657 > Project: Tika > Issue Type: Improvement > Reporter: Tim Allison > Priority: Minor > Fix For: 1.11 > > Attachments: TIKA-1558-blacklist-effective.xml > > > In TIKA-1418, we added an example for how to dump the config file so that > users could easily modify it. I think we should go further and make this an > option at the tika-core level with hooks for tika-app and tika-server. I > propose adding a main() to TikaConfig that will print the xml config file > that Tika is currently using to stdout. > I'd like to put this into core so that e.g. Solr's DIH users can get by > without having to download tika-app separately. > There's every chance that I've not accounted for issues with dynamic loading > etc. Also, I'd be ok with only having this available in tika-app and > tika-server if there are good reasons. > Feedback? -- This message was sent by Atlassian JIRA (v6.3.4#6332)