Tim Allison created TIKA-2972: --------------------------------- Summary: Allow users to specify a ContentHandlerFactory in tika-config.xml Key: TIKA-2972 URL: https://issues.apache.org/jira/browse/TIKA-2972 Project: Tika Issue Type: Improvement Reporter: Tim Allison
I'd like to add a tika-eval handler that will calculate text stats at the end of parsing a document so that the user can get a unified/simpler view of number of tokens/ out of vocabulary, etc. in the metadata rather than having to run their own post-parse process on the content. The problem comes with integrating this into tika-app and tika-server -- tika-app balloons to 134MB. I don't want to nearly double the size of tika-app just so that I can add some stuff that very few folks will use. I think we've discussed this option before, but it would be handy to allow users to specify a ContentHandlerFactory or possibly a map of ContentHandlerFactories in tika-config.xml so that users can get custom handling in tika-app and tika-server. The idea of a map of ContentHandlerFactories, would be to have a name for each content handler factory, and a user could call different handlers on tika-server like this: `curl... http://localhost:9998/tika/custom/myhandler1` `curl... http://localhost:9998/tika/custom/myhandler2` or in tika-app: `java -jar tika-app.jar --handlerFactory=myhandler1...` WDYT? WDYT? -- This message was sent by Atlassian Jira (v8.3.4#803005)