On Fri, 29 Sep 2017, Giuseppe Totaro wrote:
To sum up, I would like to quickly discuss the following aspects:
- As you all mentioned, the HTTP headers for configuring the
ContentHandler to be used are better suited for the dynamic cases.
Specifically, a ContentHadler can be given through an ad-hoc header, e.g.
-H "X-Content-Handler: StandardsExtractingContentHandler", parsed and used
run-time within tika-server.
- Nick, I believe that providing the ability to determine the
ContentHandler through a command-line option is a great idea. It could be
better also for users.
To make for shorter headers / options, I'd suggest that you test the value
given for a ".". If it has one, treat as a class name. If it doesn't, try
to prefix with org.apache.tika.sax , so that just short class names can be
used for Tika built-in handlers
Nick