On Mon, 15 Jun 2015, Mattmann, Chris A (3980) wrote:
Hey nick I guess my point is that parser context aka config properties
for parsers and custom config files e.g., x.properties loaded from the
classpath aren't configured from Tika app or server
Ah, good point. In my ideal world, you'd set the "all documents of this
kind" settings (eg paths) in the config, then set this "this document
only" settings (eg pdf column count, pdf inline image settings) via a
command line option to the app / request header to the server, converted
into ParseContext options[1]. That would then be largely the same as for
the pure-Java users.
Hopefully there aren't too many settings which are debatable as to what
they are!
Not sure how huge a tika config file this would all lead to...
I could see some value in properties files, for things that don't change
between machines but do need configuration, eg the mappings for external
parsers. Since it isn't obvious if you've missed one, I'm not sure we want
to use them heavily for customisations for paths etc
Also, since you mention having been caught out by missing jars or missing
service files, maybe we need to put something on the wiki about how to
check if you have what you expected? (IIRC we log if a parser can't be
found or can't be loaded, so mostly it's about how to enable that)
Nick
[1] Do we have tickets for adding these in yet?