Hi
On 19/06/14 01:58, Ray Gauss wrote:
The point of a tika-parsers-all artifact would be a single dependency that 
re-aggregates everything so that downstream projects could work the same way 
they do now and not worry about missing dependencies.

Meanwhile people that just want PDF parsing could declare only the 
tike-parser-pdf dependency.

We could go the other way, focusing on exclusions, but as we add more parsers 
for different types those downstream projects will have to be constantly be 
updating those exclusion lists.

What’s the disadvantage for splitting things up (in a 2.0 timeframe)?


From what I understand the concern is the proliferation of many new micro modules.

I wonder if tika-parsers has anything extra but only specific parser implementations with some related support modules. If yes then effectively it is 'tika-parsers-all'.

If it were the case then I'd settle for documenting the individual dependencies supporting specific file extensions/media types

Cheers, Sergey



On June 18, 2014 at 11:39:00 AM, Nick Burch (apa...@gagravarr.org) wrote:
On Wed, 18 Jun 2014, Ray Gauss wrote:
I think for 2.0 we should consider splitting out parsers into their own
projects for a streamlined dependency hierarchy then reassembling them
with something like a tika-parsers-all artifact.

We had another thread on that not that long ago, where someone cautioned
against breaking it up into too many pieces. We also have fairly frequent
posts on the users list from people who aren't getting any content
returned, because they've forgotten to include a dependency on
tika-parsers

I'm not convinced that splitting tika parsers into 20 odd dependencies is
really going to help more than it hinders - more people will get confused
by missing dependencies they really wanted, and anyone with special needs
about what does/doesn't get parsed is probably going to be taking such
care that they can just exclude everything by default anyway and just pull
in what they need. I'd probably rather we just gave an example pom snippet
that shows how to exclude all except one thing, and let people with
special cases work from there.

Nick


Reply via email to