On 18/06/14 16:58, Sergey Beryozkin wrote:
Hi Nick
On 18/06/14 16:38, Nick Burch wrote:
On Wed, 18 Jun 2014, Ray Gauss wrote:
I think for 2.0 we should consider splitting out parsers into their
own projects for a streamlined dependency hierarchy then reassembling
them with something like a tika-parsers-all artifact.

We had another thread on that not that long ago, where someone cautioned
against breaking it up into too many pieces. We also have fairly
frequent posts on the users list from people who aren't getting any
content returned, because they've forgotten to include a dependency on
tika-parsers

I'm not convinced that splitting tika parsers into 20 odd dependencies
is really going to help more than it hinders - more people will get
confused by missing dependencies they really wanted, and anyone with
special needs about what does/doesn't get parsed is probably going to be
taking such care that they can just exclude everything by default anyway
and just pull in what they need. I'd probably rather we just gave an
example pom snippet that shows how to exclude all except one thing, and
let people with special cases work from there.

Can we start with adding a section to Tika docs documenting the core
dependencies of the tike-parsers module to make the life a bit easier
for developers who do not expect the specific parser implementations
immediately downloaded ?
And listing the parser implementation dependencies too ? So we'd exclude
them all from our CXF module depending on Tika and point the users to
the section listing the well known Tika parser implementations for them
to choose what they need ?
The reason we need it is that CXF can not ship all of Tika Parser dependencies because CXF will only offer a light-weight Tika-aware handler.

By the way I'll be happy to help with the documentation if you let me know the details here

Cheers, Sergey


Thanks, Sergey

Nick



--
Sergey Beryozkin

Talend Community Coders
http://coders.talend.com/

Blog: http://sberyozkin.blogspot.com

Reply via email to