Re: Can some of tika-parsers module dependencies be made optional ?

Nick Burch Thu, 19 Jun 2014 12:23:05 -0700

On Thu, 19 Jun 2014, Ray Gauss wrote:

The point of a tika-parsers-all artifact would be a single dependencythat re-aggregates everything so that downstream projects could work thesame way they do now and not worry about missing dependencies.
What’s the disadvantage for splitting things up (in a 2.0 timeframe)?

We already have users confused by the current split between tika-core andtika-parsers - see users list for example. We already have users confusedby what dependencies they need with the current poms setup. Splitting isgoing to make that a lot worse. (POI, as a related example, sees plenty ofconfused users who've got mis-matched jars and problems. Splitting isgoing to make that a lot worse.)

We have previously tried pushing parsers out of the tika parser jar andinto other jars, eg ones maintained by external groups, but on the wholeit hasn't been a great success. Keeping them in sync, dealing withdifferent cycles, applying updates, keeping them consistent, building in asensible length of time, all of that would be harder with a pile ofmodules.

If we were to split out out to the level needed by some of the use casesmentioned, we'd have so many parser modules it'd be a nightmare tomaintain, and would case problems mentioned above. (People in otherthreads have cautioned on these problems). If we split into just a handfulof sub modules, then many of the uses cases mentioned still have to dowork to pick out the bits they need

I still believe that the main use case of tika is "everything included",and especially that's the beginners use case, so I think we should focuson keeping that easy. Peeling out just some bits feels like an advanceduse case to me, so I'd rather we put the requirement for effort onto thosefolks, rather than onto newbies and people on the typical uses. I'dtherefore much rather we provide advanced docs/help on excluding somebits, rather than pull it out into a pile of different modules.


Nick

Re: Can some of tika-parsers module dependencies be made optional ?

Reply via email to