[ 
https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717349#comment-14717349
 ] 

Yaniv Kunda commented on TIKA-1706:
-----------------------------------

The fact that o.a.tika.io contains public classes is a problem I didn't think 
about -
these files are strictly meant as internal utility/support classes and 
shouldn't really be used by users.
In fact, I would say although these are public classes, they should not be 
considered a part of the public API of tika-core.
And since we don't know what commons-io-cloned classes users use (probably by 
accident), it is indeed a problem letting these go.

I also think that the "no-dependencies" principle is more romantic than it is 
useful, as these days a lot of the Java ecosystem is built on using external 
libraries, unless space is critical such as in mobile applications (and even 
these are getting bigger and bigger).
As the vast majority of tika-core usages comes transitively from tika-parsers, 
I think this is not the case.
I haven't crawled maven repo (deep enough) to find how many tika-code exclusive 
usages have a few or no other dependencies, but I suspect that number is not 
very high.
So the absolute worst case here - and remember that this is the extreme case of 
a library that uses tika-core and no other library - is a 30% footprint 
increase!

o.a.tika.io is a mess - it contains:
- classes from commons-io-1.4
- partial classes from commons-io-1.4
- modified classes from commons-io-1.4
- classes from commons-io-2.0 (or later unknown version/s)
- tika original classes

It's really hard going over all changes - and I've shown just a few examples - 
but just doing the switch is simply easier, not so costly even in the worst 
case, and would bring progress to our doorstep (today and in future changes) by 
exploration faster than maintaining copied code.

My suggestion is:
- bring commons-io back to tika-core
- change all usages of the copied classes to commons-io
- deprecate (do not delete) the copied classes, probably until tika-2.0




> Bring back commons-io to tika-core
> ----------------------------------
>
>                 Key: TIKA-1706
>                 URL: https://issues.apache.org/jira/browse/TIKA-1706
>             Project: Tika
>          Issue Type: Improvement
>          Components: core
>            Reporter: Yaniv Kunda
>            Priority: Minor
>             Fix For: 1.11
>
>
> TIKA-249 inlined select commons-io classes in order to simplify the 
> dependency tree and save some space.
> I believe these arguments are weaker nowadays due to the following concerns:
> - Most of the non-core modules already use commons-io, and since tika-core is 
> usually not used by itself, commons-io is already included with it
> - Since some modules use both tika-core and commons-io, it's not clear which 
> code should be used
> - Having the inlined classes causes more maintenance and/or technology debt 
> (which in turn causes more maintenance)
> - Newer commons-io code utilizes newer platform code, e.g. using Charset 
> objects instead of encoding names, being able to use StringBuilder instead of 
> StringBuffer, and so on.
> I'll be happy to provide a patch to replace usages of the inlined classes 
> with commons-io classes if this is accepted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to