[ 
https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14695256#comment-14695256
 ] 

Nick Burch commented on TIKA-1706:
----------------------------------

The latest Commons IO jar is 180kb, the inlined classes in Tika when in a jar 
are about 20kb, so it would increase the minimum Tika install size. 

Back when most of these classes were inlined, in Tika 0.4, the size of the Tika 
Core jar was only 129kb. These days, it's coming in at just over 560kb, so the 
size of the Commons IO jar is no longer such an issue relatively. However, we 
do currently manage without any required dependencies, which this would change

While most people do use Tika Core with Tika Parsers, not all people do, so it 
will have an impact on them

We'd also need to check if there are any enhancements or fixes that have been 
made to the inlined classes, and if so, work to get them upstream before any 
changes. Would you be able to check that?

Also, do you have any cases where having all of a newer Commons IO would 
improve/simplify/fix current Tika Core code?

> Bring back commons-io to tika-core
> ----------------------------------
>
>                 Key: TIKA-1706
>                 URL: https://issues.apache.org/jira/browse/TIKA-1706
>             Project: Tika
>          Issue Type: Improvement
>          Components: core
>            Reporter: Yaniv Kunda
>            Priority: Minor
>             Fix For: 1.11
>
>
> TIKA-249 inlined select commons-io classes in order to simplify the 
> dependency tree and save some space.
> I believe these arguments are weaker nowadays due to the following concerns:
> - Most of the non-core modules already use commons-io, and since tika-core is 
> usually not used by itself, commons-io is already included with it
> - Since some modules use both tika-core and commons-io, it's not clear which 
> code should be used
> - Having the inlined classes causes more maintenance and/or technology debt 
> (which in turn causes more maintenance)
> - Newer commons-io code utilizes newer platform code, e.g. using Charset 
> objects instead of encoding names, being able to use StringBuilder instead of 
> StringBuffer, and so on.
> I'll be happy to provide a patch to replace usages of the inlined classes 
> with commons-io classes if this is accepted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to