Cservenak, Tamas created TIKA-1247: -------------------------------------- Summary: Explode monolithic parsers module into smaller ones Key: TIKA-1247 URL: https://issues.apache.org/jira/browse/TIKA-1247 Project: Tika Issue Type: Improvement Reporter: Cservenak, Tamas
Right now, there is one monolithic parsers module, that, if used in Maven, pulls in not only the whole Internet, but beyond. Also, am not certain that every use case that for example uses HTML parser needs Microsoft related parsers, etc. Make it more granular. Proposed solution: Explode the parsers module into smaller set of modules. Let the build tool figure out what user need, for example if user using Maven adds "chm" parser as dependency, Maven will figure out the "chm" > "html" > "txt" and "tike-core" dependencies by itself, and no transitive dependency hunting (for inclusion or exclusion) is needed. There is a PR in WIP state with ongoing work: https://github.com/apache/tika/pull/5 -- This message was sent by Atlassian JIRA (v6.1.5#6160)