We once had a similar discussion about the tools package and maxent.
At SourceForge we used to have two separate projects, which have also been
released independently and had two distributions.
At Apache we changed that because the overhead of making a release.
The arguments back then have been similar. People who just want to use
maxent
now need to download the whole OpenNLP distribution and get all the stuff
they don't want. I have never heard from anyone who was annoyed by this
and I think it doesn't harm or is a bad thing for a maxent only user.
But it saves as a good amount of time, because we need to make one
release less.
In my opinion it is more about which project structure do we prefer.
Having the CLI classes on the classpath or not for a library-only-user
doesn't make a difference. A tool that searches the classpath
might be slightly faster without the CLI package, but that is more a special
case and the improvement is likely not even be noticeable.
Which project structure should OpenNLP have?
And what do we want to ship to our users?
Currently most of our code lives in opennlp-tools and is separated by
java packages
which I believe works really great and there is no need to cut this down
further.
Maybe we should do even the opposite and also move the maxent code in there?
I am +1 on that, actually.
On the other side you could argument to cut things down, then you might
end up
with a couple of different sub-projects. Another prime candidate for moving
is the coreference package because it introduces an extra dependency,
which no
other component needs.
When you think this further you could end up with a sub-project per
component.
In my experience having a project which is cut into sub-projects for no
strong reasons is often more complex and more difficult then it needs to be.
Jörn
On 11/23/11 12:09 PM, Aliaksandr Autayeu wrote:
While working on CLI tools patch
OPENNLP-402<https://issues.apache.org/jira/browse/OPENNLP-402> the
following proposal came to my mind:
I would like to propose moving CLI tools out of the tool package into
separate module and jar. Motivation: as was mentioned earlier, OpenNLP is a
library, therefore it would be nice to keep extra stuff in separate jars,
because many people who use the library might never need CLI tools. But
since this is a bigger refactoring than the patch I've sent, I'd like to
discuss it first.
I see the following advantages:
* Proper modularization. Everybody gets what's needed and no extra stuff.
This leads to
* Smaller jars, fewer classes on the classpath, which comes in handy when
you use classpath search tools, like those in Spring.
Dealing with the dependency side of this (one more dependency) I see the
following:
* New CLI jar will not be added to the classpath anyway (it's used only
from "opennlp" command-line scripts), so there is no extra dependency to
manage for those using the library and those using CLI have it handled
already by the command-line scripts. The dependencies become: CLI ->
OpenNLP Tools -> MaxEnt.
As an extra option to answer "jar multiplication problem" for those outside
of Maven, if I'm not mistaken, there is a possibility to roll as many jars
as one wants into one during the build. Which turns this into a build
option, rather than project structure option question.
Aliaksandr