On 11/23/11 1:53 PM, Aliaksandr Autayeu wrote:
My proposal is different. The release should remain a single one. Let me
make the proposal more concrete in terms of steps:

1) Create a opennlp-cli module with its own pom. This will make tools
package smaller and will improve the experience of library-only users: less
stuff to drag around.
2) Keep the same single release of OpenNLP.

Currently most of our code lives in opennlp-tools and is separated by java
packages
which I believe works really great and there is no need to cut this down
further.

Exactly! And CLI is already a separate package.

I am not convinced that the additional sub-project opennlp-cli is worth
the non-noticeable advantage of having less classes on the classpath.
The java packages give us already good separation.

Maybe we should do even the opposite and also move the maxent code in
there?
I am +1 on that, actually.

It does make sense given current tight integration. However, conceptually,
this will break modularization. Even MaxEnt is not a pure maxent anymore -
there is perceptron inside as well. Nicolas Hernandez mentioned in a recent
thread "for may be considering alternatives to the MaxEnt algorithm".
Rolling everything into one bundle will make these possible plans more
difficult. If these plans would advance, this might lead to some
abstraction to interfaces and (several) implementations, which might become
optional dependencies. So I would keep current level of modularization with
respect to maxent.

We are planning a refactoring which will rename it to ml, but that is a different
story.
Adding a new algorithm like we did with perceptron works well in this setup.
What is not possible is the addition of a new algorithm in user code.
There are various things which need to be solved for this, e.g. how to pass down training options from the tools package? How to load the new model with out zip package? How to load
the classes which implement the algorithm?

Anyway that are issues which are orthogonal to our project structure decision.


On the other side you could argument to cut things down, then you might end
up
with a couple of different sub-projects. Another prime candidate for moving
is the coreference package because it introduces an extra dependency,
which no
other component needs.

Nice point. Illustrates similar situation. This is actually, a good
argument in favor of per-component modules, but for now that would
complicate things too much. So, here I would refactor it to make the
dependency on JWNL optional, continuing on the lines of existing Dictionary
interface and providing means to register your own implementation. This
will get rid of dependency. Afterall, there are alternatives.


When you think this further you could end up with a sub-project per
component.

Although there are scenarios where this makes sense, I would avoid this for
the moment.


Maybe I am mistaken, but I really cannot see the advantage of maintaining the cli package or other code in separate projects. Less classes (we are speaking about a few 10 classes) on the classpath is not a good enough reason to do this in my opinion. Java does lazy class loading, only classes which are needed are loaded into memory. The only advantage which might be there is that classpath scanning is faster, but I doubt that this will be noticeable,
or affect many users.

Jörn

Reply via email to