The vote passes with the 17 +1 votes (8 binding), no -1 votes.
Binding (8):
Jukka Zitting
Bertrand Delacretaz
Grant Ingersoll
Michael McCandless
Matt Benson
Doug Cutting
Benson Margulies
Alan Cabrera
Non-Binding (9):
Tommaso Teofili
Thilo Götz
Marshall Schor
Nick Kew
Marcel Offermans
Mohammad Nour El-Din
Andreas Kuckartz
Otis Gospodnetic
Isabel Drost
Thanks for voting,
Jörn
On 11/19/10 10:48 AM, Jörn Kottmann wrote:
Hi,
lets vote on the acceptance of the OpenNLP Project for incubation
at the Apache Incubator.
The proposal is on the wiki
http://wiki.apache.org/incubator/OpenNLPProposal
and a copy is included below.
The discussion thread can be found here:
http://mail-archives.apache.org/mod_mbox/incubator-general/201011.mbox/%3c4ce4f1f4.3010...@gmail.com%3e
Please cast your votes:
[ ] +1 Accept OpenNLP for incubation
[ ] +0 Don't care
[ ] -1 Reject for the following reason:
The vote is open for at least 72 hours.
Thanks!
Jörn
= OpenNLP Proposal =
The following is a proposal for a new top-level project within the ASF.
== Abstract ==
OpenNLP is a Java machine learning toolkit for natural language
processing (NLP).
== Proposal ==
OpenNLP is a machine learning based toolkit for the processing of
natural language text. It supports the most common NLP tasks, such as
tokenization, sentence segmentation, part-of-speech tagging, named
entity extraction, chunking, parsing, and coreference resolution.
These tasks are usually required to build more advanced text
processing services.
The goal of the OpenNLP project will be to create a mature toolkit for
the abovementioned tasks. An additional goal is to provide a large
number of pre-built models for a variety of languages, as well as the
annotated text resources that those models are derived from.
== Background ==
OpenNLP was started in 2000 by Jason Baldridge and Gann Bierner while
they were graduate students in the Division of Informatics at the
University of Edinburgh. OpenNLP, broadly speaking, was meant to be a
high-level organizational unit for various open source software
packages for natural language processing; more practically, it
provided a high-level package name for various Java packages of the
form opennlp.*. The first OpenNLP software package was the Grok
natural language parsing toolkit, which was also the genesis of what
is now called the OpenNLP Toolkit. The software released on the
OpenNLP sourceforge site (started in 2000, along with Grok) was simply
a set of interfaces defined in the package opennlp.common and referred
to as the OpenNLP Java API. The actual implementations of natural
language processing components were provided in Grok, along with code
for sentence parsing with Combinatory Categorial Grammar. This code
was used heavily in both Baldridge's and Biern
er's dissertations. The first paper that used Grok, and especially the
components that would become the OpenNLP Toolkit is
[[http://comp.ling.utexas.edu/jbaldrid/papers/hockenmaier_etal_ESSLLI2000.pdf|Hockenmaier,
Bierner and Baldridge (2000)]] (later updated as the journal article
[[http://comp.ling.utexas.edu/jbaldrid/papers/HockenmaierEtal2004.pdf|Hockenmaier,
Bierner, and Baldridge (2004)]]).
In 2003, it was decided to remove the NLP infrastructure from Grok as
there was a clear separation between the basic text processing
components and the syntactic and semantic analysis components. At the
same time, Grok was rebranded as OpenCCG (openccg.sf.net). The final
release of the OpenNLP Java API was made in March 2003; the new
OpenNLP Toolkit was created from the API and the Grok text processing
components, with version 1.0 being released in April 2004. The OpenNLP
Toolkit and OpenCCG have evolved independently since then and have
mostly independent and active developer and user communities. OpenCCG
is primarily used in the academic community, while OpenNLP has
considerable use in both academia and industry. As in indication of
the academic impact of OpenNLP, a search on Google scholar (done in
March 2010) returned about 650 publications citing the package. Some
of these include the OpenNLP website and a few non-publications plus
some self-citations. Based on a scan of
these results, we estimate that about 500 actual publications have
used OpenNLP in their work, and there are an addition 50 or so
quasi-publications like surveys and instruction manuals.
The activity level of the OpenNLP project has fluctuated over that
past 10+ years, with a large uptick in the last two years especially.
Most recently, due both to the availability of new documentation and
the release of version 1.5 , there have been many more downloads and
page views for the OpenNLP project. In fact, September 2010 had the
most downloads (1,561) and project web hits (226,391) of any month
since the project's beginning in 2000, and October is keeping pacing
with that figure so far. As a result, OpenNLP has gone from being in
the 2000th to 4000th ranked project (between January and May, 2010) to
being ranked 570, 314, 181 and 439 for July, August, September, and
October respectively. Full details are available on the Sourceforge
statistics page for OpenNLP. (There are 240,000 projects hosted on
SourceForge, though this figure includes many, many projects that
never actually get started: it seems that about 7-10% of these are
stable, active projects base
d on a review done in 2007.)
== Rationale ==
OpenNLP fills a significant gap at the ASF in regards to human
language processing tools. While Lucene/Solr, UIMA and Mahout all
have some tools in this area, none of them are solely focused on tools
specifically for working with natural language like OpenNLP.
== Initial Goals ==
The initial goals of the proposed project are:
* Bring the community together at the ASF and make the development
process transparent for them
* Write user documentation about all major components
* Automated build including train and evaluate regression tests
* Produce an Incubating release
== Current Status ==
=== Meritocracy ===
Some of the initial committers are familiar with Apache's idea of
meritocracy, others aren't. We will get everybody on the same level
as part of the incubation process.
=== Community ===
OpenNLP already has a considerable user base, both in industry and
academia.
=== Core Developers ===
See the initial committer list.
=== Alignment ===
OpenNLP has tie-ins with several existing Apache projects. We have
been distributing wrappers for UIMA for some time now (two UIMA
committers also contribute to OpenNLP). We expect this collaboration
to strengthen further after our move to Apache.
Another obvious connection exists to some of the projects under the
Lucene umbrella. On the one hand, projects like Solr may benefit from
the OpenNLP analysis capabilities to create specialized search for
particular domains. On the other, OpenNLP may benefit from the
machine learning code that is being developed in Mahout, and maybe get
some people from that community to lend a hand.
== Known Risks ==
=== Orphaned products ===
The project has been around for quite a number of years already, it
has a well-established user community and a diverse set of committers.
=== Inexperience with Open Source ===
OpenNLP has been an open source project for quite some time. Many of
the developers are already familiar with both open source in general
and the ASF in particular.
=== Homogenous Developers ===
The current group of developers is very diverse, no two developers
work for the same organization.
=== Reliance on Salaried Developers ===
Most of the developers are not paid to work on OpenNLP, so there is
little reliance on salaried developers.
=== Relationships with Other Apache Products ===
NLP is often used in search and other algorithms that work with
unstructured data, thus OpenNLP is likely to be useful to the Lucene
and Solr communities. It also aligns nicely with both Mahout and UIMA.
=== A Excessive Fascination with the Apache Brand ===
We think the project aligns nicely with the goals of the ASF to
disseminate source code to the public free of charge. NLP has long
been the subject of cutting edge research, but is often lacking in
community and shared knowledge. We believe that by bringing OpenNLP
to the ASF, the Apache brand will help deliver NLP capabilities to a
much larger audience and likewise a cutting edge project like OpenNLP
can further the ASF brand by providing users with tried and true, as
well as new, natural language processing capabilities.
== Documentation ==
*http://opennlp.sourceforge.net/README.html
*http://sourceforge.net/apps/mediawiki/opennlp/index.php?title=Main_Page
== Initial Source ==
The source code is maintained in two CVS repositories on SourceForge.
OpenNLP Maxent:http://maxent.cvs.sourceforge.net/viewvc/maxent/
OpenNLP Tools and OpenNLP
UIMA:http://opennlp.cvs.sourceforge.net/viewvc/opennlp/
== Source and Intellectual Property Submission Plan ==
The OpenNLP source code is already open source under the AL 2.0.
== External Dependencies ==
||'''Library''' ||||<style="text-align: center;">'''License'''
||||<style="text-align: center;">'''Description''' ||
||JWNL ||||<style="text-align: center;">BSD ||||<style="text-align:
center;">Java Wordnet Library ||
||JUnit ||||<style="text-align: center;">CPL ||||<style="text-align:
center;">Unit Testing Framework ||
||UIMA ||||<style="text-align: center;">AL 2.0 ||||<style="text-align:
center;">Unstructured Information Management Architecture ||
== Cryptography ==
OpenNLP neither provides nor uses any cryptography.
== Required Resources ==
=== Mailing lists ===
* opennlp-dev
* opennlp-private
* opennlp-user
* opennlp-commits
=== Subversion Directory ===
https://svn.apache.org/repos/asf/incubator/opennlp
=== Issue Tracking ===
Jira: OPENNLP
=== Other Resources ===
== Initial Committers ==
||'''Name''' ||||<style="text-align: center;">'''Email'''
||||<style="text-align: center;">'''CLA''' ||
||Thilo Goetz ||||<style="text-align: center;"> twgo...@apache.org
||||<style="text-align: center;">yes ||
||Grant Ingersoll ||||<style="text-align: center;">
gsing...@apache.org ||||<style="text-align: center;">yes ||
||Jörn Kottmann ||||<style="text-align: center;"> jo...@apache.org
||||<style="text-align: center;">yes ||
||Thomas Morton ||||<style="text-align: center;"> tsmor...@gmail.com
||||<style="text-align: center;">no ||
||William Silva ||||<style="text-align: center;">
william.co...@gmail.com ||||<style="text-align: center;">yes ||
||Jason Baldridge ||||<style="text-align: center;">
jasonbaldri...@gmail.com ||||<style="text-align: center;">yes ||
||James Kosin ||||<style="text-align: center;"> james.ko...@gmail.com
||||<style="text-align: center;">yes ||
== Affiliations ==
||'''Name''' ||||<style="text-align: center;">'''Affiliation''' ||
||Thilo Goetz ||||<style="text-align: center;">IBM ||
||Grant Ingersoll ||||<style="text-align: center;">Lucid Imagination ||
||Jörn Kottmann ||||<style="text-align: center;">Infopaq International
A/S ||
||Thomas Morton ||||<style="text-align: center;">Comcast Corporation ||
||William Silva ||||<style="text-align: center;">São Paulo University ||
||Jason Baldridge ||||<style="text-align: center;">The University of
Texas at Austin ||
||James Kosin ||||<style="text-align: center;">International
Communications Group, Inc. ||
== Sponsors ==
=== Champion ===
Grant Ingersoll
=== Nominated Mentors ===
Isabel Drost
Grant Ingersoll
Benson Margulies
=== Sponsoring Entity ===
The Apache Incubator
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org