URL: <http://savannah.nongnu.org/task/?11422>
Summary: Submission of Modular Suite of NLP Tools Project: Savannah Administration Submitted by: amaral Submitted on: Tue 11 Oct 2011 11:09:02 AM GMT Should Start On: Tue 11 Oct 2011 12:00:00 AM GMT Should be Finished on: Fri 21 Oct 2011 12:00:00 AM GMT Category: Project Approval Priority: 5 - Normal Status: None Privacy: Public Percent Complete: 0% Assigned to: None Open/Closed: Open Discussion Lock: Any Effort: 0.00 _______________________________________________________ Details: A new project has been registered at Savannah This project account will remain inactive until a site admin approves or discards the registration. = Registration Administration = While this item will be useful to track the registration process, *approving or discarding the registration must be done using the specific Group Administration <https://savannah.nongnu.org/siteadmin/groupedit.php?group_id=10884> page*, accessible only to site administrators, effectively *logged as site administrators* (superuser): * Group Administration <https://savannah.nongnu.org/siteadmin/groupedit.php?group_id=10884> = Registration Details = * Name: *Modular Suite of NLP Tools * * System Name: *modnlp* * Type: non-GNU software & documentation * License: GNU General Public License v2 or later ---- ==== Description: ==== modnlp aims to provide a modular architecture and tools for natural language processing written mainly in Java (with the occasional Perl and Bash scripts for admin tasks). In addition to modularity and flexibility, the project aims for full compatibility eith OpenJDK/IcedTea. The following modnlp modules are currently available * idx: an API and tools for (inverted) indexing, storage and retrieval of large amounts of text, with (XML-based) handling of meta-data. * tc: an API and tools for text categorisation, including, functionality for XML parsing, term set reduction (and basic keyword extraction), probabilistic classifier induction, two sample classification tools, and evaluation modules. * tec-toolsv2, consisting of tec-server, a corpus indexer and server for corpus access and analysis over the web and tec-client: a corpus analysis client. Unlike the (now obsolete) version 1 of these tools, originally developed for by the TEC project and written in Perl, C (server side) and Java (client), the version in this site (v2) is is written entirely in Java. ==== Other Software Required: ==== 1) exist-db, GPL v2, http://exist.sourceforge.net/ 2) prefuse, BSD, http://prefuse.org/ 3) Berkeley DB-JE, BSD, SleepyCat License, http://www.oracle.com/us/products/database/berkeley-db/je/ 4) gnu-regexp, LGPL, https://savannah.gnu.org/projects/gnu-regexp 5) JUNG, BSD, http://jung.sourceforge.net/ 6) Apache Commons Collection, Apache 2.0, http://projects.apache.org/projects/commons_collections.html 7) Colt, BSD, http://acs.lbl.gov/software/colt/ ==== Tarball URL: ==== http://savannah.gnu.org/submissions_uploads/modnlp-0.1.0.tar.gz _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/task/?11422> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/