Jérôme Charron wrote:
we think it would be a good idea to split Nutch into a new sub-project based
on content analysis
manipulation. The components we have identified are :
1. MimeType Repository
2. Language Identifier
3. Content Signature (MD5Signature / TextProfileSignature / ...)
(4. Generic Meta Data Infrastructure)
(5. Charset Detector)
(6. Parse Plugins Framework)
The idea is to expose these pieces of codes into a standalone lib, since we
are convinced they could be usefull
in many other projects than Nutch.
This sounds like it could arguably be six new projects. Perhaps another
way to approach this is as a build process. Perhaps nutch, like Lucene
Java, should start providing more than a single jar file. Perhaps a
release (both nightly and numbered) should consist of both a composite
Nutch tar file and also a suite of sub tar files?
That said, if you're convinced that these components form a coherent,
independently useful subset of Nutch, and that you have a sustainable
set of committers who will maintain and regularly release this, then
please submit a proposal to the Lucene PMC (pmc at lucene.a.o). The PMC
can discuss it and, eventually, vote to decide.
Doug
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers