Hi Otis, > This thread seems to have gotten very little attention. > Jérôme - I'm all for extracting sub-libraries that can really live on its > own and are substantial enough to warrant "their own identity". > > Personally, I'm the most interested in Language Identifier plugin becoming > a standalone, Nutch-independent piece. Doug had suggested we move it to > Lucene's contrib section. If you think it makes sense to have some of > these things lumped together, that's fine, too. It looks like Language > Identifier and Charset Detector may go well together. > > Is this something you want/will push for and make happen?
Just to add to this, it's something that I would push for whole-heartedly. In addition to Jerome, I would be happy to dedicate time to this sub-project, and feel it's quite worthy of being its own Stand-alone library. Just my two cents, thanks! Cheers, Chris > > Otis > > ----- Original Message ---- > From: Jérôme Charron <[EMAIL PROTECTED]> > To: [email protected] > Sent: Friday, April 7, 2006 4:26:54 AM > Subject: [Proposal] New Lucene sub-project > > Hi all, > > While chatting with Chris Mattmann, it seems to be evident to us that > there > is a need for a new sub-project within Lucene. > > For now, Lucene's sub-projects used in Nutch are : > 1. Lucene-java - The basis for search technology > 2. Hadoop - The distributed computing platform > 3. Nutch - The search engine that relies on Lucene and Hadoop. > > Since Nutch contains some value added pieces of code that focus on content > analysis, > we think it would be a good idea to split Nutch into a new sub-project > based > on content analysis > manipulation. The components we have identified are : > > 1. MimeType Repository > 2. Language Identifier > 3. Content Signature (MD5Signature / TextProfileSignature / ...) > (4. Generic Meta Data Infrastructure) > (5. Charset Detector) > (6. Parse Plugins Framework) > > The idea is to expose these pieces of codes into a standalone lib, since > we > are convinced they could be usefull > in many other projects than Nutch. > The benefits will be to have some code more widely used / tested / > contributed. > If this proposal is accepted, we have a candidate name for this new > project: > Tika (comes from my son ;-) ) > > Any comment is welcome. > > Jérôme > ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
