mapred branch
Where now placed mapred branch of nutch ?
image search
Somebody try create image search based on nutch ?
Re: mapred branch
Anton Potehin wrote: Where now placed mapred branch of nutch ? it is developed in trunk now. P.
Content-Type inconsistency?
It seems there is an inconsistency with content-type handling in Nutch: 1. The protocol level content-type header is added in content's metadata. 2. The content-type is then checked/guessed while instanciating the Content object and stored in a private field (at this step, the Content object can have 2 different content-types). 3. The Content's private field for content-type is used to find the good parser. 4. Once the Parse object is constructed, the Content is no more used (= the guessed content-type is lost) 5. Then the index-more plugin index the raw content-type and not the guessed one 6. As a consequence the content-type displayed in more.jsp is the raw one, and the one used to query on type is the raw one too. Wouldn't it be better to always use the guessed content-type all along the process? (except in cache.jsp, where the raw one should be used) Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
Re: [Proposal] New Lucene sub-project
I found your idea very interesting. I will be interested to contribute to the Parse Plugins Framework. I have developed similar one using Lucene. The project name is Lius. Hi Rida, Yes, I know Lius. It seems very interesting, and I think it would be very interesting too if we can merge our efforts to a common lucene's sub project (but for the moment, it seems that the tika project doesn't cause a lot of interest...?) If you are interested please let me know. If nutch-dev are interested to create such a project, you are welcome. Regards Jérôme -- http://motrech.free.fr/ http://www.frutch.org/
nighly build brocken?
Hi, looks like the latest nightly build is broken. Looks like the jar that comes with the nightly build contains some patches that are not yet in the svn sources. Is someone able to get the latest nutch nightly to run? Thanks. Stefan