mapred branch

2006-04-10 Thread Anton Potehin
Where now placed mapred branch of nutch ?



image search

2006-04-10 Thread Anton Potehin
Somebody try create image search based on nutch ?



Re: mapred branch

2006-04-10 Thread Piotr Kosiorowski

Anton Potehin wrote:

Where now placed mapred branch of nutch ?



it is developed in trunk now.
P.


Content-Type inconsistency?

2006-04-10 Thread Jérôme Charron
It seems there is an inconsistency with content-type handling in Nutch:

1. The protocol level content-type header is added in content's metadata.
2. The content-type is then checked/guessed while instanciating the Content
object and stored in a private field
(at this step, the Content object can have 2 different content-types).
3. The Content's private field for content-type is used to find the good
parser.
4. Once the Parse object is constructed, the Content is no more used (= the
guessed content-type is lost)
5. Then the index-more plugin index the raw content-type and not the guessed
one
6. As a consequence the content-type displayed in more.jsp is the raw one,
and the one used to query on type is the raw one too.

Wouldn't it be better to always use the guessed content-type all along the
process?
(except in cache.jsp, where the raw one should be used)

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/


Re: [Proposal] New Lucene sub-project

2006-04-10 Thread Jérôme Charron
 I found your idea very interesting. I will be interested to contribute to
 the Parse Plugins Framework. I have developed similar one using Lucene.
 The
 project name is Lius.

Hi Rida,

Yes, I know Lius.
It seems very interesting, and I think it would be very interesting too
if we can merge our efforts  to a common lucene's sub project
(but for the moment, it seems that the tika project  doesn't cause a lot of
interest...?)

If you are interested please let me know.

If nutch-dev are interested to create such a project, you are welcome.

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/


nighly build brocken?

2006-04-10 Thread Stefan Groschupf

Hi,

looks like the latest nightly build is broken.
Looks like the jar that comes with the nightly build contains some  
patches that are not yet in the svn sources.

Is someone able to get the latest nutch nightly to run?

Thanks.
Stefan