Hi Andrew,
you can either get one of the distributions, a nightly build, or check
out directly from SVN to get the sources.
Then I would suggest checking the targets in the ant build file; there
are targets for compiling. cleaning and testing. Use 'ant tar' to make
a release tarball that you
I think someting like this has already been done (apart from the daily
changes you suggest) http://issues.apache.org/jira/browse/NUTCH-207
Rgrds, Thomas
On 5/1/06, Fankhauser, Alain [EMAIL PROTECTED] wrote:
Hello
I'm thinking about to create a throttle, who let us decide at
I'm not so sure. When crawling Apache we had trouble with this feature.
Some HTML files that had an XML header and the server identified as
text/html Nutch decided to treat as XML, not HTML.
Yes, the current version of the mime-type resolver is a crude one.
XML, HTML, RSS and all XML based
[EMAIL PROTECTED] wrote:
As far as we understood from MapRed documentation all reduce tasks must be
launched after last map task is finished e.g map and reduce must not work
simultaneously. But often in logs we see such records: map 80%, reduce 10%
and many more records where map is less then
Jérôme Charron wrote:
We had to turn off
the guessing of content types to index Apache correctly.
Instead of turning off the guessing of content types you should only to
remove the magic for xml in mime-types.xml
Perhaps that would have worked also, but, with Apache, simply trusting
the
I also have .classpath, and .project files for hadoop in Eclipse.
Why are these not checked in?
- alan
-Original Message-
From: TDLN [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 02, 2006 1:33 AM
To: nutch-dev@lucene.apache.org
Subject: Re: A Developer's getting started doc?
Hi Andrew,
Three new plugins that parse, index and query meta tags defined in the
configuration
Key: NUTCH-260
URL: http://issues.apache.org/jira/browse/NUTCH-260
Project: Nutch
Type: New
[ http://issues.apache.org/jira/browse/NUTCH-260?page=all ]
Jake Vanderdray updated NUTCH-260:
--
Attachment: nutch_customizations.tar
The attachment is a tarball of the plugin source.
Three new plugins that parse, index and query meta tags defined in
Thomas,
I would really appreciate your .classpath and .project files for
Eclipse (for Nutch-trunk). Could you send them to me? Or could you
upload them somewhere?
I don't think I am novice in terms of Eclipse but frankly I am to lazy
configuring all these settings manually. I do use Maven all