I am choosing to use NUTCH-444 for my RSS functionality.  Doğacan commented on 
how to do this; he wrote:
    ...if you need the functionality of NUTCH-444, I would suggest
    trying a nightly version of Nutch. Becase NUTCH-444 by itself is not
    enough. You also need two patches from NUTCH-443 and probably
    NUTCH-504.

I have a couple newbie questions about the mechanics of installing this.

Prefatory comments: I have already installed another patch (for NUTCH-505) so I 
think I already have a nightly build (I'm guessing trunk==nightly?).  These 
were the steps I did:
$ svn co http://svn.apache.org/repos/asf/lucene/nutch/trunk nutch
$ cd nutch
$ wget 
https://issues.apache.org/jira/secure/attachment/12360411/NUTCH-505_draft_v2.patch
$ patch -p0 < NUTCH-505_draft_v2.patch
$ ant clean && ant

---

Now I need NUTCH-443 NUTCH-504 NUTCH-444.  Here's my guess:

$ cd nutch

$ wget 
http://issues.apache.org/jira/secure/attachment/12359953/NUTCH_443_reopened_v3.patch
$ patch -p0 < NUTCH_443_reopened_v3.patch
$ wget 
http://issues.apache.org/jira/secure/attachment/12350644/parse-map-core-draft-v1.patch
$ patch -p0 < parse-map-core-draft-v1.patch
$ wget 
http://issues.apache.org/jira/secure/attachment/12350634/parse-map-core-untested.patch
$ patch -p0 < parse-map-core-untested.patch
$ wget 
http://issues.apache.org/jira/secure/attachment/12357183/redirect_and_index.patch

$ patch -p0 < redirect_and_index.patch


$ wget 
http://issues.apache.org/jira/secure/attachment/12357300/redirect_and_index_v2.patch

$ patch -p0 < redirect_and_index_v2.patch

I'm really guessing on the above ... continuing:

$ wget 
http://issues.apache.org/jira/secure/attachment/12360361/NUTCH-504_v2.patch

$ patch -p0 < NUTCH-504_v2.patch

$ wget 
http://issues.apache.org/jira/secure/attachment/12360348/parse_in_fetchers.patch

$ patch -p0 < parse_in_fetchers.patch

... that felt like less of a guess, but now:


$ wget http://issues.apache.org/jira/secure/attachment/12357192/NUTCH-444.patch

$ patch -p0 < NUTCH-444.patch

$ wget 
http://issues.apache.org/jira/secure/attachment/12350820/parse-feed.tar.bz2

$ tar xjvf parse-feed.tar.bz2

what do I do with this newly created parse-feed directory?

so then I would do:

$ ant clean && ant


Wait a minute:  do I have this whole thing wrong?  Maybe Doğacan means that the 
nightly builds ALREADY contain NUTCH-443 and NUTCH-504 so that I would do this:


$ wget 
http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/lastStableBuild/artifact/trunk/build/nutch-2007-06-27_06-52-44.tar.gz
$ tar xvzf nutch-2007-06-27_06-52-44.tar.gz
$ cd nutch-2007-06-27_06-52-44

then this business:

$ wget http://issues.apache.org/jira/secure/attachment/12357192/NUTCH-444.patch


$ patch -p0 < NUTCH-444.patch


$ wget 
http://issues.apache.org/jira/secure/attachment/12350820/parse-feed.tar.bz2


$ tar xjvf parse-feed.tar.bz2



what do I do with this newly created parse-feed directory?



so then I would do:



$ ant clean && ant

I guess this is why "release engineer" is a job in and of itself!
Please advise.

--Kai Middleton

----- Original Message ----
From: Doğacan Güney <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Friday, June 22, 2007 1:39:12 AM
Subject: Re: Possibly use a different library to parse RSS feed for improved 
performance and compatibility

On 6/21/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote:
> I am a new nutch user and the ability to crawl RSS feeds is critical to my 
> mission.  Do I understand from this (lengthy) discussion that in order to get 
> the new RSS I need to either a) download one of the nightly builds and run 
> ant or b) download and apply a patch (NUTCH-444.patch, I gather).

Nutch 0.9 can already parse RSS feeds (via parse-feed) plugin.
However, if you need the functionality of NUTCH-444, I would suggest
trying a nightly version of Nutch. Becase NUTCH-444 by itself is not
enough. You also need two patches from NUTCH-443 and probably
NUTCH-504. If you are worrying about stability, nightlies of nutch are
generally pretty stable.

-- 
Doğacan Güney




x
x
x
x
x




       
____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to