On 6/28/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote: > I am choosing to use NUTCH-444 for my RSS functionality. Doğacan commented > on how to do this; he wrote: > ...if you need the functionality of NUTCH-444, I would suggest > trying a nightly version of Nutch. Becase NUTCH-444 by itself is not > enough. You also need two patches from NUTCH-443 and probably > NUTCH-504. > > I have a couple newbie questions about the mechanics of installing this. > > Prefatory comments: I have already installed another patch (for NUTCH-505) so > I think I already have a nightly build (I'm guessing trunk==nightly?). These > were the steps I did: > $ svn co http://svn.apache.org/repos/asf/lucene/nutch/trunk nutch > $ cd nutch > $ wget > https://issues.apache.org/jira/secure/attachment/12360411/NUTCH-505_draft_v2.patch > $ patch -p0 < NUTCH-505_draft_v2.patch > $ ant clean && ant > > --- > > Now I need NUTCH-443 NUTCH-504 NUTCH-444. Here's my guess: > > $ cd nutch > > $ wget > http://issues.apache.org/jira/secure/attachment/12359953/NUTCH_443_reopened_v3.patch > $ patch -p0 < NUTCH_443_reopened_v3.patch > $ wget > http://issues.apache.org/jira/secure/attachment/12350644/parse-map-core-draft-v1.patch > $ patch -p0 < parse-map-core-draft-v1.patch > $ wget > http://issues.apache.org/jira/secure/attachment/12350634/parse-map-core-untested.patch > $ patch -p0 < parse-map-core-untested.patch > $ wget > http://issues.apache.org/jira/secure/attachment/12357183/redirect_and_index.patch > > $ patch -p0 < redirect_and_index.patch > > > $ wget > http://issues.apache.org/jira/secure/attachment/12357300/redirect_and_index_v2.patch > > $ patch -p0 < redirect_and_index_v2.patch > > I'm really guessing on the above ... continuing: > > $ wget > http://issues.apache.org/jira/secure/attachment/12360361/NUTCH-504_v2.patch > > $ patch -p0 < NUTCH-504_v2.patch > > $ wget > http://issues.apache.org/jira/secure/attachment/12360348/parse_in_fetchers.patch > > $ patch -p0 < parse_in_fetchers.patch > > ... that felt like less of a guess, but now: > > > $ wget > http://issues.apache.org/jira/secure/attachment/12357192/NUTCH-444.patch > > $ patch -p0 < NUTCH-444.patch > > $ wget > http://issues.apache.org/jira/secure/attachment/12350820/parse-feed.tar.bz2 > > $ tar xjvf parse-feed.tar.bz2 > > what do I do with this newly created parse-feed directory? > > so then I would do: > > $ ant clean && ant > > > Wait a minute: do I have this whole thing wrong? Maybe Doğacan means that > the nightly builds ALREADY contain NUTCH-443 and NUTCH-504 so that I would do > this: > > > $ wget > http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/lastStableBuild/artifact/trunk/build/nutch-2007-06-27_06-52-44.tar.gz > $ tar xvzf nutch-2007-06-27_06-52-44.tar.gz > $ cd nutch-2007-06-27_06-52-44 > > then this business: > > $ wget > http://issues.apache.org/jira/secure/attachment/12357192/NUTCH-444.patch > > > $ patch -p0 < NUTCH-444.patch > > > $ wget > http://issues.apache.org/jira/secure/attachment/12350820/parse-feed.tar.bz2 > > > $ tar xjvf parse-feed.tar.bz2 > > > > what do I do with this newly created parse-feed directory? > > > > so then I would do: > > > > $ ant clean && ant > > I guess this is why "release engineer" is a job in and of itself! > Please advise.
If you downloaded nightly build of 27th June, it contains feed plugin already (the plugin is called "feed", not "parse-feed", parse-feed was an older plugin and it is never committed. In my earlier comment, I meant to write parse-rss but wrote parse-feed). So, you don't have to apply any patches or anything. Just download a recent nightly build, and you are good to go :). You can also checkout trunk from svn and it will work too. > > --Kai Middleton > > ----- Original Message ---- > From: Doğacan Güney <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Sent: Friday, June 22, 2007 1:39:12 AM > Subject: Re: Possibly use a different library to parse RSS feed for improved > performance and compatibility > > On 6/21/07, Kai_testing Middleton <[EMAIL PROTECTED]> wrote: > > I am a new nutch user and the ability to crawl RSS feeds is critical to my > > mission. Do I understand from this (lengthy) discussion that in order to > > get the new RSS I need to either a) download one of the nightly builds and > > run ant or b) download and apply a patch (NUTCH-444.patch, I gather). > > Nutch 0.9 can already parse RSS feeds (via parse-feed) plugin. > However, if you need the functionality of NUTCH-444, I would suggest > trying a nightly version of Nutch. Becase NUTCH-444 by itself is not > enough. You also need two patches from NUTCH-443 and probably > NUTCH-504. If you are worrying about stability, nightlies of nutch are > generally pretty stable. > > -- > Doğacan Güney > > > > > x > x > x > x > x > > > > > > ____________________________________________________________________________________ > Get the Yahoo! toolbar and be alerted to new email wherever you're surfing. > http://new.toolbar.yahoo.com/toolbar/features/mail/index.php -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
