Thanks. Could you please be more specific, how to setup the url filter? something like http://mysite.doc? But how can I get all doc files at mysite if the doc is at http://mysite/1/2/~user/a.doc.
Is there any reference for word parser? I don't know how to use it, thank you. On Mon, 28 Mar 2005 14:59:57 +0200, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > Setup a url filter for any *.doc and install and use the word parser, > that is all you need to do... > > Am 28.03.2005 um 07:12 schrieb Eric Money: > > > Hi all, > > > > If I wanna search a site but only interested in the > > files with .doc suffix, how should I re-write nutch to > > get all these files? Any comments and experiences > > are appreciated, thanks all in advance. > > > > > > ------------------------------------------------------- > > SF email is sponsored by - The IT Product Guide > > Read honest & candid reviews on hundreds of IT Products from real > > users. > > Discover which products truly live up to the hype. Start reading now. > > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > > _______________________________________________ > > Nutch-general mailing list > > [email protected] > > https://lists.sourceforge.net/lists/listinfo/nutch-general > > > > > --------------------------------------------------------------- > company: http://www.media-style.com > forum: http://www.text-mining.org > blog: http://www.find23.net > >
