How about conf/crawl-urlfilter.txt ?? Marcin
On 5/4/07, simon_ece <[EMAIL PROTECTED]> wrote: > > hi all, > i am new to Nutch. I would like to crawl a particular site and get the > result in the following pattern.I dont want to list other urls from the > Crwaled site. > > Site to be Crwal :eg" www.example.com > ^http://([a-z0-9]*\.)example.com/([a-zA-Z]*)-\([a-z0-9]*\)-.*-\([0-9]*-[A-Za-z0-9]*\)\.html$ > > i can crawl and geting all the matching urls from the site, > i dont know how to filterout the urls and get only the particular urls, > kindly post the suggestions > Thanks & Regards > Simon > > -- > View this message in context: > http://www.nabble.com/Nutch---Filtering-%28REGEX%29-tf3690583.html#a10318059 > Sent from the Nutch - User mailing list archive at Nabble.com. > > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
