Hi all, I am crawling some sites that have circular links so I am seeing the same pages over and over again. eg: site.com/dir1/dir1/file.html site.com/dir1/dir1/dir1/file.html site.com/dir1/dir1/dir1/dir1/file.html
or site.com/d1/d2/file.html site.com/d1/d2/d1/d2/file.html site.com/d1/d2/d1/d2/d1/d2/file.html and so on and so forth. They are then removed as duplicates, but the db is filling with these links. Is there a way to delete these links from the database? P. ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
