Hi all,

I am crawling some sites that have circular links so I am seeing the same 
pages over and over again.
eg:
site.com/dir1/dir1/file.html
site.com/dir1/dir1/dir1/file.html
site.com/dir1/dir1/dir1/dir1/file.html

or 

site.com/d1/d2/file.html
site.com/d1/d2/d1/d2/file.html
site.com/d1/d2/d1/d2/d1/d2/file.html

and so on and so forth. They are then removed as duplicates, but the db is 
filling with these links.

Is there a way to delete these links from the database?

P.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to