Dne 25.10.2011 14:21, Niels Basjes napsal(a):
Why not do something very simple: Use the MD5 of the URL as the key you do the sorting by.
This scales very easy and highly randomized order.
Maybe not the optimal maximum distance, but certainly a very good distribution and very easy to built.
I tried it and problem is that sites with lot of URLs block queue. You can have few sites with 5m urls and they take major portion of queue and small sites are not crawled.

Reply via email to