On Thu, 2006-03-09 at 21:51 +0200, Gal Nitzan wrote:
 Actually there is a property in conf: generate.max.per.host

That has proven to be problematic.

foo.domain.com
bar.domain.com
baz.domain.com
*** Repeat up to 4 Million times for some content generator sites ***

Each of these gets a different slot which effectively stalls everything
else.

Are there any objections to changing this to be one bucket per domain
instead of one per hostname?

That sounds like a good idea.

From what I remember when we did this, generating the base domain for a URL is a bit of a fuzzy problem. Things like language code suffixes, shortened versions of .com with some country codes (.co.jp), etc.

Eventually we shifted to resolving domains to IP addresses. I think there's been discussion of that on this list previously, to help ensure threads on different TaskTracker nodes don't hit the same server at the same time.

For the cases you've run into, do they resolve down to a limited number of unique IP addresses?

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

Reply via email to