David Miller a écrit :
From: Eric Dumazet <[EMAIL PROTECTED]>
Date: Tue, 06 Mar 2007 08:14:46 +0100

I wonder... are you sure this has no relation with the size of rt_hash_locks / RT_HASH_LOCK_SZ ?
One entry must have the same lock in the two tables when resizing is in flight.
#define MIN_RTHASH_SHIFT LOG2(RT_HASH_LOCK_SZ)

Good point.

+static struct rt_hash_bucket *rthash_alloc(unsigned int sz)
+{
+       struct rt_hash_bucket *n;
+
+       if (sz <= PAGE_SIZE)
+               n = kmalloc(sz, GFP_KERNEL);
+       else if (hashdist)
+               n = __vmalloc(sz, GFP_KERNEL, PAGE_KERNEL);
+       else
+               n = (struct rt_hash_bucket *)
+                       __get_free_pages(GFP_KERNEL, get_order(sz));
I dont feel well with this.
Maybe we could try a __get_free_pages(), and in case of failure, fallback to vmalloc(). Then keep a flag to be able to free memory correctly. Anyway, if (get_order(sz)>=MAX_ORDER) we know __get_free_pages() will fail.

We have to use vmalloc() for the hashdist case so that the pages
are spread out properly on NUMA systems.  That's exactly what the
large system hash allocator is going to do on bootup anyways.

Yes, but on bootup you have an appropriate NUMA active policy. (Well... we hope so, but it broke several time in the past)
I am not sure what kind of mm policy is active for scheduled works.

Anyway I have some XX GB machines, non NUMA, and I would love to be able to have a 2^20 slots hash table, without having to increase MAX_ORDER.


Look, either both are right or both are wrong.  I'm just following
protocol above and you'll note the PRECISE same logic exists in other
dynamically growing hash table implementations such as
net/xfrm/xfrm_hash.c



Yes, they are both wrong/dumb :)

Can we be smarter, or do we have to stay dumb ? :)

struct rt_hash_bucket *n = NULL;

        if (sz <= PAGE_SIZE) {
                n = kmalloc(sz, GFP_KERNEL);
                *kind = allocated_by_kmalloc;
        }
        else if (!hashdist) {
                n = (struct rt_hash_bucket *)
                        __get_free_pages(GFP_KERNEL, get_order(sz));
                *kind = allocated_by_get_free_pages;
        }
        if (!n) {
                n = __vmalloc(sz, GFP_KERNEL, PAGE_KERNEL);
                *kind = allocated_by_vmalloc;
        }

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to