On Mon, Jan 12, 2009 at 3:02 PM, Ian Kent <ra...@themaw.net> wrote:

Here are a couple of patches we can use to check what is going on (my
> math may be wrong so it would be good if folks could check). They should
> apply to the git repo source.
>
> The first patch hash-report-stats.patch modifies cache_dump_cache() to
> report some statistics.
>
> The second patch hash-oaat-algo.patch makes the cache use the Vals
> proposed oaat algorithm without changing the hash table size for
> comparison.
>
> I used an indirect map with 35000 entries as a quick check.
>
> There are a couple of problems with this, first the map keys are a bit
> random and so are not representative of actual maps and since we use
> multiple caches in v5 the number of entries in a given cache may be
> quite different from the total number of entries.
>
> But, if my math is correct, we can get some marginally useful
> information from the experiment.
>
> The results:
>
> Current algorithm:
> cache: entries 35000 table size 77 ideal chain length 454.55
> chain length: empty 0 min 381 max 501 avg 454.55 std 22.78
>
> oaat algorithm:
> cache: entries 35000 table size 77 ideal chain length 454.55
> chain length: empty 0 min 402 max 532 avg 454.55 std 22.27
>
> This looks a bit odd, my math is probably broken somehow.


Perhaps you could calculate the median instead of the mean?

I have to conclude that while the oaat algorithm looks worse some how,
> if the math is OK, then it is likely that there are more chains closer
> to the average length with less outliers (smaller variance in chain
> length) than there are with the current algorithm.
>
> It would be good if we could get some real world numbers for both
> indirect and direct maps (mainly because direct maps have somewhat
> different keys, full path compared to directory component).
>
> I will do some more checks with the proposed large hash table size to
> see if the small table size is hiding any bias in the distribution.


On Mon, Jan 12, 2009 at 3:58 PM, Ian Kent <ra...@themaw.net> wrote:

Here are some more stats.
>
> cache: entries 35000 table size 977 ideal chain length 35.82
> chain length: empty 627 min 0 max 284 avg 35.82 std 72.14
>
> cache: entries 35000 table size 977 ideal chain length 35.82
> chain length: empty 0 min 17 max 59 avg 35.82 std 6.05
>
>
> cache: entries 35000 table size 2039 ideal chain length 17.17
> chain length: empty 1689 min 0 max 284 avg 17.17 std 53.05
>
> cache: entries 35000 table size 2039 ideal chain length 17.17
> chain length: empty 0 min 6 max 32 avg 17.17 std 4.18
>
>
> cache: entries 35000 table size 4093 ideal chain length 8.55
> chain length: empty 3743 min 0 max 284 avg 8.55 std 38.41
>
> cache: entries 35000 table size 4093 ideal chain length 8.55
> chain length: empty 2 min 0 max 21 avg 8.55 std 2.99
>
>
> There's clearly something wrong with my min and probably max calculation
> but the stats still show some interesting information.


Why do you think that you've miscalculated the minimum?

It looks like the additive algorithm is biased toward a small range of
> hash indexes which becomes more significant as the table size increases.
> So the distribution is significantly better as table size increases.
>
> I'm not clear on what the graphs (Pauls) are displaying but my
> impression is that they also show this same distribution bias toward
> clustering for the additive hash. Assuming they were done using actual
> map keys that would indicate that the benefit is likely fairly general.


I plotted the number of elements in each bucket.

As you now see, the additive hash function doesn't scale.

The only question remaining is how large should we make the table.
> A size of around 4k will lead to a lot of wasted space for many common
> configurations (including configurations that have many map entries
> spread over several individual maps) but a size around 1k still gives
> fairly long chains for large maps.
>
> Suggestions?


Perhaps you could make it configurable?

PS. I'll need to make a couple of other changes as well, eg. the null
> map entry cache is typically very small.
_______________________________________________
autofs mailing list
autofs@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/autofs

Reply via email to