On Wed, Mar 28 2018, Herbert Xu wrote: > On Wed, Mar 28, 2018 at 08:34:19AM +1100, NeilBrown wrote: >> >> It is easy to get an -EBUSY insertion failure when .disable_count is >> enabled, and I did get that. Blindly propagating that up caused lustre >> to get terribly confused - not too surprising really. > > Right, so this failure mode is specific to your patch 6.
I disagree. My patch 6 only makes it common instead of exceedingly rare. If any table in the list other than the first has a chain with 16 elements, then trying to insert an element with a hash which matches that chain will fail with -EBUSY. This is theoretically possible already, though astronomically unlikely. So that case will never be tested for. > > I think I see the problem. As it currently stands, we never > need growing when we hit the bucket length limit. We only do > rehashes. > > With your patch, you must change it so that we actually try to > grow the table if necessary when we hit the bucket length limit. It is hard to know if it is necessary. And making the new table larger will make the error less likely, but still won't make it impossible. So callers will have to handle it - just like they currently have to handle -ENOMEM even though it is highly unlikely (and not strictly necessary). Are these errors ever actually useful? I thought I had convinced myself before that they were (to throttle attacks on the hash function), but they happen even less often than I thought. > > Otherwise it will result in the EBUSY that you're seeing. > > I laso think that we shouldn't make this a toggle. If we're going > to do disable_count then it should be unconditionally done for > everyone. > > However, I personally would prefer a percpu elem count instead of > disabling it completely. Would that be acceptable to your use-case? Maybe. Reading a percpu counter isn't cheap. Reading it whenever a hash chain reaches 16 is reasonable, but I think we would want to read it a lot more often than that. So probably store the last-sampled time (with no locking) and only sample the counter if last-sampled is more than jiffies - 10*HZ (???) In the case in lustre we also shard the LRU list so that adding to the LRU causes minimal contention. Keeping a shared counter together with the lru is trivial and summing them periodically is little burden. Maybe it makes sense to include that functionality if rhashtables so that it is there for everyone. A percpu counter uses a lot more memory than atomic_t. Given that some callers set nelem_hint to 2 or 3, it seem likely that those callers don't want to waste memory. Should we force them to? Thanks, NeilBrown
signature.asc
Description: PGP signature