-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/28/2011 06:53 PM, Andi Kleen wrote: > Jeff Mahoney <je...@suse.com> writes: > >> The extent_state structure is used at the core of the extent i/o >> code for managing flags, locking, etc. It requires allocations >> deep in the write code and if failures occur they are difficult >> to recover from. >> >> We avoid most of the failures by using a mempool, which can sleep >> when required, to honor the allocations. This allows future >> patches to convert most of the {set,clear,convert}_extent_bit and >> derivatives to return void. > > Is this really safe? > > iirc if there's any operation that needs multiple mempool objects > you can get into the following ABBA style deadlock: > > thread 1 thread 2 > > get object A from pool 1 get object C from pool 2 get object B from > pool 2 pool 2 full -> block get object D from pool 2 ^ pool1, i assume > pool1 full -> block > > > Now for thread 1 to make progress it needs thread 2 to free its > object first, but that needs thread 1 to free its object also > first, which is a deadlock. > > It would still work if there are other users which eventually make > progress, but if you're really unlucky either pool 1 or 2 is > complete used by threads doing a multiple object operation. So you > got a nasty rare deadlock ...
Yes, I think you're right. I think the risk is probably there and I'm not sure it can be totally mitigated. We'd be stuck if there is a string of pathological cases and both mempool are empty. The only way I can see to try to help the situation is to make the mempool fairly substantial, which is what I did here. I'd prefer to make the mempools per-fs but that would require pretty heavy modifications in order to pass around a per-fs struct. In any case, the risk isn't totally eliminated. > The only way to handle this would be to always preallocate all > objects needed for a operation in an atomic way. This was how I wanted to do it at first but I couldn't see a way to make it work in a manner that wouldn't kill performance since the calculations effectively need to be done twice and are pretty common. The other issue is that operations can be pretty complex and preallocating for all cases will either be really wasteful or really error prone. > Or never use more than one object. I need to do some analysis here to see if that is even possible. - -Jeff - -- Jeff Mahoney SUSE Labs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQIbBAEBAgAGBQJO1CFxAAoJEB57S2MheeWyL1EP+Pd+c1hodWEA6StTnE1pG+Tl u5Rg+VBmN8AdcfdOzvyoyfZgPaCxtHzaJVX2y3PzHtY1INEMp2EoijaL1X9afKQF lJP1BmpIARARMrQrYbqvZrl+DGUeIwIhC2LBuNffYifbMAn3KdTLRhkvcGehTmQO Vik1bgmo6ZKjAIP/FLViGdB4hITITwHtT7n4bDyZ9ATgqgq5sGUWnw7f0sILeBqR NOt8QVgaCmzZIF5foUr2UsBBxjo0joiW2V7YWs4roJeCtDx+6mwyRifQFTGuN2xi 1CVTnTT+FYXkD8IB6mKNHptIDLu7q29RaikSFsgV684IYAp510OBStgBuCpBeeNc Xd77XJkf9UCYSXWs1j587MnOqaeV818OqvwxHAGZnavBHZ/+8AFOJ77GAVaP1Xlm M6gYP1TXjL0yjJEWNVtP0kBjimKUFw5ECv86DCQllRTY/5oWmAxjyJviEzVgJBiM EDtzf17APyiTSWawRZlDc6a46kG2Jm6QiNmx4MyAlSFu81Qv0EDOpC4TAwiRDZKa KLjaW10TKcC5TDwZXcOBiF/PrpvBCvJjd0N3sCFVD0Tn0cdqME9BuFUP1hI1ku+T L5VyzHsp07ndCk+H6gGWkCZzEtSSyWFG4gHkFcV7z/IF5bPpsyxcxdGXGzBGbYUh 9NdrCz5lMb49VTBBSaY= =ZXGi -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html