Hi, I did some testing, the speedup results match yours. I was watching kmem cache stats of the delayed_node, seem to behave well. Increase and decrease of number of active objects, from 3 to about ~400 (creat_unlink 50000) and went back to initial values after a few seconds. On my setup, there are 13 delayed_node objects per page, which gives a good chance of quick allocation of the spare objects.
Size of the delayed_node is 304 bytes, which would slack 208 bytes (hitting the nearest kmalloc bucket of 512 bytes) per node. So to justify the recommendation of kmem_cache use. I was thinking about reducing size of the structure by a few bytes, to 292 bytes. In this case it would make 14 objects per slab page. There are padding holes after * atomic_t count (4 bytes) * bool inode_dirty (2 bytes) you may want to reorder the fileds like this: * [no change up to inode_item] * index_cnt * refs * count * the bool fields Still needed to reduce by 7 from 304-6=298. Does not seem to be possible, the bool flags provide only 1 byte. Under these conditions, I suggest to make the bool fields int again to reduce instruction count which manage these flags. Compiler did not make them int automatically (altough it is free to do it). dave -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html