2011/11/28 Alexandre Oliva <ol...@lsd.ic.unicamp.br>: > We're failing to create clusters with bitmaps because > setup_cluster_no_bitmap checks that the list is empty before inserting > the bitmap entry in the list for setup_cluster_bitmap, but the list > field is only initialized when it is restored from the on-disk free > space cache, or when it is written out to disk. > > Besides a potential race condition due to the multiple use of the list > field, filesystem performance severely degrades over time: as we use > up all non-bitmap free extents, the try-to-set-up-cluster dance is > done at every metadata block allocation. For every block group, we > fail to set up a cluster, and after failing on them all up to twice, > we fall back to the much slower unclustered allocation.
This matches exactly what I've been observing in our ceph cluster. I've now installed your patches (1-11) on two servers. The cluster setup problem seems to be gone. - A big thanks for that! However another thing is causing me some headeache: When I'm doing havy reading in our ceph cluster. The load and wait-io on the patched servers is higher than on the unpatched ones. Dstat from an unpatched server: ----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 1 6 83 8 0 1| 22M 348k| 336k 93M| 0 0 |8445 3715 1 5 87 7 0 1| 12M 1808k| 214k 65M| 0 0 |5461 1710 1 3 85 10 0 0| 11M 640k| 313k 49M| 0 0 |5919 2853 1 6 84 9 0 1| 12M 608k| 358k 69M| 0 0 |7406 3645 1 7 78 13 0 1| 15M 5344k| 348k 105M| 0 0 |9765 4403 1 7 80 10 0 1| 22M 1368k| 358k 89M| 0 0 |8036 3202 1 9 72 16 0 1| 22M 2424k| 646k 137M| 0 0 | 12k 5527 Dstat from a patched server: ---total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- usr sys idl wai hiq siq| read writ| recv send| in out | int csw 1 2 61 35 0 0|2500k 2736k| 141k 34M| 0 0 |4415 1603 1 4 48 47 0 1| 10M 3924k| 353k 61M| 0 0 |6871 3771 1 5 55 38 0 1| 10M 1728k| 385k 92M| 0 0 |8030 2617 2 8 69 20 0 1| 18M 1384k| 435k 130M| 0 0 | 10k 4493 1 5 85 8 0 1|7664k 84k| 287k 97M| 0 0 |6231 1357 1 3 91 5 0 0| 10M 144k| 194k 44M| 0 0 |3807 1081 1 7 66 25 0 1| 20M 1248k| 404k 101M| 0 0 |8676 3632 0 3 38 58 0 0|8104k 2660k| 176k 40M| 0 0 |4841 2093 This seems to be coming from "btrfs-endio-1". A kernel thread that has not caught my attention on unpatched systems, yet. I did some tracing on that process with ftrace and I can see that the time is wasted in end_bio_extent_readpage(). In a single call to end_bio_extent_readpage()the functions unlock_extent_cached(), unlock_page() and btrfs_readpage_end_io_hook() are invoked 128 times (each). Do you have any idea what's going on here? (Please note that the filesystem is still unmodified - metadata overhead is large). Thanks, Christian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html