Hi, Before this patch, frunction gfs2_trans_add_meta called list_add to add a buffer to a transaction, tr_buf. Later, in the before_commit functions, it traversed the list in sequential order, which meant that they were processed in a sub-optimal order. For example, blocks could go out in 54321 order rather than 12345, causing media heads to bounce unnecessarily.
This makes no difference for small IO operations, but large writes benefit greatly when they add lots of indirect blocks to a transaction, and those blocks are allocated in ascending order, as the block allocator tries to do. This patch changes it to list_add_tail so they are traversed (and therefore written back) in the same order as they are added to the transaction. In one of the more extreme examples, I did a test where I had 10 simultaneous instances of iozone, each writing 25GB of data, using one of our performance testing machines. The results are: Without the patch With the patch ----------------- ------------------ Children see throughput 1395068.00 kB/sec 1527073.61 kB/sec Parent sees throughput 1348544.00 kB/sec 1485594.66 kB/sec These numbers are artificially inflated because I was also running with Andreas Gruenbacher's iomap-write patch set plus my rgrp sharing patch set in both these cases. Still, it shows a 9 percent performance boost for both children and parent throughput. Signed-off-by: Bob Peterson <rpete...@redhat.com> --- fs/gfs2/trans.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c index c75cacaa349b..c4c00b90935c 100644 --- a/fs/gfs2/trans.c +++ b/fs/gfs2/trans.c @@ -247,7 +247,7 @@ void gfs2_trans_add_meta(struct gfs2_glock *gl, struct buffer_head *bh) gfs2_pin(sdp, bd->bd_bh); mh->__pad0 = cpu_to_be64(0); mh->mh_jid = cpu_to_be32(sdp->sd_jdesc->jd_jid); - list_add(&bd->bd_list, &tr->tr_buf); + list_add_tail(&bd->bd_list, &tr->tr_buf); tr->tr_num_buf_new++; out_unlock: gfs2_log_unlock(sdp);