Hi,
On 15/08/18 18:04, Bob Peterson wrote:
Hi,
Before this patch, function gfs2_rgrp_bh_get would only update the
rgrp info from the buffer if the GFS2_RDF_UPTODATE flag was clear.
This went back to the days when gfs2 kept rgrp version numbers, and
re-read the buffer_heads constantly, not just when needed.
The problem is, RDF_UPTODATE is a local flag, but lvbs are changed
dynamically throughout the cluster. This is a serious problem when
using the rgrplvb mount option because of scenarios like this:
1. Node A mounts the file system, sets RDF_UPTODATE for rgrp X.
2. Node B mounts the file system, sets RDF_UPTODATE for rgrp X.
3. Node A deletes a large file, freeing up lots of blocks,
so the lvb gets updated.
At this point Node B must have invalidated it's copy of the rgrp, since
Node A must have an exclusive lock. That should have cleared the
GFS2_RDF_UPTODATE flag. So why did that not happen?
4. Node B now re-reads the rgrp, but because it's marked UPTODATE,
it decides not to update its in-core copy of rgrp X.
Why is it marked up to date when rgrp_go_inval() should have cleared it
when the rgrp lock was demoted to allow Node A it's exclusive lock?
At this point, Node B will have the wrong value for rgd->rd_free,
the amount of free space in the rgrp.
But there's no good reason not to grab the most recent values from
the buffer: it only costs us a few cpu cycles to read them.
This patch removes the UPTODATE check in favor of just always
reading the rgrp values in from the buffer we just read.
Signed-off-by: Bob Peterson <rpete...@redhat.com>
---
fs/gfs2/rgrp.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
I think we should understand why the flag is not set correctly. If it is
not working correctly for this case, then why can we still trust it for
the other check in update_rgrp_lvb() ?
Steve.
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 1ad3256b9cbc..6eec634eae2d 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1177,15 +1177,14 @@ static int gfs2_rgrp_bh_get(struct gfs2_rgrpd *rgd)
}
}
- if (!(rgd->rd_flags & GFS2_RDF_UPTODATE)) {
- for (x = 0; x < length; x++)
- clear_bit(GBF_FULL, &rgd->rd_bits[x].bi_flags);
- gfs2_rgrp_in(rgd, (rgd->rd_bits[0].bi_bh)->b_data);
- rgd->rd_flags |= (GFS2_RDF_UPTODATE | GFS2_RDF_CHECK);
- rgd->rd_free_clone = rgd->rd_free;
- /* max out the rgrp allocation failure point */
- rgd->rd_extfail_pt = rgd->rd_free;
- }
+ for (x = 0; x < length; x++)
+ clear_bit(GBF_FULL, &rgd->rd_bits[x].bi_flags);
+ gfs2_rgrp_in(rgd, (rgd->rd_bits[0].bi_bh)->b_data);
+ rgd->rd_flags |= (GFS2_RDF_UPTODATE | GFS2_RDF_CHECK);
+ rgd->rd_free_clone = rgd->rd_free;
+ /* max out the rgrp allocation failure point */
+ rgd->rd_extfail_pt = rgd->rd_free;
+
if (cpu_to_be32(GFS2_MAGIC) != rgd->rd_rgl->rl_magic) {
rgd->rd_rgl->rl_unlinked = cpu_to_be32(count_unlinked(rgd));
gfs2_rgrp_ondisk2lvb(rgd->rd_rgl,