Re: [Gluster-devel] fstat problems when killing with stat prefetch turned on
Miklos, Thanks for this patch. Team will review this, and update you on if this is good fix. -Amar On Thu, May 4, 2017 at 6:59 PM, Miklós Fokin wrote: > Sorry, missing lines from the attachment. > > On 05/04/2017 03:24 PM, Miklós Fokin wrote: > > Hello, > > I seem to have discovered what caused half of the problem. > I did update the bug report with a more detailed description, but the > short version is that the attached diff solves the issue when we get an > fstat with a size of 0 after killing a brick (not letting the first update > to fsync be from an arbiter). > My question is: should I make a review about it or should further needed > changes be investigated first? > > Best regards, > Miklós > > > On 04/26/2017 12:58 PM, Miklós Fokin wrote: > > Thanks for the response. > We didn't have the options set that the first two reviews were about. > The third was about changes to performance.readdir-ahead. > I turned this feature off today with prefetch being turned on on my > computer, and the bug still appeared, so I would think that the commit > would not fix it either. > > Best regards, > Miklós > > > On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote: > > Recently we had worked on some patches to ensure correct stats are > returned. > > https://review.gluster.org/15759 > https://review.gluster.org/15659 > https://review.gluster.org/16419 > > Referring to these patches and bugs associated with them might give you > some insight into the nature of the problem. The major culprit was > interaction between readdir-ahead and stat-prefetch. So, the issue you are > seeing might be addressed by these patches. > > ----- Original Message - > > From: "Miklós Fokin" > > To: gluster-devel@gluster.org > Sent: Tuesday, April 25, 2017 3:42:52 PM > Subject: [Gluster-devel] fstat problems when killing with stat prefetch > turned on > > Hello, > > I tried reproducing the problem that Mateusz Slupny was experiencing > before (stat returning bad st_size value on self-healing) on my own > computer with only 3 bricks (one being an arbiter) on 3.10.0. > The result with such a small setup was that the bug appeared both on > killing and during the self-healing process, but only rarely (once in > hundreds of tries) and only with performance.stat-prefetch turned on. > This might be a completely different issue as on the setup Matt was > using, he could reproduce it with the mentioned option being off, it > always happened but only during recovery, not after killing. > I did submit a bug report about this: > https://bugzilla.redhat.com/show_bug.cgi?id=1444892. > > The problem is as Matt wrote is that this causes data corruption if one > is to use the returned size on writing. > Could I get some pointers as to what parts of the gluster code I should > be looking at to figure out what the problem might be? > > Thanks in advance, > Miklós > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel > > > > > > ___ > Gluster-devel mailing > listGluster-devel@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-devel > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel > -- Amar Tumballi (amarts) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] fstat problems when killing with stat prefetch turned on
Sorry, missing lines from the attachment. On 05/04/2017 03:24 PM, Miklós Fokin wrote: Hello, I seem to have discovered what caused half of the problem. I did update the bug report with a more detailed description, but the short version is that the attached diff solves the issue when we get an fstat with a size of 0 after killing a brick (not letting the first update to fsync be from an arbiter). My question is: should I make a review about it or should further needed changes be investigated first? Best regards, Miklós On 04/26/2017 12:58 PM, Miklós Fokin wrote: Thanks for the response. We didn't have the options set that the first two reviews were about. The third was about changes to performance.readdir-ahead. I turned this feature off today with prefetch being turned on on my computer, and the bug still appeared, so I would think that the commit would not fix it either. Best regards, Miklós On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote: Recently we had worked on some patches to ensure correct stats are returned. https://review.gluster.org/15759 https://review.gluster.org/15659 https://review.gluster.org/16419 Referring to these patches and bugs associated with them might give you some insight into the nature of the problem. The major culprit was interaction between readdir-ahead and stat-prefetch. So, the issue you are seeing might be addressed by these patches. - Original Message - From: "Miklós Fokin" To: gluster-devel@gluster.org Sent: Tuesday, April 25, 2017 3:42:52 PM Subject: [Gluster-devel] fstat problems when killing with stat prefetchturned on Hello, I tried reproducing the problem that Mateusz Slupny was experiencing before (stat returning bad st_size value on self-healing) on my own computer with only 3 bricks (one being an arbiter) on 3.10.0. The result with such a small setup was that the bug appeared both on killing and during the self-healing process, but only rarely (once in hundreds of tries) and only with performance.stat-prefetch turned on. This might be a completely different issue as on the setup Matt was using, he could reproduce it with the mentioned option being off, it always happened but only during recovery, not after killing. I did submit a bug report about this: https://bugzilla.redhat.com/show_bug.cgi?id=1444892. The problem is as Matt wrote is that this causes data corruption if one is to use the returned size on writing. Could I get some pointers as to what parts of the gluster code I should be looking at to figure out what the problem might be? Thanks in advance, Miklós ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c index ac834e9..d6185ca 100644 --- a/xlators/cluster/afr/src/afr-common.c +++ b/xlators/cluster/afr/src/afr-common.c @@ -3318,6 +3318,7 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this, int child_index = (long) cookie; int read_subvol = 0; call_stub_t *stub = NULL; + afr_private_t *private = this->private; local = frame->local; @@ -3327,7 +3328,8 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this, LOCK (&frame->lock); { if (op_ret == 0) { -if (local->op_ret == -1) { +if (local->op_ret == -1 && this->private && +!AFR_IS_ARBITER_BRICK (private, child_index)) { local->op_ret = 0; local->cont.inode_wfop.prebuf = *prebuf; ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] fstat problems when killing with stat prefetch turned on
Hello, I seem to have discovered what caused half of the problem. I did update the bug report with a more detailed description, but the short version is that the attached diff solves the issue when we get an fstat with a size of 0 after killing a brick (not letting the first update to fsync be from an arbiter). My question is: should I make a review about it or should further needed changes be investigated first? Best regards, Miklós On 04/26/2017 12:58 PM, Miklós Fokin wrote: Thanks for the response. We didn't have the options set that the first two reviews were about. The third was about changes to performance.readdir-ahead. I turned this feature off today with prefetch being turned on on my computer, and the bug still appeared, so I would think that the commit would not fix it either. Best regards, Miklós On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote: Recently we had worked on some patches to ensure correct stats are returned. https://review.gluster.org/15759 https://review.gluster.org/15659 https://review.gluster.org/16419 Referring to these patches and bugs associated with them might give you some insight into the nature of the problem. The major culprit was interaction between readdir-ahead and stat-prefetch. So, the issue you are seeing might be addressed by these patches. - Original Message - From: "Miklós Fokin" To: gluster-devel@gluster.org Sent: Tuesday, April 25, 2017 3:42:52 PM Subject: [Gluster-devel] fstat problems when killing with stat prefetchturned on Hello, I tried reproducing the problem that Mateusz Slupny was experiencing before (stat returning bad st_size value on self-healing) on my own computer with only 3 bricks (one being an arbiter) on 3.10.0. The result with such a small setup was that the bug appeared both on killing and during the self-healing process, but only rarely (once in hundreds of tries) and only with performance.stat-prefetch turned on. This might be a completely different issue as on the setup Matt was using, he could reproduce it with the mentioned option being off, it always happened but only during recovery, not after killing. I did submit a bug report about this: https://bugzilla.redhat.com/show_bug.cgi?id=1444892. The problem is as Matt wrote is that this causes data corruption if one is to use the returned size on writing. Could I get some pointers as to what parts of the gluster code I should be looking at to figure out what the problem might be? Thanks in advance, Miklós ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c index ac834e9..8ef170d 100644 --- a/xlators/cluster/afr/src/afr-common.c +++ b/xlators/cluster/afr/src/afr-common.c @@ -3318,6 +3318,7 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this, int child_index = (long) cookie; int read_subvol = 0; call_stub_t *stub = NULL; + afr_private_t *private = this->private; local = frame->local; @@ -3327,6 +3328,9 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this, LOCK (&frame->lock); { if (op_ret == 0) { +if (local->op_ret == -1 && this->private && +!AFR_IS_ARBITER_BRICK (private, child_index)) { + if (local->op_ret == -1) { local->op_ret = 0; ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] fstat problems when killing with stat prefetch turned on
Thanks for the response. We didn't have the options set that the first two reviews were about. The third was about changes to performance.readdir-ahead. I turned this feature off today with prefetch being turned on on my computer, and the bug still appeared, so I would think that the commit would not fix it either. Best regards, Miklós On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote: Recently we had worked on some patches to ensure correct stats are returned. https://review.gluster.org/15759 https://review.gluster.org/15659 https://review.gluster.org/16419 Referring to these patches and bugs associated with them might give you some insight into the nature of the problem. The major culprit was interaction between readdir-ahead and stat-prefetch. So, the issue you are seeing might be addressed by these patches. - Original Message - From: "Miklós Fokin" To: gluster-devel@gluster.org Sent: Tuesday, April 25, 2017 3:42:52 PM Subject: [Gluster-devel] fstat problems when killing with stat prefetch turned on Hello, I tried reproducing the problem that Mateusz Slupny was experiencing before (stat returning bad st_size value on self-healing) on my own computer with only 3 bricks (one being an arbiter) on 3.10.0. The result with such a small setup was that the bug appeared both on killing and during the self-healing process, but only rarely (once in hundreds of tries) and only with performance.stat-prefetch turned on. This might be a completely different issue as on the setup Matt was using, he could reproduce it with the mentioned option being off, it always happened but only during recovery, not after killing. I did submit a bug report about this: https://bugzilla.redhat.com/show_bug.cgi?id=1444892. The problem is as Matt wrote is that this causes data corruption if one is to use the returned size on writing. Could I get some pointers as to what parts of the gluster code I should be looking at to figure out what the problem might be? Thanks in advance, Miklós ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] fstat problems when killing with stat prefetch turned on
Recently we had worked on some patches to ensure correct stats are returned. https://review.gluster.org/15759 https://review.gluster.org/15659 https://review.gluster.org/16419 Referring to these patches and bugs associated with them might give you some insight into the nature of the problem. The major culprit was interaction between readdir-ahead and stat-prefetch. So, the issue you are seeing might be addressed by these patches. - Original Message - > From: "Miklós Fokin" > To: gluster-devel@gluster.org > Sent: Tuesday, April 25, 2017 3:42:52 PM > Subject: [Gluster-devel] fstat problems when killing with stat prefetch > turned on > > Hello, > > I tried reproducing the problem that Mateusz Slupny was experiencing > before (stat returning bad st_size value on self-healing) on my own > computer with only 3 bricks (one being an arbiter) on 3.10.0. > The result with such a small setup was that the bug appeared both on > killing and during the self-healing process, but only rarely (once in > hundreds of tries) and only with performance.stat-prefetch turned on. > This might be a completely different issue as on the setup Matt was > using, he could reproduce it with the mentioned option being off, it > always happened but only during recovery, not after killing. > I did submit a bug report about this: > https://bugzilla.redhat.com/show_bug.cgi?id=1444892. > > The problem is as Matt wrote is that this causes data corruption if one > is to use the returned size on writing. > Could I get some pointers as to what parts of the gluster code I should > be looking at to figure out what the problem might be? > > Thanks in advance, > Miklós > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] fstat problems when killing with stat prefetch turned on
Hello, I tried reproducing the problem that Mateusz Slupny was experiencing before (stat returning bad st_size value on self-healing) on my own computer with only 3 bricks (one being an arbiter) on 3.10.0. The result with such a small setup was that the bug appeared both on killing and during the self-healing process, but only rarely (once in hundreds of tries) and only with performance.stat-prefetch turned on. This might be a completely different issue as on the setup Matt was using, he could reproduce it with the mentioned option being off, it always happened but only during recovery, not after killing. I did submit a bug report about this: https://bugzilla.redhat.com/show_bug.cgi?id=1444892. The problem is as Matt wrote is that this causes data corruption if one is to use the returned size on writing. Could I get some pointers as to what parts of the gluster code I should be looking at to figure out what the problem might be? Thanks in advance, Miklós ___ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel