Re: [Gluster-devel] fstat problems when killing with stat prefetch turned on

2017-05-04 Thread Amar Tumballi
Miklos, Thanks for this patch.

Team will review this, and update you on if this is good fix.

-Amar



On Thu, May 4, 2017 at 6:59 PM, Miklós Fokin 
wrote:

> Sorry, missing lines from the attachment.
>
> On 05/04/2017 03:24 PM, Miklós Fokin wrote:
>
> Hello,
>
> I seem to have discovered what caused half of the problem.
> I did update the bug report with a more detailed description, but the
> short version is that the attached diff solves the issue when we get an
> fstat with a size of 0 after killing a brick (not letting the first update
> to fsync be from an arbiter).
> My question is: should I make a review about it or should further needed
> changes be investigated first?
>
> Best regards,
> Miklós
>
>
> On 04/26/2017 12:58 PM, Miklós Fokin wrote:
>
> Thanks for the response.
> We didn't have the options set that the first two reviews were about.
> The third was about changes to performance.readdir-ahead.
> I turned this feature off today with prefetch being turned on on my
> computer, and the bug still appeared, so I would think that the commit
> would not fix it either.
>
> Best regards,
> Miklós
>
>
> On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote:
>
> Recently we had worked on some patches to ensure correct stats are
> returned.
>
> https://review.gluster.org/15759
> https://review.gluster.org/15659
> https://review.gluster.org/16419
>
> Referring to these patches and bugs associated with them might give you
> some insight into the nature of the problem. The major culprit was
> interaction between readdir-ahead and stat-prefetch. So, the issue you are
> seeing might be addressed by these patches.
>
> ----- Original Message -
>
> From: "Miklós Fokin" 
> 
> To: gluster-devel@gluster.org
> Sent: Tuesday, April 25, 2017 3:42:52 PM
> Subject: [Gluster-devel] fstat problems when killing with stat prefetch
> turned on
>
> Hello,
>
> I tried reproducing the problem that Mateusz Slupny was experiencing
> before (stat returning bad st_size value on self-healing) on my own
> computer with only 3 bricks (one being an arbiter) on 3.10.0.
> The result with such a small setup was that the bug appeared both on
> killing and during the self-healing process, but only rarely (once in
> hundreds of tries) and only with performance.stat-prefetch turned on.
> This might be a completely different issue as on the setup Matt was
> using, he could reproduce it with the mentioned option being off, it
> always happened but only during recovery, not after killing.
> I did submit a bug report about this:
> https://bugzilla.redhat.com/show_bug.cgi?id=1444892.
>
> The problem is as Matt wrote is that this causes data corruption if one
> is to use the returned size on writing.
> Could I get some pointers as to what parts of the gluster code I should
> be looking at to figure out what the problem might be?
>
> Thanks in advance,
> Miklós
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Amar Tumballi (amarts)
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] fstat problems when killing with stat prefetch turned on

2017-05-04 Thread Miklós Fokin

Sorry, missing lines from the attachment.


On 05/04/2017 03:24 PM, Miklós Fokin wrote:

Hello,

I seem to have discovered what caused half of the problem.
I did update the bug report with a more detailed description, but the 
short version is that the attached diff solves the issue when we get 
an fstat with a size of 0 after killing a brick (not letting the first 
update to fsync be from an arbiter).
My question is: should I make a review about it or should further 
needed changes be investigated first?


Best regards,
Miklós


On 04/26/2017 12:58 PM, Miklós Fokin wrote:

Thanks for the response.
We didn't have the options set that the first two reviews were about.
The third was about changes to performance.readdir-ahead.
I turned this feature off today with prefetch being turned on on my 
computer, and the bug still appeared, so I would think that the 
commit would not fix it either.


Best regards,
Miklós


On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote:
Recently we had worked on some patches to ensure correct stats are 
returned.


https://review.gluster.org/15759
https://review.gluster.org/15659
https://review.gluster.org/16419

Referring to these patches and bugs associated with them might give 
you some insight into the nature of the problem. The major culprit 
was interaction between readdir-ahead and stat-prefetch. So, the 
issue you are seeing might be addressed by these patches.


- Original Message -

From: "Miklós Fokin" 
To: gluster-devel@gluster.org
Sent: Tuesday, April 25, 2017 3:42:52 PM
Subject: [Gluster-devel] fstat problems when killing with stat 
prefetchturned on


Hello,

I tried reproducing the problem that Mateusz Slupny was experiencing
before (stat returning bad st_size value on self-healing) on my own
computer with only 3 bricks (one being an arbiter) on 3.10.0.
The result with such a small setup was that the bug appeared both on
killing and during the self-healing process, but only rarely (once in
hundreds of tries) and only with performance.stat-prefetch turned on.
This might be a completely different issue as on the setup Matt was
using, he could reproduce it with the mentioned option being off, it
always happened but only during recovery, not after killing.
I did submit a bug report about this:
https://bugzilla.redhat.com/show_bug.cgi?id=1444892.

The problem is as Matt wrote is that this causes data corruption if 
one

is to use the returned size on writing.
Could I get some pointers as to what parts of the gluster code I 
should

be looking at to figure out what the problem might be?

Thanks in advance,
Miklós

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel






___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c
index ac834e9..d6185ca 100644
--- a/xlators/cluster/afr/src/afr-common.c
+++ b/xlators/cluster/afr/src/afr-common.c
@@ -3318,6 +3318,7 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
 int child_index = (long) cookie;
 	int read_subvol = 0;
 	call_stub_t *stub = NULL;
+	afr_private_t *private = this->private;
 
 local = frame->local;
 
@@ -3327,7 +3328,8 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
 LOCK (&frame->lock);
 {
 if (op_ret == 0) {
-if (local->op_ret == -1) {
+if (local->op_ret == -1 && this->private &&
+!AFR_IS_ARBITER_BRICK (private, child_index)) {
 local->op_ret = 0;
 
 local->cont.inode_wfop.prebuf  = *prebuf;
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] fstat problems when killing with stat prefetch turned on

2017-05-04 Thread Miklós Fokin

Hello,

I seem to have discovered what caused half of the problem.
I did update the bug report with a more detailed description, but the 
short version is that the attached diff solves the issue when we get an 
fstat with a size of 0 after killing a brick (not letting the first 
update to fsync be from an arbiter).
My question is: should I make a review about it or should further needed 
changes be investigated first?


Best regards,
Miklós


On 04/26/2017 12:58 PM, Miklós Fokin wrote:

Thanks for the response.
We didn't have the options set that the first two reviews were about.
The third was about changes to performance.readdir-ahead.
I turned this feature off today with prefetch being turned on on my 
computer, and the bug still appeared, so I would think that the commit 
would not fix it either.


Best regards,
Miklós


On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote:
Recently we had worked on some patches to ensure correct stats are 
returned.


https://review.gluster.org/15759
https://review.gluster.org/15659
https://review.gluster.org/16419

Referring to these patches and bugs associated with them might give 
you some insight into the nature of the problem. The major culprit 
was interaction between readdir-ahead and stat-prefetch. So, the 
issue you are seeing might be addressed by these patches.


- Original Message -

From: "Miklós Fokin" 
To: gluster-devel@gluster.org
Sent: Tuesday, April 25, 2017 3:42:52 PM
Subject: [Gluster-devel] fstat problems when killing with stat 
prefetchturned on


Hello,

I tried reproducing the problem that Mateusz Slupny was experiencing
before (stat returning bad st_size value on self-healing) on my own
computer with only 3 bricks (one being an arbiter) on 3.10.0.
The result with such a small setup was that the bug appeared both on
killing and during the self-healing process, but only rarely (once in
hundreds of tries) and only with performance.stat-prefetch turned on.
This might be a completely different issue as on the setup Matt was
using, he could reproduce it with the mentioned option being off, it
always happened but only during recovery, not after killing.
I did submit a bug report about this:
https://bugzilla.redhat.com/show_bug.cgi?id=1444892.

The problem is as Matt wrote is that this causes data corruption if one
is to use the returned size on writing.
Could I get some pointers as to what parts of the gluster code I should
be looking at to figure out what the problem might be?

Thanks in advance,
Miklós

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel




diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c
index ac834e9..8ef170d 100644
--- a/xlators/cluster/afr/src/afr-common.c
+++ b/xlators/cluster/afr/src/afr-common.c
@@ -3318,6 +3318,7 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
 int child_index = (long) cookie;
 	int read_subvol = 0;
 	call_stub_t *stub = NULL;
+	afr_private_t *private = this->private;
 
 local = frame->local;
 
@@ -3327,6 +3328,9 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
 LOCK (&frame->lock);
 {
 if (op_ret == 0) {
+if (local->op_ret == -1 && this->private &&
+!AFR_IS_ARBITER_BRICK (private, child_index)) {
+
 if (local->op_ret == -1) {
 local->op_ret = 0;
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] fstat problems when killing with stat prefetch turned on

2017-04-26 Thread Miklós Fokin

Thanks for the response.
We didn't have the options set that the first two reviews were about.
The third was about changes to performance.readdir-ahead.
I turned this feature off today with prefetch being turned on on my 
computer, and the bug still appeared, so I would think that the commit 
would not fix it either.


Best regards,
Miklós


On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote:

Recently we had worked on some patches to ensure correct stats are returned.

https://review.gluster.org/15759
https://review.gluster.org/15659
https://review.gluster.org/16419

Referring to these patches and bugs associated with them might give you some 
insight into the nature of the problem. The major culprit was interaction 
between readdir-ahead and stat-prefetch. So, the issue you are seeing might be 
addressed by these patches.

- Original Message -

From: "Miklós Fokin" 
To: gluster-devel@gluster.org
Sent: Tuesday, April 25, 2017 3:42:52 PM
Subject: [Gluster-devel] fstat problems when killing with stat prefetch turned 
on

Hello,

I tried reproducing the problem that Mateusz Slupny was experiencing
before (stat returning bad st_size value on self-healing) on my own
computer with only 3 bricks (one being an arbiter) on 3.10.0.
The result with such a small setup was that the bug appeared both on
killing and during the self-healing process, but only rarely (once in
hundreds of tries) and only with performance.stat-prefetch turned on.
This might be a completely different issue as on the setup Matt was
using, he could reproduce it with the mentioned option being off, it
always happened but only during recovery, not after killing.
I did submit a bug report about this:
https://bugzilla.redhat.com/show_bug.cgi?id=1444892.

The problem is as Matt wrote is that this causes data corruption if one
is to use the returned size on writing.
Could I get some pointers as to what parts of the gluster code I should
be looking at to figure out what the problem might be?

Thanks in advance,
Miklós

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] fstat problems when killing with stat prefetch turned on

2017-04-25 Thread Raghavendra Gowdappa
Recently we had worked on some patches to ensure correct stats are returned.

https://review.gluster.org/15759
https://review.gluster.org/15659
https://review.gluster.org/16419

Referring to these patches and bugs associated with them might give you some 
insight into the nature of the problem. The major culprit was interaction 
between readdir-ahead and stat-prefetch. So, the issue you are seeing might be 
addressed by these patches.

- Original Message -
> From: "Miklós Fokin" 
> To: gluster-devel@gluster.org
> Sent: Tuesday, April 25, 2017 3:42:52 PM
> Subject: [Gluster-devel] fstat problems when killing with stat prefetch   
> turned on
> 
> Hello,
> 
> I tried reproducing the problem that Mateusz Slupny was experiencing
> before (stat returning bad st_size value on self-healing) on my own
> computer with only 3 bricks (one being an arbiter) on 3.10.0.
> The result with such a small setup was that the bug appeared both on
> killing and during the self-healing process, but only rarely (once in
> hundreds of tries) and only with performance.stat-prefetch turned on.
> This might be a completely different issue as on the setup Matt was
> using, he could reproduce it with the mentioned option being off, it
> always happened but only during recovery, not after killing.
> I did submit a bug report about this:
> https://bugzilla.redhat.com/show_bug.cgi?id=1444892.
> 
> The problem is as Matt wrote is that this causes data corruption if one
> is to use the returned size on writing.
> Could I get some pointers as to what parts of the gluster code I should
> be looking at to figure out what the problem might be?
> 
> Thanks in advance,
> Miklós
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] fstat problems when killing with stat prefetch turned on

2017-04-25 Thread Miklós Fokin

Hello,

I tried reproducing the problem that Mateusz Slupny was experiencing 
before (stat returning bad st_size value on self-healing) on my own 
computer with only 3 bricks (one being an arbiter) on 3.10.0.
The result with such a small setup was that the bug appeared both on 
killing and during the self-healing process, but only rarely (once in 
hundreds of tries) and only with performance.stat-prefetch turned on.
This might be a completely different issue as on the setup Matt was 
using, he could reproduce it with the mentioned option being off, it 
always happened but only during recovery, not after killing.
I did submit a bug report about this: 
https://bugzilla.redhat.com/show_bug.cgi?id=1444892.


The problem is as Matt wrote is that this causes data corruption if one 
is to use the returned size on writing.
Could I get some pointers as to what parts of the gluster code I should 
be looking at to figure out what the problem might be?


Thanks in advance,
Miklós

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel