[Gluster-devel] Race with volfile notification and stopping of brick

2017-07-03 Thread Hari Gowtham
Hi,

I came across a situation where there were a few IOs going to the subvolume
which was not available. The situation happens due to the following.

During the remove brick commit the following things happen, the brick stop,
volfile creation, and volfile change notification to client.

The order in which this happens is
1) the brick is stopped.
2) the volfile are created and then the notification go to the client.
This way there is a window between the brick stop and the clients being
notified that the brick has been stopped.

The brick is unavailable and the IO is coming to the stopped brick as the
client is unaware of the volfile change for a while. And this results in an
IO failure.

So I feel its better to do it in the following order:
1) create the volfile.
2) notify the client.
3) stop the brick.

This way the clients are notified and the IO starts going to the right
subvol and the brick is available till then and as the brick is stopped
after this the condition is resolved.

As this change is on the basic functionality, I thought of bringing it up
here to everyones notice.
If you find anything that could break because of this change, or feel if
there is a better way to handle this, Do let me know.

Thanks to Du, Atin, Kaushal and Nithya for helping me with this.

Regards,
Hari.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] geo-rep regression because of node-uuid change

2017-07-03 Thread Pranith Kumar Karampuri
Xavi,
  Now that the change has been reverted, we can resume this discussion
and decide on the exact format that considers, tier, dht, afr, ec. People
working geo-rep/dht/afr/ec had an internal discussion and we all agreed
that this proposal would be a good way forward. I think once we agree on
the format and decide on the initial encoding/decoding functions of the
xattr and this change is merged, we can send patches on afr/ec/dht and
geo-rep to take it to closure.

Could you propose the new format you have in mind that considers all of the
xlators?



On Wed, Jun 21, 2017 at 2:08 PM, Karthik Subrahmanya 
wrote:

>
>
> On Wed, Jun 21, 2017 at 1:56 PM, Xavier Hernandez 
> wrote:
>
>> That's ok. I'm currently unable to write a patch for this on ec.
>
> Sunil is working on this patch.
>
> ~Karthik
>
>> If no one can do it, I can try to do it in 6 - 7 hours...
>>
>> Xavi
>>
>>
>> On Wednesday, June 21, 2017 09:48 CEST, Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>
>>
>>
>> On Wed, Jun 21, 2017 at 1:00 PM, Xavier Hernandez 
>> wrote:
>>>
>>> I'm ok with reverting node-uuid content to the previous format and
>>> create a new xattr for the new format. Currently, only rebalance will use
>>> it.
>>>
>>> Only thing to consider is what can happen if we have a half upgraded
>>> cluster where some clients have this change and some not. Can rebalance
>>> work in this situation ? if so, could there be any issue ?
>>
>>
>> I think there shouldn't be any problem, because this is in-memory xattr
>> so layers below afr/ec will only see node-uuid xattr.
>> This also gives us a chance to do whatever we want to do in future with
>> this xattr without any problems about backward compatibility.
>>
>> You can check https://review.gluster.org/#/c
>> /17576/3/xlators/cluster/afr/src/afr-inode-read.c@1507 for how karthik
>> implemented this in AFR (this got merged accidentally yesterday, but looks
>> like this is what we are settling on)
>>
>>
>>>
>>> Xavi
>>>
>>>
>>> On Wednesday, June 21, 2017 06:56 CEST, Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>
>>>
>>>
>>>
>>> On Wed, Jun 21, 2017 at 10:07 AM, Nithya Balachandran <
>>> nbala...@redhat.com> wrote:


 On 20 June 2017 at 20:38, Aravinda  wrote:
>
> On 06/20/2017 06:02 PM, Pranith Kumar Karampuri wrote:
>
> Xavi, Aravinda and I had a discussion on #gluster-dev and we agreed to
> go with the format Aravinda suggested for now and in future we wanted some
> more changes for dht to detect which subvolume went down came back up, at
> that time we will revisit the solution suggested by Xavi.
>
> Susanth is doing the dht changes
> Aravinda is doing geo-rep changes
>
> Done. Geo-rep patch sent for review https://review.gluster.org/17582
>
>

 The proposed changes to the node-uuid behaviour (while good) are going
 to break tiering . Tiering changes will take a little more time to be coded
 and tested.

 As this is a regression for 3.11 and a blocker for 3.11.1, I suggest we
 go back to the original node-uuid behaviour for now so as to unblock the
 release and target the proposed changes for the next 3.11 releases.

>>>
>>> Let me see if I understand the changes correctly. We are restoring the
>>> behavior of node-uuid xattr and adding a new xattr for parallel rebalance
>>> for both afr and ec, correct? Otherwise that is one more regression. If
>>> yes, we will also wait for Xavi's inputs. Jeff accidentally merged the afr
>>> patch yesterday which does these changes. If everyone is in agreement, we
>>> will leave it as is and add similar changes in ec as well. If we are not in
>>> agreement, then we will let the discussion progress :-)
>>>
>>>


 Regards,
 Nithya

> --
> Aravinda
>
>
>
> Thanks to all of you guys for the discussions!
>
> On Tue, Jun 20, 2017 at 5:05 PM, Xavier Hernandez <
> xhernan...@datalab.es> wrote:
>>
>> Hi Aravinda,
>>
>> On 20/06/17 12:42, Aravinda wrote:
>>>
>>> I think following format can be easily adopted by all components
>>>
>>> UUIDs of a subvolume are seperated by space and subvolumes are
>>> separated
>>> by comma
>>>
>>> For example, node1 and node2 are replica with U1 and U2 UUIDs
>>> respectively and
>>> node3 and node4 are replica with U3 and U4 UUIDs respectively
>>>
>>> node-uuid can return "U1 U2,U3 U4"
>>
>>
>> While this is ok for current implementation, I think this can be
>> insufficient if there are more layers of xlators that require to indicate
>> some sort of grouping. Some representation that can represent hierarchy
>> would be better. For example: "(U1 U2) (U3 U4)" (we can use spaces or 
>> comma
>> as a separator).
>>
>>>
>>>
>>> Geo-rep can split