EC also sends all zeros if the node is down. Regards,
Sunil kumar Acharya Senior Software Engineer Red Hat <https://www.redhat.com> T: +91-8067935170 <http://redhatemailsignature-marketing.itos.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> On Tue, Jun 20, 2017 at 4:27 PM, Karthik Subrahmanya <ksubr...@redhat.com> wrote: > > > On Tue, Jun 20, 2017 at 4:12 PM, Aravinda <avish...@redhat.com> wrote: > >> I think following format can be easily adopted by all components >> >> UUIDs of a subvolume are seperated by space and subvolumes are separated >> by comma >> >> For example, node1 and node2 are replica with U1 and U2 UUIDs >> respectively and >> node3 and node4 are replica with U3 and U4 UUIDs respectively >> >> node-uuid can return "U1 U2,U3 U4" >> >> Geo-rep can split by "," and then split by space and take first UUID >> DHT can split the value by space or comma and get unique UUIDs list >> >> Another question is about the behavior when a node is down, existing >> node-uuid xattr will not return that UUID if a node is down. > > After the change [1], if a node is down we send all zeros as the uuid for > that node, in the list of node uuids. > > [1] https://review.gluster.org/#/c/17084/ > > Regards, > Karthik > >> What is the behavior with the proposed xattr? >> >> Let me know your thoughts. >> >> regards >> Aravinda VK >> >> >> On 06/20/2017 03:06 PM, Aravinda wrote: >> >>> Hi Xavi, >>> >>> On 06/20/2017 02:51 PM, Xavier Hernandez wrote: >>> >>>> Hi Aravinda, >>>> >>>> On 20/06/17 11:05, Pranith Kumar Karampuri wrote: >>>> >>>>> Adding more people to get a consensus about this. >>>>> >>>>> On Tue, Jun 20, 2017 at 1:49 PM, Aravinda <avish...@redhat.com >>>>> <mailto:avish...@redhat.com>> wrote: >>>>> >>>>> >>>>> regards >>>>> Aravinda VK >>>>> >>>>> >>>>> On 06/20/2017 01:26 PM, Xavier Hernandez wrote: >>>>> >>>>> Hi Pranith, >>>>> >>>>> adding gluster-devel, Kotresh and Aravinda, >>>>> >>>>> On 20/06/17 09:45, Pranith Kumar Karampuri wrote: >>>>> >>>>> >>>>> >>>>> On Tue, Jun 20, 2017 at 1:12 PM, Xavier Hernandez >>>>> <xhernan...@datalab.es <mailto:xhernan...@datalab.es> >>>>> <mailto:xhernan...@datalab.es >>>>> <mailto:xhernan...@datalab.es>>> wrote: >>>>> >>>>> On 20/06/17 09:31, Pranith Kumar Karampuri wrote: >>>>> >>>>> The way geo-replication works is: >>>>> On each machine, it does getxattr of node-uuid and >>>>> check if its >>>>> own uuid >>>>> is present in the list. If it is present then it >>>>> will consider >>>>> it active >>>>> otherwise it will be considered passive. With this >>>>> change we are >>>>> giving >>>>> all uuids instead of first-up subvolume. So all >>>>> machines think >>>>> they are >>>>> ACTIVE which is bad apparently. So that is the >>>>> reason. Even I >>>>> felt bad >>>>> that we are doing this change. >>>>> >>>>> >>>>> And what about changing the content of node-uuid to >>>>> include some >>>>> sort of hierarchy ? >>>>> >>>>> for example: >>>>> >>>>> a single brick: >>>>> >>>>> NODE(<guid>) >>>>> >>>>> AFR/EC: >>>>> >>>>> AFR[2](NODE(<guid>), NODE(<guid>)) >>>>> EC[3,1](NODE(<guid>), NODE(<guid>), NODE(<guid>)) >>>>> >>>>> DHT: >>>>> >>>>> DHT[2](AFR[2](NODE(<guid>), NODE(<guid>)), >>>>> AFR[2](NODE(<guid>), >>>>> NODE(<guid>))) >>>>> >>>>> This gives a lot of information that can be used to >>>>> take the >>>>> appropriate decisions. >>>>> >>>>> >>>>> I guess that is not backward compatible. Shall I CC >>>>> gluster-devel and >>>>> Kotresh/Aravinda? >>>>> >>>>> >>>>> Is the change we did backward compatible ? if we only require >>>>> the first field to be a GUID to support backward compatibility, >>>>> we can use something like this: >>>>> >>>>> No. But the necessary change can be made to Geo-rep code as well if >>>>> format is changed, Since all these are built/shipped together. >>>>> >>>>> Geo-rep uses node-id as follows, >>>>> >>>>> list = listxattr(node-uuid) >>>>> active_node_uuids = list.split(SPACE) >>>>> active_node_flag = True if self.node_id exists in active_node_uuids >>>>> else False >>>>> >>>> >>>> How was this case solved ? >>>> >>>> suppose we have three servers and 2 bricks in each server. A replicated >>>> volume is created using the following command: >>>> >>>> gluster volume create test replica 2 server1:/brick1 server2:/brick1 >>>> server2:/brick2 server3:/brick1 server3:/brick1 server1:/brick2 >>>> >>>> In this case we have three replica-sets: >>>> >>>> * server1:/brick1 server2:/brick1 >>>> * server2:/brick2 server3:/brick1 >>>> * server3:/brick2 server2:/brick2 >>>> >>>> Old AFR implementation for node-uuid always returned the uuid of the >>>> node of the first brick, so in this case we will get the uuid of the three >>>> nodes because all of them are the first brick of a replica-set. >>>> >>>> Does this mean that with this configuration all nodes are active ? Is >>>> this a problem ? Is there any other check to avoid this situation if it's >>>> not good ? >>>> >>> Yes all Geo-rep workers will become Active and participate in syncing. >>> Since changelogs will have the same information in replica bricks this will >>> lead to duplicate syncing and consuming network bandwidth. >>> >>> Node-uuid based Active worker is the default configuration in Geo-rep >>> till now, Geo-rep also has Meta Volume based syncronization for Active >>> worker using lock files.(Can be opted using Geo-rep configuration, with >>> this config node-uuid will not be used) >>> >>> Kotresh proposed a solution to configure which worker to become Active. >>> This will give more control to Admin to choose Active workers, This will >>> become default configuration from 3.12 >>> https://github.com/gluster/glusterfs/issues/244 >>> >>> -- >>> Aravinda >>> >>> >>>> Xavi >>>> >>>> >>>>> >>>>> >>>>> Bricks: >>>>> >>>>> <guid> >>>>> >>>>> AFR/EC: >>>>> <guid>(<guid>, <guid>) >>>>> >>>>> DHT: >>>>> <guid>(<guid>(<guid>, ...), <guid>(<guid>, ...)) >>>>> >>>>> In this case, AFR and EC would return the same <guid> they >>>>> returned before the patch, but between '(' and ')' they put the >>>>> full list of guid's of all nodes. The first <guid> can be used >>>>> by geo-replication. The list after the first <guid> can be used >>>>> for rebalance. >>>>> >>>>> Not sure if there's any user of node-uuid above DHT. >>>>> >>>>> Xavi >>>>> >>>>> >>>>> >>>>> >>>>> Xavi >>>>> >>>>> >>>>> On Tue, Jun 20, 2017 at 12:46 PM, Xavier Hernandez >>>>> <xhernan...@datalab.es >>>>> <mailto:xhernan...@datalab.es> <mailto: >>>>> xhernan...@datalab.es >>>>> <mailto:xhernan...@datalab.es>> >>>>> <mailto:xhernan...@datalab.es >>>>> <mailto:xhernan...@datalab.es> <mailto: >>>>> xhernan...@datalab.es >>>>> <mailto:xhernan...@datalab.es>>>> >>>>> wrote: >>>>> >>>>> Hi Pranith, >>>>> >>>>> On 20/06/17 07:53, Pranith Kumar Karampuri >>>>> wrote: >>>>> >>>>> hi Xavi, >>>>> We all made the mistake of not >>>>> sending about changing >>>>> behavior of >>>>> node-uuid xattr so that rebalance can use >>>>> multiple nodes >>>>> for doing >>>>> rebalance. Because of this on geo-rep all >>>>> the workers >>>>> are becoming >>>>> active instead of one per EC/AFR subvolume. >>>>> So we are >>>>> frantically trying >>>>> to restore the functionality of node-uuid >>>>> and introduce >>>>> a new >>>>> xattr for >>>>> the new behavior. Sunil will be sending out >>>>> a patch for >>>>> this. >>>>> >>>>> >>>>> Wouldn't it be better to change geo-rep >>>>> behavior >>>>> to use the >>>>> new data >>>>> ? I think it's better as it's now, since it >>>>> gives more >>>>> information >>>>> to upper layers so that they can take more >>>>> accurate decisions. >>>>> >>>>> Xavi >>>>> >>>>> >>>>> -- >>>>> Pranith >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Pranith >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Pranith >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Pranith >>>>> >>>> >>>> >>> >> >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel