Re: [Gluster-devel] Automated split-brain resolution
Not from what I see on IRC. I get a pretty broad set of use cases. I'd say about a third of them have end-user data that could be lost and the admin has no way of determining what's important between two copies of a split-brain. On 08/14/2014 11:45 AM, Harshavardhana wrote: On Thu, Aug 14, 2014 at 11:12 AM, Joe Julian wrote: Some people. Depends on use case. Dan's is pretty specific. Those are majority of the users/customers. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On Thu, Aug 14, 2014 at 11:12 AM, Joe Julian wrote: > Some people. Depends on use case. Dan's is pretty specific. Those are majority of the users/customers. -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
Some people. Depends on use case. Dan's is pretty specific. On August 14, 2014 10:58:33 AM PDT, Harshavardhana wrote: >> Not sure. We can figure this out by traversing up the softlinks for >> directories. But for files there is no way to find the parent at the >moment. >> > >This email from Dan Mons puts some perspective on what actually people >expect - >https://www.mail-archive.com/gluster-devel@gluster.org/msg00392.html > >-- >Religious confuse piety with mere ritual, the virtuous confuse >regulation with outcomes >___ >Gluster-devel mailing list >Gluster-devel@gluster.org >http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- Sent from my Android device with K-9 Mail. Please excuse my brevity.___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
> Not sure. We can figure this out by traversing up the softlinks for > directories. But for files there is no way to find the parent at the moment. > This email from Dan Mons puts some perspective on what actually people expect - https://www.mail-archive.com/gluster-devel@gluster.org/msg00392.html -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On 08/12/2014 11:29 AM, Harshavardhana wrote: This is a standard problem where there are split-brains in distributed systems. For example even in git there are cases where it gives up asking users to fix the file i.e. merge conflicts. If the user doesn't want split-brains they should move to replica-3 and enable client-quorum. But if the user made a conscious decision to live with split-brain problems favouring availability/using replica-2, then split-brains do happen and it needs user intervention. All we are trying to do is to make this process a bit painless by coming up with meaningful policies. Agreed, split brains do require manual intervention no one argues about that, but it shouldn't be quite as tedious as GlusterFS wants it to be. We are on same page. I do agree that it is way simpler than perhaps some other distributed filesystems but at any point we ask some one to write a script to fix our internal structure - that is not a feature its a bug. We are not asking them to write a script with this solution. We all appreciate the effort, but my wish we incorporate some pain points which we have seen personally over the years and fix it right when we are at it. If the user knows his workload is append only and there are split-brains the only command he needs to execute is: 'gluster volume heal split-brain bigger-file' no grep, no finding file paths, nothing. Adding to this - we need to provide additional sanity checks that split brains were indeed fixed - since this looks quite destructive operation, are you planning a rollback at any point during this process? User can find if split-brain is resolved by executing 'gluster volume heal info split-brain'. The file/gfid shouldn't show up there. There is no rollback until we integrate it with trash. When we integrate it with trash the file that will be over-written/deleted will be moved to trash. There were also instances where the user knows the brick he/she would like to be the source but he/she is worried that old brick which comes back up would cause split-brains so he/she had to erase the whole brick which was down and bring it back up. Instead we can suggest him/her to use 'gluster volume heal split-brain source-brick ' after bringing the brick back up so that not all the contents needs to be healed. 1) gluster volume heal info split-brain should give output in some 'format' giving stat/pending-matrix etc for all the files in split-brain. - Unfortunately we still don't have a way to provide with file paths without doing 'find' on the bricks. Critical setups require fixing split-brain with quick turn around no one really has the luxury running a find on a large volume. So i still do not understand, if a 'find' can do a gfid --> inum --> path - how hard it is for Gluster management daemon to know this? just to provide better tooling? Not sure. We can figure this out by traversing up the softlinks for directories. But for files there is no way to find the parent at the moment. Pranith -- Harsha ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
> This is a standard problem where there are split-brains in distributed > systems. For example even in git there are cases where it gives up asking > users to fix the file i.e. merge conflicts. If the user doesn't want > split-brains they should move to replica-3 and enable client-quorum. But if > the user made a conscious decision to live with split-brain problems > favouring availability/using replica-2, then split-brains do happen and it > needs user intervention. All we are trying to do is to make this process a > bit painless by coming up with meaningful policies. > Agreed, split brains do require manual intervention no one argues about that, but it shouldn't be quite as tedious as GlusterFS wants it to be. I do agree that it is way simpler than perhaps some other distributed filesystems but at any point we ask some one to write a script to fix our internal structure - that is not a feature its a bug. We all appreciate the effort, but my wish we incorporate some pain points which we have seen personally over the years and fix it right when we are at it. > If the user knows his workload is append only and there are split-brains the > only command he needs to execute is: > 'gluster volume heal split-brain bigger-file' > no grep, no finding file paths, nothing. > Adding to this - we need to provide additional sanity checks that split brains were indeed fixed - since this looks quite destructive operation, are you planning a rollback at any point during this process? > There were also instances where the user knows the brick he/she would like > to be the source but he/she is worried that old brick which comes back up > would cause split-brains so he/she had to erase the whole brick which was > down and bring it back up. > Instead we can suggest him/her to use 'gluster volume heal > split-brain source-brick ' after bringing the brick back up so > that not all the contents needs to be healed. > 1) gluster volume heal info split-brain should give output in some > 'format' giving stat/pending-matrix etc for all the files in split-brain. > - Unfortunately we still don't have a way to provide with file paths > without doing 'find' on the bricks. Critical setups require fixing split-brain with quick turn around no one really has the luxury running a find on a large volume. So i still do not understand, if a 'find' can do a gfid --> inum --> path - how hard it is for Gluster management daemon to know this? just to provide better tooling? -- Harsha ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On 08/10/2014 11:42 PM, Ravishankar N wrote: On 08/09/2014 01:23 AM, Joe Julian wrote: Thinking about it more, I'd still rather have this functionality exposed at the client through xattrs. For 5 years I've thought about this, and the more I encounter split-brain, the more I think this is the needed approach. Joe, why do you feel resolving split-brains should be exposed to clients? Whatever approach is taken (either a gluster CLI command or an overloaded get/satfattr call, is it not better to have this done at the server side?) * It's consistent with the way other functions actually operate, rebalance, self-heal, etc. In that they're really just clients. * On the client it offers more possibilities for us admins to be able to fix something on the fly. * It's an API at that point. Software could be coded to perform its own self-heal based on the rules that might apply to that particular use case. * If multi-tenancy is ever added, it is a method by which the tenant can repair his own files. It was late, last time, and I missed one important operation. The ability to mv one copy of the split-brain to a new filename in case you choose wrongly and need it. I've seen that with VM images. Typically, it doesn't really matter which VM image you chose (if your data's in a smart place instead of on the image). Pick either one and boot it back up. Occasionally, though, the image is irreparable. Frequently, the "other copy" is ok, so if one fails to boot, we swap to the other. "getfattr -n trusted.glusterfs.stat" returns xml/json/some_madeup_datastructure with the results of stat from each brick "getfattr -n trusted.glusterfs.afr" returns the afr matrix "setfattr -n trusted.glusterfs.sb-pick -v "server2:/srv/brick1" That gives us the tools we need to choose what to do with any given split-brain. For large swaths of automated repair, we can use find. I suppose that last bit could still be implemented through that cli command. On 08/07/2014 01:35 AM, Ravishankar N wrote: Manual resolution of split-brains [1] has been a tedious task involving understanding and modifying AFR's changelog extended attributes. To simplify and to an extent automate this task, we are proposing a new CLI command with which the user can specify what the source brick/file is, and automatically heal the files in the appropriate direction. Command: gluster volume resolve-split-brain { | source-brick [] } Breaking up the command into its possible options, we have: a) gluster volume resolve-split-brain When this command is executed, AFR will consider the brick having the highest file size as the source and heal it to all other bricks (including all other sources and sinks) in that replica subvolume. If the file size is same in all the bricks, it does *not* heal the file. b) gluster volume resolve-split-brain source-brick [] When this command is executed, if is specified, AFR heals the file from the source-brick to all other bricks of that replica subvolume. For resolving multiple files, the command must be run iteratively, once per file. If is not specified, AFR heals all the files that have an entry in .glusterfs/indices/xattrop *and* are in split-brain. As before, heals happen from source-brick to all other bricks. Future work could also include extending the command to add other policies like choosing the file having the latest mtime as the source, integration with trash xlator wherein the files deleted from the sink are moved to the trash dir etc. Please give feedback on the above. Regards, Ravi [1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On 08/11/2014 11:20 AM, Pranith Kumar Karampuri wrote: On 08/09/2014 12:18 AM, Harshavardhana wrote: Some initial thoughts on the solution based on Harsha/Joe/Emmanuel's inputs are: 1) gluster volume heal info split-brain should give output in some 'format' giving stat/pending-matrix etc for all the files in split-brain. - Unfortunately we still don't have a way to provide with file paths without doing 'find' on the bricks. 2) User saves this output to a file and makes modifications to this file giving his choices of files he wants as sources. 3) 'gluster volume heal split-brain input-file ' will take the inputs and fixes the files. Question is, is it worthwhile to implement the two commands proposed by Ravi to begin with and implement the solution above in subsequent releases? Because these things are easier to implement and I feel they definitely address some of the pain points I have observed dealing with users. This approach looks good to me. We still need to think through and carefully design an interactive mechanism for resolving split-brains. While we evolve that interface, we can implement commands proposed by Ravi in the near term to alleviate some user pain points. -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On 08/09/2014 01:23 AM, Joe Julian wrote: Thinking about it more, I'd still rather have this functionality exposed at the client through xattrs. For 5 years I've thought about this, and the more I encounter split-brain, the more I think this is the needed approach. Joe, why do you feel resolving split-brains should be exposed to clients? Whatever approach is taken (either a gluster CLI command or an overloaded get/satfattr call, is it not better to have this done at the server side?) "getfattr -n trusted.glusterfs.stat" returns xml/json/some_madeup_datastructure with the results of stat from each brick "getfattr -n trusted.glusterfs.afr" returns the afr matrix "setfattr -n trusted.glusterfs.sb-pick -v "server2:/srv/brick1" That gives us the tools we need to choose what to do with any given split-brain. For large swaths of automated repair, we can use find. I suppose that last bit could still be implemented through that cli command. On 08/07/2014 01:35 AM, Ravishankar N wrote: Manual resolution of split-brains [1] has been a tedious task involving understanding and modifying AFR's changelog extended attributes. To simplify and to an extent automate this task, we are proposing a new CLI command with which the user can specify what the source brick/file is, and automatically heal the files in the appropriate direction. Command: gluster volume resolve-split-brain { | source-brick [] } Breaking up the command into its possible options, we have: a) gluster volume resolve-split-brain When this command is executed, AFR will consider the brick having the highest file size as the source and heal it to all other bricks (including all other sources and sinks) in that replica subvolume. If the file size is same in all the bricks, it does *not* heal the file. b) gluster volume resolve-split-brain source-brick [] When this command is executed, if is specified, AFR heals the file from the source-brick to all other bricks of that replica subvolume. For resolving multiple files, the command must be run iteratively, once per file. If is not specified, AFR heals all the files that have an entry in .glusterfs/indices/xattrop *and* are in split-brain. As before, heals happen from source-brick to all other bricks. Future work could also include extending the command to add other policies like choosing the file having the latest mtime as the source, integration with trash xlator wherein the files deleted from the sink are moved to the trash dir etc. Please give feedback on the above. Regards, Ravi [1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On 08/09/2014 12:18 AM, Harshavardhana wrote: While we could extend the existing heal command, we also need to provide a policy flag. Entering "y/n" for 1000 files does not make the process any easier. What i meant was not a solution just to give you suggestions, of course there should be improvements on that too. Look at e2fsck output when fixing corruption issues for example. I don't follow this part completely. If `info split-brain` gives you the gfid instead of file path, you could just go to the .glusterfs/ and do a setfattr there. It isn't about just setfattr, one needs to validate which file it points to make any sense. Are you saying that do you know the contents of the file just by looking at a canonical gfid form? command for each entry in the file. Also makes it easy to integrate with a GUI: Click 'get files in sb' and you have a scroll-down list of files with polices against each file. Select a file, tick the policy and click 'resolve-sb' and done! I agree to policy style, but the inherent problem is never fixed you are still asking some one to write scripts using "info split-brain". Here is the breakdown how it happens today - grep /var/log/glusterfs/glustershd.log | awk (get gfids) - Run the script to see which files are really in split brain "(gfid-to-file.sh)" - Thanks Joe Julian! Do this on all servers and grab output Now this on a large enough cluster example 250TB volume with 60million files takes 4hrs, assuming that we didn't have more split brain in between - Next 'gather getfattr/setfattr' output - Figure out which to be deleted - then delete. This whole cycle is a 2~3day activity on bigger clusters. With your approach after having a policy - grep /var/log/glusterfs/glustershd.log | awk (get gfids) - Run the script to see which files are really in split brain "(gfid-to-file.sh)" - Thanks Joe Julian! Do this on all servers and grab output Now this on a large enough cluster example 250TB volume with 60million files takes 4hrs, assuming that we didn't have more split brain in between. - Figure out which to be deleted provide a policy based on source-brick or bigger-file. (In-fact this seems like just a replacement for `rm -rf`) Now what is ideal - Figure out which file be deleted based on a policy (name your policy) A 250TB cluster is a simply POC cluster in case of GlusterFS not production, so you could think of scales of magnitude higher when there is a problem. Questions that occur here is: - Why does one write a script at all? when we are ought to be responsible for this information and even providing valid suggestions. This is a standard problem where there are split-brains in distributed systems. For example even in git there are cases where it gives up asking users to fix the file i.e. merge conflicts. If the user doesn't want split-brains they should move to replica-3 and enable client-quorum. But if the user made a conscious decision to live with split-brain problems favouring availability/using replica-2, then split-brains do happen and it needs user intervention. All we are trying to do is to make this process a bit painless by coming up with meaningful policies. If the user knows his workload is append only and there are split-brains the only command he needs to execute is: 'gluster volume heal split-brain bigger-file' no grep, no finding file paths, nothing. Every time I had to interact with users about fixing split-brains, I found that they need to know internals of afr to fix the split-brain themselves which is a tedious process and there is still possibility of users making mistakes while clearing the xattrs (It happened once or twice :-( ). That is the reason for implementing the second version of command to choose the file from the brick 'gluster volume heal split-brain source-brick ' There were also instances where the user knows the brick he/she would like to be the source but he/she is worried that old brick which comes back up would cause split-brains so he/she had to erase the whole brick which was down and bring it back up. Instead we can suggest him/her to use 'gluster volume heal split-brain source-brick ' after bringing the brick back up so that not all the contents needs to be healed. Next steps for this solution is to implement something similar to what Joe/Emmanuel suggested where the stat/pending-matrix etc info is presented to the user and they need to pick the file or write the decisions to a file and then we can resolve the split-brains. Thanks Harsha/Joe/Emmanuel for providing inputs. Very good inputs :-). Some initial thoughts on the solution based on Harsha/Joe/Emmanuel's inputs are: 1) gluster volume heal info split-brain should give output in some 'format' giving stat/pending-matrix etc for all the files in split-brain. - Unfortunately we still don't have a way to provide with file paths without doing 'find' on the bricks. 2) User saves this output to a file an
Re: [Gluster-devel] Automated split-brain resolution
On 08/09/2014 12:48 AM, Harshavardhana wrote: Wait, directories *are* supposed to automatically heal from split-brain? Guess I need to file a bug report. That doesn't happen. All the metadata and gfid can be the same, but since the trusted.afr are both dirty, it'll stay split-brain forever. Conservative merge happens, but 'directories' are not cleared off their extended attributes so you might see messages in logs AFAIK. they are not cleared off only when there are file-name split-brains. i.e. filename with different gfid or different file-type (i.e. one is dir another is file etc) Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
Harshavardhana wrote: > WARNING: Found 1000 files in split brain > ... > File on pair 'host1:host2' is in split brain, file with latest > time-stamp found on host1 - Fix? y > File on pair 'host3:host5' is in split brain. file with biggest size > found on host5 - Fix? y Answering y/n for a lot of time may be a pain. This can be handled by executing $EDITOR on a file with all the choice, then parsing in on exit, a la crontab(8). -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On Fri, Aug 8, 2014 at 12:53 PM, Joe Julian wrote: > Thinking about it more, I'd still rather have this functionality exposed at > the client through xattrs. For 5 years I've thought about this, and the more > I encounter split-brain, the more I think this is the needed approach. > > "getfattr -n trusted.glusterfs.stat" returns > xml/json/some_madeup_datastructure with the results of stat from each brick > "getfattr -n trusted.glusterfs.afr" returns the afr matrix > "setfattr -n trusted.glusterfs.sb-pick -v "server2:/srv/brick1" > > That gives us the tools we need to choose what to do with any given > split-brain. For large swaths of automated repair, we can use find. > > I suppose that last bit could still be implemented through that cli command. Even this makes sense, my overall pain point was the proposed CLI isn't solving anything worthwhile. -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
Thinking about it more, I'd still rather have this functionality exposed at the client through xattrs. For 5 years I've thought about this, and the more I encounter split-brain, the more I think this is the needed approach. "getfattr -n trusted.glusterfs.stat" returns xml/json/some_madeup_datastructure with the results of stat from each brick "getfattr -n trusted.glusterfs.afr" returns the afr matrix "setfattr -n trusted.glusterfs.sb-pick -v "server2:/srv/brick1" That gives us the tools we need to choose what to do with any given split-brain. For large swaths of automated repair, we can use find. I suppose that last bit could still be implemented through that cli command. On 08/07/2014 01:35 AM, Ravishankar N wrote: Manual resolution of split-brains [1] has been a tedious task involving understanding and modifying AFR's changelog extended attributes. To simplify and to an extent automate this task, we are proposing a new CLI command with which the user can specify what the source brick/file is, and automatically heal the files in the appropriate direction. Command: gluster volume resolve-split-brain { | source-brick [] } Breaking up the command into its possible options, we have: a) gluster volume resolve-split-brain When this command is executed, AFR will consider the brick having the highest file size as the source and heal it to all other bricks (including all other sources and sinks) in that replica subvolume. If the file size is same in all the bricks, it does *not* heal the file. b) gluster volume resolve-split-brain source-brick [] When this command is executed, if is specified, AFR heals the file from the source-brick to all other bricks of that replica subvolume. For resolving multiple files, the command must be run iteratively, once per file. If is not specified, AFR heals all the files that have an entry in .glusterfs/indices/xattrop *and* are in split-brain. As before, heals happen from source-brick to all other bricks. Future work could also include extending the command to add other policies like choosing the file having the latest mtime as the source, integration with trash xlator wherein the files deleted from the sink are moved to the trash dir etc. Please give feedback on the above. Regards, Ravi [1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
> Wait, directories *are* supposed to automatically heal from split-brain? > Guess I need to file a bug report. That doesn't happen. All the metadata and > gfid can be the same, but since the trusted.afr are both dirty, it'll stay > split-brain forever. Conservative merge happens, but 'directories' are not cleared off their extended attributes so you might see messages in logs AFAIK. -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On 08/07/2014 03:08 AM, Niels de Vos wrote: On Thu, Aug 07, 2014 at 03:17:11PM +0530, Ravishankar N wrote: On 08/07/2014 03:06 PM, Niels de Vos wrote: On Thu, Aug 07, 2014 at 02:05:34PM +0530, Ravishankar N wrote: Manual resolution of split-brains [1] has been a tedious task involving understanding and modifying AFR's changelog extended attributes. To simplify and to an extent automate this task, we are proposing a new CLI command with which the user can specify what the source brick/file is, and automatically heal the files in the appropriate direction. What about automatically healing directories that are in split-brain? Conservative merge happens when possible while healing directories, but yes gfid split-brain (one brick having a file with gfid-g1 and the other having a directory with the same gfid-g1) won't be resolved with this command. The reason why are not wanting to it right now is resolving such split-brains involves unlinking/rmdir'ing one of the entries and is best when we integrate with trash xlator. Okay, good to know. Thanks! Niels Wait, directories *are* supposed to automatically heal from split-brain? Guess I need to file a bug report. That doesn't happen. All the metadata and gfid can be the same, but since the trusted.afr are both dirty, it'll stay split-brain forever. Thanks, Niels Command: gluster volume resolve-split-brain { | source-brick [] } Breaking up the command into its possible options, we have: a) gluster volume resolve-split-brain When this command is executed, AFR will consider the brick having the highest file size as the source and heal it to all other bricks (including all other sources and sinks) in that replica subvolume. If the file size is same in all the bricks, it does *not* heal the file. b) gluster volume resolve-split-brain source-brick [] When this command is executed, if is specified, AFR heals the file from the source-brick to all other bricks of that replica subvolume. For resolving multiple files, the command must be run iteratively, once per file. If is not specified, AFR heals all the files that have an entry in .glusterfs/indices/xattrop *and* are in split-brain. As before, heals happen from source-brick to all other bricks. Future work could also include extending the command to add other policies like choosing the file having the latest mtime as the source, integration with trash xlator wherein the files deleted from the sink are moved to the trash dir etc. Please give feedback on the above. Regards, Ravi [1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
> > > While we could extend the existing heal command, we also need to provide a > policy flag. Entering "y/n" for 1000 files does not make the process any > easier. > What i meant was not a solution just to give you suggestions, of course there should be improvements on that too. Look at e2fsck output when fixing corruption issues for example. > I don't follow this part completely. If `info split-brain` gives you the > gfid instead of file path, you could just go to the .glusterfs/ hardlink> and do a setfattr there. > It isn't about just setfattr, one needs to validate which file it points to make any sense. Are you saying that do you know the contents of the file just by looking at a canonical gfid form? > command for each entry in the file. Also makes it easy to integrate with a > GUI: Click 'get files in sb' and you have a scroll-down list of files with > polices against each file. Select a file, tick the policy and click > 'resolve-sb' and done! > I agree to policy style, but the inherent problem is never fixed you are still asking some one to write scripts using "info split-brain". Here is the breakdown how it happens today - grep /var/log/glusterfs/glustershd.log | awk (get gfids) - Run the script to see which files are really in split brain "(gfid-to-file.sh)" - Thanks Joe Julian! Do this on all servers and grab output Now this on a large enough cluster example 250TB volume with 60million files takes 4hrs, assuming that we didn't have more split brain in between - Next 'gather getfattr/setfattr' output - Figure out which to be deleted - then delete. This whole cycle is a 2~3day activity on bigger clusters. With your approach after having a policy - grep /var/log/glusterfs/glustershd.log | awk (get gfids) - Run the script to see which files are really in split brain "(gfid-to-file.sh)" - Thanks Joe Julian! Do this on all servers and grab output Now this on a large enough cluster example 250TB volume with 60million files takes 4hrs, assuming that we didn't have more split brain in between. - Figure out which to be deleted provide a policy based on source-brick or bigger-file. (In-fact this seems like just a replacement for `rm -rf`) Now what is ideal - Figure out which file be deleted based on a policy (name your policy) A 250TB cluster is a simply POC cluster in case of GlusterFS not production, so you could think of scales of magnitude higher when there is a problem. Questions that occur here is: - Why does one write a script at all? when we are ought to be responsible for this information and even providing valid suggestions. - if you are saying that 'info split-brain' to print gfid's what purpose does it solve anyways? I would even get rid of that 'info split-brain' - why would anyone needs to see which files are in split brain when all we are printing is 'gfid' ? - Trust is on us when a user copies their data into GlusterFS and we are solely responsibly for it. If we cannot make valid decisions about the files which we are supposed to manage, how do you expect a normal user to make better decisions than us? Here is an example we came across - there was suggestion i made to Pranithk based out of Avati's idea that even a file in metadata split brain can be made readable which is not the case today. This came out of the fact that there are some important details which we know wholly as a system which is not present with the user himself. Since this has been a perpetuating misery for years, i would like to see this fixed in a more convincing manner. Excuse me being blunt about it! > So we now have the command: > # gluster volume heal [full | info [split-brain] | split-brain > {bigger-file | source-brick } [] ] > > The relevant new extension being: > gluster volume heal split-brain {bigger-file | source-brick > } [] > This looks good. -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On 08/08/2014 01:09 PM, Harshavardhana wrote: On Thu, Aug 7, 2014 at 1:35 AM, Ravishankar N wrote: Manual resolution of split-brains [1] has been a tedious task involving understanding and modifying AFR's changelog extended attributes. To simplify and to an extent automate this task, we are proposing a new CLI command with which the user can specify what the source brick/file is, and automatically heal the files in the appropriate direction. Command: gluster volume resolve-split-brain { | source-brick [] } Breaking up the command into its possible options, we have: a) gluster volume resolve-split-brain When this command is executed, AFR will consider the brick having the highest file size as the source and heal it to all other bricks (including all other sources and sinks) in that replica subvolume. If the file size is same in all the bricks, it does *not* heal the file. b) gluster volume resolve-split-brain source-brick [] When this command is executed, if is specified, AFR heals the file from the source-brick to all other bricks of that replica subvolume. For resolving multiple files, the command must be run iteratively, once per file. If is not specified, AFR heals all the files that have an entry in .glusterfs/indices/xattrop *and* are in split-brain. As before, heals happen from source-brick to all other bricks. Future work could also include extending the command to add other policies like choosing the file having the latest mtime as the source, integration with trash xlator wherein the files deleted from the sink are moved to the trash dir etc. I have a few queries regarding the overall design itself. Here are the caveats - Adding a new option rather than extending an existing option 'gluster volume heal'. This does make sense. - Asking user to input the filename which is not necessary as default since such files are already available through the 'gluster volume heal info split-brain' As of today, `info split-brain` is not 100% accurate. It does not list entries that are in gfid split-brain (but we are not attempting to heal that now anyway using a gluster CLI), and for the files that are in (meta)data split-brain, it lists only the last 1024 entries and sometimes contains stale entries. But this will be fixed soon with a gfapi based implementation, much like `heal info` command (glfs-heal.c) in the 3.5 release. What would be ideal is the following making it seamless and much more user friendly Extend the existing CLI as following - 'gluster volume heal split-brain' Agreed. Healing split-brained files is more palpable and has a rather more convincing tone for a sys-admin IMHO. An example version of this extension would be. 'gluster volume heal split-brain [|] In-fact since we already know the list of split-brained files we can just loop through them and ask interactive questions # gluster volume heal split-brain WARNING: About to start fixing split brained files on an active GlusterFS volume, do you wish to proceed? y WARNING: files removed would be actively backed up in '.trash' under your brick path for future recovery. ... WARNING: Found 1000 files in split brain ... File on pair 'host1:host2' is in split brain, file with latest time-stamp found on host1 - Fix? y File on pair 'host3:host5' is in split brain. file with biggest size found on host5 - Fix? y Fixed (1000 split brain files) While we could extend the existing heal command, we also need to provide a policy flag. Entering "y/n" for 1000 files does not make the process any easier. # gluster volume heal split-brain INFO: no split brains present on this The real pain point of fixing the split brain is not taking getfattr outputs and figuring out what is the file under conflict, the real pain point is doing the gfid to the actual file translation when there are millions of files. Gathering this list takes more time than actually fixing the split brain and i have personally spent countless hrs doing these. I don't follow this part completely. If `info split-brain` gives you the gfid instead of file path, you could just go to the .glusterfs/hardlink> and do a setfattr there. Now this list is easily available to GlusterFS and also its gfid to path translation - why isn't it simple enough for us to ask the user what we think is the right choice - we do certainly know which is the bigger file too. My general contention is when we know what is the right thing to do under certain conditions we should be making it easier for example: Directory metadata split brains - we just fix it automatically today but certainly wasn't the case in the past. We learnt to do the right thing when its necessary from experience. Sure, we have info on which the bigger file is or the one with the latest ctime but the bigger file need not always be the source (a truncated file could be the pristine copy). So the choice has to be give
Re: [Gluster-devel] Automated split-brain resolution
On Thu, Aug 7, 2014 at 1:35 AM, Ravishankar N wrote: > > Manual resolution of split-brains [1] has been a tedious task involving > understanding and modifying AFR's changelog extended attributes. To simplify > and to an extent automate this task, we are proposing a new CLI command with > which the user can specify what the source brick/file is, and > automatically heal the files in the appropriate direction. > > Command: gluster volume resolve-split-brain { | > source-brick [] } > > Breaking up the command into its possible options, we have: > > a) gluster volume resolve-split-brain > When this command is executed, AFR will consider the brick having the > highest file size as the source and heal it to all other bricks (including > all other sources and sinks) in that replica subvolume. If the file size is > same in all the bricks, it does *not* heal the file. > > b) gluster volume resolve-split-brain source-brick > [] > > When this command is executed, if is specified, AFR heals the file > from the source-brick to all other bricks of that replica > subvolume. For resolving multiple files, the command must be run > iteratively, once per file. > If is not specified, AFR heals all the files that have an entry in > .glusterfs/indices/xattrop *and* are in split-brain. As before, heals happen > from source-brick to all other bricks. > > Future work could also include extending the command to add other policies > like choosing the file having the latest mtime as the source, integration > with trash xlator wherein the files deleted from the sink are moved to the > trash dir etc. > I have a few queries regarding the overall design itself. Here are the caveats - Adding a new option rather than extending an existing option 'gluster volume heal'. - Asking user to input the filename which is not necessary as default since such files are already available through the 'gluster volume heal info split-brain' What would be ideal is the following making it seamless and much more user friendly Extend the existing CLI as following - 'gluster volume heal split-brain' Healing split-brained files is more palpable and has a rather more convincing tone for a sys-admin IMHO. An example version of this extension would be. 'gluster volume heal split-brain [|] In-fact since we already know the list of split-brained files we can just loop through them and ask interactive questions # gluster volume heal split-brain WARNING: About to start fixing split brained files on an active GlusterFS volume, do you wish to proceed? y WARNING: files removed would be actively backed up in '.trash' under your brick path for future recovery. ... WARNING: Found 1000 files in split brain ... File on pair 'host1:host2' is in split brain, file with latest time-stamp found on host1 - Fix? y File on pair 'host3:host5' is in split brain. file with biggest size found on host5 - Fix? y Fixed (1000 split brain files) # gluster volume heal split-brain INFO: no split brains present on this The real pain point of fixing the split brain is not taking getfattr outputs and figuring out what is the file under conflict, the real pain point is doing the gfid to the actual file translation when there are millions of files. Gathering this list takes more time than actually fixing the split brain and i have personally spent countless hrs doing these. Now this list is easily available to GlusterFS and also its gfid to path translation - why isn't it simple enough for us to ask the user what we think is the right choice - we do certainly know which is the bigger file too. My general contention is when we know what is the right thing to do under certain conditions we should be making it easier for example: Directory metadata split brains - we just fix it automatically today but certainly wasn't the case in the past. We learnt to do the right thing when its necessary from experience. A greater UI experience make it really 'automated' as you intend to do, to make larger decisions ourselves and users are left with simple choices to be made so that its not confusing. -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On Thu, Aug 07, 2014 at 03:17:11PM +0530, Ravishankar N wrote: > On 08/07/2014 03:06 PM, Niels de Vos wrote: > >On Thu, Aug 07, 2014 at 02:05:34PM +0530, Ravishankar N wrote: > >>Manual resolution of split-brains [1] has been a tedious task > >>involving understanding and modifying AFR's changelog extended > >>attributes. To simplify and to an extent automate this task, we are > >>proposing a new CLI command with which the user can specify what > >>the source brick/file is, and automatically heal the files in the > >>appropriate direction. > >What about automatically healing directories that are in split-brain? > > Conservative merge happens when possible while healing directories, > but yes gfid split-brain (one brick having a file with gfid-g1 and > the other having a directory with the same gfid-g1) won't be > resolved with this command. The reason why are not wanting to it > right now is resolving such split-brains involves > unlinking/rmdir'ing one of the entries and is best when we integrate > with trash xlator. Okay, good to know. Thanks! Niels > >Thanks, > >Niels > > > >>Command: gluster volume resolve-split-brain { > >>| source-brick [] } > >> > >>Breaking up the command into its possible options, we have: > >> > >>a) gluster volume resolve-split-brain > >>When this command is executed, AFR will consider the brick having > >>the highest file size as the source and heal it to all other bricks > >>(including all other sources and sinks) in that replica subvolume. > >>If the file size is same in all the bricks, it does *not* heal the > >>file. > >> > >>b) gluster volume resolve-split-brain source-brick > >> [] > >> > >>When this command is executed, if is specified, AFR heals the > >>file from the source-brick to all other bricks of that > >>replica subvolume. For resolving multiple files, the command must be > >>run iteratively, once per file. > >>If is not specified, AFR heals all the files that have an > >>entry in .glusterfs/indices/xattrop *and* are in split-brain. As > >>before, heals happen from source-brick to all other > >>bricks. > >> > >>Future work could also include extending the command to add other > >>policies like choosing the file having the latest mtime as the > >>source, integration with trash xlator wherein the files deleted from > >>the sink are moved to the trash dir etc. > >> > >>Please give feedback on the above. > >> > >>Regards, > >>Ravi > >> > >>[1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md > >>___ > >>Gluster-devel mailing list > >>Gluster-devel@gluster.org > >>http://supercolony.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On 08/07/2014 03:06 PM, Niels de Vos wrote: On Thu, Aug 07, 2014 at 02:05:34PM +0530, Ravishankar N wrote: Manual resolution of split-brains [1] has been a tedious task involving understanding and modifying AFR's changelog extended attributes. To simplify and to an extent automate this task, we are proposing a new CLI command with which the user can specify what the source brick/file is, and automatically heal the files in the appropriate direction. What about automatically healing directories that are in split-brain? Conservative merge happens when possible while healing directories, but yes gfid split-brain (one brick having a file with gfid-g1 and the other having a directory with the same gfid-g1) won't be resolved with this command. The reason why are not wanting to it right now is resolving such split-brains involves unlinking/rmdir'ing one of the entries and is best when we integrate with trash xlator. Thanks, Niels Command: gluster volume resolve-split-brain { | source-brick [] } Breaking up the command into its possible options, we have: a) gluster volume resolve-split-brain When this command is executed, AFR will consider the brick having the highest file size as the source and heal it to all other bricks (including all other sources and sinks) in that replica subvolume. If the file size is same in all the bricks, it does *not* heal the file. b) gluster volume resolve-split-brain source-brick [] When this command is executed, if is specified, AFR heals the file from the source-brick to all other bricks of that replica subvolume. For resolving multiple files, the command must be run iteratively, once per file. If is not specified, AFR heals all the files that have an entry in .glusterfs/indices/xattrop *and* are in split-brain. As before, heals happen from source-brick to all other bricks. Future work could also include extending the command to add other policies like choosing the file having the latest mtime as the source, integration with trash xlator wherein the files deleted from the sink are moved to the trash dir etc. Please give feedback on the above. Regards, Ravi [1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On Thu, Aug 07, 2014 at 02:05:34PM +0530, Ravishankar N wrote: > > Manual resolution of split-brains [1] has been a tedious task > involving understanding and modifying AFR's changelog extended > attributes. To simplify and to an extent automate this task, we are > proposing a new CLI command with which the user can specify what > the source brick/file is, and automatically heal the files in the > appropriate direction. What about automatically healing directories that are in split-brain? Thanks, Niels > > Command: gluster volume resolve-split-brain { > | source-brick [] } > > Breaking up the command into its possible options, we have: > > a) gluster volume resolve-split-brain > When this command is executed, AFR will consider the brick having > the highest file size as the source and heal it to all other bricks > (including all other sources and sinks) in that replica subvolume. > If the file size is same in all the bricks, it does *not* heal the > file. > > b) gluster volume resolve-split-brain source-brick > [] > > When this command is executed, if is specified, AFR heals the > file from the source-brick to all other bricks of that > replica subvolume. For resolving multiple files, the command must be > run iteratively, once per file. > If is not specified, AFR heals all the files that have an > entry in .glusterfs/indices/xattrop *and* are in split-brain. As > before, heals happen from source-brick to all other > bricks. > > Future work could also include extending the command to add other > policies like choosing the file having the latest mtime as the > source, integration with trash xlator wherein the files deleted from > the sink are moved to the trash dir etc. > > Please give feedback on the above. > > Regards, > Ravi > > [1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On 08/07/2014 02:09 PM, Pranith Kumar Karampuri wrote: On 08/07/2014 02:05 PM, Ravishankar N wrote: Manual resolution of split-brains [1] has been a tedious task involving understanding and modifying AFR's changelog extended attributes. To simplify and to an extent automate this task, we are proposing a new CLI command with which the user can specify what the source brick/file is, and automatically heal the files in the appropriate direction. Command: gluster volume resolve-split-brain { | source-brick [] } enclosing <> means the user will have to provide the input. And option should be there for both policies I guess. So the command should probably be: gluster volume resolve-split-brain {bigger-file | source-brick } [] Yes, that makes sense. So the combinations would be a) gluster volume resolve-split-brain bigger-file [] b) gluster volume resolve-split-brain source-brick [] Pranith Breaking up the command into its possible options, we have: a) gluster volume resolve-split-brain When this command is executed, AFR will consider the brick having the highest file size as the source and heal it to all other bricks (including all other sources and sinks) in that replica subvolume. If the file size is same in all the bricks, it does *not* heal the file. b) gluster volume resolve-split-brain source-brick [] When this command is executed, if is specified, AFR heals the file from the source-brick to all other bricks of that replica subvolume. For resolving multiple files, the command must be run iteratively, once per file. If is not specified, AFR heals all the files that have an entry in .glusterfs/indices/xattrop *and* are in split-brain. As before, heals happen from source-brick to all other bricks. Future work could also include extending the command to add other policies like choosing the file having the latest mtime as the source, integration with trash xlator wherein the files deleted from the sink are moved to the trash dir etc. Please give feedback on the above. Regards, Ravi [1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated split-brain resolution
On 08/07/2014 02:05 PM, Ravishankar N wrote: Manual resolution of split-brains [1] has been a tedious task involving understanding and modifying AFR's changelog extended attributes. To simplify and to an extent automate this task, we are proposing a new CLI command with which the user can specify what the source brick/file is, and automatically heal the files in the appropriate direction. Command: gluster volume resolve-split-brain { | source-brick [] } enclosing <> means the user will have to provide the input. And option should be there for both policies I guess. So the command should probably be: gluster volume resolve-split-brain {bigger-file | source-brick } [] Pranith Breaking up the command into its possible options, we have: a) gluster volume resolve-split-brain When this command is executed, AFR will consider the brick having the highest file size as the source and heal it to all other bricks (including all other sources and sinks) in that replica subvolume. If the file size is same in all the bricks, it does *not* heal the file. b) gluster volume resolve-split-brain source-brick [] When this command is executed, if is specified, AFR heals the file from the source-brick to all other bricks of that replica subvolume. For resolving multiple files, the command must be run iteratively, once per file. If is not specified, AFR heals all the files that have an entry in .glusterfs/indices/xattrop *and* are in split-brain. As before, heals happen from source-brick to all other bricks. Future work could also include extending the command to add other policies like choosing the file having the latest mtime as the source, integration with trash xlator wherein the files deleted from the sink are moved to the trash dir etc. Please give feedback on the above. Regards, Ravi [1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel