Re: [Gluster-users] Replicate Over VPN
Thanks guys, for your responses! I get the digest, so I'm going to cut/paste the juicier bits into one message... And a warning... if some of my comments suggest I really don't know what I'm doing - well, that could very well be right. I'm definitely down the learning curve a way - IT is not my real job or background. -- Forwarded message -- From: Alex Chekholko ch...@stanford.edu To: gluster-users@gluster.org Cc: Date: Mon, 17 Mar 2014 11:23:15 -0700 Subject: Re: [Gluster-users] Replicate Over VPN On 03/13/2014 04:50 PM, Brock Nanson wrote: ... 2) I've seen it suggested that the write function isn't considered complete until it's complete on all bricks in the volume. My write speeds would seem to confirm this. Yes, the write will return when all replicas are written. AKA synchronous replication. Usually replication means synchronous replication. OK, so the replication is bit by bit, real time across all the replicas. 'Synchronous' meaning 'common clock' in essence. Is this correct and is there any way to cache the data and allow it to trickle over the link in the background? You're talking about asynchronous replication. Which GlusterFS calls geo-replication. Understood... so this means one direction only in reality, at least until the nut of doing the replication in both directions can be cracked. 'Asynchronous' might be a bit of a misdirection though, because it would suggest (to me at least), communication in *both* directions, but not based on the same clock. ... Geo-replication would seem to be the ideal solution, except for the fact that it apparently only works in one direction (although it was evidently hoped it would be upgraded in 3.4.0 to go in both directions I understand). So if you allow replication to be delayed, and you allow writes on both sides, how would you deal with the same file simultaneously being written on both sides. Which would win in the end? This is the big question of course, and I think the answer requires more knowledge than I have relating to how the replication process occurs. In my unsophisticated way, I would assume that under the hood, gluster would sound something like this whenever a new file is written to Node A: 1) Samba wants to write a file, I'm awake! 2) Hey Node B, wake up, we're about to start writing some bits synchronously. File is called 'junk.txt'. 3) OK, we've both opened that file for writing... 3) Samba, start your transmission. 4) 'write, write, write', in Node A/B perfect harmony 5) Close that file and make sure the file listing is updated. This bit level understanding is something I don't have. At some point, the directory listing would be updated to show the new or updated file. When does that happen? Before or after the file is written? So to answer your question about which file would be win if simultaneously written, I need to understand whether simply having the file opened for writing is enough to take control of it. That is, can Node A tell Node B that junk.txt is going to be written, thus preventing Node B from accepting a local write request? If this is the case, then gluster would only need to send enough information from Node A to Node B to indicate the write was coming and that the file is off limits until further notice. The write could occur as fast as possible on the local node, and dribble across the VPN as fast as the link allows to the other. So #4 above would be 'write, write, write as fast as each node reasonably can, but not necessarily in harmony'. And if communication was broken during the process, the heal function would be called upon to sort it out when communication is restored. So are there any configuration tricks (write-behind, compression etc) that might help me out? Is there a way to fool geo-replication into working in both directions, recognizing my application isn't seeing serious read/write activity and some reasonable amount of risk is acceptable? You're basically talking about running rsyncs in both directions. How will you handle any file conflicts? Yes, I suppose in a way I am, but not based on a cron job... it would ideally be a full time synchronization, like gluster does, but without the requirement of perfect Synchronicity (wasn't that a Police album?). Assuming my kindergarten understanding above could be applied here, the file conflicts would presumably only exist if the VPN link went down, preventing the 'open the file for writing' command to be completed on both ends. If the link went down part way through a dribbling write to Node B, the healing process would presumably have a go at fixing the problem after the link is reinstated. If someone wrote to the remote copy during the outage, the typical heal issues would come into play. -- Alex Chekholko ch...@stanford.edu -- Forwarded message -- From: Alex Chekholko ch...@stanford.edu To: gluster-users@gluster.org Cc
Re: [Gluster-users] Replicate Over VPN
On 03/13/2014 04:50 PM, Brock Nanson wrote: Yeah... I found the Joe Julian do's and don'ts blog post that pretty much says I shouldn't have started down this road too late. But I have started down the road, so I'd like to make the best of it. (http://joejulian.name/blog/glusterfs-replication-dos-and-donts/) ... 2) I've seen it suggested that the write function isn't considered complete until it's complete on all bricks in the volume. My write speeds would seem to confirm this. Yes, the write will return when all replicas are written. AKA synchronous replication. Usually replication means synchronous replication. Is this correct and is there any way to cache the data and allow it to trickle over the link in the background? You're talking about asynchronous replication. Which GlusterFS calls geo-replication. I'm thinking about the write-behind-window size setting, etc. It would be nice if something like DRBD Protocol A could be implemented, where writes are considered complete when the fast local one is done. I realize the potential for data loss if something goes wrong, but in my case the heal would take care of almost every scenario I can envision. Geo-replication would seem to be the ideal solution, except for the fact that it apparently only works in one direction (although it was evidently hoped it would be upgraded in 3.4.0 to go in both directions I understand). So if you allow replication to be delayed, and you allow writes on both sides, how would you deal with the same file simultaneously being written on both sides. Which would win in the end? So are there any configuration tricks (write-behind, compression etc) that might help me out? Is there a way to fool geo-replication into working in both directions, recognizing my application isn't seeing serious read/write activity and some reasonable amount of risk is acceptable? You're basically talking about running rsyncs in both directions. How will you handle any file conflicts? -- Alex Chekholko ch...@stanford.edu ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replicate Over VPN
On 17 Mar 2014, at 23:11, Alex Chekholko ch...@stanford.edu wrote: For your async use case, how often does the shared data change? Perhaps something like a plain rsync every night would be sufficient? Or a ZFS send/receive if that's faster than rsync? (This should really have been in reply to Brock, but I lost his post somewhere) There are some fairly simple solutions for this that may be workable, especially if writes are somewhat constrained. If all reads and writes by a single client go to the same back-end server, perhaps because of cookie or IP-based stickiness, they can cope with longish latency propagating to other servers, read-what-you-just-wrote will always succeed, and simultaneous writes to the same file are very unlikely. A classic use case would be user-uploaded image files for a web server cluster. Bidirectional rsync has serious issues with deletions. Other systems worth looking at include: csync2: http://oss.linbit.com/csync2/ Unison: http://www.cis.upenn.edu/~bcpierce/unison/ Bsync: https://github.com/dooblem/bsync None of these do what gluster does of course, and may create their own issues! Marcus -- Marcus Bointon Technical Director, Synchromedia Limited Creators of http://www.smartmessages.net/ UK 1CRM solutions http://www.syniah.com/ mar...@synchromedia.co.uk | http://www.synchromedia.co.uk/ signature.asc Description: Message signed with OpenPGP using GPGMail ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Replicate Over VPN
Yeah... I found the Joe Julian do's and don'ts blog post that pretty much says I shouldn't have started down this road too late. But I have started down the road, so I'd like to make the best of it. ( http://joejulian.name/blog/glusterfs-replication-dos-and-donts/) So now I'm wondering what I can do to speed things up as much as possible. First of all, I'll describe why I'm in this situation. I need to maintain the same read/write access to all files in two geographically-distant offices. Essentially, the way Autodesk sets up project files makes moving them between offices problematic, so Glusterfs with a fast connection would solve the problem. But the VPN connection is slow for the next year (under 10 mbits, hopefully 100 mbits eventually which still isn't ideally fast). The nature of the file use is such that it would be surprising if a file was accessed from both offices at the same time. 99% of the time I could disconnect the bricks from each other and reconnect at the end of the day and the heal function would do fine after hours, no split-brain problems. Except for that 1%, I could even rsync... In testing (Samba, Glusterfs 3.4.2, Ubuntu 12.04LTS), I'm seeing two issues: 1) Browsing folders is slow; and 2) file writes are being done at approximately the speed of the VPN connection. Before I can attempt to improve the situation (if that's even possible), I'd like to know if my understandings are correct! 1) My reading suggests that every brick's file list is read when a folder is browsed by the client. Meaning the latency of the link is the bottleneck. Does this actually happen? Is there a way to prevent it? If bricks are supposedly exact replicas of each other, why get the file listings from all the other bricks in the volume instead of trusting the local one? If there were actually discrepancies found, wouldn't that suggest a bigger problem with the replication? 2) I've seen it suggested that the write function isn't considered complete until it's complete on all bricks in the volume. My write speeds would seem to confirm this. Is this correct and is there any way to cache the data and allow it to trickle over the link in the background? I'm thinking about the write-behind-window size setting, etc. It would be nice if something like DRBD Protocol A could be implemented, where writes are considered complete when the fast local one is done. I realize the potential for data loss if something goes wrong, but in my case the heal would take care of almost every scenario I can envision. Geo-replication would seem to be the ideal solution, except for the fact that it apparently only works in one direction (although it was evidently hoped it would be upgraded in 3.4.0 to go in both directions I understand). So are there any configuration tricks (write-behind, compression etc) that might help me out? Is there a way to fool geo-replication into working in both directions, recognizing my application isn't seeing serious read/write activity and some reasonable amount of risk is acceptable? ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users