[Gluster-users] encfs over glusterfs
I am considering running encfs over glusterfs so that I can include insecure boxes in the storage backend for our backups running rdiff-backup. Actually, I would prefer the crypto to be done in glusterfs, but I understand that's a little way off. Does anyone use this arrangement -- and are there any gotchas? In case you don't know, here's how the encrypted part of encfs looks: ~/.crypt$ ls -la total 216 drwx--+ 8 andrewm andrewm 4096 2009-01-13 20:47 . drwxr-x---+ 84 andrewm andrewm 12288 2009-01-30 16:08 .. drwx--+ 7 andrewm andrewm 4096 2009-01-30 16:22 0IBZ4H3k84Id,b5SLxcUth7P drwxr-x---+ 43 andrewm andrewm 4096 2009-01-22 13:14 8K8DI0H50LY8sA-k0M4m8au1 -rw-r-+ 1 andrewm andrewm 71923 2009-01-30 13:00 CuTVvivzRtVXW--9aIZNTUGP drwxr-x---+ 28 andrewm andrewm 4096 2009-01-30 16:08 EgCyclxjGnsa0EYf,rpxKpAg -rw-r-+ 1 andrewm andrewm 239 2008-06-24 22:34 .encfs5 drwx--+ 4 andrewm andrewm 4096 2008-12-14 17:49 ldubuvg2gMnrI8hDUiL1QC,m drwxr-x---+ 9 andrewm andrewm 53248 2009-01-08 11:26 lM,Bh2CjGF1qp-ubU98Boqyr drwx--+ 9 andrewm andrewm 4096 2009-01-01 12:23 nuUVB4uIqfyNF8BQwEOmB4gt lrwxrwxrwx+ 1 andrewm andrewm49 2008-09-08 08:57 Zclx5CVF6z20FqkURS7zMhyT - EgCyclxjGnsa0EYf,rpxKpAg/QkAtJiPaFEsGPxqwo9WAFfji And the plaintext part of a encfs filesystem: ~/crypt$ ls -la total 208 drwx--+ 8 andrewm andrewm 4096 2009-01-13 20:47 . drwxr-x---+ 84 andrewm andrewm 12288 2009-01-30 16:08 .. drwxr-x---+ 43 andrewm andrewm 4096 2009-01-22 13:14 work drwx--+ 7 andrewm andrewm 4096 2009-01-30 16:22 stuff drwx--+ 9 andrewm andrewm 4096 2009-01-01 12:23 friends lrwxrwxrwx+ 1 andrewm andrewm49 2008-09-08 08:57 me - foo/baz drwxr-x---+ 9 andrewm andrewm 53248 2009-01-08 11:26 junk drwx--+ 4 andrewm andrewm 4096 2008-12-14 17:49 .Trash-1000 drwxr-x---+ 28 andrewm andrewm 4096 2009-01-30 16:08 morejunk -rw-r-+ 1 andrewm andrewm 70787 2009-01-30 13:00 whatever ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Quota translator troubles
Patrick, Can you try the latest codebase? some fixes in quota have gone in since your first mail. Also, quota is best used on the server side for now. We are still working on making it work well on the client side. Avati On Fri, Jan 30, 2009 at 9:57 PM, Patrick Ruckstuhl patr...@tario.org wrote: Hi Ananth, here's the Config with the Quota on top: Server (two servers with different ip addresses have this config) ### Log brick volume log-posix type storage/posix # POSIX FS translator option directory /data/glusterfs/log# Export this directory end-volume ### Add lock support volume log-locks type features/locks subvolumes log-posix end-volume ### Add performance translator volume log-brick type performance/io-threads option thread-count 8 subvolumes log-locks end-volume ### Add network serving capability to above brick. volume server type protocol/server option transport-type tcp option transport.socket.bind-address 192.168.0.4 subvolumes log-brick option auth.addr.log-brick.allow 192.168.0.2,192.168.0.3,192.168.0.4 # Allow access to brick volume end-volume Client ### Add client feature and attach to remote subvolume volume log-remote-hip type protocol/client option transport-type tcp option remote-host 192.168.0.3 # IP address of the remote brick option remote-subvolume log-brick# name of the remote volume end-volume ### Add client feature and attach to remote subvolume volume log-remote-hop type protocol/client option transport-type tcp option remote-host 192.168.0.4 # IP address of the remote brick option remote-subvolume log-brick# name of the remote volume end-volume ### This is a distributed volume volume log-distribute type cluster/distribute subvolumes log-remote-hip log-remote-hop end-volume ### Add writeback feature volume log-writeback type performance/write-behind option block-size 512KB option cache-size 100MB option flush-behind off subvolumes log-distribute end-volume ### Add quota #volume log type features/quota option disk-usage-limit 100GB subvolumes log-writeback #end-volume This config results in the crash. (if you can't reproduce the crash with this config I can see if I'll be able to run it with gdb) The config that works is Server (two servers with different ip addresses have this config) ### Log brick volume log-posix type storage/posix # POSIX FS translator option directory /data/glusterfs/log# Export this directory end-volume ### Add quota support volume log-quota type features/quota option disk-usage-limit 100GB subvolumes log-posix end-volume ### Add lock support volume log-locks type features/locks subvolumes log-quota end-volume ### Add performance translator volume log-brick type performance/io-threads option thread-count 8 subvolumes log-locks end-volume ### Add network serving capability to above brick. volume server type protocol/server option transport-type tcp option transport.socket.bind-address 192.168.0.4 subvolumes log-brick option auth.addr.log-brick.allow 192.168.0.2,192.168.0.3,192.168.0.4 # Allow access to brick volume end-volume Client ### Add client feature and attach to remote subvolume volume log-remote-hip type protocol/client option transport-type tcp option remote-host 192.168.0.3 # IP address of the remote brick option remote-subvolume log-brick# name of the remote volume end-volume ### Add client feature and attach to remote subvolume volume log-remote-hop type protocol/client option transport-type tcp option remote-host 192.168.0.4 # IP address of the remote brick option remote-subvolume log-brick# name of the remote volume end-volume ### This is a distributed volume volume log-distribute type cluster/distribute subvolumes log-remote-hip log-remote-hop end-volume ### Add writeback feature volume log-writeback type performance/write-behind option block-size 512KB option cache-size 100MB option flush-behind off subvolumes log-distribute end-volume So the only difference is that the quota moved from the top on the client to the bottom on each server. This works but df returns the wrong amount (it returns the total available space, not the space available with the given quota). Regards, Patrick Hi Patrick, It would be great if you could provide us with a bit more information. Could you mail us the specfiles used (both with and without quota) and also the gdb backtrace? Also, if there are any special steps we need to follow to reproduce the issue, please do let us know. Regards, Ananth -Original Message- *From*: Patrick Ruckstuhl patr...@tario.org mailto:patrick%20ruckstuhl%20%3cpatr...@tario.org%3e *To*: gluster-users@gluster.org mailto:gluster-users@gluster.org *Subject*: [Gluster-users] Quota
Re: [Gluster-users] AFR/replicate questions
At 04:20 AM 1/30/2009, Barnaby Gray wrote: I'm in the process of setting up server-side AFR with 2 servers in separate data centres, separated by a WAN. Writes will be relatively few, so we can live with the performance limitations of the WAN. I noticed unexpected performance though when listing directories of around 1k files with ls -al. It looks like for this operation server1 is sending traffic to server2 in the other data centre, which for a read-only operation I wasn't expecting. anytime a directory is accessed, gluster/replicate checks with the other server to see if the information it has is current. It does this because if something changed on the other machine it might not have known about it. If something has changed, it auto-heals. Since gluster doesn't cache information about the other machines in a replicate group, it has to do this everytime. tshark shows a reasonable amount of traffic that looks related to xattr: lots of mentions of filenames and 'trusted.glusterfs.afr.metadata-pending'. I'm using the option read-subvolume local to point read operations to the volume local to either server. this means. Once it's determined that my version of the file is the most up to date, then serve it from my disk (or my favorite server in a client-server model) which is faster than streaming it over the network. Have tried both with and without the performance translators client-side to no avail. We're using 2.0.0rc1. I dont suspect any performance translator can help with this particular situation. Gluster HAS to insure that it's delivering the most up to date version of a file, in order to do that, upon any file request, it has to collaborate with other replicate servers to find out. Apologies if this is an obvious question - can someone spot what I'm doing wrong? one might think, well, both servers haven't lost connections with eachother, so they should be able to assume they're in sync, but this isn't necessarily the case because you can't know the configuration on the other end. there may be a situation where Server A decided Server B was down because of a network latency, so it wrote and updated a file but didn't replicate it to Server A. Server B goes to read that file, if it assumes that all has been well with Server A and doesn't bother checking then it will serve the wrong version of the file. The only way to resolve this would be to make server B responsible for notifying server A when it re-establishes a connection to it. While this seems logical and would improve performance for your case, this would require some sort of journaling on server B. This would be terribly inefficient and would require an additional journal filesystem, or modifying the underlying filesystem in a some way. Then there's the case of changing architecture. If you have 10 servers in your replicate group, you have to run a journal for all 10, lets say you just shut 5 of them off forever, you'd then need a way to clear out the journal for those so that space isn't wasted. So given that gluster wants to be non-intrusive cheers, Barnaby ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster ha/replication/disaster recover(dr translator) wish list
On 01/28/09 01:36, Keith Freedman wrote: At 09:32 AM 1/27/2009, Prabhu Ramachandran wrote: On 01/27/09 02:21, Keith Freedman wrote: At 10:36 AM 1/26/2009, Prabhu Ramachandran wrote: I dont see any problems with your config. other than, if your network connection is very sporadic, then you'll be caught often by waiting for timeouts which will make things seem slower. Network connection isn't usually a problem but when the network does go it could be gone for a while. In the worst case could I temporarily disable the replicate/AFR feature? you wont need to. it'll know that the other replica is down and will continue working without it, then will auto-heal when it comes back up. OK, I am now all set and things work really nicely! Thanks for the answers! so there should be no need to change your configuratoin, unless you hate seeing all the connection down messages in your logfile. Well, I can see a couple of cases where it would be expedient to change the configuration. Lets say one of the machines losses a disk and the machine won't be fixed for a week say. Then, once the machine comes back up, I could use a different configuration and sync the files locally on the machine that is available rather than do it over the slow network. I am guessing that this should work. Anyway, I'd like to thank once again the developers for the excellent software and the support. Thank you for your patient answers! cheers, prabhu ___ Gluster-users mailing list Gluster-users@gluster.org http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users