Re: [Gluster-users] Is there a way to prevent file deletion by accident?
Hi John, Why not use something like git ? Can you also be more clear about the requirements .. like do you want some command for the FS to mark new revisions ( and hence why not use something like git ) or is there some other way you want the revisions tracked ? It might be helpful if you can write some high level specs of what you want, the behaviour of the feature etc. Maybe there are others on the community who might have use for something similar and might want to contribute too. Regards, Tejas. From: John Li j...@jlisbz.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users@gluster.org Sent: Sunday, August 15, 2010 6:50:27 PM Subject: Re: [Gluster-users] Is there a way to prevent file deletion by accident? Hi Tejas, Thanks for your reply. Is there a quick way to bring the trash feature back? How much effort will that take? And for the trash implementation, is this just simply mark the file as deleted but still keep the data so people can recover the deleted file easily by undelete it until some one actually clean the trash or compact the database? rm alias will be hard to maintain on the client side so something embedded into the file system will be more preferable. Ideally the system can keep revisions of the files too. Will gluster provide this feature or are you aware of any system provide embedded revision control? Thanks a lot in advance for my amateur questions. -- John On Sat, Aug 14, 2010 at 12:47 AM, Tejas N. Bhise te...@gluster.com wrote: Hi John, We don't have such a feature. Would you like to elaborate if you need this just as an alternative to aliasing rm on the shell or do you want to put this to use in some different scenerio - like a trash folder. We do have some trash folder functionality, but its old and not supported right now as not many found a good use for it since. Regards, Tejas. - Original Message - From: John Li j...@jlisbz.com To: gluster-users@gluster.org Sent: Saturday, August 14, 2010 10:02:38 AM Subject: [Gluster-users] Is there a way to prevent file deletion by accident? Hi list, Is there any feature in the file system which can help to prevent from file deletion by accident? Thanks. -- John ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Is there a way to prevent file deletion by accident?
btw, are you thinking of something like Rational ClearCase MVFS ? - Original Message - From: Tejas N. Bhise te...@gluster.com To: John Li j...@jlisbz.com Cc: gluster-users@gluster.org Sent: Sunday, August 15, 2010 9:57:00 PM Subject: Re: [Gluster-users] Is there a way to prevent file deletion by accident? Hi John, Why not use something like git ? Can you also be more clear about the requirements .. like do you want some command for the FS to mark new revisions ( and hence why not use something like git ) or is there some other way you want the revisions tracked ? It might be helpful if you can write some high level specs of what you want, the behaviour of the feature etc. Maybe there are others on the community who might have use for something similar and might want to contribute too. Regards, Tejas. From: John Li j...@jlisbz.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users@gluster.org Sent: Sunday, August 15, 2010 6:50:27 PM Subject: Re: [Gluster-users] Is there a way to prevent file deletion by accident? Hi Tejas, Thanks for your reply. Is there a quick way to bring the trash feature back? How much effort will that take? And for the trash implementation, is this just simply mark the file as deleted but still keep the data so people can recover the deleted file easily by undelete it until some one actually clean the trash or compact the database? rm alias will be hard to maintain on the client side so something embedded into the file system will be more preferable. Ideally the system can keep revisions of the files too. Will gluster provide this feature or are you aware of any system provide embedded revision control? Thanks a lot in advance for my amateur questions. -- John On Sat, Aug 14, 2010 at 12:47 AM, Tejas N. Bhise te...@gluster.com wrote: Hi John, We don't have such a feature. Would you like to elaborate if you need this just as an alternative to aliasing rm on the shell or do you want to put this to use in some different scenerio - like a trash folder. We do have some trash folder functionality, but its old and not supported right now as not many found a good use for it since. Regards, Tejas. - Original Message - From: John Li j...@jlisbz.com To: gluster-users@gluster.org Sent: Saturday, August 14, 2010 10:02:38 AM Subject: [Gluster-users] Is there a way to prevent file deletion by accident? Hi list, Is there any feature in the file system which can help to prevent from file deletion by accident? Thanks. -- John ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Is there a way to prevent file deletion by accident?
From what you explain, you probably need an FS which has almost WORM kind of capability and where each open()/close() pair will cause a new version. We currently don't have that kind of capability but this certainly seems like a good idea. From: John Li j...@jlisbz.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users@gluster.org Sent: Sunday, August 15, 2010 10:33:49 PM Subject: Re: [Gluster-users] Is there a way to prevent file deletion by accident? Yes. Here is the brief overview. My client is trying to build a box.net type of service. Since the documents are so important for the service and they have a team of not very skilled engineers, how to prevent system admins to delete the file by accident is a very specific requirement of the underling file system/service. GIT is not considered the better option because there is no way to get back anything if someone just delete the files from OS level. Because it's still in very early stage of the project, there is not much requirement yet. So I figure the purpose of the project may help you understand where all the requirement is from. We are very open for ideas, suggestions and what's the industry's best practice as well. Hope someone can point me to some better options. Thanks a lot in advance. -- John On Sun, Aug 15, 2010 at 12:27 PM, Tejas N. Bhise te...@gluster.com wrote: Hi John, Why not use something like git ? Can you also be more clear about the requirements .. like do you want some command for the FS to mark new revisions ( and hence why not use something like git ) or is there some other way you want the revisions tracked ? It might be helpful if you can write some high level specs of what you want, the behaviour of the feature etc. Maybe there are others on the community who might have use for something similar and might want to contribute too. Regards, Tejas. From: John Li j...@jlisbz.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users@gluster.org Sent: Sunday, August 15, 2010 6:50:27 PM Subject: Re: [Gluster-users] Is there a way to prevent file deletion by accident? Hi Tejas, Thanks for your reply. Is there a quick way to bring the trash feature back? How much effort will that take? And for the trash implementation, is this just simply mark the file as deleted but still keep the data so people can recover the deleted file easily by undelete it until some one actually clean the trash or compact the database? rm alias will be hard to maintain on the client side so something embedded into the file system will be more preferable. Ideally the system can keep revisions of the files too. Will gluster provide this feature or are you aware of any system provide embedded revision control? Thanks a lot in advance for my amateur questions. -- John On Sat, Aug 14, 2010 at 12:47 AM, Tejas N. Bhise te...@gluster.com wrote: Hi John, We don't have such a feature. Would you like to elaborate if you need this just as an alternative to aliasing rm on the shell or do you want to put this to use in some different scenerio - like a trash folder. We do have some trash folder functionality, but its old and not supported right now as not many found a good use for it since. Regards, Tejas. - Original Message - From: John Li j...@jlisbz.com To: gluster-users@gluster.org Sent: Saturday, August 14, 2010 10:02:38 AM Subject: [Gluster-users] Is there a way to prevent file deletion by accident? Hi list, Is there any feature in the file system which can help to prevent from file deletion by accident? Thanks. -- John ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Is there a way to prevent file deletion by accident?
Hi John, We don't have such a feature. Would you like to elaborate if you need this just as an alternative to aliasing rm on the shell or do you want to put this to use in some different scenerio - like a trash folder. We do have some trash folder functionality, but its old and not supported right now as not many found a good use for it since. Regards, Tejas. - Original Message - From: John Li j...@jlisbz.com To: gluster-users@gluster.org Sent: Saturday, August 14, 2010 10:02:38 AM Subject: [Gluster-users] Is there a way to prevent file deletion by accident? Hi list, Is there any feature in the file system which can help to prevent from file deletion by accident? Thanks. -- John ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Howto Unify Storage from other server without replication
Hi Christeddy, If you are just starting out with GlusterFS, I would request you to use the latest version ( 3.0.5 ) and also request you to use distribute, rather than unify. Please use the volgen command to create your volumes. http://www.gluster.com/community/documentation/index.php/Glusterfs-volgen_Reference_Page Let me know how it goes. Regards, Tejas. - Original Message - From: Christeddy Parapat c.para...@gmail.com To: gluster-users@gluster.org Sent: Monday, August 9, 2010 1:11:56 PM Subject: [Gluster-users] Howto Unify Storage from other server without replication Hi, I really need some body help here. I try to make 3 servers, 2 as server, and 1 as a client. I want to use cluster/unify. But when i try to run, it always tell not connected. But if i comment the cluster/unify configuration it come to connected. Is there a way to make the glusterfs only able to unify all resources storage from other server in one pool data server only ? Let me shared my configuration here ; Server 1 Configuration (glusterfsd.vol) [r...@fs-lb1 glusterfs]# cat glusterfsd.vol --- volume brick type storage/posix option directory /data end-volume volume server type protocol/server option transport-type tcp option transport.socket.bind-address 192.168.0.10 # Default is to listen on all interfaces option transport.socket.listen-port 6996 # Default is 6996 option client-volume-filename /etc/glusterfs/glusterfs-client.vol subvolumes brick option auth.addr.brick.allow 192.168.0.* # Allow access to brick volume end-volume volume brick-ns type storage/posix# POSIX FS translator option directory /data/export-ns # Export this directory end-volume volume servers type protocol/server option transport-type tcp # For TCP/IP transport option transport.socket.listen-port 6999 # Default is 6996 subvolumes brick-ns option auth.addr.brick-ns.allow * # access to brick volume end-volume --- Server 2 Configuration (glusterfsd.vol) [r...@fs1 glusterfs]# cat glusterfsd.vol --- volume brick2 type storage/posix # POSIX FS translator option directory /Data# Export this directory end-volume volume server type protocol/server option transport-type tcp option transport.socket.bind-address 192.168.0.11 # Default is to listen on all interfaces option transport.socket.listen-port 6996 # Default is 6996 subvolumes brick2 option auth.addr.brick2.allow * # Allow access to brick volume end-volume --- Client Configuration (glusterfs.vol); [r...@appman glusterfs]# cat glusterfs.vol --- volume client type protocol/client option transport-type tcp option remote-host 192.168.0.10 # IP address of the remote brick option remote-subvolume brick# name of the remote volume end-volume volume client2 type protocol/client option transport-type tcp option remote-host 192.168.0.11 option remote-subvolume brick2 end-volume volume client-ns type protocol/client option transport-type tcp # for TCP/IP transport option remote-host 192.168.0.10 # IP address of the remote brick option transport.socket.remote-port 6999 # default server port is 6996 option remote-subvolume brick-ns # name of the remote volume end-volume volume unify type cluster/unify # option scheduler rr option self-heal background # foreground off # default is foreground option scheduler alu option alu.limits.min-free-disk 5% #% option alu.limits.max-open-files 1
Re: [Gluster-users] Howto Unify Storage from other server without replication
Hi Christeddy, Thank you for the kind words and happy to see GlusterFS working for you. You can download 3.0.5 from - ftp://ftp.gluster.org/pub/gluster/glusterfs/3.0/3.0.5/ You can continue to use the existing config. Just the binaries and restart on client and server. unify translator is now legacy and hence distribute translator worked for you. Let me know how it goes. Regards, Tejas. - Original Message - From: Christeddy Parapat c.para...@gmail.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users@gluster.org Sent: Monday, August 9, 2010 1:31:17 PM Subject: Re: [Gluster-users] Howto Unify Storage from other server without replication Hi Tejas, Glad to get response from you. Ya, I am use 3.0.4 version. Can you give me the link to download the 3.0.5 version ? Btw, Thank you very much for your response, Tejas. Now I was able to unify all the storage server with cluster/distribute. Now, my storage server have a big pool storage data now. I am really happy. You are very grade. Once more, thank you very much Tejas. Cheers, Christeddy. On Aug 9, 2010, at 2:48 PM, Tejas N. Bhise wrote: Hi Christeddy, If you are just starting out with GlusterFS, I would request you to use the latest version ( 3.0.5 ) and also request you to use distribute, rather than unify. Please use the volgen command to create your volumes. http://www.gluster.com/community/documentation/index.php/Glusterfs-volgen_Reference_Page Let me know how it goes. Regards, Tejas. - Original Message - From: Christeddy Parapat c.para...@gmail.com To: gluster-users@gluster.org Sent: Monday, August 9, 2010 1:11:56 PM Subject: [Gluster-users] Howto Unify Storage from other server without replication Hi, I really need some body help here. I try to make 3 servers, 2 as server, and 1 as a client. I want to use cluster/unify. But when i try to run, it always tell not connected. But if i comment the cluster/unify configuration it come to connected. Is there a way to make the glusterfs only able to unify all resources storage from other server in one pool data server only ? Let me shared my configuration here ; Server 1 Configuration (glusterfsd.vol) [r...@fs-lb1 glusterfs]# cat glusterfsd.vol --- volume brick type storage/posix option directory /data end-volume volume server type protocol/server option transport-type tcp option transport.socket.bind-address 192.168.0.10 # Default is to listen on all interfaces option transport.socket.listen-port 6996 # Default is 6996 option client-volume-filename /etc/glusterfs/glusterfs-client.vol subvolumes brick option auth.addr.brick.allow 192.168.0.* # Allow access to brick volume end-volume volume brick-ns type storage/posix# POSIX FS translator option directory /data/export-ns # Export this directory end-volume volume servers type protocol/server option transport-type tcp # For TCP/IP transport option transport.socket.listen-port 6999 # Default is 6996 subvolumes brick-ns option auth.addr.brick-ns.allow *# access to brick volume end-volume --- Server 2 Configuration (glusterfsd.vol) [r...@fs1 glusterfs]# cat glusterfsd.vol --- volume brick2 type storage/posix # POSIX FS translator option directory /Data# Export this directory end-volume volume server type protocol/server option transport-type tcp option transport.socket.bind-address 192.168.0.11 # Default is to listen on all interfaces option transport.socket.listen-port 6996 # Default is 6996 subvolumes brick2 option auth.addr.brick2.allow * # Allow access to brick volume end-volume --- Client Configuration (glusterfs.vol); [r...@appman glusterfs]# cat glusterfs.vol
Re: [Gluster-users] New server setup does not make hard drive bootable
Hi All, I would request you all to file bug reports if you feel there is something amiss. If it's a known issue, it will get mentioned that it's a duplicate and closed. Regards, Tejas Bhise. - Original Message - From: Ray Barnes tical@gmail.com To: gluster-users@gluster.org Sent: Thursday, August 5, 2010 10:41:41 AM Subject: [Gluster-users] New server setup does not make hard drive bootable Hi all. The subject says it all. After installing 3.0.4 in a new first server setup, the system does not boot from the hard drive because the bootable flag is not set in fdisk. Hardware is an Intel DG35EC board with a Q6600 processor, 4GB of RAM and Seagate ST3500320AS drive (500gb). I was able to get it to boot normally by simply enabling the boot flag in fdisk and nothing more, using a CentOS rescue CD. Is this a known issue? Should I be filing a bug report? -Ray ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Gluster Native NFS and Win7 NFS Client
Dear community members, Wanted to do a quick check on whether someone in the community has used Windows 7 native NFS client with the gluster native nfs server. If yes, then what was the experience like ? Regards, Tejas. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Split Brain?
Is this over the WAN replicated setup ? Or a local setup ? - Original Message - From: Count Zero cou...@gmail.com To: Gluster General Discussion List gluster-users@gluster.org Sent: Wednesday, August 4, 2010 8:38:02 AM Subject: [Gluster-users] Split Brain? I am seeing a lot of those in my cluster client's log file: [2010-08-04 04:06:30] E [afr-self-heal-data.c:705:afr_sh_data_fix] replicate: Unable to self-heal contents of '/lib/wms-server.jar' (possible split-brain). Please delete the file from all but the preferred subvolume. How do I recover from this without losing my files? Thanks, CountZ ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfsd crashes when using quota translator
Hi Peter, Please go ahead and open a bug for this. I will have someone take a look. The volume quota translator is not officially supported, though some users use it with success. We do however try to actively fix defects in it even though not officially supported. Does it crash every time you create a file after a restart ? Regards, Tejas. - Original Message - From: Peter Mueller p...@gloud.de To: gluster-users@gluster.org Sent: Wednesday, July 28, 2010 8:39:44 PM Subject: [Gluster-users] glusterfsd crashes when using quota translator Hi, i tried to use the quota translator to limit the storage for a specific volume. I can mount the volume and see the limit with df. But as soon as i try to create a file on my mounted volume, the glusterfsd crashes. Configuration and Logs: Version : glusterfs 3.0.0 built on Jul 23 2010 16:45:59 git: 2.0.1-886-g8379edd Starting Time: 2010-07-25 19:27:56 Command line : /sbin/glusterfsd -f /etc/glusterfs/glusterfsd.vol PID : 27113 System name : Linux Nodename : inst-209.install.de Kernel Release : 2.6.18-194.el5 Hardware Identifier: x86_64 Given volfile: +--+ 1: 2: volume v8705-posix 3: type storage/posix 4: option directory /data/v8705 5: end-volume 6: 7: volume v8705-quota 8: type features/quota 9: option disk-usage-limit 100GB 10: subvolumes v8705-posix 11: end-volume 12: 13: volume v8705-brick 14: type features/locks 15: subvolumes v8705-quota 16: end-volume 17: 18: volume v8704-posix 19: type storage/posix 20: option directory /data/v8704 21: end-volume 22: 23: volume v8704-quota 24: type features/quota 25: option disk-usage-limit 100GB 26: subvolumes v8704-posix 27: end-volume 28: 29: volume v8704-brick 30: type features/locks 31: subvolumes v8704-quota 32: end-volume 33: 34: 35: volume server 36: type protocol/server 37: option transport-type tcp 38: option listen-port 6996 39: option bind-address 10.0.1.122 40: 41: option auth.login.v8705-brick.allow v8705 42: option auth.login.v8705.password 43: 44: option auth.login.v8704-brick.allow v8704 45: option auth.login.v8704.password 46: 47: subvolumes v8705-brick v8704-brick 48: 49: end-volume +--+ [2010-07-25 19:27:56] W [xlator.c:655:validate_xlator_volume_options] server: option 'bind-address' is deprecated, preferred is 'transport.socket.bind-address', continuing with correction [2010-07-25 19:27:56] W [xlator.c:655:validate_xlator_volume_options] server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction [2010-07-25 19:27:56] N [glusterfsd.c:1361:main] glusterfs: Successfully started [2010-07-25 19:27:58] N [server-protocol.c:5809:mop_setvolume] server: accepted client from 10.0.1.145:1023 [2010-07-25 19:27:58] N [server-protocol.c:5809:mop_setvolume] server: accepted client from 10.0.1.145:1022 [2010-07-25 19:28:34] N [server-protocol.c:5809:mop_setvolume] server: accepted client from 10.0.1.122:1022 [2010-07-25 19:28:34] N [server-protocol.c:5809:mop_setvolume] server: accepted client from 10.0.1.122:1023 pending frames: frame : type(1) op(UNLINK) frame : type(1) op(UNLINK) patchset: 2.0.1-886-g8379edd signal received: 11 time of crash: 2010-07-25 19:28:49 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.0.0 /lib64/libc.so.6[0x3b870302d0] /usr/local/lib/glusterfs/3.0.0/xlator/protocol/server.so(server_unlink_cbk+0xd5)[0x2ae3d57b4235] /usr/local/lib/libglusterfs.so.0[0x2ae3d46c9604] /usr/local/lib/glusterfs/3.0.0/xlator/features/quota.so(quota_unlink_cbk+0x95)[0x2ae3d538a955] /usr/local/lib/glusterfs/3.0.0/xlator/storage/posix.so(posix_unlink+0x1f3)[0x2ae3d517e613] /usr/local/lib/glusterfs/3.0.0/xlator/features/quota.so(quota_unlink_stat_cbk+0xd4)[0x2ae3d5389234] /usr/local/lib/glusterfs/3.0.0/xlator/storage/posix.so(posix_stat+0x137)[0x2ae3d517f9b7] /usr/local/lib/glusterfs/3.0.0/xlator/features/quota.so(quota_unlink+0xf1)[0x2ae3d538a501] /usr/local/lib/libglusterfs.so.0(default_unlink+0xcb)[0x2ae3d46c954b] /usr/local/lib/glusterfs/3.0.0/xlator/protocol/server.so(server_unlink_resume+0xd1)[0x2ae3d57b45a1] /usr/local/lib/glusterfs/3.0.0/xlator/protocol/server.so(server_resolve_done+0x30)[0x2ae3d57b4e20] /usr/local/lib/glusterfs/3.0.0/xlator/protocol/server.so(server_resolve_all+0xaf)[0x2ae3d57b575f] /usr/local/lib/glusterfs/3.0.0/xlator/protocol/server.so(server_resolve+0x7f)[0x2ae3d57b569f] /usr/local/lib/glusterfs/3.0.0/xlator/protocol/server.so(server_resolve_all+0xa8)[0x2ae3d57b5758]
Re: [Gluster-users] Mirror volumes with odd number of servers
ok, will request someone from my team to send it .. but you will need to remember the caveat about the inadequate testing and non availability of quick support for the non-standard volume configs :-) - Original Message - From: James Burnash jburn...@knight.com To: Gluster General Discussion List gluster-users@gluster.org Sent: Wednesday, July 28, 2010 10:11:08 PM Subject: Re: [Gluster-users] Mirror volumes with odd number of servers Thanks Tejas. If an actual example of the glusterfs.vol was available showing this setup was available, that would be a valuable sanity check against what I will build. James Burnash, Unix Engineering T. 201-239-2248 jburn...@knight.com | www.knight.com 545 Washington Ave. | Jersey City, NJ -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Tejas N. Bhise Sent: Wednesday, July 28, 2010 12:26 PM To: Gluster General Discussion List Subject: Re: [Gluster-users] Mirror volumes with odd number of servers James, You can do that, but you will have to hand craft the volume file. volgen looks for even numbers as you have already noticed. With hand crafting of volume files, you can even have a replicate distribute .. so e.g. 2 copies of file, and each replicate node of the graph can have a differing number of distribute servers under it. something like this can be done to have a replica count of two with 5 servers... | R1 |--R1D1 mount--- |---|--R1D2 | | R2 |---R2D1 |---|---R2D2 |---R2D3 The design of translators is so modular that they can be used in any combination. This however used to lead to confusion and hence we developed volgen that produced easy to use and default best-fit configuration. From the perspective of official support, we typically only support configs that volgen produces. For other configs, we do fix the bugs, but how fast depends on the config and translator seeing the bug. Let me know if you have more questions about this. Regards, Tejas Bhise. - Original Message - From: James Burnash jburn...@knight.com To: Gluster General Discussion List gluster-users@gluster.org Sent: Wednesday, July 28, 2010 6:41:02 PM Subject: Re: [Gluster-users] Mirror volumes with odd number of servers Carl - could you possibly provide an example of a configuration using an odd number of servers? glusterfs-volgen is unhappy when you don't give an even number. Thanks! James Burnash, Unix Engineering T. 201-239-2248 jburn...@knight.com | www.knight.com 545 Washington Ave. | Jersey City, NJ -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Craig Carl Sent: Wednesday, July 28, 2010 12:16 AM To: Gluster General Discussion List Subject: Re: [Gluster-users] Mirror volumes with odd number of servers Brock - It is completely possible using Gluster File System. We haven't exposed that option via the Gluster Storage Platform GUI yet. Craig -- Craig Carl Gluster, Inc. Cell - (408) 829-9953 (California, USA) Gtalk - craig.c...@gmail.com From: brock brown brow...@u.washington.edu To: gluster-users@gluster.org Sent: Tuesday, July 27, 2010 12:51:19 PM Subject: [Gluster-users] Mirror volumes with odd number of servers All the documentation refers to using multiples of 2 when setting up mirrored volumes. Also the Gluster Storage Platform will not allow it. Is this impossible and if so why? Thanks, Brock _ The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. http://www.windowslive.com/campaign/thenewbusy?tile=multiaccountocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users DISCLAIMER: This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com
Re: [Gluster-users] glusterfsd crashes when using quota translator
Hi Peter, We opened a bug for this issue - http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1243 Regards, Tejas. - Original Message - From: Tejas N. Bhise te...@gluster.com To: Gluster General Discussion List gluster-users@gluster.org Sent: Wednesday, July 28, 2010 9:45:40 PM Subject: Re: [Gluster-users] glusterfsd crashes when using quota translator Hi Peter, Please go ahead and open a bug for this. I will have someone take a look. The volume quota translator is not officially supported, though some users use it with success. We do however try to actively fix defects in it even though not officially supported. Does it crash every time you create a file after a restart ? Regards, Tejas. - Original Message - From: Peter Mueller p...@gloud.de To: gluster-users@gluster.org Sent: Wednesday, July 28, 2010 8:39:44 PM Subject: [Gluster-users] glusterfsd crashes when using quota translator Hi, i tried to use the quota translator to limit the storage for a specific volume. I can mount the volume and see the limit with df. But as soon as i try to create a file on my mounted volume, the glusterfsd crashes. Configuration and Logs: Version : glusterfs 3.0.0 built on Jul 23 2010 16:45:59 git: 2.0.1-886-g8379edd Starting Time: 2010-07-25 19:27:56 Command line : /sbin/glusterfsd -f /etc/glusterfs/glusterfsd.vol PID : 27113 System name : Linux Nodename : inst-209.install.de Kernel Release : 2.6.18-194.el5 Hardware Identifier: x86_64 Given volfile: +--+ 1: 2: volume v8705-posix 3: type storage/posix 4: option directory /data/v8705 5: end-volume 6: 7: volume v8705-quota 8: type features/quota 9: option disk-usage-limit 100GB 10: subvolumes v8705-posix 11: end-volume 12: 13: volume v8705-brick 14: type features/locks 15: subvolumes v8705-quota 16: end-volume 17: 18: volume v8704-posix 19: type storage/posix 20: option directory /data/v8704 21: end-volume 22: 23: volume v8704-quota 24: type features/quota 25: option disk-usage-limit 100GB 26: subvolumes v8704-posix 27: end-volume 28: 29: volume v8704-brick 30: type features/locks 31: subvolumes v8704-quota 32: end-volume 33: 34: 35: volume server 36: type protocol/server 37: option transport-type tcp 38: option listen-port 6996 39: option bind-address 10.0.1.122 40: 41: option auth.login.v8705-brick.allow v8705 42: option auth.login.v8705.password 43: 44: option auth.login.v8704-brick.allow v8704 45: option auth.login.v8704.password 46: 47: subvolumes v8705-brick v8704-brick 48: 49: end-volume +--+ [2010-07-25 19:27:56] W [xlator.c:655:validate_xlator_volume_options] server: option 'bind-address' is deprecated, preferred is 'transport.socket.bind-address', continuing with correction [2010-07-25 19:27:56] W [xlator.c:655:validate_xlator_volume_options] server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction [2010-07-25 19:27:56] N [glusterfsd.c:1361:main] glusterfs: Successfully started [2010-07-25 19:27:58] N [server-protocol.c:5809:mop_setvolume] server: accepted client from 10.0.1.145:1023 [2010-07-25 19:27:58] N [server-protocol.c:5809:mop_setvolume] server: accepted client from 10.0.1.145:1022 [2010-07-25 19:28:34] N [server-protocol.c:5809:mop_setvolume] server: accepted client from 10.0.1.122:1022 [2010-07-25 19:28:34] N [server-protocol.c:5809:mop_setvolume] server: accepted client from 10.0.1.122:1023 pending frames: frame : type(1) op(UNLINK) frame : type(1) op(UNLINK) patchset: 2.0.1-886-g8379edd signal received: 11 time of crash: 2010-07-25 19:28:49 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.0.0 /lib64/libc.so.6[0x3b870302d0] /usr/local/lib/glusterfs/3.0.0/xlator/protocol/server.so(server_unlink_cbk+0xd5)[0x2ae3d57b4235] /usr/local/lib/libglusterfs.so.0[0x2ae3d46c9604] /usr/local/lib/glusterfs/3.0.0/xlator/features/quota.so(quota_unlink_cbk+0x95)[0x2ae3d538a955] /usr/local/lib/glusterfs/3.0.0/xlator/storage/posix.so(posix_unlink+0x1f3)[0x2ae3d517e613] /usr/local/lib/glusterfs/3.0.0/xlator/features/quota.so(quota_unlink_stat_cbk+0xd4)[0x2ae3d5389234] /usr/local/lib/glusterfs/3.0.0/xlator/storage/posix.so(posix_stat+0x137)[0x2ae3d517f9b7] /usr/local/lib/glusterfs/3.0.0/xlator/features/quota.so(quota_unlink+0xf1)[0x2ae3d538a501] /usr/local/lib/libglusterfs.so.0(default_unlink+0xcb)[0x2ae3d46c954b] /usr/local/lib/glusterfs/3.0.0/xlator/protocol/server.so(server_unlink_resume+0xd1)[0x2ae3d57b45a1] /usr/local/lib/glusterfs/3.0.0/xlator/protocol/server.so
Re: [Gluster-users] Should the gluster host access files via gluster?
Yes. All access to gluster data must be over the gluster mount. Regards, Tejas. - Original Message - From: Morgan O'Neal mon...@alpineinternet.com To: gluster-users@gluster.org Sent: Thursday, July 29, 2010 5:12:54 AM Subject: [Gluster-users] Should the gluster host access files via gluster? I have a gluster host server with a few clients connected via a lan network. When I make a lot of changes to the files on the host directly the clients cant access them sometimes. Should the host machine access the files via a gluster mount to fix this problem? -- Morgan O'Neal mon...@alpineinternet.com http://twitter.com/morganoneal Schedule me - http://tungle.me/morganoneal 901-4MONEAL (901-466-6325) ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Performance degrade
Hi Roland, You will have place the files inline. The list does not take attachments. Kindly resend. Regards, Tejas. - Original Message - From: Roland Rabben rol...@jotta.no To: gluster-users@gluster.org Sent: Thursday, July 15, 2010 3:09:51 PM Subject: [Gluster-users] Performance degrade Hi, I am looking for some help and advice from the community. I am experiencing a performance degrade on my GlusterFS system after upgrading from 2.09 to 3.05. My setup has 2 clients and 2 storage servers configured in a distributed / replicated setup over TCP Gigabit Ethernet. My storage servers each has 36 x 1,5 TB SATA disks and a 4 core Intel I7 processor with 6 GB RAM. I am runnig Ubuntu 9.04 with latest updates. My application is mainly a write once / read many app with small and large files. Now it seems write performance has been degraded. The GlusterFS servers are showing very high CPU usage. Around 200 % for the glusterfsd process. RAM usage seems to be very low, only about 148 MB. Should I be using, or not using, the performance translators I am using? I need help and advice tuning my setup for better performance and overall Gluster greatness. Attached are examples of client and server vol files. Thanks in advance. Best regards Roland Rabben Founder CEO Jotta AS Cell: +47 90 85 85 39 Phone: +47 21 04 29 00 Email: rol...@jotta.no ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] HA NFS
Hi Koen, We are infact looking at integrating this into the platform. Regards, Tejas. - Original Message - From: Layer7 Consultancy i...@layer7.be To: Gluster General Discussion List gluster-users@gluster.org Sent: Wednesday, July 14, 2010 1:06:38 PM Subject: Re: [Gluster-users] HA NFS Thanks Vikas, I think I have enough information now to set up a proof of concept. Perhaps this (ucarp install by user) is something to be mentioned on the website? Even better would be integration into Gluster Platform ;-) Best regards, Koen 2010/7/14 Shehjar Tikoo shehj...@gluster.com: Layer7 Consultancy wrote: Hello Vikas, How do the NFS clients react to such a failover? Will the I/O just temporarily stall and then proceed once connectivity is resumed or will running transactions be aborted? There may be some delay as the IP migrates but overall, the NFS client's retransmission behaviour will be able continue the transactions with no interruptions. Best regards, Koen 2010/7/13 Vikas Gorur vi...@gluster.com: Your understanding is correct. Whether using the native NFS translator or re-export through unfsd, NFS clients only connect to the management IP. If you want failover for the NFS server, you'd setup a virtual IP using ucarp (http://www.ucarp.org) and the clients would only use this virtual IP. -- Vikas Gorur Engineer - Gluster, Inc. -- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster
Hi Nathan, What version of glusterfs were you using ? 2.x.x or 3.x.x ? Regards, Tejas. - Original Message - From: Nathan Stratton nat...@robotics.net To: Gluster General Discussion List gluster-users@gluster.org Sent: Saturday, July 3, 2010 11:04:32 PM Subject: Re: [Gluster-users] Gluster On Sat, 3 Jul 2010, Andy Pace wrote: Not to go off topic too much but what was your problem with gluster and xen? I'm preparing to roll out a rather large xen based product and utilize gluster to store the sparse image files for each instance. Haven't ran into many issues yet, so I'm just wondering if you know something I don't :) Bad bad bad news... The biggest issue may be fixed now, but once a xen file was opened Gluster would only write to one node. A smaller issue is you have to use file rather then tap:aio and file is biggest and can cause corruption in xen. There was one other issue, but don't remember what it was right now. Nathan StrattonCTO, BlinkMind, Inc. nathan at robotics.net nathan at blinkmind.com http://www.robotics.nethttp://www.blinkmind.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Running Gluster client/server on single process
Hi Bryan, 3.0.5 should be out soon. If you want to do some testing before it's officially out, you can try the latest release candidate. You don't need to patch at this stage. Let me know if you know how to get the release candidates and use them. Regards, Tejas. - Original Message - From: Bryan McGuire bmcgu...@newnet66.org To: Tejas N. Bhise te...@gluster.com Cc: gluster-users gluster-users@gluster.org Sent: Sunday, June 20, 2010 7:46:55 PM Subject: Re: [Gluster-users] Running Gluster client/server on single process Tejas, Any idea when 3.0.5 will be released? I am very anxious for these patches to be in production. On another note, I am very new to Gluster let alone Linux. Could you, or someone else, give me some guidance (How to) in applying the patches. I would like to test them for now? Bryan McGuire On May 19, 2010, at 8:06 AM, Tejas N. Bhise wrote: Roberto, We recently made some code changes we think will considerably help small file performance - selective readdirp - http://patches.gluster.com/patch/3203/ dht lookup revalidation optimization - http://patches.gluster.com/patch/3204/ updated write-behind default values - http://patches.gluster.com/patch/3223/ These are tentatively scheduled to go into 3.0.5. If its possible for you, I would suggest you test them in a non- production environment and see if it helps with distribute config itself. Please do not use in production, for that wait for the release which these patches go in. Do let me know if you have any questions about this. Regards, Tejas. - Original Message - From: Roberto Franchini ro.franch...@gmail.com To: gluster-users gluster-users@gluster.org Sent: Wednesday, May 19, 2010 5:29:47 PM Subject: Re: [Gluster-users] Running Gluster client/server on single process On Sat, May 15, 2010 at 10:06 PM, Craig Carl cr...@gluster.com wrote: Robert - NUFA has been deprecated and doesn't apply to any recent version of Gluster. What version are you running? ('glusterfs --version') We run 3.0.4 on ubuntu 9.10 and 10.04 server. Is there a way to mimic NUFA behaviour? We are using gluster to store Lucene indexes. Indexes are created locally from milions of small files and then copied to the storage. I tried read this little files from gluster but was too slow. So maybe a NUFA way, e.g. prefer local disk for read, could improve performance. Let me know :) At the moment we use dht/replicate: #CLIENT volume remote1 type protocol/client option transport-type tcp option remote-host zeus option remote-subvolume brick end-volume volume remote2 type protocol/client option transport-type tcp option remote-host hera option remote-subvolume brick end-volume volume remote3 type protocol/client option transport-type tcp option remote-host apollo option remote-subvolume brick end-volume volume remote4 type protocol/client option transport-type tcp option remote-host demetra option remote-subvolume brick end-volume volume remote5 type protocol/client option transport-type tcp option remote-host ade option remote-subvolume brick end-volume volume remote6 type protocol/client option transport-type tcp option remote-host athena option remote-subvolume brick end-volume volume replicate1 type cluster/replicate subvolumes remote1 remote2 end-volume volume replicate2 type cluster/replicate subvolumes remote3 remote4 end-volume volume replicate3 type cluster/replicate subvolumes remote5 remote6 end-volume volume distribute type cluster/distribute subvolumes replicate1 replicate2 replicate3 end-volume volume writebehind type performance/write-behind option window-size 1MB subvolumes distribute end-volume volume quickread type performance/quick-read option cache-timeout 1 # default 1 second # option max-file-size 256KB# default 64Kb subvolumes writebehind end-volume ### Add io-threads for parallel requisitions volume iothreads type performance/io-threads option thread-count 16 # default is 16 subvolumes quickread end-volume #SERVER volume posix type storage/posix option directory /data/export end-volume volume locks type features/locks subvolumes posix end-volume volume brick type performance/io-threads option thread-count 8 subvolumes locks end-volume volume server type protocol/server option transport-type tcp option auth.addr.brick.allow * subvolumes brick end-volume -- Roberto Franchini http://www.celi.it http://www.blogmeter.it http://www.memesphere.it Tel +39.011.562.71.15 jabber:ro.franch...@gmail.com skype:ro.franchini ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster
Re: [Gluster-users] health monitoring of replicated volume
We plan to add a method to check if a replicated volume is out of sync. We also plan to add a way to find out which copy is good and which out of sync - we intend to do this for both volume/subvolume and also at the file level. The file level is helpful to monitor vm image file copies and their currency. There is however no planned date as yet by when this would be implemented. Will post a note when it's done. Let me know if you have any questions. Regards, Tejas. - Original Message - From: Jenn Fountain jfoun...@comcast.net To: Gluster General Discussion List gluster-users@gluster.org Sent: Friday, June 11, 2010 6:50:37 PM Subject: Re: [Gluster-users] health monitoring of replicated volume I am curious about this - anyone? -Jenn On Jun 9, 2010, at 8:00 AM, Deyan Chepishev wrote: Hello, Is there any reasonable way to monitor the health of replicated volume and sync it, if out of sync ? Regards, ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Fwd: Glusterfs over wan?
Patricio, All networked filesystems like nfs, cifs/samba and even GlusterFS can be used over WAN. The problem arises for the more complex functionality, like replication across a WAN. Replication over WAN has different meanings for different users - some just want a good way to have a remote backup, others want a hot-cold kind of a backup so namespace issue come into the picture, but its relatively easier to solve as there is only one side active at a time. The most difficult one is to keep both sites active ( and R/W ) with their own full namespace and security functionality with a designated unit of the filesystem replicated. Access from each side looks like it just pulled data from its own data cache and the currency of data could move around ( potentially ) multiple locations across the globe. I don't think anyone does that last kind of replication. AFS and DFS from the old days did go upto a certain point by providing R/O replication with remote snapshots by providing integrated multisite security and namespace, but no R/W replication. GlusterFS plan is to start with async WAN replication for remote backup and slowly moving into the multisite cluster architecture and active active replicated data copies with integrated namespace, which we hope to bring in with a new unified ( write ) caching. Async WAN replication for remote backup might come in as early as end of the year with the other functionality following sometime after that. I would like to hear more about how community users will put these different levels of WAN replication to use, how much data would change on a daily basis and hence get moved around, what kind of latency would be acceptable to your applications ? I would also like to hear if anyone is using other solutions/edge caching appliances etc to do this in other ways. Regards, Tejas. - Original Message - From: Patricio A. Bruna pbr...@it-linux.cl To: gluster-users@gluster.org Sent: Thursday, May 27, 2010 10:00:12 PM Subject: Re: [Gluster-users] Fwd: Glusterfs over wan? Hi, just for information, what file system are available to use over WAN? Patricio Bruna V. IT Linux Ltda. www.it-linux.cl Fono : (+56-2) 333 0578 Móvil: (+56-9) 7776 2899 Hi Shayn, The problem is not replication speed. Actually that part is just fine, and it's quite fast. The problem with WAN is that when the gluster client receives a request to read the file, it first checks with all the nodes in the cluster, to make sure there are no discrepancies. Only after all nodes have answered, it will read the local file (if it's replicated locally). Also take into account that international links are not as reliable as local LAN based links. If a node is suddenly inaccessible, it can slow down everything (again because READ operations are dependent on all nodes in the cluster answering, synchronously). I know that the gluster team are working on an async solution, down the road, which would make glusterfs more suitable for WAN scenarios. My suggestion is NOT to try it, until the gluster team officially announce WAN support. On May 27, 2010, at 4:31 PM, sh...@shayn.ch wrote: Hello ! sorry for english, but, just don't understand why glusterfs is not destinated to be used over the wan ? I think that if the files are tiny, there's no problem to use this solution over wan.. ? But okay, if file are large.. (don't know .. 100MB - 200MB and more.) i understand that it depend of the internet connection and that's the replication can be so soo soo slow.. Please, give me some renseignments about this solution over wan Shayn___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Big I/O or gluster proccess problem
If I am not mistaken, you have a single server for glusterfs, and this is mirrored to a second ( non-glusterFS ) server using DRBD. If you have only a single server to export data from, why use Glusterfs ? Also, we don't officially support DRBD replication with glusterFS backend. Maybe you can consider GlusterFS replication across the two servers ? - Original Message - From: Ran smtp.tes...@gmail.com To: Tejas N. Bhise te...@gluster.com Sent: Thursday, May 20, 2010 6:30:21 PM Subject: Re: [Gluster-users] Big I/O or gluster proccess problem Tejas hi , The 2 servers DRBD pair with HA so gluster actualy has 1 server that export 1 H.D ( 1 TB) this H.D is DRBD'd to the other server also this 1 TB hd is raid 1 with linux raid (i know its not optimal but rebust) in this setup if 1 servers go down the other continue , DRBD is more rebust then gluster replication specialy for VPS's etc.. I didnt check iowait but the load of the server is about 5 while the CPU's are 10-50% only so that says it all(there are IO waits) I was thinking of breaking the raid 1 seens this H.D allready has full mirror with DRBD( to server 2) but im not sure it will resolve this problem seens with NFS its not the same , it slows thigs down but not to not functional . client vol file 192.168.0.9 is the HA IP of this pair iv tested also with plain config(no writebehind etc) # file: /etc/glusterfs/glusterfs.vol volume storage1-2 type protocol/client option transport-type tcp option remote-host 192.168.0.9 option remote-subvolume b1 option ping-timeout 120 option username .. option password .. end-volume volume cluster type cluster/distribute option lookup-unhashed yes subvolumes storage1-2 end-volume #volume writebehind # type performance/write-behind # option cache-size 3MB # subvolumes cluster #end-volume #volume readahead # type performance/read-ahead # option page-count 4 # subvolumes writebehind #end-volume volume iothreads type performance/io-threads option thread-count 4 subvolumes cluster end-volume volume io-cache type performance/io-cache option cache-size 128MB option page-size 256KB #128KB is default option option force-revalidate-timeout 10 # default is 1 subvolumes iothreads end-volume volume writebehind type performance/write-behind option aggregate-size 512KB # default is 0bytes option flush-behind on # default is 'off' subvolumes io-cache end-volume server vol file # file: /etc/glusterfs/glusterfs-server.vol volume posix type storage/posix option directory /data/gluster # option o-direct enable option background-unlink yes # option span-devices 8 end-volume volume locks type features/locks subvolumes posix end-volume volume b1 type performance/io-threads option thread-count 8 subvolumes locks end-volume volume server type protocol/server option transport.socket.nodelay on option transport-type tcp # option auth.addr.b1.allow * option auth.login.b1.allow .. option auth.login.gluster.password subvolumes b1 end-volume 2010/5/20 Tejas N. Bhise te...@gluster.com Ran, Can you please elaborate on - 2 servers in distrebute mod , each has 1 TB brick that replicate to each other using DRBD Also, how many drives do you have and what does iowait look like when you write a big file ? Tell us more about the configs of your servers, share the volume files. Regards, Tejas. - Original Message - From: Ran smtp.tes...@gmail.com To: Gluster-users@gluster.org Sent: Thursday, May 20, 2010 4:49:52 PM Subject: [Gluster-users] Big I/O or gluster proccess problem Hi all , Our problem is simple but quite critical i posted few mounts ago regarding that issue and there was a good responses but not a fix What happen is that gluster stack when there is a big wrtite to it . For example time dd if=/dev/zero of=file bs=10240 count=100 OR mv 20gig_file.img into gluster mount . When that happen the all storage freezes for the entire proccess , mails , few vps's , simple dir etc.. Our setup is quite simple at this point , 2 servers in distrebute mod , each has 1 TB brick that replicate to each other using DRBD . iv monitored closly everting on this big writes and noticed that its not mem , proc , network problem . Iv also checked the same setup with simple NFS and its not happening there . Anyone has any idea on how to fix this , if you cant write big files into gluster without making the storage unfunctional then you cant realy do anything . Please advise , ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Running Gluster client/server on single process
Roberto, Since you said you are running distribute each gluster client ( mount ) will get files from multiple servers ( backends ). So you might not save that much. Secondly, any problem in your client code may also kill your server process ( if they are same ) and the other clients/mounts that use backend subvolumes on this server will suffer too. You may want to try running in the same process and see how much you gain and whether its worth it for your kind of data usage. Let me know how you go forward with this and if you have more questions. Regards, Tejas. - Original Message - From: Roberto Franchini franch...@celi.it To: gluster-users gluster-users@gluster.org Sent: Sunday, May 16, 2010 7:12:49 PM Subject: Re: [Gluster-users] Running Gluster client/server on single process On 5/16/10, Tejas N. Bhise te...@gluster.com wrote: Robert, Is there any specific reason why you want to run the client and server in a single process ? Why? Well, since the boxes are client and server at the same time we think running gluster in single process can save system resources. We run gluster 3.0.4 on ubuntu 9.10 (4 nodes) and 10.04 (2 nodes) server. We are going to update all the nodes to 10.04 in few weeks. 4 nodes are application servers (tomcat) and 2 run batches. Each byte of ram, or cpu cycle, gained improve our overall performance :) But maybe running single process will degradate them. Regards, R. -- Roberto Franchini http://www.celi.it http://www.blogmeter.it http://www.memesphere.it Tel +39.011.562.71.15 jabber:ro.franch...@gmail.com skype:ro.franchini ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Bizarre GlusterFS coherency issue
Hello Lei Zhang, It would help a lot if you can reproduce on 3.0.4 and log a defect in bugzilla and help us with the logs and other diagnostics. I will have someone look at it. Regards, Tejas. - Original Message - From: Lei Zhang lzvoya...@gmail.com To: Vikas Gorur vi...@gluster.com Cc: gluster-users@gluster.org Sent: Wednesday, May 12, 2010 7:16:59 AM Subject: Re: [Gluster-users] Bizarre GlusterFS coherency issue Thanks Vikas. I tried 3.0.4, got same error. Slightly different log messages. Any other ideas? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Monitoring Gluster availability
Phil, We are developing the dynamic volume features for the next release. To be able to do dynamic volume management, we are putting in some infrastructure that will provide list of volumes and what servers/exports the volume is spread across. We would be happy to try and provide an interface for the kind of information you want. Please let us know in detail ( like some sort of specification, examples ) what all information you need to make your monitoring task easier. I would like to encourage others also to pitch in with more information on how you monitor health of your system and if there is any specific information that your system would like to be presented natively from gluster. I will go through the requests and see what can be accommodated. Regards, Tejas. - Original Message - From: phil cryer p...@cryer.us To: Kelvin Westlake kel...@netbasic.co.uk Cc: gluster-users@gluster.org Sent: Tuesday, May 11, 2010 1:33:03 AM Subject: Re: [Gluster-users] Monitoring Gluster availability On Fri, May 7, 2010 at 3:13 AM, Kelvin Westlake kel...@netbasic.co.uk wrote: Can anybody recommend away of monitoring gluster availability, I need to be made aware of a server or client crashes out. Is there some port or system component that can be monitored? I use monit [ttp://mmonit.com/monit/] extensively, and have written a simple config snippet to watch glusterfds and restart it if it has failed. from /etc/monit/monitrc check process glusterfsd with pidfile /var/run/glusterfsd.pid start program = /etc/init.d/glusterfsd start stop program = /etc/init.d/glusterfsd stop if failed host 127.0.0.1 port 6996 then restart if loadavg(5min) greater than 10 for 8 cycles then restart if 5 restarts within 5 cycles then timeout Today I was looking for a more 'gluster native' way of checking all the nodes to see if each of them in the cluster are up, but haven't gotten very far, save for pulling the hostnames out of the volfile: grep option remote-host /etc/glusterfs/glusterfs.vol | uniq | cut -d -f7 but from there you'd need to do a shared ssh key setup for a script to loop through those entries and check things in the logs on all the servers... Does anyone have a way they do it? P On Fri, May 7, 2010 at 3:13 AM, Kelvin Westlake kel...@netbasic.co.uk wrote: Hi Guys Can anybody recommend away of monitoring gluster availability, I need to be made aware of a server or client crashes out. Is there some port or system component that can be monitored? Cheers Kelvin This email with any attachments is for the exclusive and confidential use of the addressee(s) and may contain legally privileged information. Any other distribution, use or reproduction without the senders prior consent is unauthorised and strictly prohibited. If you receive this message in error please notify the sender by email and delete the message from your computer. Netbasic Limited registered office and business address is 9 Funtley Court, Funtley Hill, Fareham, Hampshire PO16 7UY. Company No. 04906681. Netbasic Limited is authorised and regulated by the Financial Services Authority in respect of regulated activities. Please note that many of our activities do not require FSA regulation. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Small File and ls performance ..
Dear Community Users, We have recently made some code changes in an effort to improve small file and 'ls' performance. The patches are - selective readdirp - http://patches.gluster.com/patch/3203/ dht lookup revalidation optimization - http://patches.gluster.com/patch/3204/ updated write-behind default values - http://patches.gluster.com/patch/3223/ DISCLAIMER : These patches have not made it to any supported release yet and have not been tested yet. Don't use them in production. I am providing this information only as some advance notice for those in the community who might be interested in trying out these changes and provide feedback. Once these are fully tested they will make to an officially supported release. Regards, Tejas Bhise. - Original Message - From: Count Zero cou...@gmail.com To: gluster-users@gluster.org Sent: Wednesday, May 5, 2010 9:31:22 AM Subject: [Gluster-users] Replicate over WAN? I have some machines over WAN, in a Replicate cluster, with around 50ms ~ 60ms between them. However, I have read-subvolume specified so that it will always use the local brick instead of the WAN. I just need this in order to replicate files as easily and as quickly as possible between various systems... My question is - Why is it still painfully slow when all I do is read operations? Even just listing a directory takes ages. I can understand about Write operations, but read? and from the local sub-volume? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Replicate over WAN?
Count Zero, Just to clarify, the three patches were not released, they will make into a future release after proper testing. They are not developed to fix WAN issues per se, just to help with small files and slow ls kind of problems which are seen with huge numbers of files or huge number of small files or else deep directories. As a side effect, they *may* fix some of your WAN issues :-). Let me know how things go. Please don't use in production. Regards, Tejas. - Original Message - From: Count Zero cou...@gmail.com To: Vikas Gorur vi...@gluster.com Cc: gluster-users@gluster.org Sent: Thursday, May 6, 2010 1:45:32 AM Subject: Re: [Gluster-users] Replicate over WAN? Replicate is not really designed for a WAN environment. A couple of things that are probably affecting you are: 1) lookup (first access to any file or directory) needs to be sent to all subvolumes to gather information to determine if self-heal is needed. There is no way to fix this without losing the ability to self-heal. 2) readdir (ls) is always sent to the first subvolume. This is necessary to ensure consistent inode numbers. Perhaps you could ensure that the first subvolume is local? (Make sure the order of subvolumes is the same on all your clients.) Wait a sec. I believe there's a possible conflict in point (2), or perhaps I misunderstood: - Ensure the first subvolume is the local one - I am assuming that in order to do this, I need make the local subvolume the first one in the list of volumes, in the 'subvolumes' option - If I do this on every client, it means the order of subvolumes can not be the same, since on every client the local subvolume will be the first in the list. Can you confirm the above is ok? Thanks! :-) Also, I noticed this morning 3 patches were released to address the WAN issue? I will try them on a parallel test system first, and run some measurements. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Newbie questions
Jon, Stripe should be used only if the data usage is a very few files, but each, very very large file ( many GBs in size ). Rest all can use distribute. Regards, Tejas. - Original Message - From: Jon Tegner teg...@foi.se To: Joshua Baker-LePain jl...@duke.edu Cc: gluster-users gluster-users@gluster.org Sent: Tuesday, May 4, 2010 11:00:57 AM Subject: Re: [Gluster-users] Newbie questions Hi, I'm also a newbie, and I'm looking forward to answers to your questions. Just one question, why would distributed be preferable over striped (I'm probably the bigger newbie here)? For purpose 1, clearly I'm looking at a replicated volume. For purpose 2, I'm assuming that distributed is the way to go (rather than striped), although for Regards, /jon ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] server ver 3.0.4 crashes
Hi Mickey, Please open a defect in bugzilla. Someone from the dev team will have a look at it soon. Regards, Tejas. - Original Message - From: Mickey Mazarick m...@digitaltadpole.com To: Gluster Users gluster-users@gluster.org Sent: Wednesday, April 28, 2010 8:14:53 PM Subject: [Gluster-users] server ver 3.0.4 crashes Did a strait install and the ibverbs instance will crash after a single connection attempt. Are there any bugs that would cause this behavior? All the log tells me is: pending frames: frame : type(2) op(SETVOLUME) patchset: v3.0.4 signal received: 11 time of crash: 2010-04-28 10:41:08 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.0.4 /lib64/tls/libc.so.6[0x33c0a2e2b0] /usr/local/lib/libglusterfs.so.0(dict_unserialize+0x111)[0x2b662221d281] /usr/local/lib/glusterfs/3.0.4/xlator/protocol/server.so(mop_setvolume+0xa2)[0x2b6622f246e2] /usr/local/lib/glusterfs/3.0.4/xlator/protocol/server.so(protocol_server_interpret+0x1aa)[0x2b6622f25a0a] /usr/local/lib/glusterfs/3.0.4/xlator/protocol/server.so(protocol_server_pollin+0x8b)[0x2b6622f268cb] /usr/local/lib/glusterfs/3.0.4/xlator/protocol/server.so(notify+0x100)[0x2b6622f26aa0] /usr/local/lib/libglusterfs.so.0(xlator_notify+0x94)[0x2b661b24] /usr/local/lib/glusterfs/3.0.4/transport/ib-verbs.so[0x2aaaf048] /lib64/tls/libpthread.so.0[0x33c1906137] /lib64/tls/libc.so.6(__clone+0x73)[0x33c0ac7113] - -- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Data
Hi Brad, Glusterfs does not proactively migrate data on addition of a node, but there is a defrag script that allows an admin to do that. Please see the defrag and scale and defrag scripts here - http://ftp.gluster.com/pub/gluster/glusterfs/misc/defrag/ Regards, Tejas. - Original Message - From: Brad Alexander b...@servosity.com To: gluster-users@gluster.org Sent: Saturday, April 24, 2010 11:36:10 PM Subject: [Gluster-users] Data Good Afternoon, Does gluster proactively migrates data onto new devices in order to maintain a balanced distribution of data when new bricks are added? Thanks. Brad Alexander ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] iscsi with gluster
Harish, I you mean to ask whether gluster server and clients can run in VMs on XEN, then the answer is yes. Is that what you meant ? Regards, Tejas. - Original Message - From: Patrick Irvine p...@cybersites.ca To: gluster-users@gluster.org Sent: Saturday, April 24, 2010 11:41:47 AM Subject: Re: [Gluster-users] iscsi with gluster Hi harish, I'm sorry but I have no experience with xensever, but I have seen alot of talk with reference to xenserver on the mailing list. You might want to check the archives. Pat On 23/04/2010 10:55 PM, harris narang wrote: Dear sir , thanks for your suggestion. I want to ask weather gluster is compatible with xenserver.if it ,how you will access it. with regards harish narang On Thu, Apr 22, 2010 at 10:33 PM, Patrick Irvine p...@cybersites.ca mailto:p...@cybersites.ca wrote: Hi harish, I am currently using gluster in replicate mode on top of two ISCSI targets. It has been working quite well, and I have had no issues. I see that many people say that there is no need for this, but I do have my reasons. 1. Gluster allows me to use any of my server's drive spaces to replicate to. Most other cluster files systems require block devices 2. My main storage is two QNAP SS-439, that support ISCSI and NFS. Since gluster won't work on NFS V3 I use the ISCSI This is just my solution, but it works fine Pat. On 21/04/2010 10:56 PM, harris narang wrote: Dear sir/madam, i want to use gluster with iscsi .please suggest me weather it is possible or not. with regards harish narang ___ Gluster-users mailing list Gluster-users@gluster.org mailto:Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org mailto:Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] iscsi with gluster
Hi Harish, Gluster Server aggregates local filesystems ( or directories on it ) into a single glusterfs volume. These backend local filesystems can use local disk, FC SAN, iSCSI - does not matter, as long as the host and the backend filesystem work with it. Hope that answers your question. Regards, Tejas. - Original Message - From: harris narang harish.narang2...@gmail.com To: gluster-users@gluster.org Sent: Thursday, April 22, 2010 11:26:19 AM Subject: [Gluster-users] iscsi with gluster Dear sir/madam, i want to use gluster with iscsi .please suggest me weather it is possible or not. with regards harish narang ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] reloading client config at run-time
Not today, but with dynamic volume management, it will be possible in a future release. It will help add and remove servers, migrate data, change the configuration and replication count etc on the fly - its part of our virtualization and cloud strategy. Regards, Tejas. - Original Message - From: D.P. piz...@gmail.com To: gluster-users@gluster.org Sent: Monday, April 19, 2010 10:48:28 PM Subject: [Gluster-users] reloading client config at run-time is it possible? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Self heal with VM Storage
Thanks, that will help reproduce internally. Regards, Tejas. - Original Message - From: Justice London jlon...@lawinfo.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users@gluster.org Sent: Friday, April 16, 2010 9:03:50 PM Subject: RE: [Gluster-users] Self heal with VM Storage After the self heal finishes it sort of works. Usually this destroys InnoDB if you're running a database. Most often, though, it also causes some libraries and similar to not be properly read in by the VM guest which means you have to reboot it to fix for this. It should be fairly easy to reproduce... just shut down a storage brick (any configuration... it doesn't seem to matter). Make sure of course that you have a running VM guest (KVM, etc) using the gluster mount. You'll then turn off(unplug, etc.) one of the storage bricks and wait a few minutes... then re-enable it. Justice London jlon...@lawinfo.com -Original Message- From: Tejas N. Bhise [mailto:te...@gluster.com] Sent: Thursday, April 15, 2010 7:41 PM To: Justice London Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Self heal with VM Storage Justice, Thanks for the description. So, does this mean that after the self heal is over after some time, the guest starts to work fine ? We will reproduce this inhouse and get back. Regards, Tejas. - Original Message - From: Justice London jlon...@lawinfo.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users@gluster.org Sent: Friday, April 16, 2010 1:18:36 AM Subject: RE: [Gluster-users] Self heal with VM Storage Okay, but what happens on a brick shutting down and being added back to the cluster? This would be after some live data has been written to the other bricks. From what I was seeing access to the file is locked. Is this not the case? If file access is being locked it will obviously cause issues for anything trying to read/write to the guest at the time. Justice London jlon...@lawinfo.com -Original Message- From: Tejas N. Bhise [mailto:te...@gluster.com] Sent: Thursday, April 15, 2010 12:33 PM To: Justice London Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Self heal with VM Storage Justice, From posts from the community on this user list, I know that there are folks that run hundreds of VMs out of gluster. So it's probably more about the data usage than just a generic viability statement as you made in your post. Gluster does not support databases, though many people use them on gluster without much problem. Please let me know if you see some problem with unstructured file data on VMs. I would be happy to help debug that problem. Regards, Tejas. - Original Message - From: Justice London jlon...@lawinfo.com To: gluster-users@gluster.org Sent: Friday, April 16, 2010 12:52:19 AM Subject: [Gluster-users] Self heal with VM Storage I am running gluster as a storage backend for VM storage (KVM guests). If one of the bricks is taken offline (even for an instant), on bringing it back up it runs the metadata check. This causes the guest to both stop responding until the check finishes and also to ruin data that was in process (sql data for instance). I'm guessing the file is being locked while checked. Is there any way to fix for this? Without being able to fix for this, I'm not certain how viable gluster will be, or can be for VM storage. Justice London jlon...@lawinfo.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Gluster and 10GigE - quick survey in the Community
Dear Community Users, In an effort to harden Gluster's strategy with 10GigE technology, I would like to request the following information from you - 1) Are you already using 10GigE with either the Gluster servers or clients ( or the platform ) ? 2) If not currently, are you considering using 10GigE with Gluster servers or clients ( or platform ) in the future ? 3) Which make, driver and on which OS are you using or considering using this 10GigE technology with Gluster ? 4) If you are already using this technology, would you like to share your experiences with us ? Your feedback is extremely important to us. Please write to me soon. Regards, Tejas. tejas at gluster dot com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Self heal with VM Storage
Justice, From posts from the community on this user list, I know that there are folks that run hundreds of VMs out of gluster. So it's probably more about the data usage than just a generic viability statement as you made in your post. Gluster does not support databases, though many people use them on gluster without much problem. Please let me know if you see some problem with unstructured file data on VMs. I would be happy to help debug that problem. Regards, Tejas. - Original Message - From: Justice London jlon...@lawinfo.com To: gluster-users@gluster.org Sent: Friday, April 16, 2010 12:52:19 AM Subject: [Gluster-users] Self heal with VM Storage I am running gluster as a storage backend for VM storage (KVM guests). If one of the bricks is taken offline (even for an instant), on bringing it back up it runs the metadata check. This causes the guest to both stop responding until the check finishes and also to ruin data that was in process (sql data for instance). I'm guessing the file is being locked while checked. Is there any way to fix for this? Without being able to fix for this, I'm not certain how viable gluster will be, or can be for VM storage. Justice London jlon...@lawinfo.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Self heal with VM Storage
Justice, Thanks for the description. So, does this mean that after the self heal is over after some time, the guest starts to work fine ? We will reproduce this inhouse and get back. Regards, Tejas. - Original Message - From: Justice London jlon...@lawinfo.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users@gluster.org Sent: Friday, April 16, 2010 1:18:36 AM Subject: RE: [Gluster-users] Self heal with VM Storage Okay, but what happens on a brick shutting down and being added back to the cluster? This would be after some live data has been written to the other bricks. From what I was seeing access to the file is locked. Is this not the case? If file access is being locked it will obviously cause issues for anything trying to read/write to the guest at the time. Justice London jlon...@lawinfo.com -Original Message- From: Tejas N. Bhise [mailto:te...@gluster.com] Sent: Thursday, April 15, 2010 12:33 PM To: Justice London Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Self heal with VM Storage Justice, From posts from the community on this user list, I know that there are folks that run hundreds of VMs out of gluster. So it's probably more about the data usage than just a generic viability statement as you made in your post. Gluster does not support databases, though many people use them on gluster without much problem. Please let me know if you see some problem with unstructured file data on VMs. I would be happy to help debug that problem. Regards, Tejas. - Original Message - From: Justice London jlon...@lawinfo.com To: gluster-users@gluster.org Sent: Friday, April 16, 2010 12:52:19 AM Subject: [Gluster-users] Self heal with VM Storage I am running gluster as a storage backend for VM storage (KVM guests). If one of the bricks is taken offline (even for an instant), on bringing it back up it runs the metadata check. This causes the guest to both stop responding until the check finishes and also to ruin data that was in process (sql data for instance). I'm guessing the file is being locked while checked. Is there any way to fix for this? Without being able to fix for this, I'm not certain how viable gluster will be, or can be for VM storage. Justice London jlon...@lawinfo.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Maintainance mode for bricks
Fred, Would you like to tell us more about the use case ? Like why would you want to do this ? If we take a brick out, it would not be possible to get it back in ( with the existing data ). Regards, Tejas. - Original Message - From: Fred Stober fred.sto...@kit.edu To: gluster-users@gluster.org Sent: Wednesday, April 14, 2010 4:57:24 PM Subject: [Gluster-users] Maintainance mode for bricks Dear all, Is there an easy way to put a storage brick, which is part of a dht volume, into some kind of read-only maintainance mode, while keeping the whole dht volume in read/write state? Currently it almost works, but files are still scheduled to go to the server in maintainance mode and in this case you get an error. It should be possible to write to another brick instead. Sincerely, -- Fred-Markus Stober fred.sto...@kit.edu Karlsruhe Institute of Technology ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Memory usage high on server sides
Hi Chris, I would like your help in debugging this further. To start with, I would like to get the system information and the test information. You mentioned you are copying data from your old system to the new system. The new system has 3 servers. Problems you saw - 1) High memory usage on client where gluster volume is mounted 2) High memory usage on server 3) 2 days to copy 300 GB data Is that a correct summary of the problems you saw ? About the config, can you provide the following for both old and new systems - 1) OS and kernel level on gluster servers and clients 2) volume file from servers and clients 3) Filesystem type of backend gluster subvolumes 4) How close to full the backend subvolumes are 5) The exact copy command .. did you mount the volumes from old and new system on a single machine and did cp or used rsync or some other method ? If something more than just a cp, please send the exact command line you used. 6) How many files/directories ( tentative ) in that 300GB data ( would help in trying to reproduce inhouse with a smaller test bed ). 7) Was there other load on the new or old system ? 8) Any other patterns you noticed. Thanks a lot for helping to debug the problem. Regards, Tejas. - Original Message - From: Chris Jin ch...@pikicentral.com To: Krzysztof Strasburger stras...@chkw386.ch.pwr.wroc.pl Cc: gluster-users gluster-users@gluster.org Sent: Thursday, April 15, 2010 7:52:35 AM Subject: Re: [Gluster-users] Memory usage high on server sides Hi Krzysztof, Thanks for your replies. And you are right, the server process should be glusterfsd. But I did mean servers. After two days copying, the two processes took almost 70% of the total memory. I am just thinking one more process will bring our servers down. $ps auxf USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 26472 2.2 29.1 718100 600260 ? Ssl Apr09 184:09 glusterfsd -f /etc/glusterfs/servers/r2/f1.vol root 26485 1.8 39.8 887744 821384 ? Ssl Apr09 157:16 glusterfsd -f /etc/glusterfs/servers/r2/f2.vol At the meantime, the client side seems OK. $ps auxf USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 19692 1.3 0.0 262148 6980 ?Ssl Apr12 61:33 /sbin/glusterfs --log-level=NORMAL --volfile=/u2/git/modules/shared/glusterfs/clients/r2/c2.vol /gfs/r2/f2 Any ideas? On Wed, 2010-04-14 at 10:16 +0200, Krzysztof Strasburger wrote: On Wed, Apr 14, 2010 at 06:33:15AM +0200, Krzysztof Strasburger wrote: On Wed, Apr 14, 2010 at 09:22:09AM +1000, Chris Jin wrote: Hi, I got one more test today. The copying has already run for 24 hours and the memory usage is about 800MB, 39.4% of the total. But there is no external IP connection error. Is this a memory leak? Seems to be, and a very persistent one. Present in glusterfs at least since version 1.3 (the oldest I used). Krzysztof I corrected the subject, as the memory usage is high on the client side (glusterfs is the client process, glusterfsd is the server and it never used that lot of memory on my site). I did some more tests with logging. Accordingly to my old valgrind report, huge amounts of memory were still in use at exit, and these were allocated in __inode_create and __dentry_create. So I added log points in these functions and performed the du test, ie. mounted the glusterfs directory containing a large number of files with log level set to TRACE , ran du on it, then echo 3 /proc/sys/vm/drop_caches, waiting a while until the log file stopped growing, finally umounted and checked the (huge) logfile: prkom13:~# grep inode_create /var/log/glusterfs/root-loop-test.log |wc -l 151317 prkom13:~# grep inode_destroy /var/log/glusterfs/root-loop-test.log |wc -l 151316 prkom13:~# grep dentry_create /var/log/glusterfs/root-loop-test.log |wc -l 158688 prkom13:~# grep dentry_unset /var/log/glusterfs/root-loop-test.log |wc -l 158688 Do you see? Everything seems to be OK, a number of inodes created, 1 less destroyed (probably the root inode), same number of dentries created and destroyed. The memory should be freed (there are calls to free in inode_destroy and dentry_unset functions), but it is not. Any ideas, what is going on? Glusterfs developers - is something kept in the lists, where inodes and dentries live, and interleaved with these inodes and entries, so that no memory page can be unmapped? We should also look at the kernel - why it does not send forgets immediately, even with drop_caches=3? Krzysztof ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org
Re: [Gluster-users] Memory usage high on server sides
Thanks, Chris. So no caching performance translators were used - that's what I wanted to check. I sent another mail about more debug information. Can you please have a look and respond when you get some time. Thanks for all your help. Regards, Tejas. - Original Message - From: Chris Jin ch...@pikicentral.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users gluster-users@gluster.org Sent: Thursday, April 15, 2010 8:30:13 AM Subject: Re: [Gluster-users] Memory usage high on server sides Hi Tejas, Just to confirm, from the server vol file from your previous post, you are not even using any performance translators on the server side, is that correct ? Only io-threads. the vol files were generated from gluster-volgen. I just changed directories and listen ports. And the only major activity is the copying ? By now, the only major activity is the copying, yes. But later, reading will be the major activity. Thanks for the reply. Regards, Chris ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Maintainance mode for bricks
Thanks, Ian. Fred - the method Ian describes, using self heal, is a good way of doing it while maintaining gluster semantics and not playing around directly with the backend. In a future release, with dynamic volume management, one would be able to do such things with simple commands with having the data online all the while. Regards, Tejas. - Original Message - From: Ian Rogers ian.rog...@contactclean.com To: gluster-users@gluster.org Sent: Wednesday, April 14, 2010 8:43:19 PM Subject: Re: [Gluster-users] Maintainance mode for bricks On 14/04/2010 13:20, Fred Stober wrote: On Wednesday 14 April 2010, Tejas N. Bhise wrote: Fred, Would you like to tell us more about the use case ? Like why would you want to do this ? If we take a brick out, it would not be possible to get it back in ( with the existing data ). Ok, here is our use case: We have a small test system running on 3 file servers. cluster/distribute is used to give a flat view of the file servers. Now have the problem that one file server is going to be replaced with a larger one. Therefore we want to put the old file server into read only mode to rsync the files to the new server. Unfortunately this will take ~2days. During this time it would be nice to keep the glusterfs in read/write mode. If I understand it correctly, I should be able to use lookup-unhashed to reintegrate the new fileserver in the existing file system, when we switch off the old server. Cheers, Fred Could you use gluster to put the new server and old one into a cluster/replicate pair so it looks just like one server to the cluster/distribute above it? Then do rsync or let gluster copy everything across with a self heal. When the new one is up to date just disable the old one and remove the cluster/replicate. -- www.ContactClean.com Making changing email address as easy as clicking a mouse. Helping you keep in touch. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Memory usage high on server sides
Thanks, Chris. This is very good information to start with. To summarize this, when you copy lot of small files from nfs mount to glusterfs mount, the copy is slow and at the end of it you see the glusterfs servers still holding a lot of memory after the copy is done. The clients do seem to release the memory though. No caching translators are being used. Will reproduce this inhouse and work here. Will get back if more information is required. Thanks a lot for your help. Regards, Tejas. - Original Message - From: Chris Jin ch...@pikicentral.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users gluster-users@gluster.org Sent: Thursday, April 15, 2010 9:48:42 AM Subject: Re: [Gluster-users] Memory usage high on server sides Hi Tejas, Problems you saw - 1) High memory usage on client where gluster volume is mounted Memory usage for clients is 0% after copying. $ps auxf USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 19692 1.3 0.0 262148 6980 ?Ssl Apr12 61:33 /sbin/glusterfs --log-level=NORMAL --volfile=/u2/git/modules/shared/glusterfs/clients/r2/c2.vol /gfs/r2/f2 2) High memory usage on server Yes. $ps auxf USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 26472 2.2 29.1 718100 600260 ? Ssl Apr09 184:09 glusterfsd -f /etc/glusterfs/servers/r2/f1.vol root 26485 1.8 39.8 887744 821384 ? Ssl Apr09 157:16 glusterfsd -f /etc/glusterfs/servers/r2/f2.vol 3) 2 days to copy 300 GB data More than 700GB. There are two folders. The first one is copied to server 1 and server 2, and the second one is copied to server 2 and server 3. The vol files are below. About the config, can you provide the following for both old and new systems - 1) OS and kernel level on gluster servers and clients Debian Kernel 2.6.18-6-amd64 $uname -a Linux fs2 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64 GNU/Linux 2) volume file from servers and clients #Server Vol file (f1.vol) # The same settings for f2.vol and f3.vol, just different dirs and ports # f1 f3 for Server 1, f1 f2 for Server 2, f2 f3 for Server 3 volume posix1 type storage/posix option directory /gfs/r2/f1 end-volume volume locks1 type features/locks subvolumes posix1 end-volume volume brick1 type performance/io-threads option thread-count 8 subvolumes locks1 end-volume volume server-tcp type protocol/server option transport-type tcp option auth.addr.brick1.allow 192.168.0.* option transport.socket.listen-port 6991 option transport.socket.nodelay on subvolumes brick1 end-volume #Client Vol file (c1.vol) # The same settings for c2.vol and c3.vol # s2 s3 for c2, s3 s1 for c3 volume s1 type protocol/client option transport-type tcp option remote-host 192.168.0.31 option transport.socket.nodelay on option transport.remote-port 6991 option remote-subvolume brick1 end-volume volume s2 type protocol/client option transport-type tcp option remote-host 192.168.0.32 option transport.socket.nodelay on option transport.remote-port 6991 option remote-subvolume brick1 end-volume volume mirror type cluster/replicate option data-self-heal off option metadata-self-heal off option entry-self-heal off subvolumes s1 s2 end-volume volume writebehind type performance/write-behind option cache-size 100MB option flush-behind off subvolumes mirror end-volume volume iocache type performance/io-cache option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB option cache-timeout 1 subvolumes writebehind end-volume volume quickread type performance/quick-read option cache-timeout 1 option max-file-size 256Kb subvolumes iocache end-volume volume statprefetch type performance/stat-prefetch subvolumes quickread end-volume 3) Filesystem type of backend gluster subvolumes ext3 4) How close to full the backend subvolumes are New 2T hard disks for each server. 5) The exact copy command .. did you mount the volumes from old and new system on a single machine and did cp or used rsync or some other method ? If something more than just a cp, please send the exact command line you used. The old file system uses DRBD and NFS. The exact command is sudo cp -R -v -p -P /nfsmounts/nfs3/photo . 6) How many files/directories ( tentative ) in that 300GB data ( would help in trying to reproduce inhouse with a smaller test bed ). I cannot tell, but the file sizes are between 1KB to 200KB, average around 20KB. 7) Was there other load on the new or old system ? The old systems are still used for web servers. The new systems are on the same servers but different hard disks. 8) Any other patterns you noticed. There is once that one client tried to connect one server with external IP address. Using distribute translator across all three
Re: [Gluster-users] Memory usage high on server sides
Chris, By the way, after the copy is done, how is the system responding to regular access ? In the sense, was the problem with copy also carried forward as more trouble seen with subsequent access of data over glusterfs ? Regards, Tejas. - Original Message - From: Chris Jin ch...@pikicentral.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users gluster-users@gluster.org Sent: Thursday, April 15, 2010 9:48:42 AM Subject: Re: [Gluster-users] Memory usage high on server sides Hi Tejas, Problems you saw - 1) High memory usage on client where gluster volume is mounted Memory usage for clients is 0% after copying. $ps auxf USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 19692 1.3 0.0 262148 6980 ?Ssl Apr12 61:33 /sbin/glusterfs --log-level=NORMAL --volfile=/u2/git/modules/shared/glusterfs/clients/r2/c2.vol /gfs/r2/f2 2) High memory usage on server Yes. $ps auxf USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 26472 2.2 29.1 718100 600260 ? Ssl Apr09 184:09 glusterfsd -f /etc/glusterfs/servers/r2/f1.vol root 26485 1.8 39.8 887744 821384 ? Ssl Apr09 157:16 glusterfsd -f /etc/glusterfs/servers/r2/f2.vol 3) 2 days to copy 300 GB data More than 700GB. There are two folders. The first one is copied to server 1 and server 2, and the second one is copied to server 2 and server 3. The vol files are below. About the config, can you provide the following for both old and new systems - 1) OS and kernel level on gluster servers and clients Debian Kernel 2.6.18-6-amd64 $uname -a Linux fs2 2.6.18-6-amd64 #1 SMP Tue Aug 19 04:30:56 UTC 2008 x86_64 GNU/Linux 2) volume file from servers and clients #Server Vol file (f1.vol) # The same settings for f2.vol and f3.vol, just different dirs and ports # f1 f3 for Server 1, f1 f2 for Server 2, f2 f3 for Server 3 volume posix1 type storage/posix option directory /gfs/r2/f1 end-volume volume locks1 type features/locks subvolumes posix1 end-volume volume brick1 type performance/io-threads option thread-count 8 subvolumes locks1 end-volume volume server-tcp type protocol/server option transport-type tcp option auth.addr.brick1.allow 192.168.0.* option transport.socket.listen-port 6991 option transport.socket.nodelay on subvolumes brick1 end-volume #Client Vol file (c1.vol) # The same settings for c2.vol and c3.vol # s2 s3 for c2, s3 s1 for c3 volume s1 type protocol/client option transport-type tcp option remote-host 192.168.0.31 option transport.socket.nodelay on option transport.remote-port 6991 option remote-subvolume brick1 end-volume volume s2 type protocol/client option transport-type tcp option remote-host 192.168.0.32 option transport.socket.nodelay on option transport.remote-port 6991 option remote-subvolume brick1 end-volume volume mirror type cluster/replicate option data-self-heal off option metadata-self-heal off option entry-self-heal off subvolumes s1 s2 end-volume volume writebehind type performance/write-behind option cache-size 100MB option flush-behind off subvolumes mirror end-volume volume iocache type performance/io-cache option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB option cache-timeout 1 subvolumes writebehind end-volume volume quickread type performance/quick-read option cache-timeout 1 option max-file-size 256Kb subvolumes iocache end-volume volume statprefetch type performance/stat-prefetch subvolumes quickread end-volume 3) Filesystem type of backend gluster subvolumes ext3 4) How close to full the backend subvolumes are New 2T hard disks for each server. 5) The exact copy command .. did you mount the volumes from old and new system on a single machine and did cp or used rsync or some other method ? If something more than just a cp, please send the exact command line you used. The old file system uses DRBD and NFS. The exact command is sudo cp -R -v -p -P /nfsmounts/nfs3/photo . 6) How many files/directories ( tentative ) in that 300GB data ( would help in trying to reproduce inhouse with a smaller test bed ). I cannot tell, but the file sizes are between 1KB to 200KB, average around 20KB. 7) Was there other load on the new or old system ? The old systems are still used for web servers. The new systems are on the same servers but different hard disks. 8) Any other patterns you noticed. There is once that one client tried to connect one server with external IP address. Using distribute translator across all three mirrors will make system twice slower than using three mounted folders. Is this information enough? Please take a look. Regards, Chris ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org
Re: [Gluster-users] nfs-alpha feedback
Thanks for the feedback, Chad. We will look at this and get back to you. Some more information, like gluster config and hardware/OS information will be useful. Regards, Tejas. - Original Message - From: chadr ch...@mail.aspsys.com To: Gluster-users@gluster.org Sent: Saturday, April 10, 2010 10:04:59 AM Subject: [Gluster-users] nfs-alpha feedback I ran the same dd tests from KnowYourNFSAlpha-1.pdf and performance is inconsistent and causes the server to become unresponsive. My server freezes every time when I run the following command: dd if=/dev/zero of=garb bs=256k count=64000 I would also like to mount a path like: /volume/some/random/dir # mount host:/gluster/tmp /mnt/test mount: host:/gluster/tmp failed, reason given by server: No such file or directory I can mount it up host:/volume_name and /mnt/test/tmp exists dd if=/dev/zero of=garb bs=64K count=100 100+0 records in 100+0 records out 6553600 bytes (6.6 MB) copied, 0.068906 seconds, 95.1 MB/s dd of=garb if=/dev/zero bs=64K count=100 100+0 records in 100+0 records out 6553600 bytes (6.6 MB) copied, 0.057207 seconds, 115 MB/s dd if=/dev/zero of=garb bs=64K count=1000 1000+0 records in 1000+0 records out 65536000 bytes (66 MB) copied, 0.523117 seconds, 125 MB/s dd of=garb if=/dev/zero bs=64K count=1000 1000+0 records in 1000+0 records out 65536000 bytes (66 MB) copied, 1.04666 seconds, 62.6 MB/s dd if=/dev/zero of=garb bs=64K count=1 1+0 records in 1+0 records out 65536 bytes (655 MB) copied, 10.9809 seconds, 59.7 MB/s dd of=garb if=/dev/zero bs=64K count=1 1+0 records in 1+0 records out 65536 bytes (655 MB) copied, 11.3515 seconds, 57.7 MB/s dd if=/dev/zero of=garb bs=128K count=100 100+0 records in 100+0 records out 13107200 bytes (13 MB) copied, 0.105364 seconds, 124 MB/s dd of=garb if=/dev/zero bs=128K count=100 100+0 records in 100+0 records out 13107200 bytes (13 MB) copied, 0.254225 seconds, 51.6 MB/s dd if=/dev/zero of=garb bs=128K count=1000 1000+0 records in 1000+0 records out 131072000 bytes (131 MB) copied, 60.1008 seconds, 2.2 MB/s dd of=garb if=/dev/zero bs=128K count=1000 1000+0 records in 1000+0 records out 131072000 bytes (131 MB) copied, 1.51868 seconds, 86.3 MB/s dd if=/dev/zero of=garb bs=128K count=1 1+0 records in 1+0 records out 131072 bytes (1.3 GB) copied, 18.7755 seconds, 69.8 MB/s dd of=garb if=/dev/zero bs=128K count=1 1+0 records in 1+0 records out 131072 bytes (1.3 GB) copied, 18.9837 seconds, 69.0 MB/s dd if=/dev/zero of=garb bs=256k count=64000 My server freezes. Here is the recent nfs log when the server froze: [2010-04-09 23:37:33] D [nfs3-helpers.c:2114:nfs3_log_rw_call] nfs-nfsv3: XID: 6f68c85f, WRITE: args: FH: hashcount 2, xlid 0, gen 5458285267163021319, ino 11856898, offset: 1129578496, count: 65536, UNSTABLE [2010-04-09 23:37:33] D [rpcsvc.c:1790:rpcsvc_request_create] rpc-service: RPC XID: 7068c85f, Ver: 2, Program: 13, ProgVers: 3, Proc: 7 [2010-04-09 23:37:33] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-09 23:37:33] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-09 23:37:33] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-09 23:37:33] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-09 23:37:33] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-09 23:37:33] D [nfs3-helpers.c:2114:nfs3_log_rw_call] nfs-nfsv3: XID: 7068c85f, WRITE: args: FH: hashcount 2, xlid 0, gen 5458285267163021319, ino 11856898, offset: 1129644032, count: 65536, UNSTABLE [2010-04-09 23:37:33] D [rpcsvc.c:1790:rpcsvc_request_create] rpc-service: RPC XID: 7168c85f, Ver: 2, Program: 13, ProgVers: 3, Proc: 7 [2010-04-09 23:37:33] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-09 23:37:33] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-09 23:37:33] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-09 23:37:33] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-09 23:37:33] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-09 23:37:33] D [nfs3-helpers.c:2114:nfs3_log_rw_call] nfs-nfsv3: XID: 7168c85f, WRITE: args: FH: hashcount 2, xlid 0, gen 5458285267163021319, ino 11856898, offset: 1129709568, count: 65536, UNSTABLE [2010-04-09 23:38:33] D [rpcsvc.c:1790:rpcsvc_request_create] rpc-service: RPC XID: 6268c85f, Ver: 2, Program: 13, ProgVers: 3, Proc: 7 Thanks, Chad Richards ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org
Re: [Gluster-users] rmdir?
Hi Mark, Debug using strace and also look/post debug logs for glusterfs. Regards, Tejas. - Original Message - From: m roth m.r...@5-cent.us To: gluster-users@gluster.org Sent: Friday, April 9, 2010 9:21:59 PM Subject: [Gluster-users] rmdir? Got glusterfs up, and re-exported using unfs. All lovely, if not the fastest thing on the planet. However, my manager notes that he can't rmdir. Mounting it on my system, I created a directory, then tried to rmdir. That fails with i/o error. Trying from the head node that I'm re-exporting it from, again trying to rmdir as me (not as root), I get transport endpoint is not connected. Now, rmdir is not exactly an unusual thing to do - what's going on here? mark ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster local vs local = gluster x4 slower
It might also be useful overall to know what you want to achieve. Its better to do sizing, performance etc if there is clarity on what is to be achieved. Once that is clear, it would be more useful to say if something is possible or not with the config you are trying and why or why not and whether even the expectations are justified or not from what is essentially a distributed networked FS. - Original Message - From: Jeremy Enos je...@ncsa.uiuc.edu To: Stephan von Krawczynski sk...@ithnet.com Cc: Tejas N. Bhise te...@gluster.com, gluster-users@gluster.org Sent: Wednesday, March 24, 2010 5:41:28 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] gluster local vs local = gluster x4 slower Stephan is correct- I primarily did this test to show a demonstrable overhead example that I'm trying to eliminate. It's pronounced enough that it can be seen on a single disk / single node configuration, which is good in a way (so anyone can easily repro). My distributed/clustered solution would be ideal if it were fast enough for small block i/o as well as large block- I was hoping that single node systems would achieve that, hence the single node test. Because the single node test performed poorly, I eventually reduced down to single disk to see if it could still be seen, and it clearly can be. Perhaps it's something in my configuration? I've pasted my config files below. thx- Jeremy ##glusterfsd.vol## volume posix type storage/posix option directory /export end-volume volume locks type features/locks subvolumes posix end-volume volume disk type performance/io-threads option thread-count 4 subvolumes locks end-volume volume server-ib type protocol/server option transport-type ib-verbs/server option auth.addr.disk.allow * subvolumes disk end-volume volume server-tcp type protocol/server option transport-type tcp/server option auth.addr.disk.allow * subvolumes disk end-volume ##ghome.vol## #---IB remotes-- volume ghome type protocol/client option transport-type ib-verbs/client # option transport-type tcp/client option remote-host acfs option remote-subvolume raid end-volume #Performance Options--- volume readahead type performance/read-ahead option page-count 4 # 2 is default option option force-atime-update off # default is off subvolumes ghome end-volume volume writebehind type performance/write-behind option cache-size 1MB subvolumes readahead end-volume volume cache type performance/io-cache option cache-size 1GB subvolumes writebehind end-volume ##END## On 3/23/2010 6:02 AM, Stephan von Krawczynski wrote: On Tue, 23 Mar 2010 02:59:35 -0600 (CST) Tejas N. Bhisete...@gluster.com wrote: Out of curiosity, if you want to do stuff only on one machine, why do you want to use a distributed, multi node, clustered, file system ? Because what he does is a very good way to show the overhead produced only by glusterfs and nothing else (i.e. no network involved). A pretty relevant test scenario I would say. -- Regards, Stephan Am I missing something here ? Regards, Tejas. - Original Message - From: Jeremy Enosje...@ncsa.uiuc.edu To: gluster-users@gluster.org Sent: Tuesday, March 23, 2010 2:07:06 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] gluster local vs local = gluster x4 slower This test is pretty easy to replicate anywhere- only takes 1 disk, one machine, one tarball. Untarring to local disk directly vs thru gluster is about 4.5x faster. At first I thought this may be due to a slow host (Opteron 2.4ghz). But it's not- same configuration, on a much faster machine (dual 3.33ghz Xeon) yields the performance below. THIS TEST WAS TO A LOCAL DISK THRU GLUSTER [r...@ac33 jenos]# time tar xzf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz real0m41.290s user0m14.246s sys 0m2.957s THIS TEST WAS TO A LOCAL DISK (BYPASS GLUSTER) [r...@ac33 jenos]# cd /export/jenos/ [r...@ac33 jenos]# time tar xzf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz real0m8.983s user0m6.857s sys 0m1.844s THESE ARE TEST FILE DETAILS [r...@ac33 jenos]# tar tzvf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz |wc -l 109 [r...@ac33 jenos]# ls -l /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz -rw-r--r-- 1 jenos ac 804385203 2010-02-07 06:32 /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz [r...@ac33 jenos]# These are the relevant performance options I'm using in my .vol file: #Performance Options--- volume readahead type performance/read-ahead option page-count 4 # 2 is default
Re: [Gluster-users] Announcement: Alpha Release of Native NFS for GlusterFS
Hi Justice, We tested this ALPHA release with 4 physical machines for the servers ( NFS and GlusterFS server) and 7 to 11 machines as NFS clients. Some nfs client fan out testing was done with a larger number number of smaller machines. The Vmware testing was testing for using GlusterFS NFS backend for vmdk. After that we ran some standard filesystem tests - connectathod, iozone, fio, bonnie++ etc inside the vmware partition. However the testing inside the partition was not as intensive as the one we did with the physical servers. An intensive testing inside the vmware partitions is planned between ALPHA and BETA release. We would be happy to work with you to find out where you system locked up. Did you try to debug what happened ? It would help if you can describe lock up the entire machine in detail. Did other vms running on the machine stop ? Were you able to kill the test command ? if yes, then did things stabilize after that ? Please share your config and the commands etc you ran to start the NFS translator. Regards, Tejas. - Original Message - From: Justice London jlon...@lawinfo.com To: Tejas N. Bhise te...@gluster.com Cc: gluster-users@gluster.org, gluster-de...@nongnu.org, nfs-al...@gluster.com Sent: Saturday, March 20, 2010 2:39:27 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] Announcement: Alpha Release of Native NFS for GlusterFS I'm sorry.. but I don't know how you guys tested this, but using a bare-bones configuration with the NFS translator and a mirror configuration between two systems (no performance translators, etc.) I can lock up the entire system after writing 160-180megs of data. Basically: dd if=/dev/full of=testfile bs=1M count=1000 is enough to lock the entire machine. This is on a CentOS 5.4 system with a xen backend (for testing). I don't know what you guys tested with, but I can't get this stable... at all. Justice London jlon...@lawinfo.com On Thu, 2010-03-18 at 10:36 -0600, Tejas N. Bhise wrote: Dear Community Users, Gluster is happy to announce the ALPHA release of the native NFS Server. The native NFS server is implemented as an NFS Translator and hence integrates very well, the NFS protocol on one side and GlusterFS protocol on the other side. This is an important step in our strategy to extend the benefits of Gluster to other operating system which can benefit from a better NFS based data service, while enjoying all the backend smarts that Gluster provides. The new NFS Server also strongly supports our efforts towards becoming a virtualization storage of choice. The release notes of the NFS ALPHA Release are available at - http://ftp.gluster.com/pub/gluster/glusterfs/qa-releases/nfs-alpha/GlusterFS_NFS_Alpha_Release_Notes.pdf The Release notes describe where RPMs and source code can be obtained and where bugs found in this ALPHA release can be filed. Some examples on usage are also provided. Please be aware that this is an ALPHA release and in no way should be used in production. Gluster is not responsible for any loss of data or service resulting from the use of this ALPHA NFS Release. Feel free to send feedback, comments and questions to: nfs-al...@gluster.com Regards, Tejas Bhise. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Announcement: Alpha Release of Native NFS for GlusterFS
Dear Community Users, Gluster is happy to announce the ALPHA release of the native NFS Server. The native NFS server is implemented as an NFS Translator and hence integrates very well, the NFS protocol on one side and GlusterFS protocol on the other side. This is an important step in our strategy to extend the benefits of Gluster to other operating system which can benefit from a better NFS based data service, while enjoying all the backend smarts that Gluster provides. The new NFS Server also strongly supports our efforts towards becoming a virtualization storage of choice. The release notes of the NFS ALPHA Release are available at - http://ftp.gluster.com/pub/gluster/glusterfs/qa-releases/nfs-alpha/GlusterFS_NFS_Alpha_Release_Notes.pdf The Release notes describe where RPMs and source code can be obtained and where bugs found in this ALPHA release can be filed. Some examples on usage are also provided. Please be aware that this is an ALPHA release and in no way should be used in production. Gluster is not responsible for any loss of data or service resulting from the use of this ALPHA NFS Release. Feel free to send feedback, comments and questions to: nfs-al...@gluster.com Regards, Tejas Bhise. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Mounting gluster volume from fstab at boot - Specifying multiple server nodes
Jon, Why not just use replication ? and distribute over the replicated setup ? you can stack it up like that. Regards, Tejas. - Original Message - From: Jon Swanson jswan...@valuecommerce.co.jp To: gluster-users@gluster.org Sent: Monday, March 15, 2010 12:54:55 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] Mounting gluster volume from fstab at boot - Specifying multiple server nodes Hello. glusterfs is flat out awesome. Just had a quick question about mounting a volume via fstab at boot time that did not seem to be answered here:http://www.gluster.com/community/documentation/index.php/Mounting_a_GlusterFS_Volume I'd like to be able to specify multiple nodes at boot time, so if the volume is spanned across Nodes A,B,C,D, and A goes down, the system will still automatically mount the volume from B. It seems like you could do that relatively easily referencing a volume file on the client. I would like to avoid maintaining separate volume files on each client if at all possible though. Is there a syntax for providing an fstab line for a gluster mount that will allow the gluster client to try multiple hosts in the event one is down? Thanks, jon ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] How to re-sync
Yes. Replace the binaries and restart. - Original Message - From: Stephan von Krawczynski sk...@ithnet.com To: Tejas N. Bhise te...@gluster.com Cc: Ed W li...@wildgooses.com, Gluster Users gluster-users@gluster.org Sent: Tuesday, March 9, 2010 7:33:02 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] How to re-sync We run 2.0.9 currently. Can we compile install 3.X simply over 2.X, or do we need to re-create the replication (via self-heal), or even copy the data from scratch to an empty exported replication glusterfs? On Mon, 8 Mar 2010 23:45:05 -0600 (CST) Tejas N. Bhise te...@gluster.com wrote: Ed, Chad, Stephen, We believe we have fixed all ( known ) problems with self-heal in the latest releases and hence we would be very interested in getting diagnostics if you can reproduce the problem or see it again frequently. Please collect the logs by running the client and servers with log level TRACE and then reproducing the problem. Also collect the backend extended attributes of the file on both servers before self-heal was triggered. This command can be used to get that info: # getfattr -d -m '.*' -e hex filename Thank you for you help to debug if any new problem shows up. Feel free to ask if you have any queries about this. Regards, Tejas. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] advice on optimal configuration
Barry, Just to clarify, the application that would cache files on glusterfs would do it across regular mount points and not copy off from the backend servers, right ? If that is the case then that is fine. Since you mentioned such a small partition my guess would be that you are using SSD on the 128 cache nodes. Is that correct ? Since you can re-generate or retreive files from the upstream file server seamlessly, I would recommend not to use replication and instead configure a 2X cache using distribute configuration. If there are enough files and the application is caching files that are in demand, they will spread out nicely over the 128 nodes and will give you a good load balancing effect. With replication, suppose you have two replicas, like you mentioned, the write goes to both replica servers and the read for a file will go to a preferred server. There is no load balancing per file per se. What I mean is, suppose 100 clients mount a volume that is replicated across 2 servers, if all of them access the same file in read mode, it will be read from the same server and will not be balanced across the 2 servers. This however can be fixed by using a client preferred read server - but this would have to be set on each client. Also, it will work only for a replication count of 2. It does not allow for a preference list for servers - like it would not allow for a replica count of 3, one client to give preference of s1, s2, s3, another client to give preference of s2, s3, s1 and the next one a preference of s3, s1, s2 and so on and so forth. At some point we intend to automate some of that, but since most users use a replication count of 2 only, it can be managed - except of the work required to set preferences on each client. Again, if there are lots of files being accessed, it evens out, so that becomes less of a concern again and gives a load balanced effect. So in summary, read for same file does not get balanced, unless each client sets a preference. However for many files being accessed it evens out and gives a load balanced effect. Since you are only going to write once, that does not hurt performance much ( a replicated write returns only after the write has happened to both replica locations ). Since you are still in testing phase, what you can do is this - create one backend FS on each nodes. Create two directories in that - one called distribute and the other called something like replicavolumereplica# so you can use that to group it with a similar one on another node for replication. The backend subvolumes exported from the servers can be directories so you can setup a distribute GlusterFS volume as well as the replicated GlusterFS volumes and mount both on the clients and hence test both. At any point when you have decide to use one of them, just umount the other one, delete the directory from the the backend FS and thats it. If you have SSDs like I assumed, you would actually be decreasing wear per cached data ( if there were such a term :-) ) by not using replication. Let me know if you have any questions on this. Regards, Tejas. - Original Message - From: Barry Robison barry.robi...@drdstudios.com To: gluster-users@gluster.org Sent: Wednesday, March 10, 2010 5:28:24 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] advice on optimal configuration Hello, I have 128 physically identical blades, with 1GbE uplink per blade, and 10GbE between chassis ( 32 blades per chassis ). Each node will have a 80GB gluster partition. Dual-quad core intel Xeons, 24GB RAM. The goal is to use gluster as a cache for files used by render applications. All files in gluster could be re-generated or retrieved from the upstream file server. My first volume config attempt is 64 replicated volumes with partner pairs on different chassis. Is replicating a performance hit? Do reads balance between replication nodes? Would NUFA make more sense for this set-up? Here is my config, any advice appreciated. Thank you, -Barry volume c001b17-1 type protocol/client option transport-type tcp option remote-host c001b17 option transport.socket.nodelay on option transport.remote-port 6996 option remote-subvolume brick1 option ping-timeout 5 end-volume . snip . volume c004b48-1 type protocol/client option transport-type tcp option remote-host c004b48 option transport.socket.nodelay on option transport.remote-port 6996 option remote-subvolume brick1 option ping-timeout 5 end-volume volume replicate001-17 type cluster/replicate subvolumes c001b17-1 c002b17-1 end-volume . snip . volume replicate001-48 type cluster/replicate subvolumes c001b48-1 c002b48-1 end-volume volume replicate003-17 type cluster/replicate subvolumes c003b17-1 c004b17-1 end-volume . snip . volume replicate003-48 type cluster/replicate subvolumes c003b48-1 c004b48-1 end-volume volume
Re: [Gluster-users] How to re-sync
Ed, Chad, Stephen, We believe we have fixed all ( known ) problems with self-heal in the latest releases and hence we would be very interested in getting diagnostics if you can reproduce the problem or see it again frequently. Please collect the logs by running the client and servers with log level TRACE and then reproducing the problem. Also collect the backend extended attributes of the file on both servers before self-heal was triggered. This command can be used to get that info: # getfattr -d -m '.*' -e hex filename Thank you for you help to debug if any new problem shows up. Feel free to ask if you have any queries about this. Regards, Tejas. - Original Message - From: Ed W li...@wildgooses.com To: Gluster Users gluster-users@gluster.org Sent: Monday, March 8, 2010 5:22:40 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] How to re-sync On 07/03/2010 16:02, Chad wrote: Is there a gluster developer out there working on this problem specifically? Could we add some kind of sync done command that has to be run manually and until it is the failed node is not used? The bottom line for me is that I would much rather run on a performance degraded array until a sysadmin intervenes, than loose any data. I'm only in evaluation mode at the moment, but resolving split brain is something which is terrifying me at the moment and I have been giving some thought to how it needs to be done with various solutions In the case of gluster it really does seem very important to figure out a reliable way to know when the system is fully synced again if you have had an outage. For example a not unrealistic situation if you were doing a bunch of upgrades would be: - Turn off server 1 (S1) and upgrade, server 2 (S2) deviates from S1 - Turn on server 1 and expect to sync all new changes from while we were down - key expectation here is that S1 only includes changes from S2 and never sends changes. - Some event marks sync complete so that we can turn off S2 and upgrade it The problem otherwise if you don't do the sync is that you turn off S2 and now S1 doesn't know about changes made while it's off and serves up incomplete information. Split brain can occur where a file is changed on both servers while they couldn't talk to each other and then changes must be lost... I suppose a really cool translator could be written to track changes made to an AFR group where one member is missing and then the out of sync file list would be resupplied once it was turned on again in order to speed up replication... Kind of a lot of work for a small improvement, but could be interesting to create... Perhaps some dev has some other suggestions on a procedure to follow to avoid split brain in the situation that we need to turn off all servers one by one in an AFR group? Thanks Ed W ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] How to re-sync
Chad, Stephan - thank you for your feedback. Just to clarify on what wrote, do you mean to say that - 1) The setup is a replicate setup with the file being written to multiple nodes. 2) One of these nodes is brought down. 3) A replicated file with a copy on the node brought down is written to. 4) The other copies are updates as writes happen while this node is still down. 5) After this node is brought up, the client sometimes sees the old file on the node brought up instead of picking the file from a node that has the latest copy. If the above is correct, quick questions - 1) What versions are you using ? 2) Can you share your volume files ? Are they generated using volgen ? 3) Did you notice any patterns for the files where the wrong copy was picked ? like were they open when the node was brought down ? 4) Any other way to reproduce the problem ? 5) Any other patterns you observed when you see the problem ? 6) Would you have listings of problem file(s) from the replica nodes ? If however my understanding was not correct, then please let me know with some examples. Regards, Tejas. - Original Message - From: Chad ccolu...@hotmail.com To: Stephan von Krawczynski sk...@ithnet.com Cc: gluster-users@gluster.org Sent: Sunday, March 7, 2010 9:32:27 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] How to re-sync I actually do prefer top post. Well this overwritten behavior is what I saw as well and that is a REALLY REALLY bad thing. Which is why I asked my question in the first place. Is there a gluster developer out there working on this problem specifically? Could we add some kind of sync done command that has to be run manually and until it is the failed node is not used? The bottom line for me is that I would much rather run on a performance degraded array until a sysadmin intervenes, than loose any data. ^C Stephan von Krawczynski wrote: I love top-post ;-) Generally, you are right. But in real-life you cannot trust on this smartness. We tried exactly this point and had to find out that the clients do not always select the correct file version (i.e. the latest) automatically. Our idea in the testcase was to bring down a node, update its kernel an revive it - just as you would like to do it in real world for a kernel update. We found out that some files were taken from the downed node afterwards and the new contents on the other node got in fact overwritten. This does not happen generally, of course. But it does happen. We could only stop this behaviour by setting favorite-child. But that does not really help a lot, since we want to take down all nodes some other day. This is in fact one of our show-stoppers. On Sun, 7 Mar 2010 01:33:14 -0800 Liam Slusser lslus...@gmail.com wrote: Assuming you used raid1 (distribute), you DO bring up the new machine and start gluster. On one of your gluster mounts you run a ls -alR and it will resync the new node. The gluster clients are smart enough to get the files from the first node. liam On Sat, Mar 6, 2010 at 11:48 PM, Chad ccolu...@hotmail.com wrote: Ok, so assuming you have N glusterfsd servers (say 2 cause it does not really matter). Now one of the servers dies. You repair the machine and bring it back up. I think 2 things: 1. You should not start glusterfsd on boot (you need to sync the HD first) 2. When it is up how do you re-sync it? Do you rsync the underlying mount points? If it is a busy gluster cluster it will be getting new files all the time. So how do you sync and bring it back up safely so that clients don't connect to an incomplete server? ^C ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs-volgen
volgen will generate the files. Its like generating RPC stubs for client and server. You can do that on any machine, actually. Then you take the server and client files and put them on server. The client will fetch specification file from the server. Alternately you can also copy the vol file to the client, without turning on fetch spec. - Original Message - From: m roth m.r...@5-cent.us To: gluster-users@gluster.org Sent: Saturday, March 6, 2010 12:26:43 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] glusterfs-volgen Well, my manager wants me to try to get glusterfs working from the rpms that the CentOS team has, so it's try to get 2.0.8 working. I started to follow Chad's process... and ran into the same thing that started me on this: his glusterfs-volgen has *no* --export-directory in the command. Googling, I found http://www.cs.cmu.edu/~ewalter/gluster/ which is, in fact, 2.0.8. However, there's things I still don't get. Now, I've installed the server and common on all my nodes. I've only installed common and client on my head node (this is an HPC cluster that I'm building this on). 1) where do I run the glusterfs-volgen command, on the head (client) box, or on each of the servers? 2) if the latter, why does the command offer the option of listing multiple hosts? I have, as the link I posted recommends, symlinked /etc/glusterfs with /usr/local/etc/glusterfs. I've created the directories I want as part of this on each of my nodes, all with the same name. Once I get this part, then I'll get to the client side... mark ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Can gluster export be nfs exported too?
Jeremy, Exporting of same (local or locally mounted ) volume/filesystem using two different protocols is possible in theory but very difficult to implement. Each exporter usually has some housekeeping information on the data it exports and in most implementations two exporter do not share this information. This is what makes it difficult, though not impossible. Look up ctdb and SAMBA and you will understand what I am trying to say. Regards, Tejas. - Original Message - From: Jeremy Enos je...@ncsa.uiuc.edu To: Raghavendra G raghaven...@gluster.com Cc: gluster-users@gluster.org Sent: Wednesday, March 3, 2010 1:16:18 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] Can gluster export be nfs exported too? Second question: Even if it's not supported, is it theoretically feasible? Jeremy On 3/2/2010 10:16 PM, Raghavendra G wrote: No, its not supported to export glusterfs backend directories using NFS. On Tue, Mar 2, 2010 at 8:43 AM, Jeremy Enos je...@ncsa.uiuc.edu mailto:je...@ncsa.uiuc.edu wrote: If I have a single system exporting gluster, can I also export that same directory via NFS w/o Gluster in the loop, or is it different somehow? I assume I definitely can't do that on any striped setup for obvious reasons. (I realize I could NFS export the gluster mounted volume, but I'm talking about the gluster export volume here) thx- Jeremy ___ Gluster-users mailing list Gluster-users@gluster.org mailto:Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Raghavendra G ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] error: Transport endpoint is not connected and Stale NFS file handle
Jose, I would request you to use volgen. A suggestion on the config, why not group disks on two nodes ( simple e.g. group1 from server1 and server2, group2 from server2 and server4 ) with replicate. Then have distribute translator on top of the two replicated volumes. This would give you a single GlusterFS volume to be mounted on each of the clients. Would'nt something simple like that work for you ? I gave a representative example of 2 replicas this can be easily extended to 3 replicas also. Regards, Tejas. - Original Message - From: José Manuel Canelas jcane...@co.sapo.pt To: gluster-users@gluster.org Sent: Tuesday, March 2, 2010 9:38:32 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] error: Transport endpoint is not connected and Stale NFS file handle Hi, Since no one replies to this, i'll reply to myself :) I just realized I assumed that it is possible to replicate distributed volumes. I am wrong? In my setup bellow I was trying to make Replicated Distributed Storage, the inverse of what is described in http://www.gluster.com/community/documentation/index.php/Distributed_Replicated_Storage. Trying to draw a picture: replicated -| 3 replicas presented as one volume replica1 replica2 replica3 ---|-|---| - 4 volumes, distributed, to make up 4vols 4vols 4volseach of the 3 volumes to be replicated Is this dumb or is there a better way? thanks, José Canelas On 02/26/2010 03:55 PM, José Manuel Canelas wrote: Hello, everyone. We're setting up GlusterFS for some testing and having some trouble with the configuration. We have 4 nodes as clients and servers, 4 disks each. I'm trying to setup 3 replicas across all those 16 disks, configured at the client side, for high availability and optimal performance, in a way that makes it easy to add new disks and nodes. The best way I thought doing it was to put disks together from different nodes into 3 distributed volumes and then use each of those as a replica of the top volume. I'd like your input on this too, so if you look at the configuration and something looks wrong or dumb, it probably is, so please let me know :) Now the server config looks like this: volume posix1 type storage/posix option directory /srv/gdisk01 end-volume volume locks1 type features/locks subvolumes posix1 end-volume volume brick1 type performance/io-threads option thread-count 8 subvolumes locks1 end-volume [4 more identical bricks and...] volume server-tcp type protocol/server option transport-type tcp option auth.addr.brick1.allow * option auth.addr.brick2.allow * option auth.addr.brick3.allow * option auth.addr.brick4.allow * option transport.socket.listen-port 6996 option transport.socket.nodelay on subvolumes brick1 brick2 brick3 brick4 end-volume The client config: volume node01-1 type protocol/client option transport-type tcp option remote-host node01 option transport.socket.nodelay on option transport.remote-port 6996 option remote-subvolume brick1 end-volume [repeated for every brick, until node04-4] ### Our 3 replicas volume repstore1 type cluster/distribute subvolumes node01-1 node02-1 node03-1 node04-1 node04-4 end-volume volume repstore2 type cluster/distribute subvolumes node01-2 node02-2 node03-2 node04-2 node02-2 end-volume volume repstore3 type cluster/distribute subvolumes node01-3 node02-3 node03-3 node04-3 node03-3 end-volume volume replicate type cluster/replicate subvolumes repstore1 repstore2 repstore3 end-volume [and then the performance bits] When starting the glusterfs server, everything looks fine. I then mount the filesystem with node01:~# glusterfs --debug -f /etc/glusterfs/glusterfs.vol /srv/gluster-export and it does not complain and shows up as properly mounted. When accessing the content, it gives back an error, that the Transport endpoint is not connected. The log has a Stale NFS file handle warning. See bellow: [...] [2010-02-26 14:56:01] D [dht-common.c:274:dht_revalidate_cbk] repstore3: mismatching layouts for / [2010-02-26 14:56:01] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 9: LOOKUP() / = -1 (Stale NFS file handle) node01:~# mount /dev/cciss/c0d0p1 on / type ext3 (rw,errors=remount-ro) tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) procbususb on /proc/bus/usb type usbfs (rw) udev on /dev type tmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) fusectl on /sys/fs/fuse/connections type fusectl (rw) /dev/cciss/c0d1 on /srv/gdisk01 type ext3 (rw,errors=remount-ro)
Re: [Gluster-users] GlusterFS 3.0.2 small file read performance benchmark
Ed, oplocks are implemented by SAMBA and it would not be a part of GlusterFS per se till we implement a native SAMBA translator ( something that would replace the SAMBA server itself with a thin SAMBA kind of a layer on top of GlusterFS itself ). We are doing that for NFS by building an NFS translator. At some point, it would be interesting to explore, clustered SAMBA using ctdb, where two GlusterFS clients can export the same volume. ctdb itself seems to be coming up well now. Regards, Tejas. - Original Message - From: Ed W li...@wildgooses.com To: Gluster Users gluster-users@gluster.org Sent: Wednesday, March 3, 2010 12:10:47 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] GlusterFS 3.0.2 small file readperformance benchmark On 01/03/2010 20:44, Ed W wrote: I believe samba (and probably others) use a two way lock escalation facility to mitigate a similar problem. So you can read-lock or phrased differently, express your interest in caching some files/metadata and then if someone changes what you are watching the lock break is pushed to you to invalidate your cache. Seems NFS v4 implements something similar via delegations (not believed implemented in linux NFSv4 though...) In samba the equivalent are called op locks I guess this would be a great project for someone interested to work on - op-lock translator for gluster Ed W ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Various applications using Gluster
Dear Community Members, We at Gluster thank you for using our product and making our efforts worth it. Over time, GlusterFS has been used to hold data for various kinds of applications and with different configurations. We would like to collect this information so that it may help reach a wider audience and increase the adoption of Gluster. There are others out there who use the same applications you do and can benefit with running those applications on Glusterfs, just like you do. Please take some time out and share this information with us. You can write directly to me. Feel free to ask me any questions about this. Please also let me know if you would be willing to write a paper or a tutorial about your experiences, application setup, configurations, performance optimization etc with your solution involving GlusterFS. Regards, Tejas Bhise. tejas at gluster dot com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] ib-sdp
Nick, About Gluster 3.0.2 on Solaris ( not on OpenSolaris ), there are some issues which are being fixed. So please check here again in a few days about it. ib-sdp itself is supported in 3.0.2 - Raghu already replied to that question. Regards, Tejas. - Original Message - From: Raghavendra G raghaven...@gluster.com To: Nick Birkett n...@streamline-computing.com Cc: gluster-users@gluster.org Sent: Thursday, February 18, 2010 6:43:03 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] ib-sdp On Thu, Feb 18, 2010 at 2:15 PM, Nick Birkett n...@streamline-computing.com wrote: Is ib-sdp still supported on 3.0.2 (eg for use in solaris environment) ? yes, ib-sdp is supported in 3.0.2. If so do we compile for tcp socket or is there a special configure option ? give transport-type as socket and address-family as inet-sdp. option transport-type socket option address-family inet-sdp Thanks, Nick This e-mail message may contain confidential and/or privileged information. If you are not an addressee or otherwise authorized to receive this message, you should not use, copy, disclose or take any action based on this e-mail or any information contained in the message. If you have received this material in error, please advise the sender immediately by reply e-mail and delete this message. Thank you. Streamline Computing is a trading division of Concurrent Thinking Limited: Registered in England and Wales No: 03913912 Registered Address: The Innovation Centre, Warwick Technology Park, Gallows Hill, Warwick, CV34 6UW, United Kingdom ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users regards, -- Raghavendra G ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster client and HA
Hi Mike, Are you looking for HA or the NFS server or HA of glusterFS itself ? Can you please explain your system a little more and also tell us what you want to achieve. Regards, Tejas. - Original Message - From: mike foster mfost...@gmail.com To: Gluster General Discussion List gluster-users@gluster.org Sent: Monday, February 8, 2010 11:47:24 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] Gluster client and HA I was under the impression that by configuring a system as a client connected to 4 server nodes that if one of the nodes went down the client would still be able to access the data from some kind of failover to the other nodes. However I set up a test an failed the server that was listed as the last connected server from the log file and attempted to access the exported/mounted filesystem on the client and recieved an Stale NFS file handle error. Also here is some messages from the log file: cf02: connection to 10.50.14.32:6996 failed (No route to host) [2010-02-08 11:09:24] W [fuse-bridge.c:722:fuse_attr_cbk] glusterfs-fuse: 88: LOOKUP() / = -1 (Stale NFS file handle) Is it not possible for a client to have HA to the exported filesystem? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Getting md5 check of a file through libglusterfs
Hi Dongmin, Can you elaborate what what you are trying to achieve so we get a better understanding of the problem. Regards, Tejas. - Original Message - From: Dongmin Yu m...@hostway.co.kr To: gluster-users@gluster.org Sent: Saturday, January 23, 2010 1:57:09 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] Getting md5 check of a file through libglusterfs Hello, Is there an easier way to get md5 checksum of a file through libglusterfs? Sure we could get this value with a standard md5sum command line on the fuse mounted system. If I read all the contents and calculate md5 checksum, it would be very time/network consuming job. Thanks DongMin Yu HOSTWAY IDC Corp. / RD Principal Researcher TEL. +822 2105 6037 FAX. +822 2105 6019 CELL. +8216 2086 1357 EMAIL: min...@hostwaycorp.com Website: http://www.hostway.com NOTICE: This email and any file transmitted are confidential and/or legally privileged and intended only for the person(s) directly addressed. If you are not the intended recipient, any use, copying, transmission, distribution, or other forms of dissemination is strictly prohibited. If you have received this email in error, please notify the sender immediately and permanently delete the email and files, if any. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] open() issues on 3.0.0
Aaron, Nice piece of information !! ... Regards, Tejas. - Original Message - From: Aaron Knister aaron.knis...@gmail.com To: b...@usf.edu Cc: gluster-users@gluster.org Sent: Sunday, January 17, 2010 11:42:14 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] open() issues on 3.0.0 I ran into a similar problem when compiling code on a gluster mount with the intel compiler. The problem was intermittent and after digging into it further it appeared to only happy when compiling from source files that had very high inode numbers. There's a command run by icc that's a 32 bit executable and my guess is that somewhere internally the extremely large inode number wasn't able to be handled and caused the compilation to fail. The fix was to upgrade the compiler. I can verify that version 11.0.064 works without a hitch (at least in this regard). More Detail: I can reproduce this error with version 10.0.026 of the intel compiler. I have file test.c on a glusterfs with inode num 21479794854 (greater than 32 bits) 21479794854 -rw-r--r-- 1 aaron users 26 Jan 17 12:58 test.c Running icc fails: ... $ icc test.c Catastrophic error: could not open source file test.c compilation aborted for test.c (code 4) ... However if I copy the same file to a directory on the same fs that has a 32 bit inode number (le 4294967295) (hoping the new file will also have a 32 bit inode number): 11889296 -rw-r--r-- 1 aaron users 26 Jan 17 13:02 test.c Compilation is a success: ... $ !icc icc test.c $ echo $? 0 ... If you run icc with the -v option you'll see that a program called mcpcom gets run and produces the error. At least with version 10.0.026 it was a 32 bit program: $ file /opt/intel/cce/10.0.026/bin/mcpcom /opt/intel/cce/10.0.026/bin/mcpcom: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), for GNU/Linux 2.2.5, not stripped Anyway I hope that helps. -Aaron On Jan 17, 2010, at 11:13 AM, Brian Smith wrote: Hi, I'm new to Gluster, so please forgive any ignorance on my part. I've tried reading over everything I could find to resolve this myself. I'm having an issue building executables on our gluster volume (there are some other similar issues as well, but we'll deal with this one first). I have the presta infiniband utilities extracted into a directory in my glusterfs volume and I'm attempting to build them. One of the source files, util.c (coincidentally, the first one referenced in the make file) fails to build. I get an error message: Catastrophic error: could not open source file util.c Some further investigating reveals the command that causes the issue: icc -I/opt/priv/openmpi-1.4.1/intel-10.1.018/include -pthread -L/opt/priv/mx/lib -L/opt/priv/openmpi-1.4.1/intel-10.1.018/lib -lmpi -lopen-rte -lopen-pal -lmyriexpress -libverbs -lpsm_infinipath -lnuma -ldl -Wl,--export-dynamic -lnsl -lutil -c util.c Catastrophic error: could not open source file util.c compilation aborted for util.c (code 4) And further, we see an strace of the command: ... 2717 6601 brk(0x957b000)= 0x957b000 2718 6601 open(util.c, O_RDONLY) = 3 2719 6601 stat64(util.c, {st_mode=S_IFREG|0600, st_size=14370, ...}) = 0 2720 6601 close(3) = 0 2721 6601 fstat64(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 2722 6601 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE| MAP_ANONYMOUS, -1, 0x1000) = 0xf0bce000 2723 6601 write(2, Catastrophic error: could not open source file \util.c\\n, 56) = 56 2724 6601 write(2, \n, 1) = 1 2725 6601 exit_group(4) ... We can also see that the file is indeed present: $ ls -l util.c -rw--- 1 brs users 14370 Apr 8 2002 util.c $ stat util.c File: `util.c' Size: 14370 Blocks: 32 IO Block: 4096 regular file Device: 18h/24d Inode: 4295774314 Links: 1 Access: (0600/-rw---) Uid: (1229806/ brs) Gid: (10001/ users) Access: 2010-01-17 00:27:43.0 -0500 Modify: 2002-04-08 21:13:57.0 -0400 Change: 2010-01-17 00:27:43.0 -0500 If I extract the tar-ball in /tmp, a local ext3 fs, the compile line above works correctly. diff /work/b/brs/presta1.2/util.c /tmp/presta1.2/util.c also appears clean. Any ideas what is happening here? Is there an issue with mmap2() and glusterfs? Many thanks in advance, -Brian -- Brian Smith Senior Systems Administrator IT Research Computing, University of South Florida 4202 E. Fowler Ave. ENB308 Office Phone: +1 813 974-1467 Organization URL: http://rc.usf.edu ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org
Re: [Gluster-users] Memory problems / leak in 2.0.6?
Thanks, Raghavendra. Roland, I would be very interested to know if you plan to move to a newer version - that might solve some of your problems. If you do not plan to upgrade, then I would also like to know if there are certain features in 2.0.6 which you feel do not allow you to upgrade. In the ideal case, we would like to have to support a fewer set of recent versions hence the question :-). Regards, Tejas. - Original Message - From: Raghavendra G raghaven...@gluster.com To: Roland Rabben rol...@jotta.no Cc: gluster-users gluster-users@gluster.org Sent: Friday, January 1, 2010 8:41:32 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] Memory problems / leak in 2.0.6? Hi Roland, Does doing echo 3 /proc/sys/vm/drop_caches bring down the memory usage? regards, On Fri, Jan 1, 2010 at 2:47 PM, Roland Rabben rol...@jotta.no wrote: Hi I am using Glusterfs 2.0.6 and I am experiencing some memory issues on the clients. The clients will grow their memory usage to several GB's over one or two days. This causes my app to run out of memroy and I need to restart my app and unmount the glusterfs volumes. I have tried adjusting the io-cache to max 512MB but this does not seem to have any effect. My clients use replicate and distribute. From my glusterfs.vol file: volume dfs type cluster/distribute option lookup-unhashed no option min-free-disk 5% subvolumes repl-000-001-01 repl-000-001-02 repl-000-001-03 repl-000-001-04 repl-002-003-01 repl-002-003-02 repl-002-003-03 repl-002-003-04 end-volume # Enable write-behind to decrease write latency volume wb type performance/write-behind option flush-behind off option cache-size 64MB subvolumes dfs end-volume volume cache type performance/io-cache option cache-size 512MB subvolumes wb end-volume Is this a memory leak or is there a way to limit the memory usage on the clients? I am running this on Ubuntu 9.04. Regards -- Roland Rabben Founder CEO Jotta AS Cell: +47 90 85 85 39 Phone: +47 21 04 29 00 Email: rol...@jotta.no ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Raghavendra G ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] volume sizes
Thanks, Raghvendra. Anthony, Its a lazy self-heal mechanism, if you will. If one wants it all done right away, an ls -alR will access each file and hence cause the rebuild of the whole glusterfs volume which _may_, like you mentioned, be spread across disk partitions, LVM/RAID luns or even server nodes. Even after all that, only the files impacted in the volume would need to be rebuilt - although there might be some difference in overheads for different sized and configured Glusterfs volumes. It might be interesting to check - we have not done numbers on this. Let me check with the person who is more familiar with this area of code than me and he may be able to suggest some ballpark numbers till we run some real numbers. Meanwhile, if you do some tests, please share the numbers with the community. Regards, Tejas. - Original Message - From: Raghavendra G raghaven...@gluster.com To: Anthony Goddard agodd...@mbl.edu Cc: Tejas N. Bhise te...@gluster.com, gluster-users gluster-users@gluster.org Sent: Wednesday, December 30, 2009 9:10:23 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] volume sizes Hi Anthony, On Wed, Dec 30, 2009 at 6:30 PM, Anthony Goddard agodd...@mbl.edu wrote: Hi Tejas, Thanks for the advice. I will be using RAID as well as gluster replication I think.. as we'll only need to sacrifice 1 drive per raid set to add a bit of extra redundancy. The rebuild happens at the first access of a file, does this mean that the entire brick/node is rebuilt upon an initial file access? No, only the file which is accessed is rebuilt. That is the reason we recursively access all the files using 'ls -laR' on mount point. I think this is what I've seen from using gluster previously. If this is the case, it would rebuild the entire volume which could span many raid volumes or even machines, is this correct? If this is the case, then the underlying disk wouldn't have any effect at all, but if it's spanned over multiple machines and it only needs to rebuild one machine (or multiple volumes on one machine) it only needs to rebuild one volume. I don't know if that made any sense.. haha.. but if it did, any insights into whether the size of the volumes (aside from RAID rebuilds) will have a positive effect on glusters rebuild operations? Cheers, Ant. On Dec 30, 2009, at 2:56 AM, Tejas N. Bhise wrote: Anthony, Gluster can take the smaller ( 6TB ) volumes and aggregate them into a large Gluster volume ( as seen from the clients ). So that takes care of managebility on the client side of things. On the server side, once you make those smaller 6 TB volumes, you will depend on RAID to rebuild the disk behind it, so its good to have a smaller partition. Since you are using RAID and not Gluster replication, it might just make sense to have smaller RAID partitions. If instead you were using Gluster replication and resulting recovery, it would happen at first access of the file and the size of the Gluster volume or the backend native FS volume or the RAID ( or raw ) partition behind it would not be much of a consideration. Regards, Tejas. - Original Message - From: Anthony Goddard agodd...@mbl.edu To: gluster-users@gluster.org Sent: Wednesday, December 30, 2009 3:24:35 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] volume sizes First post! We're looking at setting up 6x 24 bay storage servers (36TB of JBOD storage per node) and running glusterFS over this cluster. We have RAID cards on these boxes and are trying to decide what the best size of each volume should be, for example if we present the OS's (and gluster) with six 36TB volumes, I imagine rebuilding one node would take a long time, and there may be other performance implications of this. On the other hand, if we present gluster / the OS's with 6x 6TB volumes on each node, we might have more trouble in managing a larger number of volumes. My gut tells me a lot of small (if you can call 6TB small) volumes will be lower risk and offer faster rebuilds from a failure, though I don't know what the pros and cons of these two approaches might be. Any advice would be much appreciated! Cheers, Anthony ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Raghavendra G ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster 3.0 read hangs
Hi Nick, Thank you for using Gluster and sending us such detailed description of the problem you are seeing. We will try a run with exactly the same switches and config as you mention and see if we can reproduce this inhouse to make debugging easier. Regards, Tejas. - Original Message - From: Nick Birkett n...@streamline-computing.com To: gluster-users@gluster.org Sent: Wednesday, December 23, 2009 3:04:43 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] gluster 3.0 read hangs I ran some benchmarks last week using 2.0.8. Single server with 8 Intel e1000e bonded mode=balance-alb All worked fine and I got some good results using 8 clients. All Gigabit. The benchmarks did 2 passes of IOZONE in network mode using 1-8 threads per client and using 1 - 8 clients. Each client used 32Gbyte files. All jobs completed successfully. This takes about 32 hours to run through all cases. Yesterday I updated to 3.0.0 (server and clients) and re-configured the server and client vol files using glusterfs-volgen (renamed some of the vol names). RedHat EL5 binary packages from Glusterfs site installed glusterfs-server-3.0.0-1.x86_64 glusterfs-common-3.0.0-1.x86_64 glusterfs-client-3.0.0-1.x86_64 All works mainly ok, except every so often the IOZONE job just stops. The network IO drops to zero. This is always happens during either a read or re-read test. It happes just as the IOZONE read test starts. It doesnt happen every time and it may run for several hours without incident. This has happened 6 times on different test cases (thread/clients). Anyone else noticed this ? Perhaps I have done something wrong ? vol files attached - I know I dont need to distribute 1 remote vol - part of larger test with multiple vols. Attached sample outputs. 4 clients 4 files per client ran fine. 4 clients 8 files per client hung at re-read on 2nd pass of IOZONE. All jobs with 5 clients and 8 clients ran to completion. Thanks, Nick This e-mail message may contain confidential and/or privileged information. If you are not an addressee or otherwise authorized to receive this message, you should not use, copy, disclose or take any action based on this e-mail or any information contained in the message. If you have received this material in error, please advise the sender immediately by reply e-mail and delete this message. Thank you. Streamline Computing is a trading division of Concurrent Thinking Limited: Registered in England and Wales No: 03913912 Registered Address: The Innovation Centre, Warwick Technology Park, Gallows Hill, Warwick, CV34 6UW, United Kingdom volume brick00.server-e type protocol/client option transport-type tcp option transport.socket.nodelay on option transport.remote-port 6996 option remote-host 192.168.100.200 # can be IP or hostname option remote-subvolume brick00 end-volume volume distribute type cluster/distribute subvolumes brick00.server-e end-volume volume writebehind type performance/write-behind option cache-size 4MB subvolumes distribute end-volume volume readahead type performance/read-ahead option page-count 4 subvolumes writebehind end-volume volume iocache type performance/io-cache option cache-size 1GB option cache-timeout 1 subvolumes readahead end-volume volume quickread type performance/quick-read option cache-timeout 1 option max-file-size 64kB subvolumes iocache end-volume volume statprefetch type performance/stat-prefetch subvolumes quickread end-volume #glusterfsd_keep=0 volume posix00 type storage/posix option directory /data/data00 end-volume volume locks00 type features/locks subvolumes posix00 end-volume volume brick00 type performance/io-threads option thread-count 8 subvolumes locks00 end-volume volume server type protocol/server option transport-type tcp option transport.socket.listen-port 6996 option transport.socket.nodelay on option auth.addr.brick00.allow * subvolumes brick00 end-volume == Cluster name : Delldemo Arch : x86_64 SGE job submitted: Tue Dec 22 22:21:38 GMT 2009 Number of CPUS 8 Running Parallel IOZONE on ral03 Creating files in /data2/sccomp NTHREADS=4 Total data size = 48196 MBytes Running loop 1 of 2 Iozone: Performance Test of File I/O Version $Revision: 3.326 $ Compiled for 64 bit mode. Build: linux Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins Al Slater, Scott Rhine, Mike Wisner, Ken Goss Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR, Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner, Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Erik Habbinga, Kris
Re: [Gluster-users] Adding / removing bricks
Hi Stas, Good to have you back !! .. Dynamic volumes and on the fly adding and removing of servers is a feature planned for 3.1 which we have tentatively planned for 2Q 2010. Our codebase is all open source, including the platform and the FS in the platform is the same as the FS-only product. I would request you try out the 3.0 product and give us valuable feedback. Regards, Tejas. - Original Message - From: Stas Oskin stas.os...@gmail.com To: gluster-users gluster-users@gluster.org Sent: Wednesday, December 23, 2009 8:19:47 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] Adding / removing bricks Hi. I looked through the email archives, and this feature is mentioned in the GlusterFS platform - any chance it will be present in the open-source version of GlusterFS 3? Regards. On Tue, Dec 22, 2009 at 4:29 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. I'm looking to evaluate the GlusterFS platform once again, hopefully after many issues with 2 and 2.1 (non-working replication) have been solved. Just a question - does GlusterFS finally supports adding / removing bricks on fly? Or still global re-mount is required? Thanks. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] How files are synchronized
Hello Yan, Welcome to Gluster community. I would suggest to try the 3.0 product for a better replication experience. There is code in there that makes it faster. Also, rebuilding of replica site is an exceptional condition and does not happen all the time. When it does happen, it can slow access temporarily. This can be considered as something akin to rebuidling a disk which also slows down access temporarily. Please try 3.0 product with your use cases and let us know about your experience. We would like to work with you to make sure you can use this product for your use case as we think it should work well for you once the initial problems are sorted out. Regards, Tejas - Original Message - From: Yan Fu yanfu2...@gmail.com To: gluster-users@gluster.org Sent: Wednesday, December 23, 2009 6:10:44 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] How files are synchronized Hi, I am a new GlusterFS user. I set up a replicated folder on two computers. When both are running, I can see that the folder on both computers are synchronized with each other as I write files to it. However, when I shutdown one: 1) File operations, like ls command, take time to finish on the running computer at first. But after this the dely disappears. 2) I continue to update some existing files on the running computer. Then I bring back the other one. For small files, the changes are reflected on the restarted computers very quickly. But for big files, it gets very slow to get the changes replicated. If I also happen to update the same file on the always running computer, both commands on the big file get stuck for a long time. Both will be a problem if I use such a system as a backend to build some high-volme services because there are always many requests for a same file at same time. Could somebody help me here? Is it possible that I have some wrong configurations? They are generated with glusterfs-volgen -name repstore1 --raid 1 jaunty2:/export/sdb1 jaunty3:/export/sdb1. I use GlusterFS version 2.0.8. Thanks, - Yan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Recommended GlusterFS configuration for 6 node cluster
Hi Phil, It's great to know that you are using Gluster. It would be easy to make suggestions on the points you bring up if there is more information on what use your want to put the system to. Regards, Tejas. - Original Message - From: phil cryer p...@cryer.us To: gluster-users@gluster.org Sent: Thursday, December 17, 2009 11:48:54 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] Recommended GlusterFS configuration for 6 node cluster We're setting up 6 servers, each with 24 x 1.5TB drives, the systems will run Debian testing and Gluster 3.x. The SATA RAID card offers RAID5 and RAID6, we're wondering what the optimum setup would be for this configuration. Do we RAID5 the disks, and have GlusterFS use them that way, or do we keep them all 'raw' and have GlusterFS handle the replication (though not 2x as we would have with the RAID options)? Obviously a lot of ways to do this, just wondering what GlusterFS devs and other experienced users would recommend. Thanks P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22
Thanks, Larry, for the comprehensive information. Phil, I hope that answers a lot of your questions. Feel free to ask more, we have a great community here. Regards, Tejas. - Original Message - From: Larry Bates larry.ba...@vitalesafe.com To: gluster-users@gluster.org, p...@cryer.us Sent: Thursday, December 17, 2009 9:47:30 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22 Phi.l, I think the real question you need to ask has to do with why we are using GlusterFS at all and what happens when something fails. Normally GlusterFS is used to provide scalability, redundancy/recovery, and performance. For many applications performance will be the least of the worries so we concentrate on scalability and redundancy/recovery. Scalability can be achieved no matter which way you configure your servers. Using distribute translator (DHT) you can unify all the servers into a single virtual storage space. The problem comes when you look at what happens when you have a machine/drive failures and need the redundancy/recovery capabilities of GlusterFS. By putting 36Tb of storage on a single server and exposing it as a single volume (using either hardware or software RAID), you will have to replicate that to a replacement server after a failure. Replicating 36Tb will take a lot of time and CPU cycles. If you keep things simple (JBOD) and use AFR to replicate drives between servers and use DHT to unify everything together, now you only have to move 1.5Tb/2Tb when a drive fails. You will also note that you get to use 100% of your disk storage this way instead of wasting 1 drive per array with RAID5 or two drives with RAID6. Normally with RAID5/6 it is also imperative that you have a hot spare per array, which means you waste an additional driver per array. To make RAID5/6 work with no single point of failure you have to do something like RAID50/60 across two controllers which gets expensive and much more difficult to manage and to grow. Implementing GlusterFS using more modest hardware makes all those issues go away. Just use GlusterFS to provide the RAID-like capabilities (via AFR and DHT). Personally I doubt that I would set up my storage the way you describe. I probably would (and have) set it up with more smaller servers. Something like three times as many 2U servers with 8x2Tb drives each (or even 6 times as many 1U servers with 4x2Tb drives each) and forget the expensive RAID SATA controllers, they aren't necessary and are just a single point of failure that you can eliminate. In addition you will enjoy significant performance improvements because you have: 1) Many parallel paths to storage (36x1U or 18x2U vs 6x5U servers). Gigabit Ethernet is fast, but still will limit bandwidth to a single machine. 2) Write performance on RAID5/6 is never going to be as fast as JBOD. 3) You should have much more memory caching available (36x8Gb = 256Gb memory or 18x8Gb memory = 128Gb vs maybe 6x16Gb = 96Gb) 4) Management of the storage is done in one place..GlusterFS. No messy RAID controller setups to document/remember. 5) You can expand in the future in a much more granular and controlled fashion. Add 2 machines (1 for replication) and you get 8Tb (using 2Tb drives) of storage. When you want to replace a machine, just set up new one, fail the old one, and let GlusterFS build the new one for you (AFR will do the heavy lifting). CPUs will get faster, hard drives will get faster and bigger in the future, so make it easy to upgrade. A small number of BIG machines makes it a lot harder to do upgrades as new hardware becomes available. 6) Machine failures (motherboard, power supply, etc.) will effect much less of your storage network. Having a spare 1U machine around as a hot spare doesn't cost much (maybe $1200). Having a spare 5U monster around does (probably close to $6000). IMHO 36 x 1U or 18 x 2U servers shouldn't cost any more (and maybe less) than the big boxes you are looking to buy. They are commodity items. If you go the 1U route you don't need anything but a machine, with memory and 4 hard drives (all server motherboards come with at least 4 SATA ports). By using 2Tb drives, I think you would find that the cost would be actually less. By NOT using hardware RAID you can also NOT use RAID-class hard drives which cost about $100 each more than non-RAID hard drives. Just that change alone could save you 6 x 24 = 144 x $100 = $14,400! JBOD just doesn't need RAID-class hard drives because you don't need the sophisticated firmware that the RAID-class hard drives provide. You still will want quality hard drives, but failures will have such a low impact that it is much less of a problem. By using more smaller machines you also eliminate the need for redundant power supplies (which would be a requirement in your large boxes because it would be a single point of
Re: [Gluster-users] VMware ESX
Hello Richard, Congratulations on your successful setup. Your mail text seems to have got cut after We have previously setup ... Can you please resend so we can understand your question. Regards, Tejas. - Original Message - From: Richard Charnley richardcharn...@hotmail.co.uk To: gluster-users@gluster.org Sent: Monday, December 14, 2009 3:04:19 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] VMware ESX Hi, We run ESX servers in a datacentre and i have managed to succesfully setup a mirrored volume which works really well (took 1 node down and files still available). my question is how can i run an ESX guest on gluster. We have previously setup _ Use Hotmail to send and receive mail from your different email accounts http://clk.atdmt.com/UKM/go/186394592/direct/01/ ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] client-side cpu usage, performance issue
Hi John, Thank you for sharing information about you setup. Is the application you are using, something that we can easily setup and use for generating more diagnostics inhouse at Gluster ? Regards, Tejas. - Original Message - From: John Madden jmad...@ivytech.edu To: Anush Shetty an...@gluster.com Cc: gluster-users@gluster.org Sent: Monday, December 7, 2009 8:20:52 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] client-side cpu usage, performance issue For reading small files, you could try using Quick-read translator. http://gluster.com/community/documentation/index.php/Translators/performance/quick-read This was one of the caching options I explored but it didn't seem to help. Also, keep in mind that it isn't just reading but lots of writes too. Also we would like to know the GlusterFS version no used for this setup. Apologies. This is on 2.0.8, pre-built RPMs off the site. John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmad...@ivytech.edu ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] client-side cpu usage, performance issue
Thanks John. I just looked up the Horde website. Will download and set it up. Meanwhile, any more details, besides the install, that you think would help set this up in least amount of time would be helpful, like, how to crank the incoming user rate etc. Such help will enable us to focus more on the file system rather than setting up the the application load. Thanks for helping out. Regards, Tejas. - Original Message - From: John Madden jmad...@ivytech.edu To: Tejas N. Bhise te...@gluster.com Cc: gluster-users@gluster.org, Anush Shetty an...@gluster.com Sent: Monday, December 7, 2009 11:17:15 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] client-side cpu usage, performance issue Thank you for sharing information about you setup. Is the application you are using, something that we can easily setup and use for generating more diagnostics inhouse at Gluster ? I suppose. It's Horde, a PHP-based groupware suite. I've got two Apache nodes load-balanced using glusterfs as their php sessions store plus shared data (temp files, misc app data) with the sessions store being the hard-hit service. When I crank up the incoming user rate, (causing new sessions to be created and then read), things get wonky as explained before. John -- John Madden Sr UNIX Systems Engineer Ivy Tech Community College of Indiana jmad...@ivytech.edu ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users