Re: [Gluster-users] Is it possible to install Gluster management console using manual install process?
On Wed, Nov 9, 2011 at 10:05 AM, Jérémie Tarot silopo...@gmail.com wrote: Hi, 2011/11/8 Bala.FA b...@gluster.com: Hi Xybrek, Gluster Management Console is not available for download now. It will be released soon. Is it the same management console that was available on the appliance ? Is there somewhere to get an idea of the features of the MC ? Screenshots ? Last, will the MC be specific to RH/Fedora/CentOS or will it be available for other distros ? Thanks Jé __ I have the same question, but want to know if it will be available for Debian (or if I can build it from source). I've had the cluster running fine for a long time, but having a web based console to check on all of the disks status would be killer. Thanks P ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Invitation to connect on LinkedIn
I'd like to add you to my professional network on LinkedIn. - Phil Phil Cryer Technical infrastructure engineer at Center for Library Informatics at Woods Hole Marine Biological Laboratory Greater St. Louis Area Confirm that you know Phil Cryer: https://www.linkedin.com/e/ug45gt-gtkr0eey-6k/isd/4504901697/pgJfSmLX/?hs=falsetok=27ARZa8mDlf4Y1 -- You are receiving Invitation to Connect emails. Click to unsubscribe: http://www.linkedin.com/e/ug45gt-gtkr0eey-6k/paFBcysu6IvSuDm7LAFfuLfm66wSonuzptbI1iG/goo/gluster-users%40gluster%2Eorg/20061/I1560657279_1/?hs=falsetok=3SW4URV7vlf4Y1 (c) 2011 LinkedIn Corporation. 2029 Stierlin Ct, Mountain View, CA 94043, USA. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] cannot access /mnt/glusterfs: Stale NFS file handle
I've mounted my glusterfs share as I always do: mount -t glusterfs `hostname`:/bhl-volume /mnt/glusterfs and I can see it in df: # df -h | tail -n1 clustr-01:/bhl-volume90T 51T 39T 57% /mnt/glusterfs but I can't change into it, or access any of the files in it: # ls -al /mnt/glusterfs ls: cannot access /mnt/glusterfs: Stale NFS file handle Any idea what could be causing this? It was working fine last week (in fact I haven't remounted it in months and have had clients accessing it constantly), but we did do a reboot across all 6 of the nodes over the weekend. details and version numbers: # uname -a; glusterfs -V Linux clustr-01 2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 2011 x86_64 GNU/Linux glusterfs 3.1.2 built on Jan 16 2011 18:14:56 Repository revision: v3.1.1-64-gf2a067c Copyright (c) 2006-2010 Gluster Inc. http://www.gluster.com GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU Affero General Public License. P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] anyone else playing with gcollect yet?
This looks interesting, but is there any tie in with collectd (http://collectd.org/)? I'm currently using ganglia, but only on the cluster, want to run collectd and have it consolidate data to the central monitoring server. Wondering if gcollect could do this too...could call it gcollectd then :) Thanks P On Wed, Jul 27, 2011 at 9:40 AM, greg_sw...@aotx.uscourts.gov wrote: gluster-users-boun...@gluster.org wrote on 07/27/2011 07:40:13 AM: I'm messing with it and had to do a few patches to get rid of warnings/errors on my system (it threw lots of warnings cause of my configured options on volumes and there was a traceback do to a typo), but now it just returns empty with a return code of 0. so for the sake of discussion... reasons it is returning blank: 1: iterator only goes through number of bricks-1 when evaluating the bricks, manytimes missing the local brick cause I happen to be testing on the last node in the list. patched here: https://github.com/gregswift/Gluster/commit/a16567b5149aea2ddbec1e61d6b9a8e8e3b10e76 2: hostname check doesn't work on my system because I don't use hostnames ;) yes yes.. i usually love dns... please don't fight me on this one. working on patch -greg ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] rsync for WAN replication (active/active)
On Thu, Mar 24, 2011 at 12:31 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks for pointing that out. I think rsync also has option to sync based on time,md5hash and other attributes if I am not wrong. If we can preserve time and only sync the most latest file then I think we should be ok? What do you think? I can't think of any other option other than looking at some other DFS systems. We definitely don't want to add remote site in the brick because of the latency that we have. On Thu, Mar 24, 2011 at 5:31 AM, Jonathan Barber jonathan.bar...@gmail.com wrote: On 17 March 2011 17:08, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks! I was going to trigger it through cron say every 10 mts. if rsync is not currently running. Regarding point 3) I thought of it also! I think this problem cannot be solved even when using bricks. If someone is editing 2 files at the same time only one will win (always). Only way we can avoid this is through application making sure that customer accessing the file can't go to 2 sites simulatneously. But I agree this scenario is the most complicated of all. This is a different issue; with gluster locking solves it (obviously the application has to know how to handle locks). Also, and I don't know if gluster supports this, some systems support byte range file locks, so both sites can write to the same file at the same time. The scenario I was trying to describe was a race condition between the rsync processes clobbering your files. I don't think this race condition is removed by using the --temp-dir option (although it probably decreases the window by a large amount). But if you don't run the sync process whilst the remote site is sync'ing to you, then it's not a problem. I was planning to use --temp-dir option (not tested it). Also I think rsync first copies the file as temporary files and then moves it. I just thought of another problem; which is that in the worst case you might require twice the amount of storage to sync your data (1x for the old data, 1x for the new data). In our case rsync will not handle deletes. If we want to delete any files it will be done manually. Nice thread, I've heard this come up a few times in regards to Gluster, and it relates to a project I'm working on. Basically I use a server/client setup using rsync, with inotify handling the kicking off once changes are seen. One box acts as the server and all the others are clients. This way when clients have new or changed files, those changes are sync'd to the server, but when files are removed on a client those updates will only be sync'd to the server. A separate cron job run on the clients does the syncs with the server to learn about missing files it needs to delete from its own store. It's definitely a work in progress, but the more people I talk to, the more I think this is needed. I will have it running on my gluster cluster soon to sync it with another (non-gluster) cluster in another country. If interested, or you have better idea :) the project is hosted here: https://github.com/philcryer/lipsync Thanks P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Debian, 3.1.1, duplicate files
On Thu, Jan 13, 2011 at 3:53 PM, Jacob Shucart ja...@gluster.com wrote: Phil, This sounds to me like an issue identified that affects Gluster directories that were part of older versions related to extended attributes that were set on the directories. I believe this issue is supposed to be fixed in 3.1.2. I don't know how large your dataset is, but a way to fix it would be to: 1. Delete the Gluster volume. 2. On the back end directories on your nodes, scrub the offending extended attribute with the command: find /back/end/dir -exec setfattr -x trusted.gfid {} \; 3. Create the Gluster volume again. 4. Mount the volume somewhere as a GlusterFS(mount -t glusterfs) and run: find /mnt/gluster -print0 | xargs --null stat 5. Enjoy. Jacob Thanks for your reply, to solve this I installed 3.1.2, then re-formatted all of my drives (bricks). It might have been overkill, but I wanted to start completely fresh with 3.1.2. So far, we've had no issues with the setup, and I'll be careful from now on when I update versions, hopefully they will be a path to avoid gotchas like this! Thanks P Please let me know if that helps. Thank you. -Jacob -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of phil cryer Sent: Thursday, January 13, 2011 9:07 AM To: gluster-users@gluster.org Subject: Re: [Gluster-users] Debian, 3.1.1, duplicate files So, I haven't heard anything back, so I just wanted to update this just in case anyone else comes across it. This was an old store that we created in 3.0.4, that kept getting duplicate files, basically we ran an update script that would use wget, try to download any files that were not present on the local box but were on the remote. Of course if it just downloaded the same file it would either 1) ignore it and not download it because it would see that we already have it 2) overwrite that file (clobber) with a new version of that file or 2) rewrite the file as file.1 so as not to mess with the original one (no-clobber) - but in fact it did none of these - so instead we ended up with the bizzare feature of having multiple/identical files in the same directory. Meanwhile we're also using far more space than we should have (~70TB instead of ~40TB or so) thanks to having directories like this: # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/ total 536436 drwxr-xr-x 2 www-data www-data 294912 Jan 13 10:05 . drwx-- 1016 www-data www-data 3846144 Dec 12 11:10 .. -rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010 tijdschriftvoore1951nede_djvu.txt -rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010 tijdschriftvoore1951nede_djvu.txt -rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010 tijdschriftvoore1951nede_djvu.xml -rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010 tijdschriftvoore1951nede_djvu.xml -rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010 tijdschriftvoore1951nede.gif -rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010 tijdschriftvoore1951nede.gif -rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010 tijdschriftvoore1951nede_jp2.zip -rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010 tijdschriftvoore1951nede_jp2.zip -rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010 tijdschriftvoore1951nede_marc.xml -rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010 tijdschriftvoore1951nede_marc.xml -rwxr-xr-x 1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x 1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x 1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml -rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml Ouch, right? So, I installed 3.1.1, that went well, I got it on all the drives and servers we had before, have a total capacity of 96TB again, good, all seems to be working, mounted the old directories and saw the same issue with the duplicate files and let it sit over night to see if it would notice this and try to fix things. Then we're seeing gluster logs saying things like: == glusterfs/mnt-glusterfs.log == [2011-01-13 11:46:23.2762] I [afr-common.c:662:afr_lookup_done] bhl-volume-replicate-55: entries are missing in lookup of /www/t/tijdschriftvoore1951nede. [2011-01-13 11:46:23.2817] I [afr-common.c:716:afr_lookup_done] bhl-volume-replicate-55: background meta-data data entry self-heal triggered. path
[Gluster-users] Running Storage Platform to admin normal GlusterFS instances?
I'm running GlusterFS 3.1.2 on some nodes, and they're all working. Can I now run the Gluster Storage Platform on another server and administrate those existing nodes with it, or do you have to have SP on all the servers; admin and nodes? Thanks P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.1.2 Debian - client_rpc_notify failed to get the port number for remote subvolume
On Fri, Feb 4, 2011 at 12:33 PM, Anand Avati anand.av...@gmail.com wrote: It is very likely the brick process is failing to start. Please look at the brick log on that server. (in /var/log/glusterfs/bricks/* ) Avati Thanks, so if I'm looking at it right, the 'bhl-volume-client-98' is really Brick98: clustr-02:/mnt/data17 - I'm figuring that from this: [2011-02-04 13:09:28.407300] I [client.c:1590:client_rpc_notify] bhl-volume-client-98: disconnected However, if I do a gluster volume info I see that it's listed: # gluster volume info | grep 98 Brick98: clustr-02:/mnt/data17 But on that server I don't see any issues with that brick starting: # head mnt-data17.log -n50 [2011-02-03 23:29:24.235648] W [graph.c:274:gf_add_cmdline_options] bhl-volume-server: adding option 'listen-port' for volume 'bhl-volume-server' with value '24025' [2011-02-03 23:29:24.236017] W [rpc-transport.c:566:validate_volume_options] tcp.bhl-volume-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction Given volfile: +--+ 1: volume bhl-volume-posix 2: type storage/posix 3: option directory /mnt/data17 4: end-volume 5: 6: volume bhl-volume-access-control 7: type features/access-control 8: subvolumes bhl-volume-posix 9: end-volume 10: 11: volume bhl-volume-locks 12: type features/locks 13: subvolumes bhl-volume-access-control 14: end-volume 15: 16: volume bhl-volume-io-threads 17: type performance/io-threads 18: subvolumes bhl-volume-locks 19: end-volume 20: 21: volume /mnt/data17 22: type debug/io-stats 23: subvolumes bhl-volume-io-threads 24: end-volume 25: 26: volume bhl-volume-server 27: type protocol/server 28: option transport-type tcp 29: option auth.addr./mnt/data17.allow * 30: subvolumes /mnt/data17 31: end-volume +--+ [2011-02-03 23:29:28.575630] I [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted client from 128.128.164.219:724 [2011-02-03 23:29:28.583169] I [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted client from 127.0.1.1:985 [2011-02-03 23:29:28.603357] I [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted client from 128.128.164.218:726 [2011-02-03 23:29:28.605650] I [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted client from 128.128.164.217:725 [2011-02-03 23:29:28.608033] I [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted client from 128.128.164.215:725 [2011-02-03 23:29:31.161985] I [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted client from 128.128.164.74:697 [2011-02-04 00:40:11.600314] I [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted client from 128.128.164.74:805 Plus, looking at the tail of this log, it's still working, latest messages (from 4 seconds before) as I'm moving some things on the cluster [2011-02-04 23:13:35.53685] W [server-resolve.c:565:server_resolve] bhl-volume-server: pure path resolution for /www/d/dasobstdertropen00schrrich (INODELK) [2011-02-04 23:13:35.57107] W [server-resolve.c:565:server_resolve] bhl-volume-server: pure path resolution for /www/d/dasobstdertropen00schrrich (SETXATTR) [2011-02-04 23:13:35.59699] W [server-resolve.c:565:server_resolve] bhl-volume-server: pure path resolution for /www/d/dasobstdertropen00schrrich (INODELK) Thanks! P On Fri, Feb 4, 2011 at 10:19 AM, phil cryer p...@cryer.us wrote: I have glusterfs 3.1.2 running on Debian, I'm able to start the volume and now mount it via mount -t gluster and I can see everything. I am still seeing the following error in /var/log/glusterfs/nfs.log [2011-02-04 13:09:16.404851] E [client-handshake.c:1079:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume [2011-02-04 13:09:16.404909] I [client.c:1590:client_rpc_notify] bhl-volume-client-98: disconnected [2011-02-04 13:09:20.405843] E [client-handshake.c:1079:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume [2011-02-04 13:09:20.405938] I [client.c:1590:client_rpc_notify] bhl-volume-client-98: disconnected [2011-02-04 13:09:24.406634] E [client-handshake.c:1079:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume [2011-02-04 13:09:24.406711] I [client.c:1590:client_rpc_notify] bhl-volume-client-98: disconnected [2011-02-04 13:09:28.407249] E [client-handshake.c:1079:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume [2011-02-04 13:09:28.407300] I [client.c:1590:client_rpc_notify] bhl-volume-client-98: disconnected However, if I do a gluster volume info I see that it's listed: # gluster volume
Re: [Gluster-users] df causes hang
This wasn't my issue, but I'm still having the issue. Today I purged glusterfs 3.1.1 and installed 3.1.2 fresh from deb. I recreated my volume, started it, everything was going fine, mounted the share, then ran df -h to see it, now every few seconds my logs posts this: == /var/log/glusterfs/nfs.log == [2011-02-03 15:55:57.145626] E [client-handshake.c:1079:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume [2011-02-03 15:55:57.145694] I [client.c:1590:client_rpc_notify] bhl-volume-client-98: disconnected == /var/log/glusterfs/mnt-glusterfs.log == [2011-02-03 15:55:57.605802] E [common-utils.c:124:gf_resolve_ip6] resolver: getaddrinfo failed (Name or service not known) [2011-02-03 15:55:57.605834] E [name.c:251:af_inet_client_get_remote_sockaddr] glusterfs: DNS resolution failed on host /etc/glusterfs/glusterfs.vol over and over. Any clues as to how I can fix this? This one issue has made our entire 100TB store unusable. and again, gluster volume info shows all the bricks are OK, including 98: gluster volume info Volume Name: bhl-volume Type: Distributed-Replicate Status: Started Number of Bricks: 72 x 2 = 144 Transport-type: tcp Bricks: [...] Brick92: clustr-02:/mnt/data16 Brick93: clustr-03:/mnt/data16 Brick94: clustr-04:/mnt/data16 Brick95: clustr-05:/mnt/data16 Brick96: clustr-06:/mnt/data16 Brick97: clustr-01:/mnt/data17 Brick98: clustr-02:/mnt/data17 Brick99: clustr-03:/mnt/data17 Brick100: clustr-04:/mnt/data17 Brick101: clustr-05:/mnt/data17 Brick102: clustr-06:/mnt/data17 Brick103: clustr-01:/mnt/data18 Brick104: clustr-02:/mnt/data18 Brick105: clustr-03:/mnt/data18 [...] P On Mon, Jan 31, 2011 at 4:26 PM, Anand Avati anand.av...@gmail.com wrote: Can you post your server logs? What happens if you run 'df -k' on your backend export filesystems? Thanks Avati On Mon, Jan 17, 2011 at 5:27 AM, Joe Warren-Meeks j...@encoretickets.co.ukwrote: (sorry about topposting.) Just changing the timeout would only mask the problem. The real issue is that running 'df' on either node causes a hang. All other operations seem fine, files can be created and deleted as normal with the results showing up on both. I'd like to work out why it's hanging on df so I can fix it and get my monitoring and cron scripts running again :) -- joe. -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Daniel Maher Sent: 17 January 2011 12:48 To: gluster-users@gluster.org Subject: Re: [Gluster-users] df causes hang On 01/17/2011 10:47 AM, Joe Warren-Meeks wrote: Hey chaps, Anyone got any pointers as to what this might be? This is still causing a lot of problems for us whenever we attempt to do df. -- joe. -Original Message- However, for some reason, they've got into a bit of a state such that typing 'df -k' causes both to hang, resulting in a loss of service for42 seconds. I see the following messages in the log files: 42 seconds is the default tcp timeout time for any given node - you could try tuning that down and seeing how it works for you. http://www.gluster.com/community/documentation/index.php/Gluster_3.1:_Se tting_Volume_Options -- Daniel Maher dma+gluster AT witbe DOT net ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] df causes hang
Avati - thanks for your reply, my comments below [name.c:251:af_inet_client_get_remote_sockaddr] glusterfs: DNS resolution failed on host /etc/glusterfs/glusterfs.vol Please make sure you are able to resolve hostnames as given in volume info in all of your servers via 'dig'. The logs clearly show that host resolution seems to be failing. Agreed, however that does seem to be the issue because I can dig the host (they're all defined in my hosts file too so it doesn't have to look them up) named clustr-02 and in fact there are 23 other 'bricks' on that host that are working fine: # gluster volume info | grep clustr-02 Brick2: clustr-02:/mnt/data01 Brick8: clustr-02:/mnt/data02 Brick14: clustr-02:/mnt/data03 Brick20: clustr-02:/mnt/data04 Brick26: clustr-02:/mnt/data05 Brick32: clustr-02:/mnt/data06 Brick38: clustr-02:/mnt/data07 Brick44: clustr-02:/mnt/data08 Brick50: clustr-02:/mnt/data09 Brick56: clustr-02:/mnt/data10 Brick62: clustr-02:/mnt/data11 Brick68: clustr-02:/mnt/data12 Brick74: clustr-02:/mnt/data13 Brick80: clustr-02:/mnt/data14 Brick86: clustr-02:/mnt/data15 Brick92: clustr-02:/mnt/data16 Brick98: clustr-02:/mnt/data17 Brick104: clustr-02:/mnt/data18 Brick110: clustr-02:/mnt/data19 Brick116: clustr-02:/mnt/data20 Brick122: clustr-02:/mnt/data21 Brick128: clustr-02:/mnt/data22 Brick134: clustr-02:/mnt/data23 Brick140: clustr-02:/mnt/data24 I logged into that host, unmounted that mount, ran fsck.ext4 on it, but it came back clean. Also thing, the log says: glusterfs: DNS resolution failed on host /etc/glusterfs/glusterfs.vol - however, there is obviously no host named /etc/glusterfs/glusterfs.vol - does this point to an issue? And lastly, I even have a file named /etc/glusterfs/glusterfs.vol ls -ls /etc/glusterfs -rw-r--r-- 1 root root 229 Jan 16 21:15 glusterd.vol -rw-r--r-- 1 root root 1908 Jan 16 21:15 glusterfsd.vol.sample -rw-r--r-- 1 root root 2005 Jan 16 21:15 glusterfs.vol.sample I created all of the configs via the gluster commandline tool. Thanks P On Thu, Feb 3, 2011 at 6:39 PM, Anand Avati anand.av...@gmail.com wrote: Please make sure you are able to resolve hostnames as given in volume info in all of your servers via 'dig'. The logs clearly show that host resolution seems to be failing. Avati On Thu, Feb 3, 2011 at 1:08 PM, phil cryer p...@cryer.us wrote: This wasn't my issue, but I'm still having the issue. Today I purged glusterfs 3.1.1 and installed 3.1.2 fresh from deb. I recreated my volume, started it, everything was going fine, mounted the share, then ran df -h to see it, now every few seconds my logs posts this: == /var/log/glusterfs/nfs.log == [2011-02-03 15:55:57.145626] E [client-handshake.c:1079:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume [2011-02-03 15:55:57.145694] I [client.c:1590:client_rpc_notify] bhl-volume-client-98: disconnected == /var/log/glusterfs/mnt-glusterfs.log == [2011-02-03 15:55:57.605802] E [common-utils.c:124:gf_resolve_ip6] resolver: getaddrinfo failed (Name or service not known) [2011-02-03 15:55:57.605834] E [name.c:251:af_inet_client_get_remote_sockaddr] glusterfs: DNS resolution failed on host /etc/glusterfs/glusterfs.vol over and over. Any clues as to how I can fix this? This one issue has made our entire 100TB store unusable. and again, gluster volume info shows all the bricks are OK, including 98: gluster volume info Volume Name: bhl-volume Type: Distributed-Replicate Status: Started Number of Bricks: 72 x 2 = 144 Transport-type: tcp Bricks: [...] Brick92: clustr-02:/mnt/data16 Brick93: clustr-03:/mnt/data16 Brick94: clustr-04:/mnt/data16 Brick95: clustr-05:/mnt/data16 Brick96: clustr-06:/mnt/data16 Brick97: clustr-01:/mnt/data17 Brick98: clustr-02:/mnt/data17 Brick99: clustr-03:/mnt/data17 Brick100: clustr-04:/mnt/data17 Brick101: clustr-05:/mnt/data17 Brick102: clustr-06:/mnt/data17 Brick103: clustr-01:/mnt/data18 Brick104: clustr-02:/mnt/data18 Brick105: clustr-03:/mnt/data18 [...] P On Mon, Jan 31, 2011 at 4:26 PM, Anand Avati anand.av...@gmail.com wrote: Can you post your server logs? What happens if you run 'df -k' on your backend export filesystems? Thanks Avati On Mon, Jan 17, 2011 at 5:27 AM, Joe Warren-Meeks j...@encoretickets.co.ukwrote: (sorry about topposting.) Just changing the timeout would only mask the problem. The real issue is that running 'df' on either node causes a hang. All other operations seem fine, files can be created and deleted as normal with the results showing up on both. I'd like to work out why it's hanging on df so I can fix it and get my monitoring and cron scripts running again :) -- joe. -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Daniel Maher Sent: 17 January 2011 12:48 To: gluster-users@gluster.org Subject: Re: [Gluster-users] df
[Gluster-users] 3.1 Can't mount glusterfs mountpoint - Address already in use
I have a problem with Gluster 3.1.1 (debian) that I didn't have a few weeks ago. I can run glusterd on my 6 servers, I can see all 6 of them from the main one, but the logs keep complaining about: == /var/log/glusterfs/nfs.log == [2011-01-31 14:46:49.157527] E [client-handshake.c:1067:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume [2011-01-31 14:46:53.158708] E [client-handshake.c:1067:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume ^[ [2011-01-31 14:46:57.159754] E [client-handshake.c:1067:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume [2011-01-31 14:47:01.160804] E [client-handshake.c:1067:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume Then, if I try to mount the glusterfs I get the following errors in glusterfs.log == /var/log/glusterfs/mnt-glusterfs.log == [2011-01-31 14:46:48.720968] I [glusterd.c:275:init] management: Using /etc/glusterd as working directory [2011-01-31 14:46:48.721514] E [socket.c:322:__socket_server_bind] socket.management: binding to failed: Address already in use [2011-01-31 14:46:48.721535] E [socket.c:325:__socket_server_bind] socket.management: Port is already in use [2011-01-31 14:46:48.721561] E [glusterd.c:348:init] management: creation of listener failed [2011-01-31 14:46:48.721577] E [xlator.c:909:xlator_init] management: Initialization of volume 'management' failed, review your volfile again [2011-01-31 14:46:48.721595] E [graph.c:331:glusterfs_graph_init] management: initializing translator failed [2011-01-31 14:46:48.721635] I [fuse-bridge.c:3616:fini] fuse: Unmounting '/mnt/glusterfs'. [2011-01-31 14:46:48.823199] I [glusterfsd.c:672:cleanup_and_exit] glusterfsd: shutting down But I don't see how it's already in use; I've made sure everything was stopped/killed on the main server, but when I restart everything fresh the above happens. So, are these two related, or how can I debug the mounting issue? incidentally, if I do a volume info everything seems to be mounted and ok: # gluster gluster volume info bhl-volume Volume Name: bhl-volume Type: Distributed-Replicate Status: Started Number of Bricks: 72 x 2 = 144 Transport-type: tcp Bricks: Brick1: clustr-01:/mnt/data01 Brick2: clustr-02:/mnt/data01 Brick3: clustr-03:/mnt/data01 Brick4: clustr-04:/mnt/data01 Brick5: clustr-05:/mnt/data01 Brick6: clustr-06:/mnt/data01 Brick7: clustr-01:/mnt/data02 Brick8: clustr-02:/mnt/data02 Brick9: clustr-03:/mnt/data02 Brick10: clustr-04:/mnt/data02 Brick11: clustr-05:/mnt/data02 Brick12: clustr-06:/mnt/data02 Brick13: clustr-01:/mnt/data03 Brick14: clustr-02:/mnt/data03 Brick15: clustr-03:/mnt/data03 Brick16: clustr-04:/mnt/data03 Brick17: clustr-05:/mnt/data03 Brick18: clustr-06:/mnt/data03 Brick19: clustr-01:/mnt/data04 Brick20: clustr-02:/mnt/data04 Brick21: clustr-03:/mnt/data04 Brick22: clustr-04:/mnt/data04 Brick23: clustr-05:/mnt/data04 Brick24: clustr-06:/mnt/data04 Brick25: clustr-01:/mnt/data05 Brick26: clustr-02:/mnt/data05 Brick27: clustr-03:/mnt/data05 Brick28: clustr-04:/mnt/data05 Brick29: clustr-05:/mnt/data05 Brick30: clustr-06:/mnt/data05 Brick31: clustr-01:/mnt/data06 Brick32: clustr-02:/mnt/data06 Brick33: clustr-03:/mnt/data06 Brick34: clustr-04:/mnt/data06 Brick35: clustr-05:/mnt/data06 Brick36: clustr-06:/mnt/data06 Brick37: clustr-01:/mnt/data07 Brick38: clustr-02:/mnt/data07 Brick39: clustr-03:/mnt/data07 Brick40: clustr-04:/mnt/data07 Brick41: clustr-05:/mnt/data07 Brick42: clustr-06:/mnt/data07 Brick43: clustr-01:/mnt/data08 Brick44: clustr-02:/mnt/data08 Brick45: clustr-03:/mnt/data08 Brick46: clustr-04:/mnt/data08 Brick47: clustr-05:/mnt/data08 Brick48: clustr-06:/mnt/data08 Brick49: clustr-01:/mnt/data09 Brick50: clustr-02:/mnt/data09 Brick51: clustr-03:/mnt/data09 Brick52: clustr-04:/mnt/data09 Brick53: clustr-05:/mnt/data09 Brick54: clustr-06:/mnt/data09 Brick55: clustr-01:/mnt/data10 Brick56: clustr-02:/mnt/data10 Brick57: clustr-03:/mnt/data10 Brick58: clustr-04:/mnt/data10 Brick59: clustr-05:/mnt/data10 Brick60: clustr-06:/mnt/data10 Brick61: clustr-01:/mnt/data11 Brick62: clustr-02:/mnt/data11 Brick63: clustr-03:/mnt/data11 Brick64: clustr-04:/mnt/data11 Brick65: clustr-05:/mnt/data11 Brick66: clustr-06:/mnt/data11 Brick67: clustr-01:/mnt/data12 Brick68: clustr-02:/mnt/data12 Brick69: clustr-03:/mnt/data12 Brick70: clustr-04:/mnt/data12 Brick71: clustr-05:/mnt/data12 Brick72: clustr-06:/mnt/data12 Brick73: clustr-01:/mnt/data13 Brick74: clustr-02:/mnt/data13 Brick75: clustr-03:/mnt/data13 Brick76: clustr-04:/mnt/data13 Brick77: clustr-05:/mnt/data13 Brick78: clustr-06:/mnt/data13 Brick79: clustr-01:/mnt/data14 Brick80: clustr-02:/mnt/data14 Brick81: clustr-03:/mnt/data14 Brick82: clustr-04:/mnt/data14 Brick83: clustr-05:/mnt/data14 Brick84: clustr-06:/mnt/data14 Brick85: clustr-01:/mnt/data15 Brick86: clustr-02:/mnt/data15 Brick87:
Re: [Gluster-users] Debian, 3.1.1, duplicate files
tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x1 www-data www-data546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x1 www-data www-data546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x1 www-data www-data257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml -rwxr-xr-x1 www-data www-data257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml but, this allows us to do (in my opinion) scary things like this: # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/*_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml # rm /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/*_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml eek! so it only removed one of the files, even though they both had the same name. At this point we're going to wipe all 70TB and re-transfer, hoping it stops when it gets all the files and doesn't start writing the files with the same names as before. Anyone with advice or insight into this issue? Would love to learn why it did this, and REALLY hope it doesn't do it again. Thanks P On Wed, Jan 12, 2011 at 2:37 PM, phil cryer p...@cryer.us wrote: I'm now running gluster 3.1.1 on Debian. A directory that was running under 3.0.4 had duplicate files, but I've remounted things now that we're running 3.1.1 in hopes it would fix things, but so far it has not: # ls -l /mnt/glusterfs/www/0/0descriptionofta581unittotal 37992 -rwxr-xr-x 1 www-data www-data 796343 Jun 23 2010 0descriptionofta581unit_bw.pdf -rwxr-xr-x 1 www-data www-data 796343 Jun 23 2010 0descriptionofta581unit_bw.pdf -T 1 root root 1497 Jun 24 2010 0descriptionofta581unit_dc.xml -T 1 root root 1497 Jun 24 2010 0descriptionofta581unit_dc.xml -T 1 www-data www-data 577050 Jun 24 2010 0descriptionofta581unit.djvu -T 1 www-data www-data 577050 Jun 24 2010 0descriptionofta581unit.djvu -rwxr-xr-x 1 www-data www-data 33272 Jun 22 2010 0descriptionofta581unit_djvu.txt -rwxr-xr-x 1 www-data www-data 33272 Jun 22 2010 0descriptionofta581unit_djvu.txt -rwxr-xr-x 1 www-data www-data 4445 Jun 23 2010 0descriptionofta581unit_files.xml -rwxr-xr-x 1 www-data www-data 4445 Jun 23 2010 0descriptionofta581unit_files.xml -rwxr-xr-x 1 www-data www-data 5011 Jun 22 2010 0descriptionofta581unit_marc.xml -rwxr-xr-x 1 www-data www-data 5011 Jun 22 2010 0descriptionofta581unit_marc.xml -rwxr-xr-x 1 www-data www-data 360 Jun 23 2010 0descriptionofta581unit_metasource.xml -rwxr-xr-x 1 www-data www-data 360 Jun 23 2010 0descriptionofta581unit_metasource.xml -rwxr-xr-x 1 www-data www-data 2848 Jun 22 2010 0descriptionofta581unit_meta.xml -rwxr-xr-x 1 www-data www-data 2848 Jun 22 2010 0descriptionofta581unit_meta.xml -rwxr-xr-x 1 www-data www-data 16916480 Jun 22 2010 0descriptionofta581unit_orig_jp2.tar -rwxr-xr-x 1 www-data www-data 16916480 Jun 22 2010 0descriptionofta581unit_orig_jp2.tar -rwxr-xr-x 1 www-data www-data 1051810 Jun 22 2010 0descriptionofta581unit.pdf -rwxr-xr-x 1 www-data www-data 1051810 Jun 22 2010 0descriptionofta581unit.pdf While running the latest, 3.1.1, I noticed some log files that said: [..] [2011-01-12 15:24:33.325546] I [afr-common.c:613:afr_lookup_self_heal_check] bhl-volume-replicate-69: size differs for /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu [2011-01-12 15:24:33.325558] I [afr-common.c:716:afr_lookup_done] bhl-volume-replicate-69: background meta-data data self-heal triggered. path: /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu [2011-01-12 15:24:33.364501] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] bhl-volume-replicate-66: background meta-data data self-heal completed on /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu [2011-01-12 15:24:33.364881] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] bhl-volume-replicate-69: background meta-data data self-heal completed on /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu I assumed it was fixing that, but it didn't. Here's the full logs that include all the gluster.log work it did in this directory: http://pastebin.com/8X52Em7Y Question: how can I 'fix' this, or is the best
[Gluster-users] what does the permission T mean? -rwx-----T
I have a file that looks like this, what does T tell me in terms of permissions and glusterfs? -rwx-T 1 root root 3414 Oct 22 15:27 reportr2.sh P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Debian, 3.1.1, duplicate files
I'm now running gluster 3.1.1 on Debian. A directory that was running under 3.0.4 had duplicate files, but I've remounted things now that we're running 3.1.1 in hopes it would fix things, but so far it has not: # ls -l /mnt/glusterfs/www/0/0descriptionofta581unittotal 37992 -rwxr-xr-x 1 www-data www-data 796343 Jun 23 2010 0descriptionofta581unit_bw.pdf -rwxr-xr-x 1 www-data www-data 796343 Jun 23 2010 0descriptionofta581unit_bw.pdf -T 1 root root 1497 Jun 24 2010 0descriptionofta581unit_dc.xml -T 1 root root 1497 Jun 24 2010 0descriptionofta581unit_dc.xml -T 1 www-data www-data 577050 Jun 24 2010 0descriptionofta581unit.djvu -T 1 www-data www-data 577050 Jun 24 2010 0descriptionofta581unit.djvu -rwxr-xr-x 1 www-data www-data33272 Jun 22 2010 0descriptionofta581unit_djvu.txt -rwxr-xr-x 1 www-data www-data33272 Jun 22 2010 0descriptionofta581unit_djvu.txt -rwxr-xr-x 1 www-data www-data 4445 Jun 23 2010 0descriptionofta581unit_files.xml -rwxr-xr-x 1 www-data www-data 4445 Jun 23 2010 0descriptionofta581unit_files.xml -rwxr-xr-x 1 www-data www-data 5011 Jun 22 2010 0descriptionofta581unit_marc.xml -rwxr-xr-x 1 www-data www-data 5011 Jun 22 2010 0descriptionofta581unit_marc.xml -rwxr-xr-x 1 www-data www-data 360 Jun 23 2010 0descriptionofta581unit_metasource.xml -rwxr-xr-x 1 www-data www-data 360 Jun 23 2010 0descriptionofta581unit_metasource.xml -rwxr-xr-x 1 www-data www-data 2848 Jun 22 2010 0descriptionofta581unit_meta.xml -rwxr-xr-x 1 www-data www-data 2848 Jun 22 2010 0descriptionofta581unit_meta.xml -rwxr-xr-x 1 www-data www-data 16916480 Jun 22 2010 0descriptionofta581unit_orig_jp2.tar -rwxr-xr-x 1 www-data www-data 16916480 Jun 22 2010 0descriptionofta581unit_orig_jp2.tar -rwxr-xr-x 1 www-data www-data 1051810 Jun 22 2010 0descriptionofta581unit.pdf -rwxr-xr-x 1 www-data www-data 1051810 Jun 22 2010 0descriptionofta581unit.pdf While running the latest, 3.1.1, I noticed some log files that said: [..] [2011-01-12 15:24:33.325546] I [afr-common.c:613:afr_lookup_self_heal_check] bhl-volume-replicate-69: size differs for /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu [2011-01-12 15:24:33.325558] I [afr-common.c:716:afr_lookup_done] bhl-volume-replicate-69: background meta-data data self-heal triggered. path: /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu [2011-01-12 15:24:33.364501] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] bhl-volume-replicate-66: background meta-data data self-heal completed on /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu [2011-01-12 15:24:33.364881] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] bhl-volume-replicate-69: background meta-data data self-heal completed on /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu I assumed it was fixing that, but it didn't. Here's the full logs that include all the gluster.log work it did in this directory: http://pastebin.com/8X52Em7Y Question: how can I 'fix' this, or is the best bet to remove everything and start over? It's going to set us back, but I'd rather do it now that keep banging on this without any resolution. Thanks for the help, really like the new gluster command, very nice! P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Hardware advice?
On Fri, Sep 24, 2010 at 8:30 AM, Janne Aho ja...@citynetwork.se wrote: Hi, We are looking into setting up a glusterfs cluster to store VM images (for KVM and VMware). Usually we have machines from Dell, but we haven't found any good machine to use which allows a good amount of disk space and possibility to have at least 4 NICs (we are thinking about using 10 gigabit network, otherwise we need to bond and use more NICs). Sure we could buy just off the shelf stuff to keep the cost down, but we are looking for having a good hardware support (to be sure if something breaks down, that we will get spare parts). Does anyone here have suggestion on hardware that can do the following: 1. having iDrac or similar (remote access to console) 2. at least 4 NICs which can be 10 gigabit (this for redundancy). 3. have an architecture which is supported by gluster (with other words no mc68k). 4. having space enough for a good amount of disks or jbod that can be connected to the machine (please no suggestion on Promise jbods). 5. It has to be rack mounted We currently have six servers setup like this in our Gluster cluster: http://philcryer.com/wiki/doku.php?id=building_steam_from_a_grain_of_salt_-_redux We have more details if you need them, so far, so good is our experience! P If suggesting something else than Dell, please give some price indication on the hardware, don't care if it's accurate or not, just that I get some understanding if it's something that can fit our budget. Thanks in advance for your replies. -- Janne Aho (Developer) | City Network Hosting AB - www.citynetwork.se Phone: +46 455 690022 | Cell: +46 733 312775 EMail/MSN: ja...@citynetwork.se ICQ: 567311547 | Skype: janne_mz | AIM: janne4cn | Gadu: 16275665 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Cannot remove directory - Directory not empty
[r...@3d13 ~]# rm -rfv /flock/proj/tele2_holland/rnd/comp/010/v003 rm: cannot remove directory `/flock/proj/tele2_holland/rnd/comp/010/v003': Directory not empty When I had this issue it was because I modified the files outside of glusterfs - so for example, when gluster was not running, I moved/modified files. I believe you have to run the scale-n-defrag.sh script that you'll find in the contrib directory of the gluster source. P On Mon, Sep 20, 2010 at 4:49 AM, Thomas Ericsson thomas.erics...@fido.se wrote: I can not delete quite a few directories on our glusterfs mounts. The error is Directory not empty. A listing shows no files in the directory, however if i do a listing on the brick volumes some of them show files. Any idea as of how this can happen and how to remove the directories. Would it be safe to remove the invisible files straight from the brick volume? Best regards Thomas From a glusterfs client [r...@3d13 ~]# ls -lai /flock/proj/tele2_holland/rnd/comp/010/v003/ total 0 38939716797 drwxrwxr-x 2 poal FidoUsers 162 Sep 16 09:15 . 60331700537 drwxrwxr-x 5 poal FidoUsers 536 Sep 4 01:24 .. [r...@3d13 ~]# rm -rfv /flock/proj/tele2_holland/rnd/comp/010/v003 rm: cannot remove directory `/flock/proj/tele2_holland/rnd/comp/010/v003': Directory not empty From a glusterfsd brick flock01 ~ # ls -lai /node04/storage/proj/tele2_holland/rnd/comp/010/v003/ total 0 305414438 drwxrwxr-x 2 1038 fido_user 57 Sep 16 09:15 . 7541462567 drwxrwxr-x 5 1038 fido_user 61 Jul 7 09:44 .. 305414403 -T 1 root root 0 Sep 4 01:24 tele2_holland_010_comp_v003.0031.exr From another glusterfsd brick flock04 ~ # ls -lai /node03/storage/proj/tele2_holland/rnd/comp/010/v003/ total 0 4861583534 drwxrwxr-x 2 1038 500 57 Sep 16 09:15 . 280040615 drwxrwxr-x 5 1038 500 61 Jul 7 09:44 .. 4861671820 -T 1 root root 0 Sep 4 01:24 tele2_holland_010_comp_v003.0007.exr -- Server and clients are vesion 2.0.8 with FUSE 2.7.4 Server config flock04 ~ # cat /usr/local/etc/glusterfs/glusterfs.server volume posix01 type storage/posix option directory /node01/storage end-volume volume locks01 type features/locks subvolumes posix01 end-volume volume brick01 type performance/io-threads option thread-count 2 subvolumes locks01 end-volume volume posix02 type storage/posix option directory /node02/storage end-volume volume locks02 type features/locks subvolumes posix02 end-volume volume brick02 type performance/io-threads option thread-count 2 subvolumes locks02 end-volume volume posix03 type storage/posix option directory /node03/storage end-volume volume locks03 type features/locks subvolumes posix03 end-volume volume brick03 type performance/io-threads option thread-count 32 subvolumes locks03 end-volume volume posix04 type storage/posix option directory /node04/storage end-volume volume locks04 type features/locks subvolumes posix04 end-volume volume brick04 type performance/io-threads option thread-count 32 subvolumes locks04 end-volume volume server type protocol/server option transport-type ib-verbs/server option auth.addr.brick01.allow * option auth.addr.brick02.allow * option auth.addr.brick03.allow * option auth.addr.brick04.allow * subvolumes brick01 brick02 brick03 brick04 end-volume volume tcp_server type protocol/server option transport-type tcp/server option transport.socket.nodelay on option auth.addr.brick01.allow * option auth.addr.brick02.allow * option auth.addr.brick03.allow * option auth.addr.brick04.allow * subvolumes brick01 brick02 brick03 brick04 end-volume Client config volume remote01 type protocol/client option transport-type ib-verbs/client option remote-host flock01 option remote-subvolume brick03 end-volume volume remote02 type protocol/client option transport-type ib-verbs/client option remote-host flock01 option remote-subvolume brick04 end-volume volume remote03 type protocol/client option transport-type ib-verbs/client option remote-host flock03 option remote-subvolume brick03 end-volume volume remote04 type protocol/client option transport-type ib-verbs/client option remote-host flock03 option remote-subvolume brick04 end-volume volume remote05 type protocol/client option transport-type ib-verbs/client option remote-host flock04 option remote-subvolume brick03 end-volume volume remote06 type protocol/client option transport-type ib-verbs/client option remote-host flock04 option remote-subvolume brick04 end-volume volume remote07 type protocol/client option transport-type ib-verbs/client option remote-host flock08 option remote-subvolume brick03 end-volume volume remote08 type protocol/client option transport-type ib-verbs/client option remote-host flock08 option
Re: [Gluster-users] Shared web hosting with GlusterFS and inotify
We're interested in this as well, as we will be serving our docroot from a GlusterFS share. Have you tried nginx? I have not tested this, but after your benchmarks it sounds like I need to. Your inotifiy script looks like it would work, but it wouldn't for me; we use GlusterFS so we can store ~70TB of data, which we can't copy to the regular filesystem. This is a big risk indeed - can you share your benchmarking method? Did you simply use ab? Thanks for the heads up P On Wed, Sep 15, 2010 at 9:58 AM, Emile Heitor emile.hei...@nbs-system.com wrote: Hi list, For a couple of weeks, we're experimenting a web hosting system based on GlusterFS in order to share customers documentroots between more-than-one machine. Involved hardware and software are : Two servers composed of 2x Intel 5650 (i.e. 2x12 cores @2,6Ghz), 24GB DDR3 RAM, 146GB SAS disks / RAID 1 Both servers running 64bits Debian Lenny GNU/Linux with GlusterFS 3.0.5 The web server is Apache 2.2, the application is a huge PHP/MySQL monster. For our first naive tests were using the glusterfs mountpoint as apache's documentroot. In short, performances were catastrophic. A single of these servers, without GlusterFS, is capable of handling about 170 pages per second with 100 concurrent users. The same server, with apache documentroot being a gluster mountpoint, drops to 5 PPS for 20 CU and just stops responding for 40+. We tried a lot of tips (quick-read, io-threads, io-cache, thread-count, timeouts...) we read on this very mailing list, various websites, or experiences on our own, we never got better than 10 PPS / 20 users. So we took another approach: instead of declaring gluster mountpoint as the documentroot, we declared the local storage, but of course, without any modification, this would lead to inconsistencies if by any chance apache writes something (.htaccess, tmp file, log...). And so enters inotify. Using inotify-tools's inotifywait, we have this little script watching for local documentroot modifications, duplicating them to the glusterfs share. The infinite loop is avoided by a md5 comparison. Here a very early proof of concept : #!/bin/sh [ $# -lt 2 ] echo usage: $0source destination exit 1 PATH=${PATH}:/bin:/sbin:/usr/bin:/usr/sbin; export PATH SRC=$1 DST=$2 cd ${SRC} # no recursion RSYNC='rsync -dlptgoD --delete ${srcdir} ${dstdir}/' inotifywait -mr \ --exclude \..*\.sw.* \ -e close_write -e create -e delete_self -e delete . | \ while read dir action file do srcdir=${SRC}/${dir} dstdir=${DST}/${dir} [ -d ${srcdir} ] \ [ ! -z `df -T \${srcdir}\|grep tmpfs` ] \ continue # debug echo ${dir} ${action} ${file} case ${action} in CLOSE_WRITE,CLOSE) [ ! -f ${dstdir}/${file} ] eval ${RSYNC} continue md5src=`md5sum \${srcdir}/${file}\|cut -d' ' -f1` md5dst=`md5sum \${dstdir}/${file}\|cut -d' ' -f1` [ ! $md5src == $md5dst ] eval ${RSYNC} ;; CREATE,ISDIR) [ ! -d ${dstdir}/${file} ] eval ${RSYNC} ;; DELETE|DELETE,ISDIR) eval ${RSYNC} ;; esac done As for now a gluster mountpoint is barely unusable as an Apache DocumentRoot for us (and yes, with htaccess disabled), i'd like to have the list's point of view on this approach. Do you see any terrible glitch ? Thanks in advance, -- Emile Heitor, Responsable d'Exploitation --- www.nbs-system.com, 140 Bd Haussmann, 75008 Paris Tel: 01.58.56.60.80 / Fax: 01.58.56.60.81 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Adding volumes - How redistribute existing data
If a drive dies and you want to repopulate its replacement with `ls -R /mnt/glusterfs` is it necessary to have the options set too, or is this specific to the scale-n-defrag.sh script? P On Thu, Jul 29, 2010 at 12:22 AM, Amar Tumballi a...@gluster.com wrote: Hi Michael, Sorry for the confusion on 'scale-n-defrag.sh' script. To make sure the script does defrag, you need to have two options set in distribute volume. 'option unhashed-sticky-bit on' 'option lookup-unhashed on' Without these options it will not move the data files in backend. If you don't want to bring down the current mount point to run the defrag, you can have another mount point with changed volume file, and run defrag over it. Let us know if you have any more questions regarding defrag process. Regards, Amar On Wed, Jul 28, 2010 at 9:37 PM, Moore, Michael michael.mo...@lifetech.comwrote: Hi, I am trying to add several new backend volumes to an existing GlusterFS setup. I am running GlusterFS 3.0.4 using the distribute translator. I've tried running the scale-n-defrag.sh script to redistribute the data across the additional volumes, but after running for a significant time, nothing was significantly redistributed. What are the proper steps to do to redistribute the data? Do I need to clean up the links GlusterFS makes on the backends before I run scale-n-defrag? I am running GlusterFS 3.0.4 on top of CentOS 5.4. This is not running GlusterSP. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Kernel panic when populating cluster
I'm populating my 6 node cluster, running glusterfs 3.0.4, by copying files into /mnt/glusterfs, the gluster mounted filesystem. I had one machine go down with a kernel panic last week, but I wasn't able to see the error (it's a remote server) so we just restarted and went along. I was running 4 instances, all writing to the /mnt/gluster directory, again today and saw the following error in the logs. I've stopped and restarted my processes, this time just running two of them, and I'm not seeing the error. Obviously this is taking much longer to populate the cluster, could I have overloaded it by having four shell scripts copying files into the mount? What does this error mean, and is my method the proper way to populate a 6 node cluster with 50TB capacity? Thanks P == /var/log/syslog == Jul 20 23:49:54 clustr-01 kernel: [794473.515204] INFO: task cp:6706 blocked for more than 120 seconds. Jul 20 23:49:54 clustr-01 kernel: [794473.515235] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. Jul 20 23:49:54 clustr-01 kernel: [794473.515292] cpD 880143d955c0 0 6706 1 0x0004 Jul 20 23:49:54 clustr-01 kernel: [794473.515296] 88013ce55bd0 0046 88005d4afbc8 88005d4afbc4 Jul 20 23:49:54 clustr-01 kernel: [794473.515300] 000e 0096 f8a0 88005d4affd8 Jul 20 23:49:54 clustr-01 kernel: [794473.515303] 000155c0 000155c0 88023e35e2e0 88023e35e5d8 Jul 20 23:49:54 clustr-01 kernel: [794473.515306] Call Trace: Jul 20 23:49:54 clustr-01 kernel: [794473.515315] [a0213a99] ? fuse_request_send+0x196/0x249 [fuse] Jul 20 23:49:54 clustr-01 kernel: [794473.515319] [81064a56] ? autoremove_wake_function+0x0/0x2e Jul 20 23:49:54 clustr-01 kernel: [794473.515324] [a0218086] ? fuse_flush+0xca/0xfe [fuse] Jul 20 23:49:54 clustr-01 kernel: [794473.515328] [810eb90e] ? filp_close+0x37/0x62 Jul 20 23:49:54 clustr-01 kernel: [794473.515332] [8104f710] ? put_files_struct+0x64/0xc1 Jul 20 23:49:54 clustr-01 kernel: [794473.515335] [81050fb2] ? do_exit+0x225/0x6b5 Jul 20 23:49:54 clustr-01 kernel: [794473.515337] [810514b8] ? do_group_exit+0x76/0x9d Jul 20 23:49:54 clustr-01 kernel: [794473.515341] [8105dc50] ? get_signal_to_deliver+0x310/0x33c Jul 20 23:49:54 clustr-01 kernel: [794473.515353] [8101002f] ? do_notify_resume+0x87/0x73f Jul 20 23:49:54 clustr-01 kernel: [794473.515357] [810cb774] ? handle_mm_fault+0x2f7/0x7a5 Jul 20 23:49:54 clustr-01 kernel: [794473.515361] [810eddd6] ? vfs_read+0xa6/0xff Jul 20 23:49:54 clustr-01 kernel: [794473.515363] [81010e0e] ? int_signal+0x12/0x17 -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] scale-n-defrag of 50TB across a 6 node cluster
Our cluster was out of balance as we had two servers running glusterfs under a RAID1 setup, and only after those two servers were full did we add the additional four to our group. Now we're running the scale-n-defrag.sh script across all 50TB of data across the six node cluster. So we continue to get closer to having all of the data balanced across the 6 nodes on the cluster, though it seems to be going very slowly overall - this process has been running for more than a week now. Looking at the networking graph on this page shows that it's still working, and passing data across the network. http://whbhl01.ubio.org/ganglia/?m=load_oner=hours=descendingc=Woods+Holeh=sh=1hc=3z=medium Looking at the servers' disk usage on the command line we see that the data in indeed being equally distributed across all 24 mounts on each node. While not being able to get an update from the gluster process, we can see by physically looking at the disk usage that: 1 - done balancing 2 - done balancing 3 - done balancing 4 - beginning balancing 5 - beginning balancing 6 - about 1/2 complete balancing This makes sense, since 4/5 were the first two servers, and were the full ones, and the most out of sync with the others. It seems like 1/2/3 and most of 6 have gotten the majority of the balancing complete. Does this sound normal? Also, would it cause the process to run longer if we started moving files around in their directories on the nodes? (we need to move the files to a shared docroot so they can be served via HTTP). I realize now that the best way to build this cluster would have been to have the entire cluster up and running, and then load the data, but since over 50TB needed to be transfered to the cluster over the Internet, we thought starting sooner and adding nodes as we grew was the best way to proceed. Also, does anyone have configuration suggtions for serving static files for websites from glusterfs? Either as far as configuration of the .vol files, or the architecture of how the servers are laid out: I'm thinking of two ways: Internet - SERVER 1 (www server with glusterfs client running) using /mnt/glusterfs/www as the docroot - or - Internet - SERVER1 (www server) - CLUSTER1 (www server with glusterfs server and client running) using /mnt/glusterfs/www as the docroot P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Transport endpoint is not connected - getfattr
So I'm working on this today, maybe I can simplify my issue: in Glusterfs 3.0.4, I can create files and directories fine, I can delete files, but not directories. I'm running the server in DEBUG and it's not saying anything. For example, I want to delete /mnt/glusterfs/www/new : [23:12:03] [r...@clustr-01 /mnt]# mount -t glusterfs /etc/glusterfs/glusterfs.vol /mnt/glusterfs -o log-level=DEBUG [23:12:13] [r...@clustr-01 /mnt]# ls -al /mnt/glusterfs/www/ | grep new drwxrwxrwx 3 www-data www-data 196608 2010-06-18 23:10 new [23:12:26] [r...@clustr-01 /mnt]# rm -rf /mnt/glusterfs/www/new/ rm: cannot remove directory `/mnt/glusterfs/www/new/__MACOSX': Transport endpoint is not connected I'm running glusterfsd in another window in DEBUG, and it doesn't log anything when this happens. So I've already deleted the files in that directory, I just can't remove the two remaining directories, new and __MACOSX. Again, I created these yesterday so I haven't made any config changes between then and now, how can I figure out why this is failing? Thanks P On Thu, Jun 17, 2010 at 4:23 PM, phil cryer p...@cryer.us wrote: I'm having problems removing directories, if I do a mv or if I do a rm I'll get an error like this: [00:57:57] [r...@clustr-01 /]# rm -rf /mnt/glusterfs/bhl/ rm: cannot remove directory `/mnt/glusterfs/bhl': Transport endpoint is not connected EdWyse on IRC suggested I run getfattr -m on a few bricks, when I did I got various results (see below). Is this a case where I can run something like backend-cleanup.sh or backend-xattr-sanitize.sh to fix, or is there a manual command? We're around 45TB, so I don't have anywhere to copy the files off. Thanks! [16:30:25] [r...@clustr-04 /root/bin]# getfattr -m /mnt/data04 getfattr: Removing leading '/' from absolute path names # file: mnt/data04 trusted.afr.clustr-04-1 trusted.afr.clustr-04-10 trusted.afr.clustr-04-11 trusted.afr.clustr-04-12 trusted.afr.clustr-04-13 trusted.afr.clustr-04-14 trusted.afr.clustr-04-15 trusted.afr.clustr-04-16 trusted.afr.clustr-04-17 trusted.afr.clustr-04-18 trusted.afr.clustr-04-19 trusted.afr.clustr-04-2 trusted.afr.clustr-04-20 trusted.afr.clustr-04-21 trusted.afr.clustr-04-22 trusted.afr.clustr-04-23 trusted.afr.clustr-04-24 trusted.afr.clustr-04-3 trusted.afr.clustr-04-4 trusted.afr.clustr-04-5 trusted.afr.clustr-04-6 trusted.afr.clustr-04-7 trusted.afr.clustr-04-8 trusted.afr.clustr-04-9 trusted.afr.clustr-05-1 trusted.afr.clustr-05-10 trusted.afr.clustr-05-11 trusted.afr.clustr-05-12 trusted.afr.clustr-05-13 trusted.afr.clustr-05-14 trusted.afr.clustr-05-15 trusted.afr.clustr-05-16 trusted.afr.clustr-05-17 trusted.afr.clustr-05-18 trusted.afr.clustr-05-19 trusted.afr.clustr-05-2 trusted.afr.clustr-05-20 trusted.afr.clustr-05-21 trusted.afr.clustr-05-22 trusted.afr.clustr-05-23 trusted.afr.clustr-05-24 trusted.afr.clustr-05-3 trusted.afr.clustr-05-4 trusted.afr.clustr-05-5 trusted.afr.clustr-05-6 trusted.afr.clustr-05-7 trusted.afr.clustr-05-8 trusted.afr.clustr-05-9 trusted.glusterfs.dht trusted.posix4.gen --- Another server [01:02:05] [r...@clustr-01 /]# getfattr -m /mnt/data09 getfattr: Removing leading '/' from absolute path names # file: mnt/data09 trusted.afr.clustr-01-10 trusted.afr.clustr-01-9 trusted.glusterfs.dht trusted.glusterfs.test trusted.posix9.gen [00:43:14] [r...@clustr-01 /]# getfattr -m /mnt/data04 getfattr: Removing leading '/' from absolute path names # file: mnt/data04 trusted.afr.clustr-01-3 trusted.afr.clustr-01-4 trusted.glusterfs.dht trusted.glusterfs.test trusted.posix4.gen -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Transport endpoint is not connected - getfattr
I'm having problems removing directories, if I do a mv or if I do a rm I'll get an error like this: [00:57:57] [r...@clustr-01 /]# rm -rf /mnt/glusterfs/bhl/ rm: cannot remove directory `/mnt/glusterfs/bhl': Transport endpoint is not connected EdWyse on IRC suggested I run getfattr -m on a few bricks, when I did I got various results (see below). Is this a case where I can run something like backend-cleanup.sh or backend-xattr-sanitize.sh to fix, or is there a manual command? We're around 45TB, so I don't have anywhere to copy the files off. Thanks! [16:30:25] [r...@clustr-04 /root/bin]# getfattr -m /mnt/data04 getfattr: Removing leading '/' from absolute path names # file: mnt/data04 trusted.afr.clustr-04-1 trusted.afr.clustr-04-10 trusted.afr.clustr-04-11 trusted.afr.clustr-04-12 trusted.afr.clustr-04-13 trusted.afr.clustr-04-14 trusted.afr.clustr-04-15 trusted.afr.clustr-04-16 trusted.afr.clustr-04-17 trusted.afr.clustr-04-18 trusted.afr.clustr-04-19 trusted.afr.clustr-04-2 trusted.afr.clustr-04-20 trusted.afr.clustr-04-21 trusted.afr.clustr-04-22 trusted.afr.clustr-04-23 trusted.afr.clustr-04-24 trusted.afr.clustr-04-3 trusted.afr.clustr-04-4 trusted.afr.clustr-04-5 trusted.afr.clustr-04-6 trusted.afr.clustr-04-7 trusted.afr.clustr-04-8 trusted.afr.clustr-04-9 trusted.afr.clustr-05-1 trusted.afr.clustr-05-10 trusted.afr.clustr-05-11 trusted.afr.clustr-05-12 trusted.afr.clustr-05-13 trusted.afr.clustr-05-14 trusted.afr.clustr-05-15 trusted.afr.clustr-05-16 trusted.afr.clustr-05-17 trusted.afr.clustr-05-18 trusted.afr.clustr-05-19 trusted.afr.clustr-05-2 trusted.afr.clustr-05-20 trusted.afr.clustr-05-21 trusted.afr.clustr-05-22 trusted.afr.clustr-05-23 trusted.afr.clustr-05-24 trusted.afr.clustr-05-3 trusted.afr.clustr-05-4 trusted.afr.clustr-05-5 trusted.afr.clustr-05-6 trusted.afr.clustr-05-7 trusted.afr.clustr-05-8 trusted.afr.clustr-05-9 trusted.glusterfs.dht trusted.posix4.gen --- Another server [01:02:05] [r...@clustr-01 /]# getfattr -m /mnt/data09 getfattr: Removing leading '/' from absolute path names # file: mnt/data09 trusted.afr.clustr-01-10 trusted.afr.clustr-01-9 trusted.glusterfs.dht trusted.glusterfs.test trusted.posix9.gen [00:43:14] [r...@clustr-01 /]# getfattr -m /mnt/data04 getfattr: Removing leading '/' from absolute path names # file: mnt/data04 trusted.afr.clustr-01-3 trusted.afr.clustr-01-4 trusted.glusterfs.dht trusted.glusterfs.test trusted.posix4.gen ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Replication between two separate clusters
Is it possible with Gluster to have two separate clusters replicating volumes that are already mirrored in their independent cluster? So far it looks like that's what AFR is supposed to do. This is what I'm setting up as well, we'll have individual, stand alone, clusters that are synced using a combination of rsync/ssh/lsyncd/csync2. For me this is a true DR environment, since each instance will be storing and serving content at the same time, and won't be reliant on the other, but as long as they're both up they will keep each other in sync as to be a mirror. P On Mon, May 24, 2010 at 10:39 AM, Jeffrey Negro jne...@billtrust.com wrote: Hello - My company is in need of a clustered NAS solution, mostly for CIFS fileshares. We have been considering commercial solutions from Isilon and NetApp, but I have a feeling I'm not going to get the budget approval for those products. I also tend to stay away from closed hardware solutions... but I digress. We want to have a production and a DR cluster that replicate across a WAN. Is it possible with Gluster to have two separate clusters replicating volumes that are already mirrored in their independent cluster? So far it looks like that's what AFR is supposed to do. Any information or assistance that anyone can provide in clarifying my understanding of this scenario would be very helpful and much appreciated. Thank you, Jeffrey ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Input/output error when running `ls` and `cd` on directories
Lakshmipathi Attached are the gluster vol files, and the glusterfsd.log file while running under TRACE. I did the same queries as in the original email, and got the same results. Let me know if you want each action broken out in the logfile if you can't tell by the details. Thanks for your help on this. P On Sat, May 15, 2010 at 7:20 AM, Lakshmipathi lakshmipa...@gluster.com wrote: Hi, Can you please sent us server/client log files and server/client volume files?.If you think there is not enough detail on logs, please set the log level as TRACE instead of DEBUG and sent us the logs. Cheers, Lakshmipathi.G - Original Message - From: phil cryer p...@cryer.us To: gluster-users@gluster.org Sent: Saturday, May 15, 2010 9:44:18 AM Subject: [Gluster-users] Input/output error when running `ls` and `cd` on directories I'm getting Input/output errors on gluster mounted directories. First, I have a few directories I created a few weeks ago, but when I run an ls on them, their status is listed as ???: [23:52:54] [r...@clustr06 /mnt/glusterfs]# ls -al ls: cannot access lost+found: Input/output error ls: cannot access bhl: Input/output error total 1920 drwxr-xr-x 7 root root 294912 2010-05-13 19:11 . drwxr-xr-x 27 root root 4096 2010-04-30 15:28 .. d? ? ? ? ? ? bhl drwx-- 2 root root 294912 2010-05-05 22:37 bin drwx-- 4 root root 294912 2010-05-10 14:37 clustr-02 drwx-- 46 root root 294912 2010-05-13 19:13 clustr-04 d? ? ? ? ? ? lost+found Then, I go into a directory I've been trying to populate with files to see how far along is is, and I can't see it: [23:55:48] [r...@clustr06 /mnt/glusterfs/clustr-04]# ls grab4* ls: cannot access grab43: Input/output error ls: cannot access grab44: Input/output error ls: cannot access grab45: Input/output error grab4: grabby.sh status grab40: grabby.sh status grab41: complete grabby.sh status grab42: grabby.sh status I have glusterfsd running in debug mode, but it's not giving me any details. I've stopped, restarted glusterfsd, unmounting and remounting the glusterfs shares after that. What is happening and how can I fix it? Thanks. P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] Input/output error when running `ls` and `cd` on directories
I'm getting Input/output errors on gluster mounted directories. First, I have a few directories I created a few weeks ago, but when I run an ls on them, their status is listed as ???: [23:52:54] [r...@clustr06 /mnt/glusterfs]# ls -al ls: cannot access lost+found: Input/output error ls: cannot access bhl: Input/output error total 1920 drwxr-xr-x 7 root root 294912 2010-05-13 19:11 . drwxr-xr-x 27 root root 4096 2010-04-30 15:28 .. d? ? ?? ?? bhl drwx-- 2 root root 294912 2010-05-05 22:37 bin drwx-- 4 root root 294912 2010-05-10 14:37 clustr-02 drwx-- 46 root root 294912 2010-05-13 19:13 clustr-04 d? ? ?? ?? lost+found Then, I go into a directory I've been trying to populate with files to see how far along is is, and I can't see it: [23:55:48] [r...@clustr06 /mnt/glusterfs/clustr-04]# ls grab4* ls: cannot access grab43: Input/output error ls: cannot access grab44: Input/output error ls: cannot access grab45: Input/output error grab4: grabby.sh status grab40: grabby.sh status grab41: complete grabby.sh status grab42: grabby.sh status I have glusterfsd running in debug mode, but it's not giving me any details. I've stopped, restarted glusterfsd, unmounting and remounting the glusterfs shares after that. What is happening and how can I fix it? Thanks. P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Monitoring Gluster availability
On Fri, May 7, 2010 at 3:13 AM, Kelvin Westlake kel...@netbasic.co.uk wrote: Can anybody recommend away of monitoring gluster availability, I need to be made aware of a server or client crashes out. Is there some port or system component that can be monitored? I use monit [ttp://mmonit.com/monit/] extensively, and have written a simple config snippet to watch glusterfds and restart it if it has failed. from /etc/monit/monitrc check process glusterfsd with pidfile /var/run/glusterfsd.pid start program = /etc/init.d/glusterfsd start stop program = /etc/init.d/glusterfsd stop if failed host 127.0.0.1 port 6996 then restart if loadavg(5min) greater than 10 for 8 cycles then restart if 5 restarts within 5 cycles then timeout Today I was looking for a more 'gluster native' way of checking all the nodes to see if each of them in the cluster are up, but haven't gotten very far, save for pulling the hostnames out of the volfile: grep option remote-host /etc/glusterfs/glusterfs.vol | uniq | cut -d -f7 but from there you'd need to do a shared ssh key setup for a script to loop through those entries and check things in the logs on all the servers... Does anyone have a way they do it? P On Fri, May 7, 2010 at 3:13 AM, Kelvin Westlake kel...@netbasic.co.uk wrote: Hi Guys Can anybody recommend away of monitoring gluster availability, I need to be made aware of a server or client crashes out. Is there some port or system component that can be monitored? Cheers Kelvin This email with any attachments is for the exclusive and confidential use of the addressee(s) and may contain legally privileged information. Any other distribution, use or reproduction without the senders prior consent is unauthorised and strictly prohibited. If you receive this message in error please notify the sender by email and delete the message from your computer. Netbasic Limited registered office and business address is 9 Funtley Court, Funtley Hill, Fareham, Hampshire PO16 7UY. Company No. 04906681. Netbasic Limited is authorised and regulated by the Financial Services Authority in respect of regulated activities. Please note that many of our activities do not require FSA regulation. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster-volgen - syntax for mirroring/distributing across 6 nodes
On Mon, May 3, 2010 at 1:25 AM, Lakshmipathi lakshmipa...@gluster.com wrote: Hi Phil, 1)Yes,your glusterfs-volgen command should provide the required volume files for your setup ,but you can also use this, glusterfs-volgen --name repstore1 --raid 1 clustr-01:/mnt/data01 clustr-02:/mnt/data01 clustr-03:/mnt/data01 clustr-04:/mnt/data01 clustr-05:/mnt/data01 clustr-06:/mnt/data01 -c ~/files/ will create volume files under your home , make sure you have ~/files directory already exists. Thanks for this, I'm currently trying to debug my config, when I try to mount ONE of the servers 24 bricks, it's only showing capacity like it's mounting ONE of the bricks, logs error is: [2010-05-03 11:18:35] W [posix.c:246:posix_lstat_with_gen] posix1: Access to /mnt/data01//.. (on dev 16771) is crossing device (2257) What is this telling me? I'm looking at just dropping back to mirroring 2 servers, and replicating from there, then try against all 6. I'll post my config in a bit. Thanks P 2)No currently glusterfs-volgen won't support shorthand methods for export directories - they need to be exact strings. For more details on volgen - please check http://www.gluster.com/community/documentation/index.php/Glusterfs-volgen_Reference_Page Cheers, Lakshmipathi.G - Original Message - From: phil cryer p...@cryer.us To: gluster-users@gluster.org Sent: Friday, April 30, 2010 11:56:44 PM Subject: [Gluster-users] gluster-volgen - syntax for mirroring/distributing across 6 nodes NOTE: posted this to gluster-devel when I meant to post it to gluster-users 01 | 02 mirrored --| 03 | 04 mirrored --| distributed 05 | 06 mirrored --| 1) Would this command work for that? glusterfs-volgen --name repstore1 --raid 1 clustr-01:/mnt/data01 clustr-02:/mnt/data01 --raid 1 clustr-03:/mnt/data01 clustr-04:/mnt/data01 --raid 1 clustr-05:/mnt/data01 clustr-06:/mnt/data01 So the 'repstore1' is the distributed part, and within that are 3 sets of mirrored nodes. 2) Then, since we're running 24 drives in JBOD mode, so we've got mounts from /mnt/data01 - /mnt/data24. Is there a way to write this in shorthand, because the last time I generated a config across 3 of these hosts, the command looked like this: glusterfs-volgen --name store123 clustr-01:/mnt/data01 clustr-02:/mnt/data01 clustr-03:/mnt/data01 clustr-01:/mnt/data02 clustr-02:/mnt/data02 clustr-03:/mnt/data02 clustr-01:/mnt/data03 clustr-02:/mnt/data03 clustr-03:/mnt/data03 clustr-01:/mnt/data04 clustr-02:/mnt/data04 clustr-03:/mnt/data04 clustr-01:/mnt/data05 clustr-02:/mnt/data05 clustr-03:/mnt/data05 clustr-01:/mnt/data06 clustr-02:/mnt/data06 clustr-03:/mnt/data06 clustr-01:/mnt/data07 clustr-02:/mnt/data07 clustr-03:/mnt/data07 clustr-01:/mnt/data08 clustr-02:/mnt/data08 clustr-03:/mnt/data08 clustr-01:/mnt/data09 clustr-02:/mnt/data09 clustr-03:/mnt/data09 clustr-01:/mnt/data10 clustr-02:/mnt/data10 clustr-03:/mnt/data10 clustr-01:/mnt/data11 clustr-02:/mnt/data11 clustr-03:/mnt/data11 clustr-01:/mnt/data12 clustr-02:/mnt/data12 clustr-03:/mnt/data12 clustr-01:/mnt/data13 clustr-02:/mnt/data13 clustr-03:/mnt/data13 clustr-01:/mnt/data14 clustr-02:/mnt/data14 clustr-03:/mnt/data14 clustr-01:/mnt/data15 clustr-02:/mnt/data15 clustr-03:/mnt/data15 clustr-01:/mnt/data16 clustr-02:/mnt/data16 clustr-03:/mnt/data16 clustr-01:/mnt/data17 clustr-02:/mnt/data17 clustr-03:/mnt/data17 clustr-01:/mnt/data18 clustr-02:/mnt/data18 clustr-03:/mnt/data18 clustr-01:/mnt/data19 clustr-02:/mnt/data19 clustr-03:/mnt/data19 clustr-01:/mnt/data20 clustr-02:/mnt/data20 clustr-03:/mnt/data20 clustr-01:/mnt/data21 clustr-02:/mnt/data21 clustr-03:/mnt/data21 clustr-01:/mnt/data22 clustr-02:/mnt/data22 clustr-03:/mnt/data22 clustr-01:/mnt/data23 clustr-02:/mnt/data23 clustr-03:/mnt/data23 clustr-01:/mnt/data24 clustr-02:/mnt/data24 clustr-03:/mnt/data24 Thanks P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] gluster-volgen - syntax for mirroring/distributing across 6 nodes
NOTE: posted this to gluster-devel when I meant to post it to gluster-users 01 | 02 mirrored --| 03 | 04 mirrored --| distributed 05 | 06 mirrored --| 1) Would this command work for that? glusterfs-volgen --name repstore1 --raid 1 clustr-01:/mnt/data01 clustr-02:/mnt/data01 --raid 1 clustr-03:/mnt/data01 clustr-04:/mnt/data01 --raid 1 clustr-05:/mnt/data01 clustr-06:/mnt/data01 So the 'repstore1' is the distributed part, and within that are 3 sets of mirrored nodes. 2) Then, since we're running 24 drives in JBOD mode, so we've got mounts from /mnt/data01 - /mnt/data24. Is there a way to write this in shorthand, because the last time I generated a config across 3 of these hosts, the command looked like this: glusterfs-volgen --name store123 clustr-01:/mnt/data01 clustr-02:/mnt/data01 clustr-03:/mnt/data01 clustr-01:/mnt/data02 clustr-02:/mnt/data02 clustr-03:/mnt/data02 clustr-01:/mnt/data03 clustr-02:/mnt/data03 clustr-03:/mnt/data03 clustr-01:/mnt/data04 clustr-02:/mnt/data04 clustr-03:/mnt/data04 clustr-01:/mnt/data05 clustr-02:/mnt/data05 clustr-03:/mnt/data05 clustr-01:/mnt/data06 clustr-02:/mnt/data06 clustr-03:/mnt/data06 clustr-01:/mnt/data07 clustr-02:/mnt/data07 clustr-03:/mnt/data07 clustr-01:/mnt/data08 clustr-02:/mnt/data08 clustr-03:/mnt/data08 clustr-01:/mnt/data09 clustr-02:/mnt/data09 clustr-03:/mnt/data09 clustr-01:/mnt/data10 clustr-02:/mnt/data10 clustr-03:/mnt/data10 clustr-01:/mnt/data11 clustr-02:/mnt/data11 clustr-03:/mnt/data11 clustr-01:/mnt/data12 clustr-02:/mnt/data12 clustr-03:/mnt/data12 clustr-01:/mnt/data13 clustr-02:/mnt/data13 clustr-03:/mnt/data13 clustr-01:/mnt/data14 clustr-02:/mnt/data14 clustr-03:/mnt/data14 clustr-01:/mnt/data15 clustr-02:/mnt/data15 clustr-03:/mnt/data15 clustr-01:/mnt/data16 clustr-02:/mnt/data16 clustr-03:/mnt/data16 clustr-01:/mnt/data17 clustr-02:/mnt/data17 clustr-03:/mnt/data17 clustr-01:/mnt/data18 clustr-02:/mnt/data18 clustr-03:/mnt/data18 clustr-01:/mnt/data19 clustr-02:/mnt/data19 clustr-03:/mnt/data19 clustr-01:/mnt/data20 clustr-02:/mnt/data20 clustr-03:/mnt/data20 clustr-01:/mnt/data21 clustr-02:/mnt/data21 clustr-03:/mnt/data21 clustr-01:/mnt/data22 clustr-02:/mnt/data22 clustr-03:/mnt/data22 clustr-01:/mnt/data23 clustr-02:/mnt/data23 clustr-03:/mnt/data23 clustr-01:/mnt/data24 clustr-02:/mnt/data24 clustr-03:/mnt/data24 Thanks P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] client mount fails on boot under debian lenny...
On Sat, Apr 24, 2010 at 9:55 PM, mki-gluste...@mozone.net wrote: On Sat, Apr 24, 2010 at 07:57:32PM +0200, Smart Weblications GmbH - Florian Wiessner wrote: Am 24.04.2010 19:08, schrieb mki-gluste...@mozone.net: I quote below: The fstab entry contains options noatime,_netdev already. :) Any other thoughts? the parameter is iirc no_netdev. you can also add glusterfs to the function mount_all_local() in /etc/init.d/mountall.sh and update ../init.d/mountnfs.sh to mount glusterfs. ../init.d/mountnfs.sh is executed after networking is established. Yeah I did try modifying mountall.sh, mountnfs.sh and a couple of others that already had references to gfs/ocfs in them and add glusterfs to the list. I even added it to /etc/network/if-up.d/mountnfs, but even with that it did the same thing and barfed until I added a sleep 3 to the script right before it's mount attempt line. My fear with modifying all those system files is the next apt-get update/upgrade will end up blowing the changes away? My fear with modifying all those system files is the next apt-get update/upgrade will end up blowing the changes away? This would be my concern as well, and bolsters the original thought of using /etc/rc.local to handle it. Perhaps background a `sleep x; mount -a` command, or the like. P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Monitoring gluster with nagios - was Re: [Gluster-devel] Gluster health/status
Ian Very nice, IMO this should be added to the Gluster wiki. P On Tue, Apr 13, 2010 at 5:07 PM, Ian Rogers ian.rog...@contactclean.com wrote: Answering my own question, hope these instructions are useful - http://www.sirgroane.net/2010/04/monitoring-gluster-with-nagios/ Cheers, Ian On 09/04/2010 06:01, Ian Rogers wrote: Gluster devs, I found the message below in the archives. glfs-health.sh is not included in the v3.0.3 sources - is there any plan to add this to the extras directory? What's its status? Ian == snip == Raghavendra G Mon, 22 Feb 2010 20:20:33 -0800 Hi all, Here is some work related to Health monitoring. glfs-health.sh is a shell script to check the health of glusterfs. http://git.gluster.com/?p=users/avati/glfs-health.git;a=blob_plain;f=glfs-health.sh;hb=5bf3cb50452525f545018fa5f8eed06cb2fbbe7d Documentation can be found from http://git.gluster.com/?p=users/avati/glfs-health.git;a=blob_plain;f=README;hb=5bf3cb50452525f545018fa5f8eed06cb2fbbe7d We welcome improvements and discussions on this. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Random No such file or directory error in gluster client logs - FIXED
The solution was quite simple. It turned out that it was because the server's data drive was formatted in ext4. Switched it to ext3 and the problems went away! Is this a known issue with Gluster 3.0.3? I've setup our cluster with ext4 on Debian, but have not had any issues like this yet (but we're not running live yet). Is this something to be concerned about? Should we change everything back to ext3? P On Thu, Mar 18, 2010 at 8:48 AM, Lee Simpson l...@leenix.co.uk wrote: Hello, Just thought id share the experience i had with a gluster client error and the solution i found after much searching and chatting with some IRC guys. Im running a simple 2 server with multiple clients using cluster/replicate. Randomly newly created files produced the following error in the gluster client logs when accessed; W [fuse-bridge.c:858:fuse_fd_cbk] glusterfs-fuse: 59480: OPEN() /data/randomfile-here = -1 (No such file or directory) These files are created by apache or other scripts (such as awstats on a cron). Apache is then unable to read the file, and the above message appears in the gluster logs everytime you try. If i SSH into the apache server and cat the file it displays fine and then apache starts reading it fine. I upgraded the client and server to 3.03 and tried reducing my configs to the bare min without any performance volumes.. But the problem persisted... SOLUTION The solution was quite simple. It turned out that it was because the server's data drive was formatted in ext4. Switched it to ext3 and the problems went away! Hope that helps someone else who finds this. - Lee [ Disclaimer ] This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender by replying to this e-mail. This email has been scanned for viruses ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster-users Digest, Vol 20, Issue 22
that it is much less of a problem. By using more smaller machines you also eliminate the need for redundant power supplies (which would be a requirement in your large boxes because it would be a single point of failure on a large percentage of your storage system). Hope the information helps. Regards, Larry Bates -- Message: 6 Date: Thu, 17 Dec 2009 00:18:54 -0600 From: phil cryer p...@cryer.us Subject: [Gluster-users] Recommended GlusterFS configuration for 6 node cluster To: gluster-users@gluster.org gluster-users@gluster.org Message-ID: 3a3bc55a0912162218i4e3f326cr9956dd37132bf...@mail.gmail.com Content-Type: text/plain; charset=UTF-8 We're setting up 6 servers, each with 24 x 1.5TB drives, the systems will run Debian testing and Gluster 3.x. The SATA RAID card offers RAID5 and RAID6, we're wondering what the optimum setup would be for this configuration. Do we RAID5 the disks, and have GlusterFS use them that way, or do we keep them all 'raw' and have GlusterFS handle the replication (though not 2x as we would have with the RAID options)? Obviously a lot of ways to do this, just wondering what GlusterFS devs and other experienced users would recommend. Thanks P ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] volume sizes
Thanks for all of the responses, as Anthony said, we're just now building this out, so we look forward to doing benchmarking to find what's right for our environment, and of course share it to help adoption of GlusterFS. Also, we gave a talk in November at a conference in France covering our plans and reasons for moving to Gluster, you can see the slides for the talk here: http://www.slideshare.net/phil.cryer/building-a-scalable-open-source-storage-solution-2482448 All questions, comments are welcome! I'm working now on some detailed documentation for Gluster install using the latest Debian (Squeeze) testing branch to take advantage of of ext4. The storage platform looks very promising, I hope that you can break the web UI out of if so we can install it on our 'hand rolled' boxes. Thanks again for the support! P On Wed, Dec 30, 2009 at 12:07 PM, Tejas N. Bhise te...@gluster.com wrote: Thanks, Raghvendra. Anthony, Its a lazy self-heal mechanism, if you will. If one wants it all done right away, an ls -alR will access each file and hence cause the rebuild of the whole glusterfs volume which _may_, like you mentioned, be spread across disk partitions, LVM/RAID luns or even server nodes. Even after all that, only the files impacted in the volume would need to be rebuilt - although there might be some difference in overheads for different sized and configured Glusterfs volumes. It might be interesting to check - we have not done numbers on this. Let me check with the person who is more familiar with this area of code than me and he may be able to suggest some ballpark numbers till we run some real numbers. Meanwhile, if you do some tests, please share the numbers with the community. Regards, Tejas. - Original Message - From: Raghavendra G raghaven...@gluster.com To: Anthony Goddard agodd...@mbl.edu Cc: Tejas N. Bhise te...@gluster.com, gluster-users gluster-users@gluster.org Sent: Wednesday, December 30, 2009 9:10:23 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: Re: [Gluster-users] volume sizes Hi Anthony, On Wed, Dec 30, 2009 at 6:30 PM, Anthony Goddard agodd...@mbl.edu wrote: Hi Tejas, Thanks for the advice. I will be using RAID as well as gluster replication I think.. as we'll only need to sacrifice 1 drive per raid set to add a bit of extra redundancy. The rebuild happens at the first access of a file, does this mean that the entire brick/node is rebuilt upon an initial file access? No, only the file which is accessed is rebuilt. That is the reason we recursively access all the files using 'ls -laR' on mount point. I think this is what I've seen from using gluster previously. If this is the case, it would rebuild the entire volume which could span many raid volumes or even machines, is this correct? If this is the case, then the underlying disk wouldn't have any effect at all, but if it's spanned over multiple machines and it only needs to rebuild one machine (or multiple volumes on one machine) it only needs to rebuild one volume. I don't know if that made any sense.. haha.. but if it did, any insights into whether the size of the volumes (aside from RAID rebuilds) will have a positive effect on glusters rebuild operations? Cheers, Ant. On Dec 30, 2009, at 2:56 AM, Tejas N. Bhise wrote: Anthony, Gluster can take the smaller ( 6TB ) volumes and aggregate them into a large Gluster volume ( as seen from the clients ). So that takes care of managebility on the client side of things. On the server side, once you make those smaller 6 TB volumes, you will depend on RAID to rebuild the disk behind it, so its good to have a smaller partition. Since you are using RAID and not Gluster replication, it might just make sense to have smaller RAID partitions. If instead you were using Gluster replication and resulting recovery, it would happen at first access of the file and the size of the Gluster volume or the backend native FS volume or the RAID ( or raw ) partition behind it would not be much of a consideration. Regards, Tejas. - Original Message - From: Anthony Goddard agodd...@mbl.edu To: gluster-users@gluster.org Sent: Wednesday, December 30, 2009 3:24:35 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] volume sizes First post! We're looking at setting up 6x 24 bay storage servers (36TB of JBOD storage per node) and running glusterFS over this cluster. We have RAID cards on these boxes and are trying to decide what the best size of each volume should be, for example if we present the OS's (and gluster) with six 36TB volumes, I imagine rebuilding one node would take a long time, and there may be other performance implications of this. On the other hand, if we present gluster / the OS's with 6x 6TB volumes on each node, we might have more trouble in managing a larger number of volumes. My gut tells me a lot
Re: [Gluster-users] Recommended GlusterFS configuration for 6 node cluster
Thanks Tejas We have scanned biodiversity texts, so our aims are to have storage capable of holding our full store of data (approx 24TB), and then as we'll have everything in one place, be able to serve the data up via standard HTTP calls. Later we will look at doing syncs of the data once we have other clusters up regionally, and globally for further redudancy as well as providing better presetation for other parts of the world. So overall we just need to have a cluster that is redundant and is able to serve files relatively quickly (some of the scans are large, whereas the accompanying metadata files are small. GlusterFS gives us this ability, something we've wanted for some time, so this is amazing functioality for us to put into place. Does this give you enough to go on? If not, let me know, I appreciate any/all suggestions. P On Thu, Dec 17, 2009 at 4:29 AM, Tejas N. Bhise te...@gluster.com wrote: Hi Phil, It's great to know that you are using Gluster. It would be easy to make suggestions on the points you bring up if there is more information on what use your want to put the system to. Regards, Tejas. - Original Message - From: phil cryer p...@cryer.us To: gluster-users@gluster.org Sent: Thursday, December 17, 2009 11:48:54 AM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] Recommended GlusterFS configuration for 6 node cluster We're setting up 6 servers, each with 24 x 1.5TB drives, the systems will run Debian testing and Gluster 3.x. The SATA RAID card offers RAID5 and RAID6, we're wondering what the optimum setup would be for this configuration. Do we RAID5 the disks, and have GlusterFS use them that way, or do we keep them all 'raw' and have GlusterFS handle the replication (though not 2x as we would have with the RAID options)? Obviously a lot of ways to do this, just wondering what GlusterFS devs and other experienced users would recommend. Thanks P -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- http://philcryer.com ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users