[Gluster-users] New release of Gluster?
Hi, Are there any proposed dates for a new release of Gluster? I'm currently running 3.3, and the gluster heal info commands all segfault. Gerald ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume?
Greetings, I'm running v3.3.0 on Fedora16-x86_64. I used to have a replicated volume on two bricks. This morning I deleted it successfully: [root@farm-ljf0 ~]# gluster volume stop gv0 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y Stopping volume gv0 has been successful [root@farm-ljf0 ~]# gluster volume delete gv0 Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y Deleting volume gv0 has been successful [root@farm-ljf0 ~]# gluster volume info all No volumes present I then attempted to create a new volume using the same bricks that used to be part of the (now) deleted volume, but it keeps refusing failing claiming that the brick is already part of a volume: [root@farm-ljf1 ~]# gluster volume create gv0 rep 2 transport tcp 10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1 /mnt/sdb1 or a prefix of it is already part of a volume [root@farm-ljf1 ~]# gluster volume info all No volumes present Note farm-ljf0 is 10.31.99.165 and farm-ljf1 is 10.31.99.166. I also tried restarting glusterd (and glusterfsd) hoping that might clear things up, but it had no impact. How can /mnt/sdb1 be part of a volume when there are no volumes present? Is this a bug, or am I just missing something obvious? thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] glusterd vs. glusterfsd
I'm running version 3.3.0 on Fedora16-x86_64. The official(?) RPMs ship two init scripts, glusterd and glusterfsd. I've googled a bit, and I can't figure out what the purpose is for each of them. I know that I need one of them, but I can't tell which for sure. There's no man page for either, and running them with --help returns the same exact output. Do they have separate purposes? Do I only need one or both running on the bricks? thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume?
I believe gluster writes 2 entries into the top level of your gluster brick filesystems: -rw-r--r-- 2 root root36 2012-06-22 15:58 .gl.mount.check drw--- 258 root root 8192 2012-04-16 13:20 .glusterfs You will have to remove these as well as all the other fs info from the volume to re-add the fs as another brick. Or just remake the filesystem - instantaneous with XFS, less so with ext4. hjm On Tuesday, September 18, 2012 11:03:35 AM Lonni J Friedman wrote: Greetings, I'm running v3.3.0 on Fedora16-x86_64. I used to have a replicated volume on two bricks. This morning I deleted it successfully: [root@farm-ljf0 ~]# gluster volume stop gv0 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y Stopping volume gv0 has been successful [root@farm-ljf0 ~]# gluster volume delete gv0 Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y Deleting volume gv0 has been successful [root@farm-ljf0 ~]# gluster volume info all No volumes present I then attempted to create a new volume using the same bricks that used to be part of the (now) deleted volume, but it keeps refusing failing claiming that the brick is already part of a volume: [root@farm-ljf1 ~]# gluster volume create gv0 rep 2 transport tcp 10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1 /mnt/sdb1 or a prefix of it is already part of a volume [root@farm-ljf1 ~]# gluster volume info all No volumes present Note farm-ljf0 is 10.31.99.165 and farm-ljf1 is 10.31.99.166. I also tried restarting glusterd (and glusterfsd) hoping that might clear things up, but it had no impact. How can /mnt/sdb1 be part of a volume when there are no volumes present? Is this a bug, or am I just missing something obvious? thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) -- What does it say about a society that would rather send its children to kill and die for oil than to get on a bike? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume?
There are xattrs on the top-level directory of the old brick volume that gluster is detecting causing this. I personally always create my bricks on a subdir. If you do that you can simply rmdir/mkdir the directory when you want to delete a gluster volume. You can clear the xattrs or nuke it from orbit with mkfs on the volume device. - Original Message - From: Lonni J Friedman netll...@gmail.com To: gluster-users@gluster.org Sent: Tuesday, September 18, 2012 2:03:35 PM Subject: [Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume? Greetings, I'm running v3.3.0 on Fedora16-x86_64. I used to have a replicated volume on two bricks. This morning I deleted it successfully: [root@farm-ljf0 ~]# gluster volume stop gv0 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y Stopping volume gv0 has been successful [root@farm-ljf0 ~]# gluster volume delete gv0 Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y Deleting volume gv0 has been successful [root@farm-ljf0 ~]# gluster volume info all No volumes present I then attempted to create a new volume using the same bricks that used to be part of the (now) deleted volume, but it keeps refusing failing claiming that the brick is already part of a volume: [root@farm-ljf1 ~]# gluster volume create gv0 rep 2 transport tcp 10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1 /mnt/sdb1 or a prefix of it is already part of a volume [root@farm-ljf1 ~]# gluster volume info all No volumes present Note farm-ljf0 is 10.31.99.165 and farm-ljf1 is 10.31.99.166. I also tried restarting glusterd (and glusterfsd) hoping that might clear things up, but it had no impact. How can /mnt/sdb1 be part of a volume when there are no volumes present? Is this a bug, or am I just missing something obvious? thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume?
Hi Harry, Thanks for your reply. I tried to manually delete everything from the brick filesystems (including the hidden files/dirs), but that didn't help: [root@farm-ljf0 ~]# ls -la /mnt/sdb1/ total 4 drwxr-xr-x 2 root root 6 Sep 18 11:22 . drwxr-xr-x. 3 root root 17 Sep 13 09:45 .. [root@farm-ljf0 ~]# gluster volume create gv0 rep 2 transport tcp 10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1 /mnt/sdb1 or a prefix of it is already part of a volume # So I unmounted, formatted fresh, remounted, and then the volume creation worked. This seems like a bug, or at the very least a shortcoming in the documentation (which never mentions any of these requirements). Anyway, thanks for the help, its got me back on track. Hopefully someone from the gluster team will comment on this. On Tue, Sep 18, 2012 at 11:18 AM, harry mangalam harry.manga...@uci.edu wrote: I believe gluster writes 2 entries into the top level of your gluster brick filesystems: -rw-r--r-- 2 root root36 2012-06-22 15:58 .gl.mount.check drw--- 258 root root 8192 2012-04-16 13:20 .glusterfs You will have to remove these as well as all the other fs info from the volume to re-add the fs as another brick. Or just remake the filesystem - instantaneous with XFS, less so with ext4. hjm On Tuesday, September 18, 2012 11:03:35 AM Lonni J Friedman wrote: Greetings, I'm running v3.3.0 on Fedora16-x86_64. I used to have a replicated volume on two bricks. This morning I deleted it successfully: [root@farm-ljf0 ~]# gluster volume stop gv0 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y Stopping volume gv0 has been successful [root@farm-ljf0 ~]# gluster volume delete gv0 Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y Deleting volume gv0 has been successful [root@farm-ljf0 ~]# gluster volume info all No volumes present I then attempted to create a new volume using the same bricks that used to be part of the (now) deleted volume, but it keeps refusing failing claiming that the brick is already part of a volume: [root@farm-ljf1 ~]# gluster volume create gv0 rep 2 transport tcp 10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1 /mnt/sdb1 or a prefix of it is already part of a volume [root@farm-ljf1 ~]# gluster volume info all No volumes present Note farm-ljf0 is 10.31.99.165 and farm-ljf1 is 10.31.99.166. I also tried restarting glusterd (and glusterfsd) hoping that might clear things up, but it had no impact. How can /mnt/sdb1 be part of a volume when there are no volumes present? Is this a bug, or am I just missing something obvious? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume?
Hrmm, ok. Shouldn't 'gluster volume delete ...' be smart enough to clean this up so that I don't have to do it manually? Or alternatively, 'gluster volume create ...' should be able to figure out whether the path to a brick is really in use? As things stand now, the process is rather hacky when I have to issue the 'gluster volume delete ...' command, then manually clean up afterwards. Hopefully this is something that will be addressed in a future release? thanks On Tue, Sep 18, 2012 at 11:26 AM, Kaleb Keithley kkeit...@redhat.com wrote: There are xattrs on the top-level directory of the old brick volume that gluster is detecting causing this. I personally always create my bricks on a subdir. If you do that you can simply rmdir/mkdir the directory when you want to delete a gluster volume. You can clear the xattrs or nuke it from orbit with mkfs on the volume device. - Original Message - From: Lonni J Friedman netll...@gmail.com To: gluster-users@gluster.org Sent: Tuesday, September 18, 2012 2:03:35 PM Subject: [Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume? Greetings, I'm running v3.3.0 on Fedora16-x86_64. I used to have a replicated volume on two bricks. This morning I deleted it successfully: [root@farm-ljf0 ~]# gluster volume stop gv0 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y Stopping volume gv0 has been successful [root@farm-ljf0 ~]# gluster volume delete gv0 Deleting volume will erase all information about the volume. Do you want to continue? (y/n) y Deleting volume gv0 has been successful [root@farm-ljf0 ~]# gluster volume info all No volumes present I then attempted to create a new volume using the same bricks that used to be part of the (now) deleted volume, but it keeps refusing failing claiming that the brick is already part of a volume: [root@farm-ljf1 ~]# gluster volume create gv0 rep 2 transport tcp 10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1 /mnt/sdb1 or a prefix of it is already part of a volume [root@farm-ljf1 ~]# gluster volume info all No volumes present Note farm-ljf0 is 10.31.99.165 and farm-ljf1 is 10.31.99.166. I also tried restarting glusterd (and glusterfsd) hoping that might clear things up, but it had no impact. How can /mnt/sdb1 be part of a volume when there are no volumes present? Is this a bug, or am I just missing something obvious? ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterd vs. glusterfsd
On Tue, Sep 18, 2012 at 11:30 AM, Kaleb Keithley kkeit...@redhat.com wrote: If you mean you're using RPMs from my fedorapeople.org repo, those are not official. I put them there to be helpful, that's about it. yes, those. thanks for maintaining them, they are great! With those RPMs you need both init scripts, but as a an admin you should only ever use the glusterd script. so i should only set glusterd to run at boot, and ignore glusterfsd altogether ? - Original Message - From: Lonni J Friedman netll...@gmail.com To: gluster-users@gluster.org Sent: Tuesday, September 18, 2012 2:06:29 PM Subject: [Gluster-users] glusterd vs. glusterfsd I'm running version 3.3.0 on Fedora16-x86_64. The official(?) RPMs ship two init scripts, glusterd and glusterfsd. I've googled a bit, and I can't figure out what the purpose is for each of them. I know that I need one of them, but I can't tell which for sure. There's no man page for either, and running them with --help returns the same exact output. Do they have separate purposes? Do I only need one or both running on the bricks? thanks __ ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterd vs. glusterfsd
ok, thanks. On Tue, Sep 18, 2012 at 11:51 AM, Kaleb Keithley kkeit...@redhat.com wrote: The RPM install does a `checkcfg --add ...`; they will start after a reboot without any additional steps on your part. The only thing you need to do after an install is `service glusterd start`. - Original Message - From: Lonni J Friedman netll...@gmail.com To: Kaleb Keithley kkeit...@redhat.com Cc: gluster-users@gluster.org Sent: Tuesday, September 18, 2012 2:31:54 PM Subject: Re: [Gluster-users] glusterd vs. glusterfsd On Tue, Sep 18, 2012 at 11:30 AM, Kaleb Keithley kkeit...@redhat.com wrote: If you mean you're using RPMs from my fedorapeople.org repo, those are not official. I put them there to be helpful, that's about it. yes, those. thanks for maintaining them, they are great! With those RPMs you need both init scripts, but as a an admin you should only ever use the glusterd script. so i should only set glusterd to run at boot, and ignore glusterfsd altogether ? - Original Message - From: Lonni J Friedman netll...@gmail.com To: gluster-users@gluster.org Sent: Tuesday, September 18, 2012 2:06:29 PM Subject: [Gluster-users] glusterd vs. glusterfsd I'm running version 3.3.0 on Fedora16-x86_64. The official(?) RPMs ship two init scripts, glusterd and glusterfsd. I've googled a bit, and I can't figure out what the purpose is for each of them. I know that I need one of them, but I can't tell which for sure. There's no man page for either, and running them with --help returns the same exact output. Do they have separate purposes? Do I only need one or both running on the bricks? thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Setting xattrs failed
I'd open a reproducible bug on bugzilla.redhat.com https://bugzilla.redhat.com/page.cgi?id=browse.htmltab=product=GlusterFSbug_status=open On Mon, Sep 17, 2012 at 1:28 AM, Jan Krajdl s...@spamik.cz wrote: Bump. Anybody hasn't any idea? It's quite critical for me and I don't known what to try next... Thanks, -- Jan Krajdl Dne 13.9.2012 23:54, Jan Krajdl napsal(a): Hi, I have problem with glusterfs 3.3.0. I have 4 node cluster with several volumes. All bricks are ext4 filesystem, no selinux and writing extended attributes with setfattr works fine. But in brick logs I see messages like this: [2012-09-13 12:50:17.428402] E [posix.c:857:posix_mknod] 0-bacula-strip-posix: setting xattrs on /mnt/bacula/fff failed (Operation not supported) everytime I create some file on mounted volume. Glusterfs was upgraded from version 3.2.1. This error I can see on replicated volume which was there from version 3.2.1 and on stripe volume which was created after upgrade to 3.3.0. But on one-brick volume this error doesn't appear. On stripe volume there is some other strange behaviour but I think it's related to this xattr issue. On created file with getfattr I can see set attribute trusted.gfid. On volume itself are attributes trusted.gfid and trusted.glusterfs.volume-id. According to log it seems that this problem starts after upgrade to 3.3.0. On 3.2.1 version there wasn't these errors in logs. Could you please help me with solving this problem? Thanks, ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] XFS and MD RAID
On Mon, Sep 10, 2012 at 09:29:25AM +0800, Jack Wang wrote: Hi Brian, below patch should fix your bug. John reports: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202] [..] Call Trace: [8141782a] scsi_remove_target+0xda/0x1f0 [81421de5] sas_rphy_remove+0x55/0x60 [81421e01] sas_rphy_delete+0x11/0x20 [81421e35] sas_port_delete+0x25/0x160 [814549a3] mptsas_del_end_device+0x183/0x270 ...introduced by commit 3b661a9 [SCSI] fix hot unplug vs async scan race. I raised an Ubuntu bug which references this information and patch at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049013 I have now been asked: Can you provide some information on the status of the patch with regards to getting it merged upstream? Has it been sent upstream, what sort of feedback has it received, is it getting applied to a subsystem maintainer's tree, etc? Do you have any info on this? Thanks, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] NFS over gluster stops responding under write load
Greetings, I'm running version 3.3.0 on Fedora16-x86_64. I have two bricks setup with a volume doing basic replication on an XFS formatted filesystem. I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to write to the NFS mount point. After a few minutes, I noticed that bonnie++ didn't seem to be generating any more progress output, at which point I started checking assorted logs to see if anything was wrong. At that point, I saw that the client system was no longer able to write to the NFS mount point, and dmesg (and /var/log/messages) was spewing these warnings like crazy (dozens/second): nfs: server 10.31.99.166 not responding, still trying trying Those warnings started at 14:40:58 on the client system, but oddly stopped a few seconds later at 14:41:04. Here's the full bonnie++ output (/mnt/gv0 is where the gluster file system is mounted as an NFS client): [root@cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting... Here's what's in the glusterfs logs at the moment: ## # tail etc-glusterfs-glusterd.vol.log [2012-09-18 14:54:39.026557] I [glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:39.029463] I [glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0 [2012-09-18 14:54:46.993426] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:46.993520] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 [2012-09-18 14:55:47.175521] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:47.181048] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:49.306776] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:55:49.306844] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 # tail -f cli.log [2012-09-18 14:55:47.176824] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.180959] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181128] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.181167] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0 [2012-09-18 14:55:49.244795] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to socket [2012-09-18 14:55:49.307054] I [cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume [2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x328ca0f965] (--gluster(gf_cli3_1_heal_volume_cbk+0x1d4) [0x4225e4]))) 0-dict: buf is null! [2012-09-18 14:55:49.307289] E [cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate memory [2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1 ## I'd be happy to provide more if someone requests something specific. Not sure what other information to provide at this point, but here's the basics of the gluster setup: ## # gluster volume info all Volume Name: gv0 Type: Replicate Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.31.99.165:/mnt/sdb1 Brick2: 10.31.99.166:/mnt/sdb1 # gluster volume heal gv0 info operation failed ## I just noticed that glusterfs seems to be rapidly heading towards OOM territory. The glusterfs daemon is currently consuming 90% of MEM according to top. thanks ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS over gluster stops responding under write load
On Tue, Sep 18, 2012 at 2:59 PM, Lonni J Friedman netll...@gmail.com wrote: Greetings, I'm running version 3.3.0 on Fedora16-x86_64. I have two bricks setup with a volume doing basic replication on an XFS formatted filesystem. I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to write to the NFS mount point. After a few minutes, I noticed that bonnie++ didn't seem to be generating any more progress output, at which point I started checking assorted logs to see if anything was wrong. At that point, I saw that the client system was no longer able to write to the NFS mount point, and dmesg (and /var/log/messages) was spewing these warnings like crazy (dozens/second): nfs: server 10.31.99.166 not responding, still trying trying Those warnings started at 14:40:58 on the client system, but oddly stopped a few seconds later at 14:41:04. Here's the full bonnie++ output (/mnt/gv0 is where the gluster file system is mounted as an NFS client): [root@cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root Using uid:0, gid:0. Writing a byte at a time...done Writing intelligently...done Rewriting... Here's what's in the glusterfs logs at the moment: ## # tail etc-glusterfs-glusterd.vol.log [2012-09-18 14:54:39.026557] I [glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:39.029463] I [glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd: Responded to stage, ret: 0 [2012-09-18 14:54:46.993426] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:54:46.993520] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 [2012-09-18 14:55:47.175521] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:47.181048] I [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd: Received get vol req [2012-09-18 14:55:49.306776] I [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume gv0 [2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock] 0-glusterd: Unable to get lock for uuid: e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62 [2012-09-18 14:55:49.306844] E [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to acquire local lock, ret: -1 # tail -f cli.log [2012-09-18 14:55:47.176824] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.180959] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181128] I [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to get vol: 0 [2012-09-18 14:55:47.181167] I [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0 [2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0 [2012-09-18 14:55:49.244795] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to socket [2012-09-18 14:55:49.307054] I [cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to heal volume [2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365] (--/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) [0x328ca0f965] (--gluster(gf_cli3_1_heal_volume_cbk+0x1d4) [0x4225e4]))) 0-dict: buf is null! [2012-09-18 14:55:49.307289] E [cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate memory [2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1 ## I'd be happy to provide more if someone requests something specific. Not sure what other information to provide at this point, but here's the basics of the gluster setup: ## # gluster volume info all Volume Name: gv0 Type: Replicate Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: 10.31.99.165:/mnt/sdb1 Brick2: 10.31.99.166:/mnt/sdb1 # gluster volume heal gv0 info operation failed ## I just noticed that glusterfs seems to be rapidly heading towards OOM territory. The glusterfs daemon is currently consuming 90% of MEM according to top. I just attempted to shutdown the glusterd service, and it ran off a cliff. the OOM killer kicked in and killed it. From dmesg: # [ 4151.733182] glusterfsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0 [ 4151.733186] glusterfsd cpuset=/ mems_allowed=0 [ 4151.733189] Pid: 2567, comm: glusterfsd
[Gluster-users] Geo replication gluster 3.3 error
Hi All, [root@vm1 vol]# gluster volume geo-replication vol root@slave:/data/replication status MASTER SLAVE STATUS vol root@slave:/data/replication faulty [root@vm1 vol]# I am setting up the geo replication on gluster 3.3 for the first time. I used the http://repos.fedorapeople.org/repos/kkeithle/glusterfs/HOWTO.UFOto set up. I followed the same steps as in the admin guide however, everytime its showing faulty and I am getting following logs in the ssh. My setup is 3 virtual machines 2 are running as server nodes and third one to be geo site. I have installed geo across all the nodes and kept the gsynd file at /usr/local/libexe/gluster and /usr/libexe/gluster both on all servers. tailf /var/log/glusterfs/geo-replication/vol/ssh%3A%2F%2Froot%4010.2.3.35%3Afile%3A%2F%2F%2Fdata%2Freplication.log 2012-09-18 15:34:12.593647] I [monitor(monitor):80:monitor] Monitor: [2012-09-18 15:34:12.594135] I [monitor(monitor):81:monitor] Monitor: starting gsyncd worker [2012-09-18 15:34:12.633137] I [gsyncd:354:main_i] top: syncing: gluster://localhost:vol - ssh://root@slave:/data/replication [2012-09-18 15:34:13.803124] E [syncdutils:173:log_raise_exception] top: connection to peer is broken [2012-09-18 15:34:13.809307] E [resource:181:errfail] Popen: command ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-oSbzAm/gsycnd-ssh-%r@%h:%p root@slave/usr/local/libexec/glusterfs/gsyncd --session-owner 9e3bec63-69b6-4df2-96a7-3ff42693fc33 -N --listen --timeout 120 file:///data/replication returned with 1, saying: [2012-09-18 15:34:13.809423] E [resource:184:errfail] Popen: ssh [2012-09-18 15:34:12.754773] W [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing 'option transport-type'. defaulting to socket [2012-09-18 15:34:13.809514] E [resource:184:errfail] Popen: ssh [2012-09-18 15:34:12.792495] E [socket.c:1715:socket_connect_finish] 0-glusterfs: connection to failed (Connection refused) [2012-09-18 15:34:13.809608] E [resource:184:errfail] Popen: ssh [2012-09-18 15:34:13.792763] I [cli-cmd.c:145:cli_cmd_process] 0-: Exiting with: 110 [2012-09-18 15:34:13.809689] E [resource:184:errfail] Popen: ssh gsyncd initializaion failed [2012-09-18 15:34:13.809832] I [syncdutils:142:finalize] top: exiting. I am not sure what am I missing. Thanks Chandan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] XFS and MD RAID
2012/9/19 Brian Candler b.cand...@pobox.com: On Mon, Sep 10, 2012 at 09:29:25AM +0800, Jack Wang wrote: Hi Brian, below patch should fix your bug. John reports: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202] [..] Call Trace: [8141782a] scsi_remove_target+0xda/0x1f0 [81421de5] sas_rphy_remove+0x55/0x60 [81421e01] sas_rphy_delete+0x11/0x20 [81421e35] sas_port_delete+0x25/0x160 [814549a3] mptsas_del_end_device+0x183/0x270 ...introduced by commit 3b661a9 [SCSI] fix hot unplug vs async scan race. I raised an Ubuntu bug which references this information and patch at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049013 I have now been asked: Can you provide some information on the status of the patch with regards to getting it merged upstream? Has it been sent upstream, what sort of feedback has it received, is it getting applied to a subsystem maintainer's tree, etc? Do you have any info on this? Thanks, Brian. Hi Brian, Patch is not been applied to subsystem maintainer's tree yet, James may busy with other staff, you can send mail to James james.bottom...@hansenpartnership.com linux scsi linux-s...@vger.kernel.org push this bug fix to be include in to mainline. Jack ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users