[Gluster-users] New release of Gluster?

2012-09-18 Thread Gerald Brandt
Hi,

Are there any proposed dates for a new release of Gluster?  I'm currently 
running 3.3, and the gluster heal info commands all segfault.

Gerald
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume?

2012-09-18 Thread Lonni J Friedman
Greetings,
I'm running v3.3.0 on Fedora16-x86_64.  I used to have a replicated
volume on two bricks.  This morning I deleted it successfully:

[root@farm-ljf0 ~]# gluster volume stop gv0
Stopping volume will make its data inaccessible. Do you want to
continue? (y/n) y
Stopping volume gv0 has been successful
[root@farm-ljf0 ~]# gluster volume delete gv0
Deleting volume will erase all information about the volume. Do you
want to continue? (y/n) y
Deleting volume gv0 has been successful
[root@farm-ljf0 ~]# gluster volume info all
No volumes present


I then attempted to create a new volume using the same bricks that
used to be part of the (now) deleted volume, but it keeps refusing 
failing claiming that the brick is already part of a volume:

[root@farm-ljf1 ~]# gluster volume create gv0 rep 2 transport tcp
10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1
/mnt/sdb1 or a prefix of it is already part of a volume
[root@farm-ljf1 ~]# gluster volume info all
No volumes present


Note farm-ljf0 is 10.31.99.165 and farm-ljf1 is 10.31.99.166.  I also
tried restarting glusterd (and glusterfsd) hoping that might clear
things up, but it had no impact.

How can /mnt/sdb1 be part of a volume when there are no volumes present?
Is this a bug, or am I just missing something obvious?

thanks
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] glusterd vs. glusterfsd

2012-09-18 Thread Lonni J Friedman
I'm running version 3.3.0 on Fedora16-x86_64.  The official(?) RPMs
ship two init scripts, glusterd and glusterfsd.  I've googled a bit,
and I can't figure out what the purpose is for each of them.  I know
that I need one of them, but I can't tell which for sure.  There's no
man page for either, and running them with --help returns the same
exact output.  Do they have separate purposes?  Do I only need one or
both running on the bricks?

thanks
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume?

2012-09-18 Thread harry mangalam
I believe gluster writes 2 entries into the top level of your gluster brick 
filesystems:

-rw-r--r--   2 root root36 2012-06-22 15:58 .gl.mount.check
drw--- 258 root root  8192 2012-04-16 13:20 .glusterfs

You will have to remove these as well as all the other fs info from the volume 
to re-add the fs as another brick.

Or just remake the filesystem - instantaneous with XFS, less so with ext4.

hjm

On Tuesday, September 18, 2012 11:03:35 AM Lonni J Friedman wrote:
 Greetings,
 I'm running v3.3.0 on Fedora16-x86_64.  I used to have a replicated
 volume on two bricks.  This morning I deleted it successfully:
 
 [root@farm-ljf0 ~]# gluster volume stop gv0
 Stopping volume will make its data inaccessible. Do you want to
 continue? (y/n) y
 Stopping volume gv0 has been successful
 [root@farm-ljf0 ~]# gluster volume delete gv0
 Deleting volume will erase all information about the volume. Do you
 want to continue? (y/n) y
 Deleting volume gv0 has been successful
 [root@farm-ljf0 ~]# gluster volume info all
 No volumes present
 
 
 I then attempted to create a new volume using the same bricks that
 used to be part of the (now) deleted volume, but it keeps refusing 
 failing claiming that the brick is already part of a volume:
 
 [root@farm-ljf1 ~]# gluster volume create gv0 rep 2 transport tcp
 10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1
 /mnt/sdb1 or a prefix of it is already part of a volume
 [root@farm-ljf1 ~]# gluster volume info all
 No volumes present
 
 
 Note farm-ljf0 is 10.31.99.165 and farm-ljf1 is 10.31.99.166.  I also
 tried restarting glusterd (and glusterfsd) hoping that might clear
 things up, but it had no impact.
 
 How can /mnt/sdb1 be part of a volume when there are no volumes present?
 Is this a bug, or am I just missing something obvious?
 
 thanks
 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
What does it say about a society that would rather send its 
children to kill and die for oil than to get on a bike?


___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume?

2012-09-18 Thread Kaleb Keithley

There are xattrs on the top-level directory of the old brick volume that 
gluster is detecting causing this.

I personally always create my bricks on a subdir. If you do that you can simply 
rmdir/mkdir the directory when you want to delete a gluster volume.

You can clear the xattrs or nuke it from orbit with mkfs on the volume device.


- Original Message -
From: Lonni J Friedman netll...@gmail.com
To: gluster-users@gluster.org
Sent: Tuesday, September 18, 2012 2:03:35 PM
Subject: [Gluster-users] cannot create a new volume with a brick that used to 
be part of a deleted volume?

Greetings,
I'm running v3.3.0 on Fedora16-x86_64.  I used to have a replicated
volume on two bricks.  This morning I deleted it successfully:

[root@farm-ljf0 ~]# gluster volume stop gv0
Stopping volume will make its data inaccessible. Do you want to
continue? (y/n) y
Stopping volume gv0 has been successful
[root@farm-ljf0 ~]# gluster volume delete gv0
Deleting volume will erase all information about the volume. Do you
want to continue? (y/n) y
Deleting volume gv0 has been successful
[root@farm-ljf0 ~]# gluster volume info all
No volumes present


I then attempted to create a new volume using the same bricks that
used to be part of the (now) deleted volume, but it keeps refusing 
failing claiming that the brick is already part of a volume:

[root@farm-ljf1 ~]# gluster volume create gv0 rep 2 transport tcp
10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1
/mnt/sdb1 or a prefix of it is already part of a volume
[root@farm-ljf1 ~]# gluster volume info all
No volumes present


Note farm-ljf0 is 10.31.99.165 and farm-ljf1 is 10.31.99.166.  I also
tried restarting glusterd (and glusterfsd) hoping that might clear
things up, but it had no impact.

How can /mnt/sdb1 be part of a volume when there are no volumes present?
Is this a bug, or am I just missing something obvious?

thanks
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume?

2012-09-18 Thread Lonni J Friedman
Hi Harry,
Thanks for your reply.  I tried to manually delete everything from the
brick filesystems (including the hidden files/dirs), but that didn't
help:

[root@farm-ljf0 ~]# ls -la /mnt/sdb1/
total 4
drwxr-xr-x  2 root root  6 Sep 18 11:22 .
drwxr-xr-x. 3 root root 17 Sep 13 09:45 ..
[root@farm-ljf0 ~]# gluster volume create gv0 rep 2 transport tcp
10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1
/mnt/sdb1 or a prefix of it is already part of a volume
#

So I unmounted, formatted fresh, remounted, and then the volume
creation worked.  This seems like a bug, or at the very least a
shortcoming in the documentation (which never mentions any of these
requirements).

Anyway, thanks for the help, its got me back on track.  Hopefully
someone from the gluster team will comment on this.


On Tue, Sep 18, 2012 at 11:18 AM, harry mangalam harry.manga...@uci.edu wrote:
 I believe gluster writes 2 entries into the top level of your gluster brick
 filesystems:

 -rw-r--r--   2 root root36 2012-06-22 15:58 .gl.mount.check
 drw--- 258 root root  8192 2012-04-16 13:20 .glusterfs

 You will have to remove these as well as all the other fs info from the volume
 to re-add the fs as another brick.

 Or just remake the filesystem - instantaneous with XFS, less so with ext4.

 hjm

 On Tuesday, September 18, 2012 11:03:35 AM Lonni J Friedman wrote:
 Greetings,
 I'm running v3.3.0 on Fedora16-x86_64.  I used to have a replicated
 volume on two bricks.  This morning I deleted it successfully:
 
 [root@farm-ljf0 ~]# gluster volume stop gv0
 Stopping volume will make its data inaccessible. Do you want to
 continue? (y/n) y
 Stopping volume gv0 has been successful
 [root@farm-ljf0 ~]# gluster volume delete gv0
 Deleting volume will erase all information about the volume. Do you
 want to continue? (y/n) y
 Deleting volume gv0 has been successful
 [root@farm-ljf0 ~]# gluster volume info all
 No volumes present
 

 I then attempted to create a new volume using the same bricks that
 used to be part of the (now) deleted volume, but it keeps refusing 
 failing claiming that the brick is already part of a volume:
 
 [root@farm-ljf1 ~]# gluster volume create gv0 rep 2 transport tcp
 10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1
 /mnt/sdb1 or a prefix of it is already part of a volume
 [root@farm-ljf1 ~]# gluster volume info all
 No volumes present
 

 Note farm-ljf0 is 10.31.99.165 and farm-ljf1 is 10.31.99.166.  I also
 tried restarting glusterd (and glusterfsd) hoping that might clear
 things up, but it had no impact.

 How can /mnt/sdb1 be part of a volume when there are no volumes present?
 Is this a bug, or am I just missing something obvious?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] cannot create a new volume with a brick that used to be part of a deleted volume?

2012-09-18 Thread Lonni J Friedman
Hrmm, ok.  Shouldn't 'gluster volume delete ...' be smart enough to
clean this up so that I don't have to do it manually?  Or
alternatively, 'gluster volume create ...' should be able to figure
out whether the path to a brick is really in use?

As things stand now, the process is rather hacky when I have to issue
the 'gluster volume delete ...' command, then manually clean up
afterwards.  Hopefully this is something that will be addressed in a
future release?

thanks

On Tue, Sep 18, 2012 at 11:26 AM, Kaleb Keithley kkeit...@redhat.com wrote:

 There are xattrs on the top-level directory of the old brick volume that 
 gluster is detecting causing this.

 I personally always create my bricks on a subdir. If you do that you can 
 simply rmdir/mkdir the directory when you want to delete a gluster volume.

 You can clear the xattrs or nuke it from orbit with mkfs on the volume 
 device.


 - Original Message -
 From: Lonni J Friedman netll...@gmail.com
 To: gluster-users@gluster.org
 Sent: Tuesday, September 18, 2012 2:03:35 PM
 Subject: [Gluster-users] cannot create a new volume with a brick that used to 
 be part of a deleted volume?

 Greetings,
 I'm running v3.3.0 on Fedora16-x86_64.  I used to have a replicated
 volume on two bricks.  This morning I deleted it successfully:
 
 [root@farm-ljf0 ~]# gluster volume stop gv0
 Stopping volume will make its data inaccessible. Do you want to
 continue? (y/n) y
 Stopping volume gv0 has been successful
 [root@farm-ljf0 ~]# gluster volume delete gv0
 Deleting volume will erase all information about the volume. Do you
 want to continue? (y/n) y
 Deleting volume gv0 has been successful
 [root@farm-ljf0 ~]# gluster volume info all
 No volumes present
 

 I then attempted to create a new volume using the same bricks that
 used to be part of the (now) deleted volume, but it keeps refusing 
 failing claiming that the brick is already part of a volume:
 
 [root@farm-ljf1 ~]# gluster volume create gv0 rep 2 transport tcp
 10.31.99.165:/mnt/sdb1 10.31.99.166:/mnt/sdb1
 /mnt/sdb1 or a prefix of it is already part of a volume
 [root@farm-ljf1 ~]# gluster volume info all
 No volumes present
 

 Note farm-ljf0 is 10.31.99.165 and farm-ljf1 is 10.31.99.166.  I also
 tried restarting glusterd (and glusterfsd) hoping that might clear
 things up, but it had no impact.

 How can /mnt/sdb1 be part of a volume when there are no volumes present?
 Is this a bug, or am I just missing something obvious?

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterd vs. glusterfsd

2012-09-18 Thread Lonni J Friedman
On Tue, Sep 18, 2012 at 11:30 AM, Kaleb Keithley kkeit...@redhat.com wrote:

 If you mean you're using RPMs from my fedorapeople.org repo, those are not 
 official. I put them there to be helpful, that's about it.

yes, those.  thanks for maintaining them, they are great!


 With those RPMs you need both init scripts, but as a an admin you should only 
 ever use the glusterd script.

so i should only set glusterd to run at boot, and ignore glusterfsd altogether ?




 - Original Message -
 From: Lonni J Friedman netll...@gmail.com
 To: gluster-users@gluster.org
 Sent: Tuesday, September 18, 2012 2:06:29 PM
 Subject: [Gluster-users] glusterd vs. glusterfsd

 I'm running version 3.3.0 on Fedora16-x86_64.  The official(?) RPMs
 ship two init scripts, glusterd and glusterfsd.  I've googled a bit,
 and I can't figure out what the purpose is for each of them.  I know
 that I need one of them, but I can't tell which for sure.  There's no
 man page for either, and running them with --help returns the same
 exact output.  Do they have separate purposes?  Do I only need one or
 both running on the bricks?

 thanks
 __
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] glusterd vs. glusterfsd

2012-09-18 Thread Lonni J Friedman
ok, thanks.

On Tue, Sep 18, 2012 at 11:51 AM, Kaleb Keithley kkeit...@redhat.com wrote:

 The RPM install does a `checkcfg --add ...`; they will start after a reboot 
 without any additional steps on your part.

 The only thing you need to do after an install is `service glusterd start`.

 - Original Message -
 From: Lonni J Friedman netll...@gmail.com
 To: Kaleb Keithley kkeit...@redhat.com
 Cc: gluster-users@gluster.org
 Sent: Tuesday, September 18, 2012 2:31:54 PM
 Subject: Re: [Gluster-users] glusterd vs. glusterfsd

 On Tue, Sep 18, 2012 at 11:30 AM, Kaleb Keithley kkeit...@redhat.com wrote:

 If you mean you're using RPMs from my fedorapeople.org repo, those are not 
 official. I put them there to be helpful, that's about it.

 yes, those.  thanks for maintaining them, they are great!


 With those RPMs you need both init scripts, but as a an admin you should 
 only ever use the glusterd script.

 so i should only set glusterd to run at boot, and ignore glusterfsd 
 altogether ?




 - Original Message -
 From: Lonni J Friedman netll...@gmail.com
 To: gluster-users@gluster.org
 Sent: Tuesday, September 18, 2012 2:06:29 PM
 Subject: [Gluster-users] glusterd vs. glusterfsd

 I'm running version 3.3.0 on Fedora16-x86_64.  The official(?) RPMs
 ship two init scripts, glusterd and glusterfsd.  I've googled a bit,
 and I can't figure out what the purpose is for each of them.  I know
 that I need one of them, but I can't tell which for sure.  There's no
 man page for either, and running them with --help returns the same
 exact output.  Do they have separate purposes?  Do I only need one or
 both running on the bricks?

 thanks
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] Setting xattrs failed

2012-09-18 Thread Bryan Whitehead
I'd open a reproducible bug on bugzilla.redhat.com
https://bugzilla.redhat.com/page.cgi?id=browse.htmltab=product=GlusterFSbug_status=open


On Mon, Sep 17, 2012 at 1:28 AM, Jan Krajdl s...@spamik.cz wrote:
 Bump. Anybody hasn't any idea? It's quite critical for me and I don't
 known what to try next...

 Thanks,

 --
 Jan Krajdl


 Dne 13.9.2012 23:54, Jan Krajdl napsal(a):
 Hi,

 I have problem with glusterfs 3.3.0. I have 4 node cluster with several
 volumes. All bricks are ext4 filesystem, no selinux and writing extended
 attributes with setfattr works fine. But in brick logs I see messages
 like this:
 [2012-09-13 12:50:17.428402] E [posix.c:857:posix_mknod]
 0-bacula-strip-posix: setting xattrs on /mnt/bacula/fff failed
 (Operation not supported)
 everytime I create some file on mounted volume. Glusterfs was upgraded
 from version 3.2.1. This error I can see on replicated volume which was
 there from version 3.2.1 and on stripe volume which was created after
 upgrade to 3.3.0. But on one-brick volume this error doesn't appear. On
 stripe volume there is some other strange behaviour but I think it's
 related to this xattr issue.

 On created file with getfattr I can see set attribute trusted.gfid. On
 volume itself are attributes trusted.gfid and
 trusted.glusterfs.volume-id. According to log it seems that this problem
 starts after upgrade to 3.3.0. On 3.2.1 version there wasn't these
 errors in logs.

 Could you please help me with solving this problem? Thanks,






 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] XFS and MD RAID

2012-09-18 Thread Brian Candler
On Mon, Sep 10, 2012 at 09:29:25AM +0800, Jack Wang wrote:
 Hi Brian,
 
 below patch should fix your bug.
 
 John reports:
  BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]  [..]  Call Trace:
   [8141782a] scsi_remove_target+0xda/0x1f0
   [81421de5] sas_rphy_remove+0x55/0x60
   [81421e01] sas_rphy_delete+0x11/0x20
   [81421e35] sas_port_delete+0x25/0x160
   [814549a3] mptsas_del_end_device+0x183/0x270
 
 ...introduced by commit 3b661a9 [SCSI] fix hot unplug vs async scan race.

I raised an Ubuntu bug which references this information and patch at
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049013

I have now been asked:

Can you provide some information on the status of the patch with regards
to getting it merged upstream?  Has it been sent upstream, what sort of
feedback has it received, is it getting applied to a subsystem
maintainer's tree, etc?

Do you have any info on this?

Thanks,

Brian.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


[Gluster-users] NFS over gluster stops responding under write load

2012-09-18 Thread Lonni J Friedman
Greetings,
I'm running version 3.3.0 on Fedora16-x86_64.  I have two bricks setup
with a volume doing basic replication on an XFS formatted filesystem.
I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to
write to the NFS mount point.  After a few minutes, I noticed that
bonnie++ didn't seem to be generating any more progress output, at
which point I started checking assorted logs to see if anything was
wrong.  At that point, I saw that the client system was no longer able
to write to the NFS mount point, and dmesg (and /var/log/messages) was
spewing these warnings like crazy (dozens/second):
nfs: server 10.31.99.166 not responding, still trying trying

Those warnings started at 14:40:58 on the client system, but oddly
stopped a few seconds later at 14:41:04.  Here's the full bonnie++
output (/mnt/gv0 is where the gluster file system is mounted as an NFS
client):

[root@cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...


Here's what's in the glusterfs logs at the moment:
##
# tail etc-glusterfs-glusterd.vol.log
[2012-09-18 14:54:39.026557] I
[glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received
op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
[2012-09-18 14:54:39.029463] I
[glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd:
Responded to stage, ret: 0
[2012-09-18 14:54:46.993426] I
[glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume]
0-management: Received heal vol req for volume gv0
[2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock]
0-glusterd: Unable to get lock for uuid:
e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by:
1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
[2012-09-18 14:54:46.993520] E
[glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to
acquire local lock, ret: -1
[2012-09-18 14:55:47.175521] I
[glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd:
Received get vol req
[2012-09-18 14:55:47.181048] I
[glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd:
Received get vol req
[2012-09-18 14:55:49.306776] I
[glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume]
0-management: Received heal vol req for volume gv0
[2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock]
0-glusterd: Unable to get lock for uuid:
e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by:
1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
[2012-09-18 14:55:49.306844] E
[glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to
acquire local lock, ret: -1
# tail -f cli.log
[2012-09-18 14:55:47.176824] I
[cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to
get vol: 0
[2012-09-18 14:55:47.180959] I
[cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0
[2012-09-18 14:55:47.181128] I
[cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to
get vol: 0
[2012-09-18 14:55:47.181167] I
[cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0
[2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0
[2012-09-18 14:55:49.244795] W
[rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing
'option transport-type'. defaulting to socket
[2012-09-18 14:55:49.307054] I
[cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to
heal volume
[2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize]
(--/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365]
(--/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
[0x328ca0f965] (--gluster(gf_cli3_1_heal_volume_cbk+0x1d4)
[0x4225e4]))) 0-dict: buf is null!
[2012-09-18 14:55:49.307289] E
[cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate
memory
[2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1
##

I'd be happy to provide more if someone requests something specific.

Not sure what other information to provide at this point, but here's
the basics of the gluster setup:
##
# gluster volume info all

Volume Name: gv0
Type: Replicate
Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.31.99.165:/mnt/sdb1
Brick2: 10.31.99.166:/mnt/sdb1
# gluster volume heal gv0 info
operation failed
##

I just noticed that glusterfs seems to be rapidly heading towards OOM
territory.  The glusterfs daemon is currently consuming 90% of MEM
according to top.

thanks
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] NFS over gluster stops responding under write load

2012-09-18 Thread Lonni J Friedman
On Tue, Sep 18, 2012 at 2:59 PM, Lonni J Friedman netll...@gmail.com wrote:
 Greetings,
 I'm running version 3.3.0 on Fedora16-x86_64.  I have two bricks setup
 with a volume doing basic replication on an XFS formatted filesystem.
 I've NFS mounted the volume on a 3rd system, and invoked bonnie++ to
 write to the NFS mount point.  After a few minutes, I noticed that
 bonnie++ didn't seem to be generating any more progress output, at
 which point I started checking assorted logs to see if anything was
 wrong.  At that point, I saw that the client system was no longer able
 to write to the NFS mount point, and dmesg (and /var/log/messages) was
 spewing these warnings like crazy (dozens/second):
 nfs: server 10.31.99.166 not responding, still trying trying

 Those warnings started at 14:40:58 on the client system, but oddly
 stopped a few seconds later at 14:41:04.  Here's the full bonnie++
 output (/mnt/gv0 is where the gluster file system is mounted as an NFS
 client):
 
 [root@cuda-ljf0 ~]# bonnie++ -d /mnt/gv0 -u root
 Using uid:0, gid:0.
 Writing a byte at a time...done
 Writing intelligently...done
 Rewriting...
 

 Here's what's in the glusterfs logs at the moment:
 ##
 # tail etc-glusterfs-glusterd.vol.log
 [2012-09-18 14:54:39.026557] I
 [glusterd-handler.c:542:glusterd_req_ctx_create] 0-glusterd: Received
 op from uuid: 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
 [2012-09-18 14:54:39.029463] I
 [glusterd-handler.c:1417:glusterd_op_stage_send_resp] 0-glusterd:
 Responded to stage, ret: 0
 [2012-09-18 14:54:46.993426] I
 [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume]
 0-management: Received heal vol req for volume gv0
 [2012-09-18 14:54:46.993503] E [glusterd-utils.c:277:glusterd_lock]
 0-glusterd: Unable to get lock for uuid:
 e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by:
 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
 [2012-09-18 14:54:46.993520] E
 [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to
 acquire local lock, ret: -1
 [2012-09-18 14:55:47.175521] I
 [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd:
 Received get vol req
 [2012-09-18 14:55:47.181048] I
 [glusterd-handler.c:860:glusterd_handle_cli_get_volume] 0-glusterd:
 Received get vol req
 [2012-09-18 14:55:49.306776] I
 [glusterd-volume-ops.c:492:glusterd_handle_cli_heal_volume]
 0-management: Received heal vol req for volume gv0
 [2012-09-18 14:55:49.306834] E [glusterd-utils.c:277:glusterd_lock]
 0-glusterd: Unable to get lock for uuid:
 e9ce949d-8521-4868-ad1b-860e0ffd8768, lock held by:
 1d3fb6c7-f5eb-42e9-b2bc-48bd3ed09e62
 [2012-09-18 14:55:49.306844] E
 [glusterd-handler.c:453:glusterd_op_txn_begin] 0-management: Unable to
 acquire local lock, ret: -1
 # tail -f cli.log
 [2012-09-18 14:55:47.176824] I
 [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to
 get vol: 0
 [2012-09-18 14:55:47.180959] I
 [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0
 [2012-09-18 14:55:47.181128] I
 [cli-rpc-ops.c:479:gf_cli3_1_get_volume_cbk] 0-cli: Received resp to
 get vol: 0
 [2012-09-18 14:55:47.181167] I
 [cli-rpc-ops.c:732:gf_cli3_1_get_volume_cbk] 0-cli: Returning: 0
 [2012-09-18 14:55:47.181214] I [input.c:46:cli_batch] 0-: Exiting with: 0
 [2012-09-18 14:55:49.244795] W
 [rpc-transport.c:174:rpc_transport_load] 0-rpc-transport: missing
 'option transport-type'. defaulting to socket
 [2012-09-18 14:55:49.307054] I
 [cli-rpc-ops.c:5905:gf_cli3_1_heal_volume_cbk] 0-cli: Received resp to
 heal volume
 [2012-09-18 14:55:49.307274] W [dict.c:2339:dict_unserialize]
 (--/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xa5) [0x328ca10365]
 (--/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)
 [0x328ca0f965] (--gluster(gf_cli3_1_heal_volume_cbk+0x1d4)
 [0x4225e4]))) 0-dict: buf is null!
 [2012-09-18 14:55:49.307289] E
 [cli-rpc-ops.c:5930:gf_cli3_1_heal_volume_cbk] 0-: Unable to allocate
 memory
 [2012-09-18 14:55:49.307314] I [input.c:46:cli_batch] 0-: Exiting with: -1
 ##

 I'd be happy to provide more if someone requests something specific.

 Not sure what other information to provide at this point, but here's
 the basics of the gluster setup:
 ##
 # gluster volume info all

 Volume Name: gv0
 Type: Replicate
 Volume ID: 200046fc-1b5f-460c-b54b-96932e31ed3c
 Status: Started
 Number of Bricks: 1 x 2 = 2
 Transport-type: tcp
 Bricks:
 Brick1: 10.31.99.165:/mnt/sdb1
 Brick2: 10.31.99.166:/mnt/sdb1
 # gluster volume heal gv0 info
 operation failed
 ##

 I just noticed that glusterfs seems to be rapidly heading towards OOM
 territory.  The glusterfs daemon is currently consuming 90% of MEM
 according to top.

I just attempted to shutdown the glusterd service, and it ran off a
cliff.  the OOM killer kicked in and killed it.  From dmesg:
#
[ 4151.733182] glusterfsd invoked oom-killer: gfp_mask=0x201da,
order=0, oom_adj=0, oom_score_adj=0
[ 4151.733186] glusterfsd cpuset=/ mems_allowed=0
[ 4151.733189] Pid: 2567, comm: glusterfsd 

[Gluster-users] Geo replication gluster 3.3 error

2012-09-18 Thread Chandan Kumar
Hi All,

[root@vm1 vol]# gluster volume geo-replication vol root@slave:/data/replication
status
MASTER   SLAVE
STATUS

vol  root@slave:/data/replication
faulty
[root@vm1 vol]#


I am setting up the geo replication on gluster 3.3 for the first time. I
used the http://repos.fedorapeople.org/repos/kkeithle/glusterfs/HOWTO.UFOto
set up. I followed the same steps as in the admin guide however,
everytime its showing faulty and I am getting following logs in the ssh.


My setup is 3 virtual machines 2 are running as server nodes and third one
to be geo site. I have installed geo across all the nodes and kept the
gsynd file at /usr/local/libexe/gluster and /usr/libexe/gluster both on all
servers.

tailf
/var/log/glusterfs/geo-replication/vol/ssh%3A%2F%2Froot%4010.2.3.35%3Afile%3A%2F%2F%2Fdata%2Freplication.log

2012-09-18 15:34:12.593647] I [monitor(monitor):80:monitor] Monitor:

[2012-09-18 15:34:12.594135] I [monitor(monitor):81:monitor] Monitor:
starting gsyncd worker
[2012-09-18 15:34:12.633137] I [gsyncd:354:main_i] top: syncing:
gluster://localhost:vol - ssh://root@slave:/data/replication


[2012-09-18 15:34:13.803124] E [syncdutils:173:log_raise_exception] top:
connection to peer is broken
[2012-09-18 15:34:13.809307] E [resource:181:errfail] Popen: command ssh
-oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
/var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S
/tmp/gsyncd-aux-ssh-oSbzAm/gsycnd-ssh-%r@%h:%p
root@slave/usr/local/libexec/glusterfs/gsyncd --session-owner
9e3bec63-69b6-4df2-96a7-3ff42693fc33 -N --listen --timeout 120
file:///data/replication returned with 1, saying:
[2012-09-18 15:34:13.809423] E [resource:184:errfail] Popen: ssh
[2012-09-18 15:34:12.754773] W [rpc-transport.c:174:rpc_transport_load]
0-rpc-transport: missing 'option transport-type'. defaulting to socket
[2012-09-18 15:34:13.809514] E [resource:184:errfail] Popen: ssh
[2012-09-18 15:34:12.792495] E [socket.c:1715:socket_connect_finish]
0-glusterfs: connection to  failed (Connection refused)
[2012-09-18 15:34:13.809608] E [resource:184:errfail] Popen: ssh
[2012-09-18 15:34:13.792763] I [cli-cmd.c:145:cli_cmd_process] 0-: Exiting
with: 110
[2012-09-18 15:34:13.809689] E [resource:184:errfail] Popen: ssh gsyncd
initializaion failed
[2012-09-18 15:34:13.809832] I [syncdutils:142:finalize] top: exiting.


I am not sure what am I missing.


Thanks
Chandan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


Re: [Gluster-users] XFS and MD RAID

2012-09-18 Thread Jack Wang
2012/9/19 Brian Candler b.cand...@pobox.com:
 On Mon, Sep 10, 2012 at 09:29:25AM +0800, Jack Wang wrote:
 Hi Brian,

 below patch should fix your bug.

 John reports:
  BUG: soft lockup - CPU#2 stuck for 23s! [kworker/u:8:2202]  [..]  Call 
 Trace:
   [8141782a] scsi_remove_target+0xda/0x1f0
   [81421de5] sas_rphy_remove+0x55/0x60
   [81421e01] sas_rphy_delete+0x11/0x20
   [81421e35] sas_port_delete+0x25/0x160
   [814549a3] mptsas_del_end_device+0x183/0x270

 ...introduced by commit 3b661a9 [SCSI] fix hot unplug vs async scan race.

 I raised an Ubuntu bug which references this information and patch at
 https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1049013

 I have now been asked:

 Can you provide some information on the status of the patch with regards
 to getting it merged upstream?  Has it been sent upstream, what sort of
 feedback has it received, is it getting applied to a subsystem
 maintainer's tree, etc?

 Do you have any info on this?

 Thanks,

 Brian.

Hi Brian,

Patch is not been applied to subsystem maintainer's tree yet, James
may busy with other staff, you can send mail to James
james.bottom...@hansenpartnership.com  linux scsi
linux-s...@vger.kernel.org push this bug fix to be include in to
mainline.

Jack
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users