Re: [Gluster-users] Command "/etc/init.d/glusterd start" failed

2014-04-19 Thread Joe Julian
[2014-04-11 18:12:03.433371] E [rpc-transport.c:269:rpc_transport_load] 
0-rpc-transport: /usr/local/lib/glusterfs/3.4.3/rpc-transport/rdma.so: cannot 
open shared object file: No such file or directory

This simply means that rdma support wasn't installed. It always tries to load 
the rdma library. If it fails, there is no rdma support. 

On April 19, 2014 9:49:58 AM PDT, Chalcogen  
wrote:
>I have been plagued by errors of this kind every so often, mainly 
>because we are in a development phase and we reboot our servers so 
>frequently. If you start glusterd in debug mode:
>
>sh$ glusterd --debug
>
>you can easily pinpoint exactly which volume/peer data is causing the 
>initialization failure for mgmt/glusterd.
>
>In addition, from my own experiences, two of the leading reasons for 
>failure include:
>a) Bad peer data if glusterd is somehow killed during an active peer 
>probe operation, and
>b) I have noticed that if glusterd needs to update info for
>volume/brick 
>(say "info" for volume testvol) in /var/lib/glusterd, it first renames 
>/var/lib/glusterd/vols/testvol/info to info.tmp, and then creates a new
>
>file info, which is probably written into _freshly_. If glusterd were
>to 
>crash at this point, it would cause failures in glusterd startup till 
>this is manually resolved. Usually, moving info.tmp into info works for
>me.
>
>Thanks,
>Anirban
>
>On Saturday 12 April 2014 08:45 AM, 吴保川 wrote:
>> It is tcp.
>>
>> [root@server1 wbc]# gluster volume info
>>
>> Volume Name: gv_replica
>> Type: Replicate
>> Volume ID: 81014863-ee59-409b-8897-6485d411d14d
>> Status: Started
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: 192.168.1.3:/home/wbc/vdir/gv_replica
>> Brick2: 192.168.1.4:/home/wbc/vdir/gv_replica
>>
>> Volume Name: gv1
>> Type: Distribute
>> Volume ID: cfe2b8a0-284b-489d-a153-21182933f266
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: 192.168.1.4:/home/wbc/vdir/gv1
>> Brick2: 192.168.1.3:/home/wbc/vdir/gv1
>>
>> Thanks,
>> Baochuan Wu
>>
>>
>>
>> 2014-04-12 10:11 GMT+08:00 Nagaprasad Sathyanarayana 
>> mailto:nsath...@redhat.com>>:
>>
>> If you run
>>
>> # gluster volume info
>>
>> What is the value set for transport-type?
>>
>> Thanks
>> Naga
>>
>>
>> On 12-Apr-2014, at 7:33 am, 吴保川 > > wrote:
>>
>>> Thanks, Joe. I found one of my machine has been assigned wrong
>IP
>>> address. This leads to the error.
>>> Originally, I thought the following error is critical:
>>> [2014-04-11 18:12:03.433371] E
>>> [rpc-transport.c:269:rpc_transport_load] 0-rpc-transport:
>>> /usr/local/lib/glusterfs/3.4.3/rpc-transport/rdma.so: cannot
>open
>>> shared object file: No such file or directory
>>>
>>>
>>> 2014-04-12 5:34 GMT+08:00 Joe Julian >> >:
>>>
>>> On 04/11/2014 11:18 AM, 吴保川 wrote:
>>>
>>> [2014-04-11 18:12:05.165989] E
>>> [glusterd-store.c:2663:glusterd_resolve_all_bricks]
>>> 0-glusterd: resolve brick failed in restore
>>>
>>> I'm pretty sure that means that one of the bricks isn't
>>> resolved in your list of peers.
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org 
>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>___
>Gluster-users mailing list
>Gluster-users@gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Command "/etc/init.d/glusterd start" failed

2014-04-19 Thread Chalcogen
I have been plagued by errors of this kind every so often, mainly 
because we are in a development phase and we reboot our servers so 
frequently. If you start glusterd in debug mode:


sh$ glusterd --debug

you can easily pinpoint exactly which volume/peer data is causing the 
initialization failure for mgmt/glusterd.


In addition, from my own experiences, two of the leading reasons for 
failure include:
a) Bad peer data if glusterd is somehow killed during an active peer 
probe operation, and
b) I have noticed that if glusterd needs to update info for volume/brick 
(say "info" for volume testvol) in /var/lib/glusterd, it first renames 
/var/lib/glusterd/vols/testvol/info to info.tmp, and then creates a new 
file info, which is probably written into _freshly_. If glusterd were to 
crash at this point, it would cause failures in glusterd startup till 
this is manually resolved. Usually, moving info.tmp into info works for me.


Thanks,
Anirban

On Saturday 12 April 2014 08:45 AM, 吴保川 wrote:

It is tcp.

[root@server1 wbc]# gluster volume info

Volume Name: gv_replica
Type: Replicate
Volume ID: 81014863-ee59-409b-8897-6485d411d14d
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.3:/home/wbc/vdir/gv_replica
Brick2: 192.168.1.4:/home/wbc/vdir/gv_replica

Volume Name: gv1
Type: Distribute
Volume ID: cfe2b8a0-284b-489d-a153-21182933f266
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 192.168.1.4:/home/wbc/vdir/gv1
Brick2: 192.168.1.3:/home/wbc/vdir/gv1

Thanks,
Baochuan Wu



2014-04-12 10:11 GMT+08:00 Nagaprasad Sathyanarayana 
mailto:nsath...@redhat.com>>:


If you run

# gluster volume info

What is the value set for transport-type?

Thanks
Naga


On 12-Apr-2014, at 7:33 am, 吴保川 mailto:wildpointe...@gmail.com>> wrote:


Thanks, Joe. I found one of my machine has been assigned wrong IP
address. This leads to the error.
Originally, I thought the following error is critical:
[2014-04-11 18:12:03.433371] E
[rpc-transport.c:269:rpc_transport_load] 0-rpc-transport:
/usr/local/lib/glusterfs/3.4.3/rpc-transport/rdma.so: cannot open
shared object file: No such file or directory


2014-04-12 5:34 GMT+08:00 Joe Julian mailto:j...@julianfamily.org>>:

On 04/11/2014 11:18 AM, 吴保川 wrote:

[2014-04-11 18:12:05.165989] E
[glusterd-store.c:2663:glusterd_resolve_all_bricks]
0-glusterd: resolve brick failed in restore

I'm pretty sure that means that one of the bricks isn't
resolved in your list of peers.


___
Gluster-users mailing list
Gluster-users@gluster.org 
http://supercolony.gluster.org/mailman/listinfo/gluster-users





___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Conflicting entries for symlinks between bricks (Trusted.gfid not consistent)

2014-04-19 Thread Joe Julian
What would really help is a clear list of steps to reproduce this issue. It 
sounds like a bug but I can't repro.

In your questions you ask in relation to adding or removing bricks whether you 
can continue to read and write. My understanding is that you're not doing that 
(gluster volume (add|remove)-brick) but rather just shutting down. If my 
understanding is correct, then yes. You should be able to continue normal 
operation. 
Repairing this issue is the same as healing split-brain. The easiest way is to 
use splitmount[1] to delete one of them.

[1] https://forge.gluster.org/splitmount



On April 17, 2014 4:37:32 PM PDT, "PEPONNET, Cyril (Cyril)" 
 wrote:
>Hi gluster people !
>
>I would like some help regarding an issue we have with our early
>production glusterfs setup.
>
>Our Topology:
>
>2 Bricks in Replicate mode:
>
>[root@myBrick1 /]# cat /etc/redhat-release
>CentOS release 6.5 (Final)
>[root@myBrick1 /]# glusterfs --version
>glusterfs 3.4.2 built on Jan  3 2014 12:38:05
>Repository revision: git://git.gluster.com/glusterfs.git
>Copyright (c) 2006-2013 Red Hat, Inc. 
>
>
>[root@myBrick1 /]# gluster volume info
>
>Volume Name: myVol
>Type: Replicate
>Volume ID: 58f5d775-acb5-416d-bee6-5209f7b20363
>Status: Started
>Number of Bricks: 1 x 2 = 2
>Transport-type: tcp
>Bricks:
>Brick1: myBrick1.company.lan:/export/raid/myVol
>Brick2: myBrick2.company.lan:/export/raid/myVol
>Options Reconfigured:
>nfs.enable-ino32: on
>
>The issue:
>
>We power down a brick (myBrick1) for hardware maintenance, when we
>power it up, issues starts with some files (symlinks in fact), auto
>healing seems not working fine for all the files…
>
>Let's take a look with one faulty symlink:
>
>Using fuse.glusterfs (sometimes it works sometimes not)
>
>[root@myBrick2 /]mount
>...
>myBrick2.company.lan:/myVol on /images type fuse.glusterfs
>(rw,default_permissions,allow_other,max_read=131072)
>...
>
>[root@myBrick2 /]# stat /images/myProject1/2.1_stale/current
>  File: `/images/myProject1/2.1_stale/current' -> `current-59a77422'
>  Size: 16 Blocks: 0  IO Block: 131072 symbolic link
>Device: 13h/19d Inode: 11422905275486058235  Links: 1
>Access: (0777/lrwxrwxrwx)  Uid: (  499/ testlab)   Gid: (  499/
>testlab)
>Access: 2014-04-17 14:05:54.488238322 -0700
>Modify: 2014-04-16 19:46:05.033299589 -0700
>Change: 2014-04-17 14:05:54.487238322 -0700
>
>[root@myBrick2 /]# stat /images/myProject1/2.1_stale/current
>stat: cannot stat `/images/myProject1/2.1_stale/current': Input/output
>error
>
>I type the above commands with few seconds between them.
>
>Let's try with the other brick
>
>[root@myBrick1 ~]mount
>...
>myBrick1.company.lan:/myVol on /images type fuse.glusterfs
>(rw,default_permissions,allow_other,max_read=131072)
>...
>
>[root@myBrick1 ~]# stat /images/myProject1/2.1_stale/current
>stat: cannot stat `/images/myProject1/2.1_stale/current': Input/output
>error
>
>With this one it always fail… (myBrick1 is the server we powered up
>after maintenance).
>
>Using nfs:
>
>It never works (tested with two bricks)
>
>[root@station-localdomain myProject1]# mount
>...
>myBrick1:/myVol on /images type nfs
>(rw,relatime,vers=3,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp,timeo=14,retrans=2,sec=sys,mountaddr=10.0.0.57,mountvers=3,mountport=38465,mountproto=tcp,local_lock=none,addr=10.0.0.57)
>...
>
>[root@station-localdomain myProject1]# ls 2.1_stale
>ls: cannot access 2.1_stale: Input/output error
>
>In both cases here are the logs:
>
>==> /var/log/glusterfs/glustershd.log <==
>[2014-04-17 10:20:25.861003] I
>[afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0:
>: Performing conservative
>merge
>[2014-04-17 10:20:25.895143] I
>[afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0:
>: Performing conservative
>merge
>[2014-04-17 10:20:25.949176] I
>[afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0:
>: Performing conservative
>merge
>[2014-04-17 10:20:25.995289] I
>[afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0:
>: Performing conservative
>merge
>[2014-04-17 10:20:26.013995] I
>[afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0:
>: Performing conservative
>merge
>[2014-04-17 10:20:26.050693] I
>[afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0:
>: Performing conservative
>merge
>
>==> /var/log/glusterfs/usr-global.log <==
>[2014-04-17 10:20:38.281705] I
>[afr-self-heal-entry.c:2253:afr_sh_entry_fix] 0-myVol-replicate-0:
>/images/myProject1/2.1_stale: Performing conservative merge
>[2014-04-17 10:20:38.286986] W
>[afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0:
>/images/myProject1/2.1_stale/latest_s: gfid differs on subvolume 1
>[2014-04-17 10:20:38.287030] E
>[afr-self-heal-common.c:1433:afr_sh_common_lookup_cbk]
>0-myVol-replicate-0: Conflicting entries for
>/images/myProject1/2.1_stale/latest_s
>[2014-04-17 10:20:38.287169] W
>[afr-common.c:1505:afr_conflicting_iattrs] 0-myVol-replicate-0:
>/ima