Hi,
Can Any body help me on this ??
On Thu, Nov 27, 2014 at 9:29 AM, Punit Dambiwal wrote:
> Hi Kaushal,
>
> Thanks for the detailed replylet me explain my setup first :-
>
> 1. Ovirt Engine
> 2. 4* host as well as storage machine (Host and gluster combined)
> 3. Every host has 24 bricks...
>
> Now whenever the host machine reboot...it can come up but can not join the
> cluster again and through the following error "Gluster command []
> failed on server.."
>
> Please check my comment in line :-
>
> 1. Use the same string for doing the peer probe and for the brick address
> during volume create/add-brick. Ideally, we suggest you use properly
> resolvable FQDNs everywhere. If that is not possible, then use only IP
> addresses. Try to avoid short names.
> ---
> [root@cpu05 ~]# gluster peer status
> Number of Peers: 3
>
> Hostname: cpu03.stack.com
> Uuid: 5729b8c4-e80d-4353-b456-6f467bddbdfb
> State: Peer in Cluster (Connected)
>
> Hostname: cpu04.stack.com
> Uuid: d272b790-c4b2-4bed-ba68-793656e6d7b0
> State: Peer in Cluster (Connected)
> Other names:
> 10.10.0.8
>
> Hostname: cpu02.stack.com
> Uuid: 8d8a7041-950e-40d0-85f9-58d14340ca25
> State: Peer in Cluster (Connected)
> [root@cpu05 ~]#
>
> 2. During boot up, make sure to launch glusterd only after the network is
> up. This will allow the new peer identification mechanism to do its
> job correctly.
> >> I think the service itself doing the same job
>
> [root@cpu05 ~]# cat /usr/lib/systemd/system/glusterd.service
> [Unit]
> Description=GlusterFS, a clustered file-system server
> After=network.target rpcbind.service
> Before=network-online.target
>
> [Service]
> Type=forking
> PIDFile=/var/run/glusterd.pid
> LimitNOFILE=65536
> ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
> KillMode=process
>
> [Install]
> WantedBy=multi-user.target
> [root@cpu05 ~]#
>
>
> gluster logs :-
>
> [2014-11-24 09:22:22.147471] I [MSGID: 100030] [glusterfsd.c:2018:main]
> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.6.1
> (args: /usr/sbin/glusterd -p /var/run/glusterd.pid)
> [2014-11-24 09:22:22.151565] I [glusterd.c:1214:init] 0-management:
> Maximum allowed open file descriptors set to 65536
> [2014-11-24 09:22:22.151599] I [glusterd.c:1259:init] 0-management: Using
> /var/lib/glusterd as working directory
> [2014-11-24 09:22:22.155216] W [rdma.c:4195:__gf_rdma_ctx_create]
> 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device)
> [2014-11-24 09:22:22.155264] E [rdma.c:4483:init] 0-rdma.management:
> Failed to initialize IB Device
> [2014-11-24 09:22:22.155285] E [rpc-transport.c:333:rpc_transport_load]
> 0-rpc-transport: 'rdma' initialization failed
> [2014-11-24 09:22:22.155354] W [rpcsvc.c:1524:rpcsvc_transport_create]
> 0-rpc-service: cannot create listener, initing the transport failed
> [2014-11-24 09:22:22.156290] I
> [glusterd.c:413:glusterd_check_gsync_present] 0-glusterd: geo-replication
> module not installed in the system
> [2014-11-24 09:22:22.161318] I
> [glusterd-store.c:2043:glusterd_restore_op_version] 0-glusterd: retrieved
> op-version: 30600
> [2014-11-24 09:22:22.821800] I
> [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management:
> connect returned 0
> [2014-11-24 09:22:22.825810] I
> [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management:
> connect returned 0
> [2014-11-24 09:22:22.828705] I
> [glusterd-handler.c:3146:glusterd_friend_add_from_peerinfo] 0-management:
> connect returned 0
> [2014-11-24 09:22:22.828771] I [rpc-clnt.c:969:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2014-11-24 09:22:22.832670] I [rpc-clnt.c:969:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2014-11-24 09:22:22.835919] I [rpc-clnt.c:969:rpc_clnt_connection_init]
> 0-management: setting frame-timeout to 600
> [2014-11-24 09:22:22.840209] E
> [glusterd-store.c:4248:glusterd_resolve_all_bricks] 0-glusterd: resolve
> brick failed in restore
> [2014-11-24 09:22:22.840233] E [xlator.c:425:xlator_init] 0-management:
> Initialization of volume 'management' failed, review your volfile again
> [2014-11-24 09:22:22.840245] E [graph.c:322:glusterfs_graph_init]
> 0-management: initializing translator failed
> [2014-11-24 09:22:22.840264] E [graph.c:525:glusterfs_graph_activate]
> 0-graph: init failed
> [2014-11-24 09:22:22.840754] W [glusterfsd.c:1194:cleanup_and_exit] (-->
> 0-: received signum (0), shutting down
>
> Thanks,
> Punit
>
>
>
>
> On Wed, Nov 26, 2014 at 7:14 PM, Kaushal M wrote:
>
>> Based on the logs I can guess that glusterd is being started before
>> the network has come up and that the addresses given to bricks do not
>> directly match the addresses used in during peer probe.
>>
>> The gluster_after_reboot log has the line "[2014-11-25
>> 06:46:09.972113] E [glusterd-store.c:2632:glusterd_resolve_all_bricks]
>> 0-glusterd: resolve brick failed in restore".
>>
>> Brick re