On 12/14/2011 03:06 PM, Changliang Chen wrote:
Hi,we have use glusterfs for two years. After upgraded to 3.2.5,we discover that when one of replicate node reboot and startup the glusterd daemon,the gluster will crash cause by the other

replicate node cpu usage reach 100%.

Our gluster info:

Type: Distributed-Replicate
Status: Started
Number of Bricks: 5 x 2 = 10
Transport-type: tcp
Options Reconfigured:
performance.cache-size: 3GB
performance.cache-max-file-size: 512KB
network.frame-timeout: 30
network.ping-timeout: 25
cluster.min-free-disk: 10%

Our device:

Dell R710
600Gsas *6
3*8Gmem

The error info:

[2011-12-14 13:24:10.483812] E [rdma.c:4813:init] 0-rdma.management: Failed to initialize IB Device [2011-12-14 13:24:10.483828] E [rpc-transport.c:742:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed [2011-12-14 13:24:10.483841] W [rpcsvc.c:1288:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed [2011-12-14 13:24:11.967621] E [glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0 [2011-12-14 13:24:11.967665] E [glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1 [2011-12-14 13:24:11.967681] E [glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-: Unknown key: brick-2 [2011-12-14 13:24:11.967695] E [glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-: Unknown key: brick-3 [2011-12-14 13:24:11.967709] E [glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-: Unknown key: brick-4 [2011-12-14 13:24:11.967723] E [glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-: Unknown key: brick-5 [2011-12-14 13:24:11.967736] E [glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-: Unknown key: brick-6 [2011-12-14 13:24:11.967750] E [glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-: Unknown key: brick-7 [2011-12-14 13:24:11.967764] E [glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-: Unknown key: brick-8 [2011-12-14 13:24:11.967777] E [glusterd-store.c:1820:glusterd_store_retrieve_volume] 0-: Unknown key: brick-9 [2011-12-14 13:24:12.465565] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.17:1013 <http://10.1.1.17:1013>) [2011-12-14 13:24:12.465623] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.8:1013 <http://10.1.1.8:1013>) [2011-12-14 13:24:12.465656] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.10:1013 <http://10.1.1.10:1013>) [2011-12-14 13:24:12.465686] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.11:1013 <http://10.1.1.11:1013>) [2011-12-14 13:24:12.465716] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.125:1013 <http://10.1.1.125:1013>) [2011-12-14 13:24:12.633288] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.65:1006 <http://10.1.1.65:1006>) [2011-12-14 13:24:13.138150] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.1:1013 <http://10.1.1.1:1013>) [2011-12-14 13:24:13.284665] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.3:1013 <http://10.1.1.3:1013>) [2011-12-14 13:24:15.790805] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.8:1013 <http://10.1.1.8:1013>) [2011-12-14 13:24:16.113430] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.125:1013 <http://10.1.1.125:1013>) [2011-12-14 13:24:16.259040] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.10:1013 <http://10.1.1.10:1013>) [2011-12-14 13:24:16.392058] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.17:1013 <http://10.1.1.17:1013>) [2011-12-14 13:24:16.429444] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (10.1.1.11:1013 <http://10.1.1.11:1013>) [2011-12-14 13:26:05.787680] W [glusterfsd.c:727:cleanup_and_exit] (-->/lib64/libc.so.6(clone+0x6d) [0x37c8ed3c2d] (-->/lib64/libpthread.so.0 [0x37c96064a7] (-->/opt/glusterfs/3.2.5/sbin/glusterd(glusterfs_sigwaiter+0x17c) [0x40477c]))) 0-: received signum (15), shutting down

--

Regards,

Cocl



_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
hi Changliang,
Could you specify which process crashed. Is it glusterd or glusterfs? Could you provide the stack trace that is present in it's respective logfile. I dont see any stack trace in the logs you have provided.

Pranith
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to