Re: [Gluster-devel] Netbsd build failure
Emmanuel Dreyfus wrote: > Yes, this is again a test corrupting random system files. > I started rebuild of nbslave7[149] from image... Done. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Netbsd build failure
Avra Sengupta wrote: > >> + '/opt/qa/build.sh' > >>File "/usr/pkg/lib/python2.7/site.py", line 601 > >> [2015-08-19 05:45:06.N]:++ G_LOG:./tests/basic/quota-anon-fd-nfs.t: TEST: 85 ! fd_write 3 content ++ > This particular test is currently in bad test and I believe Vijaikumar > is looking into it. Could you please make sure if there is any other > failure(apart from this), which is failing the regression runs. Yes, this is again a test corrupting random system files. I started rebuild of nbslave7[149] from image... -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Fresh NetBSD regression failures
Avra Sengupta wrote: > All NetBSD regression failures are again failing (more like refusing to > build), with the following error. Random files clobbered by G_LOG? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Skipped files during rebalance
Dear Susant, Do you think the patch submitted by Rafi could help with this? The nodes are on the same network in the same rack and as such should have no connectivity issues. Is it possible that the processes on nodes 104 and 106 were too “busy” and unable to accept new connections? Any helpers would be appreciated, — Christophe Dr Christophe Trefois, Dipl.-Ing. Technical Specialist / Post-Doc UNIVERSITÉ DU LUXEMBOURG LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE Campus Belval | House of Biomedicine 6, avenue du Swing L-4367 Belvaux T: +352 46 66 44 6124 F: +352 46 66 44 6949 http://www.uni.lu/lcsb This message is confidential and may contain privileged information. It is intended for the named recipient only. If you receive it in error please notify me and permanently delete the original message and any copies. > On 21 Aug 2015, at 14:57, Susant Palai wrote: > > Hi, > Mostly the rebalance failures are due to the network problem. > > Here is the log: > > [2015-08-16 20:31:36.301467] E [MSGID: 109023] > [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file > failed:/hcs/hcs/OperaArchiveCol/PA > 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/003002002.flex > lookup failed > [2015-08-16 20:31:36.921405] E [MSGID: 109023] > [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file > failed:/hcs/hcs/OperaArchiveCol/PA > 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/003004005.flex > lookup failed > [2015-08-16 20:31:36.921591] E [MSGID: 109023] > [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file > failed:/hcs/hcs/OperaArchiveCol/PA > 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/006004004.flex > lookup failed > [2015-08-16 20:31:36.921770] E [MSGID: 109023] > [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file > failed:/hcs/hcs/OperaArchiveCol/PA > 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/005004007.flex > lookup failed > [2015-08-16 20:31:37.577758] E [MSGID: 109023] > [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file > failed:/hcs/hcs/OperaArchiveCol/PA > 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/007004005.flex > lookup failed > [2015-08-16 20:34:12.387425] E [socket.c:2332:socket_connect_finish] > 0-live-client-4: connection to 192.168.123.106:24007 failed (Connection > refused) > [2015-08-16 20:34:12.392820] E [socket.c:2332:socket_connect_finish] > 0-live-client-5: connection to 192.168.123.106:24007 failed (Connection > refused) > [2015-08-16 20:34:12.398023] E [socket.c:2332:socket_connect_finish] > 0-live-client-0: connection to 192.168.123.104:24007 failed (Connection > refused) > [2015-08-16 20:34:12.402904] E [socket.c:2332:socket_connect_finish] > 0-live-client-2: connection to 192.168.123.104:24007 failed (Connection > refused) > [2015-08-16 20:34:12.407464] E [socket.c:2332:socket_connect_finish] > 0-live-client-3: connection to 192.168.123.106:24007 failed (Connection > refused) > [2015-08-16 20:34:12.412249] E [socket.c:2332:socket_connect_finish] > 0-live-client-1: connection to 192.168.123.104:24007 failed (Connection > refused) > [2015-08-16 20:34:12.416621] E [socket.c:2332:socket_connect_finish] > 0-live-client-6: connection to 192.168.123.105:24007 failed (Connection > refused) > [2015-08-16 20:34:12.420906] E [socket.c:2332:socket_connect_finish] > 0-live-client-8: connection to 192.168.123.105:24007 failed (Connection > refused) > [2015-08-16 20:34:12.425066] E [socket.c:2332:socket_connect_finish] > 0-live-client-7: connection to 192.168.123.105:24007 failed (Connection > refused) > [2015-08-16 20:34:17.479925] E [socket.c:2332:socket_connect_finish] > 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused) > [2015-08-16 20:36:23.788206] E [MSGID: 101075] > [common-utils.c:314:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or > service not known) > [2015-08-16 20:36:23.788286] E > [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-4: DNS > resolution failed on host stor106 > [2015-08-16 20:36:23.788387] E > [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-5: DNS > resolution failed on host stor106 > [2015-08-16 20:36:23.788918] E > [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-0: DNS > resolution failed on host stor104 > [2015-08-16 20:36:23.789233] E > [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-2: DNS > resolution failed on host stor104 > [2015-08-16 20:36:23.789295] E > [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-3: DNS > resolution failed on host stor106 > > > For the high mem usage part I will try to run rebalance and analyze. In the > mean time it will be help full if you can take a state dump of the rebalance > process when it is using high RAM. > > Here are the steps to take the state dump. > > 1. Find your state-dump destination
[Gluster-devel] Fresh NetBSD regression failures
Hi, All NetBSD regression failures are again failing (more like refusing to build), with the following error. [2015-08-21 10:53:51.N]:++ G_LOG:./tests/basic/meta.t: TEST: 18 Started volinfo_field patchy Status ++ Is someone aware of this issue. Right now no NetBSD regressions are running coz of this. Regards, Avra ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Skipped files during rebalance
Hi, Mostly the rebalance failures are due to the network problem. Here is the log: [2015-08-16 20:31:36.301467] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/PA 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/003002002.flex lookup failed [2015-08-16 20:31:36.921405] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/PA 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/003004005.flex lookup failed [2015-08-16 20:31:36.921591] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/PA 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/006004004.flex lookup failed [2015-08-16 20:31:36.921770] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/PA 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/005004007.flex lookup failed [2015-08-16 20:31:37.577758] E [MSGID: 109023] [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file failed:/hcs/hcs/OperaArchiveCol/PA 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/007004005.flex lookup failed [2015-08-16 20:34:12.387425] E [socket.c:2332:socket_connect_finish] 0-live-client-4: connection to 192.168.123.106:24007 failed (Connection refused) [2015-08-16 20:34:12.392820] E [socket.c:2332:socket_connect_finish] 0-live-client-5: connection to 192.168.123.106:24007 failed (Connection refused) [2015-08-16 20:34:12.398023] E [socket.c:2332:socket_connect_finish] 0-live-client-0: connection to 192.168.123.104:24007 failed (Connection refused) [2015-08-16 20:34:12.402904] E [socket.c:2332:socket_connect_finish] 0-live-client-2: connection to 192.168.123.104:24007 failed (Connection refused) [2015-08-16 20:34:12.407464] E [socket.c:2332:socket_connect_finish] 0-live-client-3: connection to 192.168.123.106:24007 failed (Connection refused) [2015-08-16 20:34:12.412249] E [socket.c:2332:socket_connect_finish] 0-live-client-1: connection to 192.168.123.104:24007 failed (Connection refused) [2015-08-16 20:34:12.416621] E [socket.c:2332:socket_connect_finish] 0-live-client-6: connection to 192.168.123.105:24007 failed (Connection refused) [2015-08-16 20:34:12.420906] E [socket.c:2332:socket_connect_finish] 0-live-client-8: connection to 192.168.123.105:24007 failed (Connection refused) [2015-08-16 20:34:12.425066] E [socket.c:2332:socket_connect_finish] 0-live-client-7: connection to 192.168.123.105:24007 failed (Connection refused) [2015-08-16 20:34:17.479925] E [socket.c:2332:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused) [2015-08-16 20:36:23.788206] E [MSGID: 101075] [common-utils.c:314:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or service not known) [2015-08-16 20:36:23.788286] E [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-4: DNS resolution failed on host stor106 [2015-08-16 20:36:23.788387] E [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-5: DNS resolution failed on host stor106 [2015-08-16 20:36:23.788918] E [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-0: DNS resolution failed on host stor104 [2015-08-16 20:36:23.789233] E [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-2: DNS resolution failed on host stor104 [2015-08-16 20:36:23.789295] E [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-3: DNS resolution failed on host stor106 For the high mem usage part I will try to run rebalance and analyze. In the mean time it will be help full if you can take a state dump of the rebalance process when it is using high RAM. Here are the steps to take the state dump. 1. Find your state-dump destination; Run "gluster --print-statedumpdir". The state dump will be stored in this location. 2. When you see any of the rebalance process on any of the servers using high memory issue the following command. "kill -USR1 ". ---> ps aux | grep rebalance should give the rebalance process pid. The state dump should give some hint about the high mem-usage. Thanks, Susant - Original Message - From: "Susant Palai" To: "Christophe TREFOIS" Cc: "Gluster Devel" Sent: Friday, 21 August, 2015 3:52:07 PM Subject: Re: [Gluster-devel] Skipped files during rebalance Thanks Christophe for the details. Will get back to you with the analysis. Regards, Susant - Original Message - From: "Christophe TREFOIS" To: "Susant Palai" Cc: "Raghavendra Gowdappa" , "Nithya Balachandran" , "Shyamsundar Ranganathan" , "Mohammed Rafi K C" , "Gluster Devel" Sent: Friday, 21 August, 2015 12:39:05 AM Subject: Re: [Gluster-devel] Skipped files during rebalance Dear Susant, The rebalance failed again and also had (in my opinion) excessive RAM usage. Please find a very detailled list below. All l
Re: [Gluster-devel] Skipped files during rebalance
Thanks Christophe for the details. Will get back to you with the analysis. Regards, Susant - Original Message - From: "Christophe TREFOIS" To: "Susant Palai" Cc: "Raghavendra Gowdappa" , "Nithya Balachandran" , "Shyamsundar Ranganathan" , "Mohammed Rafi K C" , "Gluster Devel" Sent: Friday, 21 August, 2015 12:39:05 AM Subject: Re: [Gluster-devel] Skipped files during rebalance Dear Susant, The rebalance failed again and also had (in my opinion) excessive RAM usage. Please find a very detailled list below. All logs: http://wikisend.com/download/651948/allstores.tar.gz Thank you for letting me know how I could successfully complete the rebalance process. The fedora pastes are the output of top of each node at that time (more or less). Please let me know if you need more information, Best, —— Start of mem info # After reboot, before starting glusterd [root@highlander ~]# pdsh -g live 'free -m' stor106: totalusedfree shared buff/cache available stor106: Mem: 1932492208 190825 9 215 190772 stor106: Swap: 0 0 0 stor105: totalusedfree shared buff/cache available stor105: Mem: 1932482275 190738 9 234 190681 stor105: Swap: 0 0 0 stor104: totalusedfree shared buff/cache available stor104: Mem: 1932492221 190811 9 216 190757 stor104: Swap: 0 0 0 [root@highlander ~]# # Gluster Info [root@stor106 glusterfs]# gluster volume info Volume Name: live Type: Distribute Volume ID: 1328637d-7730-4627-8945-bbe43626d527 Status: Started Number of Bricks: 9 Transport-type: tcp Bricks: Brick1: stor104:/zfs/brick0/brick Brick2: stor104:/zfs/brick1/brick Brick3: stor104:/zfs/brick2/brick Brick4: stor106:/zfs/brick0/brick Brick5: stor106:/zfs/brick1/brick Brick6: stor106:/zfs/brick2/brick Brick7: stor105:/zfs/brick0/brick Brick8: stor105:/zfs/brick1/brick Brick9: stor105:/zfs/brick2/brick Options Reconfigured: nfs.disable: true diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.write-behind-window-size: 4MB performance.io-thread-count: 32 performance.client-io-threads: on performance.cache-size: 1GB performance.cache-refresh-timeout: 60 performance.cache-max-file-size: 4MB cluster.data-self-heal-algorithm: full diagnostics.client-log-level: ERROR diagnostics.brick-log-level: ERROR cluster.min-free-disk: 1% server.allow-insecure: on # Starting gluserd [root@highlander ~]# pdsh -g live 'systemctl start glusterd' [root@highlander ~]# pdsh -g live 'free -m' stor106: totalusedfree shared buff/cache available stor106: Mem: 1932492290 190569 9 389 190587 stor106: Swap: 0 0 0 stor104: totalusedfree shared buff/cache available stor104: Mem: 1932492297 190557 9 394 190571 stor104: Swap: 0 0 0 stor105: totalusedfree shared buff/cache available stor105: Mem: 1932482286 190554 9 407 190595 stor105: Swap: 0 0 0 [root@highlander ~]# systemctl start glusterd [root@highlander ~]# gluster volume start live volume start: live: success [root@highlander ~]# gluster volume status Status of volume: live Gluster process TCP Port RDMA Port Online Pid -- Brick stor104:/zfs/brick0/brick 49164 0 Y 5945 Brick stor104:/zfs/brick1/brick 49165 0 Y 5963 Brick stor104:/zfs/brick2/brick 49166 0 Y 5981 Brick stor106:/zfs/brick0/brick 49158 0 Y 5256 Brick stor106:/zfs/brick1/brick 49159 0 Y 5274 Brick stor106:/zfs/brick2/brick 49160 0 Y 5292 Brick stor105:/zfs/brick0/brick 49155 0 Y 5284 Brick stor105:/zfs/brick1/brick 49156 0 Y 5302 Brick stor105:/zfs/brick2/brick 49157 0 Y 5320 NFS Server on localhost N/A N/AN N/A NFS Server on 192.168.123.106 N/A N/AN N/A NFS Server on stor105 N/A N/AN N/A NFS Server on 192.168.123.104 N/A N/AN N/A Task Status of Volume live -- There are no active volume tasks [root@highlander ~]# # Memory
Re: [Gluster-devel] Netbsd build failure
On Fri, Aug 21, 2015 at 10:37:43AM +0530, Vijaikumar M wrote: > We have marked test './tests/basic/quota-anon-fd-nfs.t' as bad-test, I am > not sure about 'SyntaxError' error. I think there is some parsing error in > the shell script, need to root cause the issue. Isn't this another instance of G_LOG appending to the wrong file? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Netbsd build failure
On Fri, Aug 21, 2015 at 10:32:33AM +0530, Raghavendra Talur wrote: > Some assumption with FD number in shell is wrong some where. Putting it out > so that anybody who had better idea can debug faster, I will look into it > too. I think you are on the right track. The thing alsays append log lines to files likely to be opn eby the test suite. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel