Re: [Gluster-devel] Netbsd build failure

2015-08-21 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> Yes, this is again a test corrupting random system files.
> I started rebuild of nbslave7[149] from image...

Done.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Netbsd build failure

2015-08-21 Thread Emmanuel Dreyfus
Avra Sengupta  wrote:

> >> + '/opt/qa/build.sh'
> >>File "/usr/pkg/lib/python2.7/site.py", line 601
> >>  [2015-08-19 05:45:06.N]:++
G_LOG:./tests/basic/quota-anon-fd-nfs.t: TEST: 85 ! fd_write 3 content
++
> This particular test is currently in bad test and I believe Vijaikumar
> is looking into it. Could you please make sure if there is any other 
> failure(apart from this), which is failing the regression runs.

Yes, this is again a test corrupting random system files.
I started rebuild of nbslave7[149] from image...

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Fresh NetBSD regression failures

2015-08-21 Thread Emmanuel Dreyfus
Avra Sengupta  wrote:

> All NetBSD regression failures are again failing (more like refusing to
> build), with the following error.

Random files clobbered by G_LOG?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Skipped files during rebalance

2015-08-21 Thread Christophe TREFOIS
Dear Susant,

Do you think the patch submitted by Rafi could help with this?

The nodes are on the same network in the same rack and as such should have no 
connectivity issues.

Is it possible that the processes on nodes 104 and 106 were too “busy” and 
unable to accept new connections?

Any helpers would be appreciated,

—
Christophe

Dr Christophe Trefois, Dipl.-Ing.  
Technical Specialist / Post-Doc

UNIVERSITÉ DU LUXEMBOURG

LUXEMBOURG CENTRE FOR SYSTEMS BIOMEDICINE
Campus Belval | House of Biomedicine  
6, avenue du Swing 
L-4367 Belvaux  
T: +352 46 66 44 6124 
F: +352 46 66 44 6949  
http://www.uni.lu/lcsb




This message is confidential and may contain privileged information. 
It is intended for the named recipient only. 
If you receive it in error please notify me and permanently delete the original 
message and any copies. 


  

> On 21 Aug 2015, at 14:57, Susant Palai  wrote:
> 
> Hi,
> Mostly the rebalance failures are due to the network problem.
> 
> Here is the log:
> 
> [2015-08-16 20:31:36.301467] E [MSGID: 109023] 
> [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
> failed:/hcs/hcs/OperaArchiveCol/PA 
> 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/003002002.flex 
> lookup failed
> [2015-08-16 20:31:36.921405] E [MSGID: 109023] 
> [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
> failed:/hcs/hcs/OperaArchiveCol/PA 
> 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/003004005.flex 
> lookup failed
> [2015-08-16 20:31:36.921591] E [MSGID: 109023] 
> [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
> failed:/hcs/hcs/OperaArchiveCol/PA 
> 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/006004004.flex 
> lookup failed
> [2015-08-16 20:31:36.921770] E [MSGID: 109023] 
> [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
> failed:/hcs/hcs/OperaArchiveCol/PA 
> 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/005004007.flex 
> lookup failed
> [2015-08-16 20:31:37.577758] E [MSGID: 109023] 
> [dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
> failed:/hcs/hcs/OperaArchiveCol/PA 
> 27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/007004005.flex 
> lookup failed
> [2015-08-16 20:34:12.387425] E [socket.c:2332:socket_connect_finish] 
> 0-live-client-4: connection to 192.168.123.106:24007 failed (Connection 
> refused)
> [2015-08-16 20:34:12.392820] E [socket.c:2332:socket_connect_finish] 
> 0-live-client-5: connection to 192.168.123.106:24007 failed (Connection 
> refused)
> [2015-08-16 20:34:12.398023] E [socket.c:2332:socket_connect_finish] 
> 0-live-client-0: connection to 192.168.123.104:24007 failed (Connection 
> refused)
> [2015-08-16 20:34:12.402904] E [socket.c:2332:socket_connect_finish] 
> 0-live-client-2: connection to 192.168.123.104:24007 failed (Connection 
> refused)
> [2015-08-16 20:34:12.407464] E [socket.c:2332:socket_connect_finish] 
> 0-live-client-3: connection to 192.168.123.106:24007 failed (Connection 
> refused)
> [2015-08-16 20:34:12.412249] E [socket.c:2332:socket_connect_finish] 
> 0-live-client-1: connection to 192.168.123.104:24007 failed (Connection 
> refused)
> [2015-08-16 20:34:12.416621] E [socket.c:2332:socket_connect_finish] 
> 0-live-client-6: connection to 192.168.123.105:24007 failed (Connection 
> refused)
> [2015-08-16 20:34:12.420906] E [socket.c:2332:socket_connect_finish] 
> 0-live-client-8: connection to 192.168.123.105:24007 failed (Connection 
> refused)
> [2015-08-16 20:34:12.425066] E [socket.c:2332:socket_connect_finish] 
> 0-live-client-7: connection to 192.168.123.105:24007 failed (Connection 
> refused)
> [2015-08-16 20:34:17.479925] E [socket.c:2332:socket_connect_finish] 
> 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)
> [2015-08-16 20:36:23.788206] E [MSGID: 101075] 
> [common-utils.c:314:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or 
> service not known)
> [2015-08-16 20:36:23.788286] E 
> [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-4: DNS 
> resolution failed on host stor106
> [2015-08-16 20:36:23.788387] E 
> [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-5: DNS 
> resolution failed on host stor106
> [2015-08-16 20:36:23.788918] E 
> [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-0: DNS 
> resolution failed on host stor104
> [2015-08-16 20:36:23.789233] E 
> [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-2: DNS 
> resolution failed on host stor104
> [2015-08-16 20:36:23.789295] E 
> [name.c:247:af_inet_client_get_remote_sockaddr] 0-live-client-3: DNS 
> resolution failed on host stor106
> 
> 
> For the high mem usage part I will try to run rebalance and analyze. In the 
> mean time it will be help full if you can take a state dump of the rebalance 
> process when it is using high RAM.
> 
> Here are the steps to take the state dump.
> 
> 1. Find your state-dump destination

[Gluster-devel] Fresh NetBSD regression failures

2015-08-21 Thread Avra Sengupta

Hi,

All NetBSD regression failures are again failing (more like refusing to 
build), with the following error.


[2015-08-21 10:53:51.N]:++ G_LOG:./tests/basic/meta.t: TEST: 18 
Started volinfo_field patchy Status ++

Is someone aware of this issue. Right now no NetBSD regressions are 
running coz of this.


Regards,
Avra

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Skipped files during rebalance

2015-08-21 Thread Susant Palai
Hi,
 Mostly the rebalance failures are due to the network problem.

Here is the log:

[2015-08-16 20:31:36.301467] E [MSGID: 109023] 
[dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
failed:/hcs/hcs/OperaArchiveCol/PA 
27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/003002002.flex 
lookup failed
[2015-08-16 20:31:36.921405] E [MSGID: 109023] 
[dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
failed:/hcs/hcs/OperaArchiveCol/PA 
27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/003004005.flex 
lookup failed
[2015-08-16 20:31:36.921591] E [MSGID: 109023] 
[dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
failed:/hcs/hcs/OperaArchiveCol/PA 
27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/006004004.flex 
lookup failed
[2015-08-16 20:31:36.921770] E [MSGID: 109023] 
[dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
failed:/hcs/hcs/OperaArchiveCol/PA 
27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/005004007.flex 
lookup failed
[2015-08-16 20:31:37.577758] E [MSGID: 109023] 
[dht-rebalance.c:1965:gf_defrag_get_entry] 0-live-dht: Migrate file 
failed:/hcs/hcs/OperaArchiveCol/PA 
27112012_ATCC_Fibroblasts_Chem/Meas_10(2012-11-27_20-15-48)/007004005.flex 
lookup failed
[2015-08-16 20:34:12.387425] E [socket.c:2332:socket_connect_finish] 
0-live-client-4: connection to 192.168.123.106:24007 failed (Connection refused)
[2015-08-16 20:34:12.392820] E [socket.c:2332:socket_connect_finish] 
0-live-client-5: connection to 192.168.123.106:24007 failed (Connection refused)
[2015-08-16 20:34:12.398023] E [socket.c:2332:socket_connect_finish] 
0-live-client-0: connection to 192.168.123.104:24007 failed (Connection refused)
[2015-08-16 20:34:12.402904] E [socket.c:2332:socket_connect_finish] 
0-live-client-2: connection to 192.168.123.104:24007 failed (Connection refused)
[2015-08-16 20:34:12.407464] E [socket.c:2332:socket_connect_finish] 
0-live-client-3: connection to 192.168.123.106:24007 failed (Connection refused)
[2015-08-16 20:34:12.412249] E [socket.c:2332:socket_connect_finish] 
0-live-client-1: connection to 192.168.123.104:24007 failed (Connection refused)
[2015-08-16 20:34:12.416621] E [socket.c:2332:socket_connect_finish] 
0-live-client-6: connection to 192.168.123.105:24007 failed (Connection refused)
[2015-08-16 20:34:12.420906] E [socket.c:2332:socket_connect_finish] 
0-live-client-8: connection to 192.168.123.105:24007 failed (Connection refused)
[2015-08-16 20:34:12.425066] E [socket.c:2332:socket_connect_finish] 
0-live-client-7: connection to 192.168.123.105:24007 failed (Connection refused)
[2015-08-16 20:34:17.479925] E [socket.c:2332:socket_connect_finish] 
0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)
[2015-08-16 20:36:23.788206] E [MSGID: 101075] 
[common-utils.c:314:gf_resolve_ip6] 0-resolver: getaddrinfo failed (Name or 
service not known)
[2015-08-16 20:36:23.788286] E [name.c:247:af_inet_client_get_remote_sockaddr] 
0-live-client-4: DNS resolution failed on host stor106
[2015-08-16 20:36:23.788387] E [name.c:247:af_inet_client_get_remote_sockaddr] 
0-live-client-5: DNS resolution failed on host stor106
[2015-08-16 20:36:23.788918] E [name.c:247:af_inet_client_get_remote_sockaddr] 
0-live-client-0: DNS resolution failed on host stor104
[2015-08-16 20:36:23.789233] E [name.c:247:af_inet_client_get_remote_sockaddr] 
0-live-client-2: DNS resolution failed on host stor104
[2015-08-16 20:36:23.789295] E [name.c:247:af_inet_client_get_remote_sockaddr] 
0-live-client-3: DNS resolution failed on host stor106


For the high mem usage part I will try to run rebalance and analyze. In the 
mean time it will be help full if you can take a state dump of the rebalance 
process when it is using high RAM.

Here are the steps to take the state dump.

1. Find your state-dump destination; Run "gluster --print-statedumpdir". The 
state dump will be stored in this location.

2. When you see any of the rebalance process on any of the servers using high 
memory issue the following command.
   "kill -USR1 ".  ---> ps aux | grep rebalance 
should give the rebalance process pid.

The state dump should give some hint about the high mem-usage.

Thanks,
Susant

- Original Message -
From: "Susant Palai" 
To: "Christophe TREFOIS" 
Cc: "Gluster Devel" 
Sent: Friday, 21 August, 2015 3:52:07 PM
Subject: Re: [Gluster-devel] Skipped files during rebalance

Thanks Christophe for the details. Will get back to you with the analysis.

Regards,
Susant

- Original Message -
From: "Christophe TREFOIS" 
To: "Susant Palai" 
Cc: "Raghavendra Gowdappa" , "Nithya Balachandran" 
, "Shyamsundar Ranganathan" , 
"Mohammed Rafi K C" , "Gluster Devel" 

Sent: Friday, 21 August, 2015 12:39:05 AM
Subject: Re: [Gluster-devel] Skipped files during rebalance

Dear Susant,

The rebalance failed again and also had (in my opinion) excessive RAM usage.

Please find a very detailled list below.

All l

Re: [Gluster-devel] Skipped files during rebalance

2015-08-21 Thread Susant Palai
Thanks Christophe for the details. Will get back to you with the analysis.

Regards,
Susant

- Original Message -
From: "Christophe TREFOIS" 
To: "Susant Palai" 
Cc: "Raghavendra Gowdappa" , "Nithya Balachandran" 
, "Shyamsundar Ranganathan" , 
"Mohammed Rafi K C" , "Gluster Devel" 

Sent: Friday, 21 August, 2015 12:39:05 AM
Subject: Re: [Gluster-devel] Skipped files during rebalance

Dear Susant,

The rebalance failed again and also had (in my opinion) excessive RAM usage.

Please find a very detailled list below.

All logs:

http://wikisend.com/download/651948/allstores.tar.gz

Thank you for letting me know how I could successfully complete the rebalance 
process.
The fedora pastes are the output of top of each node at that time (more or 
less).

Please let me know if you need more information,

Best,

—— Start of mem info

# After reboot, before starting glusterd

[root@highlander ~]# pdsh -g live 'free -m'
stor106:   totalusedfree  shared  buff/cache   
available
stor106: Mem: 1932492208  190825   9 215
  190772
stor106: Swap: 0   0   0
stor105:   totalusedfree  shared  buff/cache   
available
stor105: Mem: 1932482275  190738   9 234
  190681
stor105: Swap: 0   0   0
stor104:   totalusedfree  shared  buff/cache   
available
stor104: Mem: 1932492221  190811   9 216
  190757
stor104: Swap: 0   0   0
[root@highlander ~]#

# Gluster Info

[root@stor106 glusterfs]# gluster volume info

Volume Name: live
Type: Distribute
Volume ID: 1328637d-7730-4627-8945-bbe43626d527
Status: Started
Number of Bricks: 9
Transport-type: tcp
Bricks:
Brick1: stor104:/zfs/brick0/brick
Brick2: stor104:/zfs/brick1/brick
Brick3: stor104:/zfs/brick2/brick
Brick4: stor106:/zfs/brick0/brick
Brick5: stor106:/zfs/brick1/brick
Brick6: stor106:/zfs/brick2/brick
Brick7: stor105:/zfs/brick0/brick
Brick8: stor105:/zfs/brick1/brick
Brick9: stor105:/zfs/brick2/brick
Options Reconfigured:
nfs.disable: true
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.write-behind-window-size: 4MB
performance.io-thread-count: 32
performance.client-io-threads: on
performance.cache-size: 1GB
performance.cache-refresh-timeout: 60
performance.cache-max-file-size: 4MB
cluster.data-self-heal-algorithm: full
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
cluster.min-free-disk: 1%
server.allow-insecure: on

# Starting gluserd

[root@highlander ~]# pdsh -g live 'systemctl start glusterd'
[root@highlander ~]# pdsh -g live 'free -m'
stor106:   totalusedfree  shared  buff/cache   
available
stor106: Mem: 1932492290  190569   9 389
  190587
stor106: Swap: 0   0   0
stor104:   totalusedfree  shared  buff/cache   
available
stor104: Mem: 1932492297  190557   9 394
  190571
stor104: Swap: 0   0   0
stor105:   totalusedfree  shared  buff/cache   
available
stor105: Mem: 1932482286  190554   9 407
  190595
stor105: Swap: 0   0   0

[root@highlander ~]# systemctl start glusterd
[root@highlander ~]# gluster volume start live
volume start: live: success
[root@highlander ~]# gluster volume status
Status of volume: live
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick stor104:/zfs/brick0/brick 49164 0  Y   5945
Brick stor104:/zfs/brick1/brick 49165 0  Y   5963
Brick stor104:/zfs/brick2/brick 49166 0  Y   5981
Brick stor106:/zfs/brick0/brick 49158 0  Y   5256
Brick stor106:/zfs/brick1/brick 49159 0  Y   5274
Brick stor106:/zfs/brick2/brick 49160 0  Y   5292
Brick stor105:/zfs/brick0/brick 49155 0  Y   5284
Brick stor105:/zfs/brick1/brick 49156 0  Y   5302
Brick stor105:/zfs/brick2/brick 49157 0  Y   5320
NFS Server on localhost N/A   N/AN   N/A
NFS Server on 192.168.123.106   N/A   N/AN   N/A
NFS Server on stor105   N/A   N/AN   N/A
NFS Server on 192.168.123.104   N/A   N/AN   N/A

Task Status of Volume live
--
There are no active volume tasks

[root@highlander ~]#

# Memory

Re: [Gluster-devel] Netbsd build failure

2015-08-21 Thread Emmanuel Dreyfus
On Fri, Aug 21, 2015 at 10:37:43AM +0530, Vijaikumar M wrote:
> We have marked test './tests/basic/quota-anon-fd-nfs.t' as bad-test, I am
> not sure about 'SyntaxError' error. I think there is some parsing error in
> the shell script, need to root cause the issue.

Isn't this another instance of G_LOG appending to the wrong file?

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Netbsd build failure

2015-08-21 Thread Emmanuel Dreyfus
On Fri, Aug 21, 2015 at 10:32:33AM +0530, Raghavendra Talur wrote:
> Some assumption with FD number in shell is wrong some where. Putting it out
> so that anybody who had better idea can debug faster, I will look into it
> too.

I think you are on the right track. The thing alsays append log lines to
files likely to be opn eby the test suite.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel