Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
Comments inline. - Original Message - > From: "Sachin Pandit" > To: "Kotresh Hiremath Ravishankar" > Cc: "Gluster Devel" > Sent: Thursday, July 2, 2015 12:21:44 PM > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > - Original Message - > > From: "Vijaikumar M" > > To: "Kotresh Hiremath Ravishankar" , "Gluster Devel" > > > > Cc: "Sachin Pandit" > > Sent: Thursday, July 2, 2015 12:01:03 PM > > Subject: Re: Regression Failure: ./tests/basic/quota.t > > > > We look into this issue > > > > Thanks, > > Vijay > > > > On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar wrote: > > > Hi, > > > > > > I see quota.t regression failure for the following. The changes are > > > related > > > to > > > example programs in libgfchangelog. > > > > > > http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull > > > > > > Could someone from quota team, take a look at it. > > Hi, > > I had a quick look at this. It looks like the following test case failed > > TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4} > EXPECT_WITHIN $REBALANCE_TIMEOUT "0" rebalance_completed > > > I looked at the logs too, and found out the following errors > > patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026] > [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix layout on / > failed > build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01 09:27:23.040998] E > [MSGID: 106224] > [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle] 0-management: > Failed to update status > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.557887] > E [rpc-clnt.c:362:saved_frames_unwind] (--> > /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7fc882d04d5a] > (--> > /build/install/lib/libgfrpc.so.0(saved_frames_unwind+0x212)[0x7fc882ace086] > (--> > /build/install/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7fc882ace183] > (--> > /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7fc882ace615] > (--> /build/install/lib/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fc882acf00f] > ) 0-StartMigrationDuringRebalanceTest-client-0: forced unwinding frame > type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-06-19 14:34:47.554862 > (xid=0xc) > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.561191] > E [MSGID: 114031] [client-rpc-fops.c:1623:client3_3_inodelk_cbk] > 0-StartMigrationDuringRebalanceTest-client-0: remote operation failed: > Transport endpoint is not connected [Transport endpoint is not connected] > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.561417] > E [socket.c:2332:socket_connect_finish] > 0-StartMigrationDuringRebalanceTest-client-0: connection to > 23.253.62.104:24007 failed (Connection refused) > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.561707] > E [dht-common.c:2643:dht_find_local_subvol_cbk] > 0-StartMigrationDuringRebalanceTest-dht: getxattr err (Transport endpoint is > not connected) for dir > Seems like a network partition. Rebalance fails if there it receives ENOTCONN on it's child. > > Any help regarding this or more information on this would be much > appreciated. > > Thanks, > Sachin Pandit. > > > > > > > > Thanks and Regards, > > > Kotresh H R > > > > > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
Comments inline. Thanks and Regards, Kotresh H R - Original Message - > From: "Susant Palai" > To: "Sachin Pandit" > Cc: "Kotresh Hiremath Ravishankar" , "Gluster Devel" > > Sent: Thursday, July 2, 2015 12:35:08 PM > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > Comments inline. > > - Original Message - > > From: "Sachin Pandit" > > To: "Kotresh Hiremath Ravishankar" > > Cc: "Gluster Devel" > > Sent: Thursday, July 2, 2015 12:21:44 PM > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > > > - Original Message - > > > From: "Vijaikumar M" > > > To: "Kotresh Hiremath Ravishankar" , "Gluster Devel" > > > > > > Cc: "Sachin Pandit" > > > Sent: Thursd, TOTAL CHANGELOGS: 106 [2015-07-02 07:01:06.883504] E [gf-history-changelog.c:877:gf_history_changelog] 0-gfchangelog: wrong result for start: 1435818ay, July 2, 2015 12:01:03 PM > > > Subject: Re: Regression Failure: ./tests/basic/quota.t > > > > > > We look into this issue > > > > > > Thanks, > > > Vijay > > > > > > On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar wrote: > > > > Hi, > > > > > > > > I see quota.t regression failure for the following. The changes are > > > > related > > > > to > > > > example programs in libgfchangelog. > > > > > > > > http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull > > > > > > > > Could someone from quota team, take a look at it. > > > > Hi, > > > > I had a quick look at this. It looks like the following test case failed > > > > TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4} > > EXPECT_WITHIN $REBALANCE_TIMEOUT "0" rebalance_completed > > > > > > I looked at the logs too, and found out the following errors > > > > patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026] > > [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix layout on / > > failed > > build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01 09:27:23.040998] E > > [MSGID: 106224] > > [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle] > > 0-management: > > Failed to update status > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 > > 14:34:47.557887] > > E [rpc-clnt.c:362:saved_frames_unwind] (--> > > /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7fc882d04d5a] > > (--> > > /build/install/lib/libgfrpc.so.0(saved_frames_unwind+0x212)[0x7fc882ace086] > > (--> > > /build/install/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7fc882ace183] > > (--> > > /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7fc882ace615] > > (--> > > /build/install/lib/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fc882acf00f] > > ) 0-StartMigrationDuringRebalanceTest-client-0: forced unwinding frame > > type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-06-19 14:34:47.554862 > > (xid=0xc) > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 > > 14:34:47.561191] > > E [MSGID: 114031] [client-rpc-fops.c:1623:client3_3_inodelk_cbk] > > 0-StartMigrationDuringRebalanceTest-client-0: remote operation failed: > > Transport endpoint is not connected [Transport endpoint is not connected] > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 > > 14:34:47.561417] > > E [socket.c:2332:socket_connect_finish] > > 0-StartMigrationDuringRebalanceTest-client-0: connection to > > 23.253.62.104:24007 failed (Connection refused) > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 > > 14:34:47.561707] > > E [dht-common.c:2643:dht_find_local_subvol_cbk] > > 0-StartMigrationDuringRebalanceTest-dht: getxattr err (Transport endpoint > > is > > not connected) for dir > > > Seems like a network partition. Rebalance fails if there it receives ENOTCONN > on it's child. Is this intended to happen on regression machines? > > > > > Any help regarding this or more information on this would be much > > appreciated. > > > > Thanks, > > Sachin Pandit. > > > > > > > > > > > > Thanks and Regards, > > > > Kotresh H R > > > > > > > > > > > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Unable to send patches to review.gluster.org
Working fine for me now. In case someone hasn't checked, try using it. - Original Message - > From: "Anoop C S" > To: gluster-devel@gluster.org > Cc: "Anuradha Talur" > Sent: Thursday, July 2, 2015 10:41:35 AM > Subject: Re: [Gluster-devel] Unable to send patches to review.gluster.org > > Same here. git pull from r.g.o failed with the following error. > > Permission denied (publickey). > fatal: Could not read from remote repository. > > Please make sure you have the correct access rights > and the repository exists. > > --Anoop C S. > > On 07/02/2015 09:53 AM, Anuradha Talur wrote: > > Hi, > > > > I'm unable to send patches to r.g.o, also not able to login. I'm > > getting the following errors respectively: 1) Permission denied > > (publickey). fatal: Could not read from remote repository. > > > > Please make sure you have the correct access rights and the > > repository exists. > > > > 2) Internal server error or forbidden access. > > > > Is anyone else facing the same issue? > > > -- Thanks, Anuradha. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
- Original Message - > From: "Kotresh Hiremath Ravishankar" > To: "Susant Palai" > Cc: "Gluster Devel" > Sent: Thursday, July 2, 2015 1:03:18 PM > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > Comments inline. > > Thanks and Regards, > Kotresh H R > > - Original Message - > > From: "Susant Palai" > > To: "Sachin Pandit" > > Cc: "Kotresh Hiremath Ravishankar" , "Gluster Devel" > > > > Sent: Thursday, July 2, 2015 12:35:08 PM > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > > > Comments inline. > > > > - Original Message - > > > From: "Sachin Pandit" > > > To: "Kotresh Hiremath Ravishankar" > > > Cc: "Gluster Devel" > > > Sent: Thursday, July 2, 2015 12:21:44 PM > > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > > > > > - Original Message - > > > > From: "Vijaikumar M" > > > > To: "Kotresh Hiremath Ravishankar" , "Gluster > > > > Devel" > > > > > > > > Cc: "Sachin Pandit" > > > > Sent: Thursd, TOTAL CHANGELOGS: 106 > [2015-07-02 07:01:06.883504] E > [gf-history-changelog.c:877:gf_history_changelog] 0-gfchangelog: wrong > result for start: 1435818ay, July 2, 2015 12:01:03 PM > > > > Subject: Re: Regression Failure: ./tests/basic/quota.t > > > > > > > > We look into this issue > > > > > > > > Thanks, > > > > Vijay > > > > > > > > On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar wrote: > > > > > Hi, > > > > > > > > > > I see quota.t regression failure for the following. The changes are > > > > > related > > > > > to > > > > > example programs in libgfchangelog. > > > > > > > > > > http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull > > > > > > > > > > Could someone from quota team, take a look at it. > > > > > > Hi, > > > > > > I had a quick look at this. It looks like the following test case failed > > > > > > TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4} > > > EXPECT_WITHIN $REBALANCE_TIMEOUT "0" rebalance_completed > > > Looks like the same "port in use" issue. From the d-backends-brick3.log: [2015-07-01 09:27:17.821430] E [socket.c:818:__socket_server_bind] 0-tcp.patchy-server: binding to failed: Address already in use [2015-07-01 09:27:17.821441] E [socket.c:821:__socket_server_bind] 0-tcp.patchy-server: Port is already in use [2015-07-01 09:27:17.821452] W [rpcsvc.c:1599:rpcsvc_transport_create] 0-rpc-service: listening on transport failed [2015-07-01 09:27:17.821462] W [MSGID: 115045] [server.c:996:init] 0-patchy-server: creation of listener failed [2015-07-01 09:27:17.821475] E [MSGID: 101019] [xlator.c:423:xlator_init] 0-patchy-server: Initialization of volume 'patchy-server' failed, review your volfile again [2015-07-01 09:27:17.821485] E [MSGID: 101066] [graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator failed [2015-07-01 09:27:17.821495] E [MSGID: 101176] [graph.c:669:glusterfs_graph_activate] 0-graph: init failed [2015-07-01 09:27:17.821891] W [glusterfsd.c:1214:cleanup_and_exit] (--> 0-: received signum (0), shutting down > > > > > > I looked at the logs too, and found out the following errors > > > > > > patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026] > > > [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix layout on > > > / > > > failed > > > build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01 09:27:23.040998] > > > E > > > [MSGID: 106224] > > > [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle] > > > 0-management: > > > Failed to update status > > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 > > > 14:34:47.557887] > > > E [rpc-clnt.c:362:saved_frames_unwind] (--> > > > /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7fc882d04d5a] > > > (--> > > > /build/install/lib/libgfrpc.so.0(saved_frames_unwind+0x212)[0x7fc882ace086] > > > (--> > > > /build/install/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7fc882ace183] > > > (--> > > > /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7fc882ace615] > > > (--> > > > /build/install/lib/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fc882acf00f] > > > ) 0-StartMigrationDuringRebalanceTest-client-0: forced unwinding > > > frame > > > type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-06-19 14:34:47.554862 > > > (xid=0xc) > > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 > > > 14:34:47.561191] > > > E [MSGID: 114031] [client-rpc-fops.c:1623:client3_3_inodelk_cbk] > > > 0-StartMigrationDuringRebalanceTest-client-0: remote operation failed: > > > Transport endpoint is not connected [Transport endpoint is not connected] > > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 > > > 14:34:47.561417] > > > E [socket.c:2332:socket_connect_finish] > > > 0-StartMigrationDuringRebalanceTest-client-0: connection to > > > 23.253.62.104:24007 failed (Connection refused) > > > StartMigrationDuringRebalanceTest-
[Gluster-devel] glusterfs-3.6.4beta2 released
Hi, glusterfs-3.6.4beta1 has been released and the packages for RHEL/Fedora/Centos can be found here. http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.6.4beta2/ Requesting people running 3.6.x to please try it out and let us know if there are any issues. This release supposedly fixes the bugs listed below since 3.6.4beta1 was made available. Thanks to all who submitted patches, reviewed the changes. 1230242 - `ls' on a directory which has files with mismatching gfid's does not list anything 1230259 - Honour afr self-heal volume set options from clients 1122290 - Issues reported by Cppcheck static analysis tool 1227670 - wait for sometime before accessing the activated snapshot 1225745 - [AFR-V2] - afr_final_errno() should treat op_ret > 0 also as success 1223891 - readdirp return 64bits inodes even if enable-ino32 is set 1206429 - Maintainin local transaction peer list in op-sm framework 1217419 - DHT:Quota:- brick process crashed after deleting .glusterfs from backend 1225072 - OpenSSL multi-threading changes break build in RHEL5 (3.6.4beta1) 1215419 - Autogenerated files delivered in tarball 1224624 - cli: Excessive logging 1217423 - glusterfsd crashed after directory was removed from the mount point, while self-heal and rebalance were running on the volume Regards, Raghavendra Bhat ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi wrote: > > > > > > > A port assigned by Glusterd for a brick is found to be in use already > by > > > the brick. Any changes in Glusterd recently which can cause this? > > > > > > Or is it a test infra problem? > > This issue is likely to be caused by http://review.gluster.org/11039 > This patch changes the port allocation that happens for rpc_clnt based > connections. Previously, ports allocated where < 1024. With this change, > these connections, typically mount process, gluster-nfs server processes > etc could end up using ports that bricks are being assigned to. > > IIUC, the intention of the patch was to make server processes lenient to > inbound messages from ports > 1024. If we don't require to use ports > 1024 > we could leave the port allocation for rpc_clnt connections as before. > Alternately, we could reserve the range of ports starting from 49152 for > bricks > by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is > specific to Linux. > I'm not aware of how this could be done in NetBSD for instance though. > It seems this is exactly whats happening. I have a question, I get the following data from netstat and grep tcp0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd tcp0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket Here 31516 is the brick pid. Looking at the data, line 2 is very clear, it shows connection between brick and glusterfs client. unix socket on line 3 is also clear, it is the unix socket connection that glusterd and brick process use for communication. I am not able to understand line 1; which part of brick process established a tcp connection with glusterd using port 1023? Note: this data is from a build which does not have the above mentioned patch. -- *Raghavendra Talur * ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
hi Joseph, Could you take a look at http://build.gluster.org/job/rackspace-regression-2GB-triggered/11842/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
I'll check on this. - Original Message - > From: "Pranith Kumar Karampuri" > To: "Gluster Devel" , "Joseph Fernandes" > > Sent: Thursday, July 2, 2015 5:40:34 AM > Subject: [Gluster-devel] Failure in > tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t > > hi Joseph, > Could you take a look at > http://build.gluster.org/job/rackspace-regression-2GB-triggered/11842/consoleFull > > Pranith > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
Thanks Dan!. Pranith On 07/02/2015 06:14 PM, Dan Lambright wrote: I'll check on this. - Original Message - From: "Pranith Kumar Karampuri" To: "Gluster Devel" , "Joseph Fernandes" Sent: Thursday, July 2, 2015 5:40:34 AM Subject: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t hi Joseph, Could you take a look at http://build.gluster.org/job/rackspace-regression-2GB-triggered/11842/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
On Thu, Jul 2, 2015 at 1:26 PM, Nithya Balachandran wrote: > > > - Original Message - > > From: "Kotresh Hiremath Ravishankar" > > To: "Susant Palai" > > Cc: "Gluster Devel" > > Sent: Thursday, July 2, 2015 1:03:18 PM > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > > > Comments inline. > > > > Thanks and Regards, > > Kotresh H R > > > > - Original Message - > > > From: "Susant Palai" > > > To: "Sachin Pandit" > > > Cc: "Kotresh Hiremath Ravishankar" , "Gluster > Devel" > > > > > > Sent: Thursday, July 2, 2015 12:35:08 PM > > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > > > > > Comments inline. > > > > > > - Original Message - > > > > From: "Sachin Pandit" > > > > To: "Kotresh Hiremath Ravishankar" > > > > Cc: "Gluster Devel" > > > > Sent: Thursday, July 2, 2015 12:21:44 PM > > > > Subject: Re: [Gluster-devel] Regression Failure: > ./tests/basic/quota.t > > > > > > > > - Original Message - > > > > > From: "Vijaikumar M" > > > > > To: "Kotresh Hiremath Ravishankar" , "Gluster > > > > > Devel" > > > > > > > > > > Cc: "Sachin Pandit" > > > > > Sent: Thursd, TOTAL CHANGELOGS: 106 > > [2015-07-02 07:01:06.883504] E > > [gf-history-changelog.c:877:gf_history_changelog] 0-gfchangelog: wrong > > result for start: 1435818ay, July 2, 2015 12:01:03 PM > > > > > Subject: Re: Regression Failure: ./tests/basic/quota.t > > > > > > > > > > We look into this issue > > > > > > > > > > Thanks, > > > > > Vijay > > > > > > > > > > On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar > wrote: > > > > > > Hi, > > > > > > > > > > > > I see quota.t regression failure for the following. The changes > are > > > > > > related > > > > > > to > > > > > > example programs in libgfchangelog. > > > > > > > > > > > > > http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull > > > > > > > > > > > > Could someone from quota team, take a look at it. > > > > > > > > Hi, > > > > > > > > I had a quick look at this. It looks like the following test case > failed > > > > > > > > TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4} > > > > EXPECT_WITHIN $REBALANCE_TIMEOUT "0" rebalance_completed > > > > > > > > Looks like the same "port in use" issue. From the d-backends-brick3.log: > > > [2015-07-01 09:27:17.821430] E [socket.c:818:__socket_server_bind] > 0-tcp.patchy-server: binding to failed: Address already in use > [2015-07-01 09:27:17.821441] E [socket.c:821:__socket_server_bind] > 0-tcp.patchy-server: Port is already in use > [2015-07-01 09:27:17.821452] W [rpcsvc.c:1599:rpcsvc_transport_create] > 0-rpc-service: listening on transport failed > [2015-07-01 09:27:17.821462] W [MSGID: 115045] [server.c:996:init] > 0-patchy-server: creation of listener failed > [2015-07-01 09:27:17.821475] E [MSGID: 101019] [xlator.c:423:xlator_init] > 0-patchy-server: Initialization of volume 'patchy-server' failed, review > your volfile again > [2015-07-01 09:27:17.821485] E [MSGID: 101066] > [graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator > failed > [2015-07-01 09:27:17.821495] E [MSGID: 101176] > [graph.c:669:glusterfs_graph_activate] 0-graph: init failed > [2015-07-01 09:27:17.821891] W [glusterfsd.c:1214:cleanup_and_exit] (--> > 0-: received signum (0), shutting down > > The patch which exposed this bug is being reverted till the underlying bug is also fixed. You can monitor revert patches here master: http://review.gluster.org/11507 3.7 branch: http://review.gluster.org/11508 Please rebase your patches after the above patches are merged to ensure that you patches pass regression. > > > > > > > > > I looked at the logs too, and found out the following errors > > > > > > > > patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026] > > > > [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix > layout on > > > > / > > > > failed > > > > build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01 > 09:27:23.040998] > > > > E > > > > [MSGID: 106224] > > > > [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle] > > > > 0-management: > > > > Failed to update status > > > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 > > > > 14:34:47.557887] > > > > E [rpc-clnt.c:362:saved_frames_unwind] (--> > > > > > /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7fc882d04d5a] > > > > (--> > > > > > /build/install/lib/libgfrpc.so.0(saved_frames_unwind+0x212)[0x7fc882ace086] > > > > (--> > > > > > /build/install/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7fc882ace183] > > > > (--> > > > > > /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7fc882ace615] > > > > (--> > > > > > /build/install/lib/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fc882acf00f] > > > > ) 0-StartMigrationDuringRebalanceTest-client-0: forced unwinding > > > > frame > > > > type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-06-19 > 14:34:47.554862 > > > > (
Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur < raghavendra.ta...@gmail.com> wrote: > > > On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi < > kpart...@redhat.com> wrote: > >> >> > > >> > > A port assigned by Glusterd for a brick is found to be in use already >> by >> > > the brick. Any changes in Glusterd recently which can cause this? >> > > >> > > Or is it a test infra problem? >> >> This issue is likely to be caused by http://review.gluster.org/11039 >> This patch changes the port allocation that happens for rpc_clnt based >> connections. Previously, ports allocated where < 1024. With this change, >> these connections, typically mount process, gluster-nfs server processes >> etc could end up using ports that bricks are being assigned to. >> >> IIUC, the intention of the patch was to make server processes lenient to >> inbound messages from ports > 1024. If we don't require to use ports > >> 1024 >> we could leave the port allocation for rpc_clnt connections as before. >> Alternately, we could reserve the range of ports starting from 49152 for >> bricks >> by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is >> specific to Linux. >> I'm not aware of how this could be done in NetBSD for instance though. >> > > > It seems this is exactly whats happening. > > I have a question, I get the following data from netstat and grep > > tcp0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 > ESTABLISHED 31516/glusterfsd > tcp0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 > ESTABLISHED 31516/glusterfsd > unix 3 [ ] STREAM CONNECTED 988353 > 31516/glusterfsd > /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket > > Here 31516 is the brick pid. > > Looking at the data, line 2 is very clear, it shows connection between > brick and glusterfs client. > unix socket on line 3 is also clear, it is the unix socket connection that > glusterd and brick process use for communication. > > I am not able to understand line 1; which part of brick process > established a tcp connection with glusterd using port 1023? > Note: this data is from a build which does not have the above mentioned > patch. > The patch which exposed this bug is being reverted till the underlying bug is also fixed. You can monitor revert patches here master: http://review.gluster.org/11507 3.7 branch: http://review.gluster.org/11508 Please rebase your patches after the above patches are merged to ensure that you patches pass regression. > > -- > *Raghavendra Talur * > > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Mount hangs because of connection delays
hi, When glusterfs mount process is coming up all cluster xlators wait for at least one event from all the children before propagating the status upwards. Sometimes client xlator takes upto 2 minutes to propogate this event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to this xavi implemented timer in ec notify where we treat a child as down if it doesn't come up in 10 seconds. Similar patch went up for review @http://review.gluster.org/#/c/3 for afr. Kritika raised an interesting point in the review that all cluster xlators need to have this logic for the mount to not hang, and the correct place to fix it would be client xlator itself. i.e. add the timer logic in client xlator. Which seems like a better approach. I just want to take inputs from everyone before we go ahead in that direction. i.e. on PARENT_UP in client xlator it will start a timer and if no rpc notification is received in that timeout it treats the client xlator as down. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Problems when using different hostnames in a bricks and a peer
Hi Atin, You are right!!! I was using the version 3.5 in production. And when I've checked the Gluster source code, I checked the wrong commit (not the latest commit in the master branch). Currently, you've already implemented my the proposed solution. It was done at the function gd_peerinfo_find_from_addrinfo, file xlators/mgmt/glusterd/src/glusterd-peer-utils.c. Thanks for your tip! And sorry for any inconvenience. -- *Rarylson Freitas* On Thu, Jul 2, 2015 at 2:01 AM, Atin Mukherjee wrote: > Which gluster version are you using? Better peer identification feature > (available 3.6 onwards) should tackle this problem IMO. > > ~Atin > > On 07/02/2015 10:05 AM, Rarylson Freitas wrote: > > Hi, > > > > Recently, my company needed to change our hostnames used in the Gluster > > Pool. > > > > In a first moment, we have two Gluster Nodes called storage1 and > storage2. > > Our volumes used two bricks: storage1:/MYVOLYME and storage2:/MYVOLUME. > We > > put the storage1 and storage2 IPs in the /etc/hosts file of our nodes and > > in our client servers. > > > > After some time, more client servers started to using Gluster and we > > discovered that using hostnames without domain (using /etc/hosts) in all > > client servers is a pain in the a$$ :(. So, we decided to change them to > > something like storage1.mydomain.com and storage2.mydomain.com. > > > > Remember that, at this point, we had already some volumes (with bricks): > > > > $ gluster volume info MYVOL > > [...] > > Brick1: storage1:/MYDIR > > Brick1: storage2:/MYDIR > > > > For simplicity, let's consider that we had two Gluster Nodes, each one > with > > the following entries in /etc/hosts: > > > > 10.10.10.1 storage1 > > 10.10.10.2 storage2 > > > > To implement the hostname changes, we've changed the etc hosts file to: > > > > 10.10.10.1 storage1 storage1.mydomain.com > > 10.10.10.2 storage2 storage2.mydomain.com > > > > And we've run in storage1: > > > > $ gluster peer probe storage2.mydomain.com > > peer probe: success > > > > Everything works well during some time, but the glusterd starts to fail > > after any reboot: > > > > $ service glusterfs-server status > > glusterfs-server start/running, process 14714 > > $ service glusterfs-server restart > > glusterfs-server stop/waiting > > glusterfs-server start/running, process 14860 > > $ service glusterfs-server status > > glusterfs-server stop/waiting > > > > To start the service again, it was necessary to rollback the hostname1 > > config to storage2 in /var/lib/glusterd/peers/OUR_UUID. > > > > After some try and error, we discovered that if we change the order of > the > > entries in /etc/hosts and repeat the process, everything worked. > > > > It is, from: > > > > 10.10.10.1 storage1 storage1.mydomain.com > > 10.10.10.2 storage2 storage2.mydomain.com > > > > To: > > > > 10.10.10.1 storage1.mydomain.com storage1 > > 10.10.10.2 storage2.mydomain.com storage2 > > > > And run: > > > > gluster peer probe storage2.mydomain.com > > service glusterfs-server restart > > > > So we've checked the Glusterd debug log and checked the GlusterFS source > > code and discovered that the big secret was the function > > glusterd_friend_find_by_hostname, in the file > > xlators/mgmt/glusterd/src/glusterd-utils.c. This function is called for > > each brick that isn't a local brick and does the following things: > > > >- It checks if the brick hostname is equal to some peer hostname; > >- If it's, this peer is our wanted friend; > >- If not, it gets the brick IP (resolves the hostname using the > function > >getaddrinfo) and checks if the brick IP is equal to the peer hostname; > > - It is, we could run gluster peer probe 10.10.10.2. Once the brick > > IP (storage2 resolves to 10.10.10.2) would have equal to the peer > > "hostname" (10.10.10.2); > >- If it's, this peer is our wanted friend; > >- If not, gets the reverse of the brick IP (using the function > >getnameinfo) and checks if the brick reverse is equal to the peer > >hostname; > > - This is why changing the order of the entries in /etc/hosts > worked > > as an workaround for us; > >- If not, returns and error (and Glusterd will fail). > > > > However, we think that comparing the brick IP (resolving the brick > > hostname) and the peer IP (resolving the peer hostname) would be a > simpler > > and more comprehensive solution. Once both brick and peer will have > > difference hostnames, but the same IP, it would work. > > > > The solution could be: > > > >- It checks if the brick hostname is equal to some peer hostname; > >- If it's, this peer is our wanted friend; > >- If not, it gets both the brick IP (resolves the hostname using the > >function getaddrinfo) and the peer IP (resolves the peer hostname) > and, > >for each IP pair, check if a brick IP is equal to a peer IP; > >- If it's, this peer is our wanted friend; > >- If not, returns and error (and Gluste
Re: [Gluster-devel] Mount hangs because of connection delays
I agree that a generic solution for all cluster xlators would be good. Only question I have is whether parallel notifications are specially handled somewhere. For example, if client xlator sends EC_CHILD_DOWN after a timeout, it's possible that an immediate EC_CHILD_UP is sent if the brick is connected. In this case, the cluster xlator could receive both notifications in any order (we have multi-threading), which is dangerous if EC_CHILD_DOWN is processed after EC_CHILD_UP. I've seen that protocol/client doesn't send one notification until the previous one has been completed. However this assumes that there won't be any xlator that delays the notification (i.e. sends it in background at another moment). Is that a requirement to process notifications ? otherwise the concurrent notifications problem could appear even if protocol/client serializes them. Xavi On 07/02/2015 03:34 PM, Pranith Kumar Karampuri wrote: hi, When glusterfs mount process is coming up all cluster xlators wait for at least one event from all the children before propagating the status upwards. Sometimes client xlator takes upto 2 minutes to propogate this event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to this xavi implemented timer in ec notify where we treat a child as down if it doesn't come up in 10 seconds. Similar patch went up for review @http://review.gluster.org/#/c/3 for afr. Kritika raised an interesting point in the review that all cluster xlators need to have this logic for the mount to not hang, and the correct place to fix it would be client xlator itself. i.e. add the timer logic in client xlator. Which seems like a better approach. I just want to take inputs from everyone before we go ahead in that direction. i.e. on PARENT_UP in client xlator it will start a timer and if no rpc notification is received in that timeout it treats the client xlator as down. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
This is caused because when bind-insecure is turned on (which is the default now), it may happen that brick is not able to bind to port assigned by Glusterd for example 49192-49195... It seems to occur because the rpc_clnt connections are binding to ports in the same range. so brick fails to bind to a port which is already used by someone else. This bug already exist before http://review.gluster.org/#/c/11039/ when use rdma, i.e. even previously rdma binds to port >= 1024 if it cannot find a free port < 1024, even when bind insecure was turned off (ref to commit '0e3fd04e'). Since we don't have tests related to rdma we did not discover this issue previously. http://review.gluster.org/#/c/11039/ discovers the bug we encountered, however now the bug can be fixed by http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers from 65535 in a descending order, as a result port clash is minimized, also it fixes issues in rdma too Thanks to Raghavendra Talur for help in discovering the real cause Regards, Prasanna Kalever - Original Message - From: "Raghavendra Talur" To: "Krishnan Parthasarathi" Cc: "Gluster Devel" Sent: Thursday, July 2, 2015 6:45:17 PM Subject: Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur < raghavendra.ta...@gmail.com > wrote: On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi < kpart...@redhat.com > wrote: > > > > A port assigned by Glusterd for a brick is found to be in use already by > > the brick. Any changes in Glusterd recently which can cause this? > > > > Or is it a test infra problem? This issue is likely to be caused by http://review.gluster.org/11039 This patch changes the port allocation that happens for rpc_clnt based connections. Previously, ports allocated where < 1024. With this change, these connections, typically mount process, gluster-nfs server processes etc could end up using ports that bricks are being assigned to. IIUC, the intention of the patch was to make server processes lenient to inbound messages from ports > 1024. If we don't require to use ports > 1024 we could leave the port allocation for rpc_clnt connections as before. Alternately, we could reserve the range of ports starting from 49152 for bricks by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is specific to Linux. I'm not aware of how this could be done in NetBSD for instance though. It seems this is exactly whats happening. I have a question, I get the following data from netstat and grep tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket Here 31516 is the brick pid. Looking at the data, line 2 is very clear, it shows connection between brick and glusterfs client. unix socket on line 3 is also clear, it is the unix socket connection that glusterd and brick process use for communication. I am not able to understand line 1; which part of brick process established a tcp connection with glusterd using port 1023? Note: this data is from a build which does not have the above mentioned patch. The patch which exposed this bug is being reverted till the underlying bug is also fixed. You can monitor revert patches here master: http://review.gluster.org/11507 3.7 branch: http://review.gluster.org/11508 Please rebase your patches after the above patches are merged to ensure that you patches pass regression. -- Raghavendra Talur ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Problems when using different hostnames in a bricks and a peer
Not at all a problem. I am here to help Rarylson :) -Atin Sent from one plus one On Jul 2, 2015 7:23 PM, "Rarylson Freitas" wrote: > Hi Atin, > > You are right!!! I was using the version 3.5 in production. And when I've > checked the Gluster source code, I checked the wrong commit (not the latest > commit in the master branch). > > Currently, you've already implemented my the proposed solution. It was > done at the function gd_peerinfo_find_from_addrinfo, file > xlators/mgmt/glusterd/src/glusterd-peer-utils.c. > > Thanks for your tip! And sorry for any inconvenience. > > -- > *Rarylson Freitas* > > On Thu, Jul 2, 2015 at 2:01 AM, Atin Mukherjee > wrote: > >> Which gluster version are you using? Better peer identification feature >> (available 3.6 onwards) should tackle this problem IMO. >> >> ~Atin >> >> On 07/02/2015 10:05 AM, Rarylson Freitas wrote: >> > Hi, >> > >> > Recently, my company needed to change our hostnames used in the Gluster >> > Pool. >> > >> > In a first moment, we have two Gluster Nodes called storage1 and >> storage2. >> > Our volumes used two bricks: storage1:/MYVOLYME and storage2:/MYVOLUME. >> We >> > put the storage1 and storage2 IPs in the /etc/hosts file of our nodes >> and >> > in our client servers. >> > >> > After some time, more client servers started to using Gluster and we >> > discovered that using hostnames without domain (using /etc/hosts) in all >> > client servers is a pain in the a$$ :(. So, we decided to change them to >> > something like storage1.mydomain.com and storage2.mydomain.com. >> > >> > Remember that, at this point, we had already some volumes (with bricks): >> > >> > $ gluster volume info MYVOL >> > [...] >> > Brick1: storage1:/MYDIR >> > Brick1: storage2:/MYDIR >> > >> > For simplicity, let's consider that we had two Gluster Nodes, each one >> with >> > the following entries in /etc/hosts: >> > >> > 10.10.10.1 storage1 >> > 10.10.10.2 storage2 >> > >> > To implement the hostname changes, we've changed the etc hosts file to: >> > >> > 10.10.10.1 storage1 storage1.mydomain.com >> > 10.10.10.2 storage2 storage2.mydomain.com >> > >> > And we've run in storage1: >> > >> > $ gluster peer probe storage2.mydomain.com >> > peer probe: success >> > >> > Everything works well during some time, but the glusterd starts to fail >> > after any reboot: >> > >> > $ service glusterfs-server status >> > glusterfs-server start/running, process 14714 >> > $ service glusterfs-server restart >> > glusterfs-server stop/waiting >> > glusterfs-server start/running, process 14860 >> > $ service glusterfs-server status >> > glusterfs-server stop/waiting >> > >> > To start the service again, it was necessary to rollback the hostname1 >> > config to storage2 in /var/lib/glusterd/peers/OUR_UUID. >> > >> > After some try and error, we discovered that if we change the order of >> the >> > entries in /etc/hosts and repeat the process, everything worked. >> > >> > It is, from: >> > >> > 10.10.10.1 storage1 storage1.mydomain.com >> > 10.10.10.2 storage2 storage2.mydomain.com >> > >> > To: >> > >> > 10.10.10.1 storage1.mydomain.com storage1 >> > 10.10.10.2 storage2.mydomain.com storage2 >> > >> > And run: >> > >> > gluster peer probe storage2.mydomain.com >> > service glusterfs-server restart >> > >> > So we've checked the Glusterd debug log and checked the GlusterFS source >> > code and discovered that the big secret was the function >> > glusterd_friend_find_by_hostname, in the file >> > xlators/mgmt/glusterd/src/glusterd-utils.c. This function is called for >> > each brick that isn't a local brick and does the following things: >> > >> >- It checks if the brick hostname is equal to some peer hostname; >> >- If it's, this peer is our wanted friend; >> >- If not, it gets the brick IP (resolves the hostname using the >> function >> >getaddrinfo) and checks if the brick IP is equal to the peer >> hostname; >> > - It is, we could run gluster peer probe 10.10.10.2. Once the >> brick >> > IP (storage2 resolves to 10.10.10.2) would have equal to the peer >> > "hostname" (10.10.10.2); >> >- If it's, this peer is our wanted friend; >> >- If not, gets the reverse of the brick IP (using the function >> >getnameinfo) and checks if the brick reverse is equal to the peer >> >hostname; >> > - This is why changing the order of the entries in /etc/hosts >> worked >> > as an workaround for us; >> >- If not, returns and error (and Glusterd will fail). >> > >> > However, we think that comparing the brick IP (resolving the brick >> > hostname) and the peer IP (resolving the peer hostname) would be a >> simpler >> > and more comprehensive solution. Once both brick and peer will have >> > difference hostnames, but the same IP, it would work. >> > >> > The solution could be: >> > >> >- It checks if the brick hostname is equal to some peer hostname; >> >- If it's, this peer is our wanted friend; >> >- If not, it gets both the
Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
Thanks Prasanna for the patches :) -Atin Sent from one plus one On Jul 2, 2015 9:19 PM, "Prasanna Kalever" wrote: > > This is caused because when bind-insecure is turned on (which is the > default now), it may happen > that brick is not able to bind to port assigned by Glusterd for example > 49192-49195... > It seems to occur because the rpc_clnt connections are binding to ports in > the same range. > so brick fails to bind to a port which is already used by someone else. > > This bug already exist before http://review.gluster.org/#/c/11039/ when > use rdma, i.e. even > previously rdma binds to port >= 1024 if it cannot find a free port < 1024, > even when bind insecure was turned off (ref to commit '0e3fd04e'). > Since we don't have tests related to rdma we did not discover this issue > previously. > > http://review.gluster.org/#/c/11039/ discovers the bug we encountered, > however now the bug can be fixed by > http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port > numbers from 65535 in a descending > order, as a result port clash is minimized, also it fixes issues in rdma > too > > Thanks to Raghavendra Talur for help in discovering the real cause > > > Regards, > Prasanna Kalever > > > > - Original Message - > From: "Raghavendra Talur" > To: "Krishnan Parthasarathi" > Cc: "Gluster Devel" > Sent: Thursday, July 2, 2015 6:45:17 PM > Subject: Re: [Gluster-devel] spurious failures > tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t > > > > On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur < > raghavendra.ta...@gmail.com > wrote: > > > > > > On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi < > kpart...@redhat.com > wrote: > > > > > > > > > A port assigned by Glusterd for a brick is found to be in use already > by > > > the brick. Any changes in Glusterd recently which can cause this? > > > > > > Or is it a test infra problem? > > This issue is likely to be caused by http://review.gluster.org/11039 > This patch changes the port allocation that happens for rpc_clnt based > connections. Previously, ports allocated where < 1024. With this change, > these connections, typically mount process, gluster-nfs server processes > etc could end up using ports that bricks are being assigned to. > > IIUC, the intention of the patch was to make server processes lenient to > inbound messages from ports > 1024. If we don't require to use ports > 1024 > we could leave the port allocation for rpc_clnt connections as before. > Alternately, we could reserve the range of ports starting from 49152 for > bricks > by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is > specific to Linux. > I'm not aware of how this could be done in NetBSD for instance though. > > > It seems this is exactly whats happening. > > I have a question, I get the following data from netstat and grep > > tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd > tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd > unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd > /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket > > Here 31516 is the brick pid. > > Looking at the data, line 2 is very clear, it shows connection between > brick and glusterfs client. > unix socket on line 3 is also clear, it is the unix socket connection that > glusterd and brick process use for communication. > > I am not able to understand line 1; which part of brick process > established a tcp connection with glusterd using port 1023? > Note: this data is from a build which does not have the above mentioned > patch. > > > The patch which exposed this bug is being reverted till the underlying bug > is also fixed. > You can monitor revert patches here > master: http://review.gluster.org/11507 > 3.7 branch: http://review.gluster.org/11508 > > Please rebase your patches after the above patches are merged to ensure > that you patches pass regression. > > > > > > -- > Raghavendra Talur > > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
Hi All, This is the same issue as the previous tiering regression failure. Volume brick not able to start brick because port is busy [2015-07-02 10:20:20.601372] [run.c:190:runner_log] (--> /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7f05e080bc32] (--> /build/install/lib/libglusterfs.so.0(runner_log+0x192)[0x7f05e08754ce] (--> /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0xae7)[0x7f05d5c935d7] (--> /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_brick_start+0x151)[0x7f05d5c9d4e3] (--> /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_op_perform_add_bricks+0x8fe)[0x7f05d5d10661] ) 0-: Starting GlusterFS: /build/install/sbin/glusterfsd -s slave33.cloud.gluster.org --volfile-id patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0 --brick-port 49167 --xlator-option patchy-server.listen-port=49167 [2015-07-02 10:20:20.624297] I [MSGID: 106144] [glusterd-pmap.c:269:pmap_registry_remove] 0-pmap: removing brick (null) on port 49167 [2015-07-02 10:20:20.625315] E [MSGID: 106005] [glusterd-utils.c:4448:glusterd_brick_start] 0-management: Unable to start brick slave33.cloud.gluster.org:/d/backends/patchy5 [2015-07-02 10:20:20.625354] E [MSGID: 106074] [glusterd-brick-ops.c:2096:glusterd_op_add_brick] 0-glusterd: Unable to add bricks [2015-07-02 10:20:20.625368] E [MSGID: 106123] [glusterd-syncop.c:1416:gd_commit_op_phase] 0-management: Commit of operation 'Volume Add brick' failed on localhost Brick Log: [2015-07-02 10:20:20.608547] I [MSGID: 100030] [glusterfsd.c:2296:main] 0-/build/install/sbin/glusterfsd: Started running /build/install/sbin/glusterfsd version 3.8dev (args: /build/install/sbin/glusterfsd -s slave33.cloud.gluster.org --volfile-id patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0 --brick-port 49167 --xlator-option patchy-server.listen-port=49167) [2015-07-02 10:20:20.617113] I [MSGID: 101190] [event-epoll.c:627:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2015-07-02 10:20:20.623097] I [MSGID: 101173] [graph.c:268:gf_add_cmdline_options] 0-patchy-server: adding option 'listen-port' for volume 'patchy-server' with value '49167' [2015-07-02 10:20:20.623135] I [MSGID: 101173] [graph.c:268:gf_add_cmdline_options] 0-patchy-posix: adding option 'glusterd-uuid' for volume 'patchy-posix' with value 'da011de8-9103-4cf2-9f4b-03707d0019d0' [2015-07-02 10:20:20.623358] I [MSGID: 115034] [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format check for non-addr auth option auth.login./d/backends/patchy5.allow [2015-07-02 10:20:20.623374] I [MSGID: 115034] [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format check for non-addr auth option auth.login.96bcb872-559b-4f19-84ad-a735dc6068f6.password [2015-07-02 10:20:20.623568] I [rpcsvc.c:2210:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64 [2015-07-02 10:20:20.623633] W [MSGID: 101002] [options.c:952:xl_opt_validate] 0-patchy-server: option 'listen-port' is deprecated, preferred is 'transport.socket.listen-port', continuing with correction [2015-07-02 10:20:20.623707] E [socket.c:818:__socket_server_bind] 0-tcp.patchy-server: binding to failed: Address already in use [2015-07-02 10:20:20.623720] E [socket.c:821:__socket_server_bind] 0-tcp.patchy-server: Port is already in use [2015-07-02 10:20:20.623746] W [rpcsvc.c:1599:rpcsvc_transport_create] 0-rpc-service: listening on transport failed [2015-07-02 10:20:20.623758] W [MSGID: 115045] [server.c:998:init] 0-patchy-server: creation of listener failed [2015-07-02 10:20:20.623772] E [MSGID: 101019] [xlator.c:423:xlator_init] 0-patchy-server: Initialization of volume 'patchy-server' failed, review your volfile again [2015-07-02 10:20:20.623783] E [MSGID: 101066] [graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator failed [2015-07-02 10:20:20.623792] E [MSGID: 101176] [graph.c:669:glusterfs_graph_activate] 0-graph: init failed [2015-07-02 10:20:20.624203] W [glusterfsd.c:1214:cleanup_and_exit] (--> 0-: received signum (0), shutting down Regards, Joe - Original Message - From: "Pranith Kumar Karampuri" To: "Dan Lambright" Cc: "Gluster Devel" , "Joseph Fernandes" Sent: Thursday, July 2, 2015 6:16:44 P
Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
Joe, Please refer to Prasanna's mail. He has uploaded a patch to solve it. -Atin Sent from one plus one On Jul 2, 2015 9:42 PM, "Joseph Fernandes" wrote: > Hi All, > > This is the same issue as the previous tiering regression failure. > > Volume brick not able to start brick because port is busy > > [2015-07-02 10:20:20.601372] [run.c:190:runner_log] (--> > /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7f05e080bc32] > (--> /build/install/lib/libglusterfs.so.0(runner_log+0x192)[0x7f05e08754ce] > (--> > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0xae7)[0x7f05d5c935d7] > (--> > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_brick_start+0x151)[0x7f05d5c9d4e3] > (--> > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_op_perform_add_bricks+0x8fe)[0x7f05d5d10661] > ) 0-: Starting GlusterFS: /build/install/sbin/glusterfsd -s > slave33.cloud.gluster.org --volfile-id > patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p > /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid > -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name > /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log > --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0 > --brick-port 49167 --xlator-option patchy-server.listen-port=49167 > [2015-07-02 10:20:20.624297] I [MSGID: 106144] > [glusterd-pmap.c:269:pmap_registry_remove] 0-pmap: removing brick (null) on > port 49167 > [2015-07-02 10:20:20.625315] E [MSGID: 106005] > [glusterd-utils.c:4448:glusterd_brick_start] 0-management: Unable to start > brick slave33.cloud.gluster.org:/d/backends/patchy5 > [2015-07-02 10:20:20.625354] E [MSGID: 106074] > [glusterd-brick-ops.c:2096:glusterd_op_add_brick] 0-glusterd: Unable to add > bricks > [2015-07-02 10:20:20.625368] E [MSGID: 106123] > [glusterd-syncop.c:1416:gd_commit_op_phase] 0-management: Commit of > operation 'Volume Add brick' failed on localhost > > > Brick Log: > > [2015-07-02 10:20:20.608547] I [MSGID: 100030] [glusterfsd.c:2296:main] > 0-/build/install/sbin/glusterfsd: Started running > /build/install/sbin/glusterfsd version 3.8dev (args: > /build/install/sbin/glusterfsd -s slave33.cloud.gluster.org --volfile-id > patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p > /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid > -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name > /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log > --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0 > --brick-port 49167 --xlator-option patchy-server.listen-port=49167) > [2015-07-02 10:20:20.617113] I [MSGID: 101190] > [event-epoll.c:627:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2015-07-02 10:20:20.623097] I [MSGID: 101173] > [graph.c:268:gf_add_cmdline_options] 0-patchy-server: adding option > 'listen-port' for volume 'patchy-server' with value '49167' > [2015-07-02 10:20:20.623135] I [MSGID: 101173] > [graph.c:268:gf_add_cmdline_options] 0-patchy-posix: adding option > 'glusterd-uuid' for volume 'patchy-posix' with value > 'da011de8-9103-4cf2-9f4b-03707d0019d0' > [2015-07-02 10:20:20.623358] I [MSGID: 115034] > [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format > check for non-addr auth option auth.login./d/backends/patchy5.allow > [2015-07-02 10:20:20.623374] I [MSGID: 115034] > [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format > check for non-addr auth option > auth.login.96bcb872-559b-4f19-84ad-a735dc6068f6.password > [2015-07-02 10:20:20.623568] I > [rpcsvc.c:2210:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured > rpc.outstanding-rpc-limit with value 64 > [2015-07-02 10:20:20.623633] W [MSGID: 101002] > [options.c:952:xl_opt_validate] 0-patchy-server: option 'listen-port' is > deprecated, preferred is 'transport.socket.listen-port', continuing with > correction > [2015-07-02 10:20:20.623707] E [socket.c:818:__socket_server_bind] > 0-tcp.patchy-server: binding to failed: Address already in use > [2015-07-02 10:20:20.623720] E [socket.c:821:__socket_server_bind] > 0-tcp.patchy-server: Port is already in use > [2015-07-02 10:20:20.623746] W [rpcsvc.c:1599:rpcsvc_transport_create] > 0-rpc-service: listening on transport failed > [2015-07-02 10:20:20.623758] W [MSGID: 115045] [server.c:998:init] > 0-patchy-server: creation of listener failed > [2015-07-02 10:20:20.623772] E [MSGID: 101019] [xlator.c:423:xlator_init] > 0-patchy-server: Initialization of volume 'patchy-server' failed, review > your volfile again > [2015-07-02 10:20:20.623783] E [MSGID: 101066] > [graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator > failed > [2015-07-02 10:20:20.623792] E [MSGID: 101176] > [graph.c:669:glusterfs_graph_activate] 0-graph: init failed > [2015-07-02 10:20:20.
Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
Yep.. Thanks Guys. - Original Message - From: "Atin Mukherjee" To: "Joseph Fernandes" Cc: kpart...@redhat.com, "Atin Mukherjee" , "Gluster Devel" Sent: Thursday, July 2, 2015 9:45:01 PM Subject: Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t Joe, Please refer to Prasanna's mail. He has uploaded a patch to solve it. -Atin Sent from one plus one On Jul 2, 2015 9:42 PM, "Joseph Fernandes" wrote: > Hi All, > > This is the same issue as the previous tiering regression failure. > > Volume brick not able to start brick because port is busy > > [2015-07-02 10:20:20.601372] [run.c:190:runner_log] (--> > /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7f05e080bc32] > (--> /build/install/lib/libglusterfs.so.0(runner_log+0x192)[0x7f05e08754ce] > (--> > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0xae7)[0x7f05d5c935d7] > (--> > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_brick_start+0x151)[0x7f05d5c9d4e3] > (--> > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_op_perform_add_bricks+0x8fe)[0x7f05d5d10661] > ) 0-: Starting GlusterFS: /build/install/sbin/glusterfsd -s > slave33.cloud.gluster.org --volfile-id > patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p > /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid > -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name > /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log > --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0 > --brick-port 49167 --xlator-option patchy-server.listen-port=49167 > [2015-07-02 10:20:20.624297] I [MSGID: 106144] > [glusterd-pmap.c:269:pmap_registry_remove] 0-pmap: removing brick (null) on > port 49167 > [2015-07-02 10:20:20.625315] E [MSGID: 106005] > [glusterd-utils.c:4448:glusterd_brick_start] 0-management: Unable to start > brick slave33.cloud.gluster.org:/d/backends/patchy5 > [2015-07-02 10:20:20.625354] E [MSGID: 106074] > [glusterd-brick-ops.c:2096:glusterd_op_add_brick] 0-glusterd: Unable to add > bricks > [2015-07-02 10:20:20.625368] E [MSGID: 106123] > [glusterd-syncop.c:1416:gd_commit_op_phase] 0-management: Commit of > operation 'Volume Add brick' failed on localhost > > > Brick Log: > > [2015-07-02 10:20:20.608547] I [MSGID: 100030] [glusterfsd.c:2296:main] > 0-/build/install/sbin/glusterfsd: Started running > /build/install/sbin/glusterfsd version 3.8dev (args: > /build/install/sbin/glusterfsd -s slave33.cloud.gluster.org --volfile-id > patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p > /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid > -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name > /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log > --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0 > --brick-port 49167 --xlator-option patchy-server.listen-port=49167) > [2015-07-02 10:20:20.617113] I [MSGID: 101190] > [event-epoll.c:627:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2015-07-02 10:20:20.623097] I [MSGID: 101173] > [graph.c:268:gf_add_cmdline_options] 0-patchy-server: adding option > 'listen-port' for volume 'patchy-server' with value '49167' > [2015-07-02 10:20:20.623135] I [MSGID: 101173] > [graph.c:268:gf_add_cmdline_options] 0-patchy-posix: adding option > 'glusterd-uuid' for volume 'patchy-posix' with value > 'da011de8-9103-4cf2-9f4b-03707d0019d0' > [2015-07-02 10:20:20.623358] I [MSGID: 115034] > [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format > check for non-addr auth option auth.login./d/backends/patchy5.allow > [2015-07-02 10:20:20.623374] I [MSGID: 115034] > [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format > check for non-addr auth option > auth.login.96bcb872-559b-4f19-84ad-a735dc6068f6.password > [2015-07-02 10:20:20.623568] I > [rpcsvc.c:2210:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured > rpc.outstanding-rpc-limit with value 64 > [2015-07-02 10:20:20.623633] W [MSGID: 101002] > [options.c:952:xl_opt_validate] 0-patchy-server: option 'listen-port' is > deprecated, preferred is 'transport.socket.listen-port', continuing with > correction > [2015-07-02 10:20:20.623707] E [socket.c:818:__socket_server_bind] > 0-tcp.patchy-server: binding to failed: Address already in use > [2015-07-02 10:20:20.623720] E [socket.c:821:__socket_server_bind] > 0-tcp.patchy-server: Port is already in use > [2015-07-02 10:20:20.623746] W [rpcsvc.c:1599:rpcsvc_transport_create] > 0-rpc-service: listening on transport failed > [2015-07-02 10:20:20.623758] W [MSGID: 115045] [server.c:998:init] > 0-patchy-server: creation of listener failed > [2015-07-02 10:20:20.623772] E [MSGID: 101019] [xlator.c:423:xlator_init] > 0-patchy-server: Initialization of volume 'patchy-serv
Re: [Gluster-devel] Mount hangs because of connection delays
Pranith, I understand the bug and a more generic layer solution would be desirable and apt, rather than repeating things at each xlator. However, I am always confused about notifications and its processing, so cannot state with conviction that this is fine and will work elegantly. Will leave others to chime in with the same. Shyam On 07/02/2015 09:34 AM, Pranith Kumar Karampuri wrote: hi, When glusterfs mount process is coming up all cluster xlators wait for at least one event from all the children before propagating the status upwards. Sometimes client xlator takes upto 2 minutes to propogate this event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to this xavi implemented timer in ec notify where we treat a child as down if it doesn't come up in 10 seconds. Similar patch went up for review @http://review.gluster.org/#/c/3 for afr. Kritika raised an interesting point in the review that all cluster xlators need to have this logic for the mount to not hang, and the correct place to fix it would be client xlator itself. i.e. add the timer logic in client xlator. Which seems like a better approach. I just want to take inputs from everyone before we go ahead in that direction. i.e. on PARENT_UP in client xlator it will start a timer and if no rpc notification is received in that timeout it treats the client xlator as down. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Mount hangs because of connection delays
On 07/02/2015 07:04 PM, Pranith Kumar Karampuri wrote: hi, When glusterfs mount process is coming up all cluster xlators wait for at least one event from all the children before propagating the status upwards. Sometimes client xlator takes upto 2 minutes to propogate this event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to this xavi implemented timer in ec notify where we treat a child as down if it doesn't come up in 10 seconds. Similar patch went up for review @http://review.gluster.org/#/c/3 for afr. Kritika raised an interesting point in the review that all cluster xlators need to have this logic for the mount to not hang, and the correct place to fix it would be client xlator itself. i.e. add the timer logic in client xlator. Which seems like a better approach. I think it makes sense to handle the change only in relevant cluster xlators like AFR/EC because of the notion of high availability associated with them. In my limited understanding, protocol-client is the originator (?) of the child up/down events. While it looks okay to allow cluster xlators to take certain decisions because the 'originator' did not respond within a specific time, altering the originator itself without giving a chance to the upper xlators to make choices seems incorrect to me. Perhaps I'm wrong, but setting an unconditional 10 second timer on protocol/client seems to beat the purpose of having a configurable `network.ping-timeout` volume set option. Just my two cents. :) I just want to take inputs from everyone before we go ahead in that direction. i.e. on PARENT_UP in client xlator it will start a timer and if no rpc notification is received in that timeout it treats the client xlator as down. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
I've reverted [1] which brought the change allow-insecure to be on by default. The patch seems to have issues which will be addressed and merged later. The revert can be found at [2]. [1] http://review.gluster.org/11274 [2] http://review.gluster.org/11507 Please let me know if the regressions are still failing. regards, Raghavendra. - Original Message - > From: "Joseph Fernandes" > To: "Atin Mukherjee" > Cc: "Gluster Devel" > Sent: Thursday, July 2, 2015 9:49:16 PM > Subject: Re: [Gluster-devel] Failure in > tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t > > Yep.. Thanks Guys. > > - Original Message - > From: "Atin Mukherjee" > To: "Joseph Fernandes" > Cc: kpart...@redhat.com, "Atin Mukherjee" , "Gluster > Devel" > Sent: Thursday, July 2, 2015 9:45:01 PM > Subject: Re: [Gluster-devel] Failure in > tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t > > Joe, > > Please refer to Prasanna's mail. He has uploaded a patch to solve it. > > -Atin > Sent from one plus one > On Jul 2, 2015 9:42 PM, "Joseph Fernandes" wrote: > > > Hi All, > > > > This is the same issue as the previous tiering regression failure. > > > > Volume brick not able to start brick because port is busy > > > > [2015-07-02 10:20:20.601372] [run.c:190:runner_log] (--> > > /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7f05e080bc32] > > (--> /build/install/lib/libglusterfs.so.0(runner_log+0x192)[0x7f05e08754ce] > > (--> > > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0xae7)[0x7f05d5c935d7] > > (--> > > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_brick_start+0x151)[0x7f05d5c9d4e3] > > (--> > > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_op_perform_add_bricks+0x8fe)[0x7f05d5d10661] > > ) 0-: Starting GlusterFS: /build/install/sbin/glusterfsd -s > > slave33.cloud.gluster.org --volfile-id > > patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p > > /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid > > -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name > > /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log > > --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0 > > --brick-port 49167 --xlator-option patchy-server.listen-port=49167 > > [2015-07-02 10:20:20.624297] I [MSGID: 106144] > > [glusterd-pmap.c:269:pmap_registry_remove] 0-pmap: removing brick (null) on > > port 49167 > > [2015-07-02 10:20:20.625315] E [MSGID: 106005] > > [glusterd-utils.c:4448:glusterd_brick_start] 0-management: Unable to start > > brick slave33.cloud.gluster.org:/d/backends/patchy5 > > [2015-07-02 10:20:20.625354] E [MSGID: 106074] > > [glusterd-brick-ops.c:2096:glusterd_op_add_brick] 0-glusterd: Unable to add > > bricks > > [2015-07-02 10:20:20.625368] E [MSGID: 106123] > > [glusterd-syncop.c:1416:gd_commit_op_phase] 0-management: Commit of > > operation 'Volume Add brick' failed on localhost > > > > > > Brick Log: > > > > [2015-07-02 10:20:20.608547] I [MSGID: 100030] [glusterfsd.c:2296:main] > > 0-/build/install/sbin/glusterfsd: Started running > > /build/install/sbin/glusterfsd version 3.8dev (args: > > /build/install/sbin/glusterfsd -s slave33.cloud.gluster.org --volfile-id > > patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p > > /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid > > -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name > > /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log > > --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0 > > --brick-port 49167 --xlator-option patchy-server.listen-port=49167) > > [2015-07-02 10:20:20.617113] I [MSGID: 101190] > > [event-epoll.c:627:event_dispatch_epoll_worker] 0-epoll: Started thread > > with index 1 > > [2015-07-02 10:20:20.623097] I [MSGID: 101173] > > [graph.c:268:gf_add_cmdline_options] 0-patchy-server: adding option > > 'listen-port' for volume 'patchy-server' with value '49167' > > [2015-07-02 10:20:20.623135] I [MSGID: 101173] > > [graph.c:268:gf_add_cmdline_options] 0-patchy-posix: adding option > > 'glusterd-uuid' for volume 'patchy-posix' with value > > 'da011de8-9103-4cf2-9f4b-03707d0019d0' > > [2015-07-02 10:20:20.623358] I [MSGID: 115034] > > [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format > > check for non-addr auth option auth.login./d/backends/patchy5.allow > > [2015-07-02 10:20:20.623374] I [MSGID: 115034] > > [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format > > check for non-addr auth option > > auth.login.96bcb872-559b-4f19-84ad-a735dc6068f6.password > > [2015-07-02 10:20:20.623568] I > > [rpcsvc.c:2210:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured > > rpc.outstanding-rpc-limit with value 64 > > [2015-07-02 10:20:20.623633] W [MSGID: 101002] >
Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
I've reverted [1] which brought the change allow-insecure to be on by default. The patch seems to have issues which will be addressed and merged later. The revert can be found at [2]. [1] http://review.gluster.org/11274 [2] http://review.gluster.org/11507 Please let me know if the regressions are still failing. regards, Raghavendra. - Original Message - > From: "Atin Mukherjee" > To: "Prasanna Kalever" > Cc: "Gluster Devel" > Sent: Thursday, July 2, 2015 9:41:33 PM > Subject: Re: [Gluster-devel] spurious failures > tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t > > > > Thanks Prasanna for the patches :) > > -Atin > Sent from one plus one > On Jul 2, 2015 9:19 PM, "Prasanna Kalever" < pkale...@redhat.com > wrote: > > > > This is caused because when bind-insecure is turned on (which is the default > now), it may happen > that brick is not able to bind to port assigned by Glusterd for example > 49192-49195... > It seems to occur because the rpc_clnt connections are binding to ports in > the same range. > so brick fails to bind to a port which is already used by someone else. > > This bug already exist before http://review.gluster.org/#/c/11039/ when use > rdma, i.e. even > previously rdma binds to port >= 1024 if it cannot find a free port < 1024, > even when bind insecure was turned off (ref to commit '0e3fd04e'). > Since we don't have tests related to rdma we did not discover this issue > previously. > > http://review.gluster.org/#/c/11039/ discovers the bug we encountered, > however now the bug can be fixed by > http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers > from 65535 in a descending > order, as a result port clash is minimized, also it fixes issues in rdma too > > Thanks to Raghavendra Talur for help in discovering the real cause > > > Regards, > Prasanna Kalever > > > > - Original Message - > From: "Raghavendra Talur" < raghavendra.ta...@gmail.com > > To: "Krishnan Parthasarathi" < kpart...@redhat.com > > Cc: "Gluster Devel" < gluster-devel@gluster.org > > Sent: Thursday, July 2, 2015 6:45:17 PM > Subject: Re: [Gluster-devel] spurious failures > tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t > > > > On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur < > raghavendra.ta...@gmail.com > wrote: > > > > > > On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi < kpart...@redhat.com > > wrote: > > > > > > > > > A port assigned by Glusterd for a brick is found to be in use already by > > > the brick. Any changes in Glusterd recently which can cause this? > > > > > > Or is it a test infra problem? > > This issue is likely to be caused by http://review.gluster.org/11039 > This patch changes the port allocation that happens for rpc_clnt based > connections. Previously, ports allocated where < 1024. With this change, > these connections, typically mount process, gluster-nfs server processes > etc could end up using ports that bricks are being assigned to. > > IIUC, the intention of the patch was to make server processes lenient to > inbound messages from ports > 1024. If we don't require to use ports > 1024 > we could leave the port allocation for rpc_clnt connections as before. > Alternately, we could reserve the range of ports starting from 49152 for > bricks > by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is specific > to Linux. > I'm not aware of how this could be done in NetBSD for instance though. > > > It seems this is exactly whats happening. > > I have a question, I get the following data from netstat and grep > > tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd > tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd > unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd > /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket > > Here 31516 is the brick pid. > > Looking at the data, line 2 is very clear, it shows connection between brick > and glusterfs client. > unix socket on line 3 is also clear, it is the unix socket connection that > glusterd and brick process use for communication. > > I am not able to understand line 1; which part of brick process established a > tcp connection with glusterd using port 1023? > Note: this data is from a build which does not have the above mentioned > patch. > > > The patch which exposed this bug is being reverted till the underlying bug is > also fixed. > You can monitor revert patches here > master: http://review.gluster.org/11507 > 3.7 branch: http://review.gluster.org/11508 > > Please rebase your patches after the above patches are merged to ensure that > you patches pass regression. > > > > > > -- > Raghavendra Talur > > > > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ > Gluster-devel mailing list > Glu
Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
Thanks Raghavendra Talur for root causing the issue. I've reverted the patch you pointed out. - Original Message - > From: "Raghavendra Talur" > To: "Nithya Balachandran" > Cc: "Gluster Devel" > Sent: Thursday, July 2, 2015 6:44:22 PM > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > > > On Thu, Jul 2, 2015 at 1:26 PM, Nithya Balachandran < nbala...@redhat.com > > wrote: > > > > > - Original Message - > > From: "Kotresh Hiremath Ravishankar" < khire...@redhat.com > > > To: "Susant Palai" < spa...@redhat.com > > > Cc: "Gluster Devel" < gluster-devel@gluster.org > > > Sent: Thursday, July 2, 2015 1:03:18 PM > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > > > Comments inline. > > > > Thanks and Regards, > > Kotresh H R > > > > - Original Message - > > > From: "Susant Palai" < spa...@redhat.com > > > > To: "Sachin Pandit" < span...@redhat.com > > > > Cc: "Kotresh Hiremath Ravishankar" < khire...@redhat.com >, "Gluster > > > Devel" > > > < gluster-devel@gluster.org > > > > Sent: Thursday, July 2, 2015 12:35:08 PM > > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > > > > > Comments inline. > > > > > > - Original Message - > > > > From: "Sachin Pandit" < span...@redhat.com > > > > > To: "Kotresh Hiremath Ravishankar" < khire...@redhat.com > > > > > Cc: "Gluster Devel" < gluster-devel@gluster.org > > > > > Sent: Thursday, July 2, 2015 12:21:44 PM > > > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t > > > > > > > > - Original Message - > > > > > From: "Vijaikumar M" < vmall...@redhat.com > > > > > > To: "Kotresh Hiremath Ravishankar" < khire...@redhat.com >, "Gluster > > > > > Devel" > > > > > < gluster-devel@gluster.org > > > > > > Cc: "Sachin Pandit" < span...@redhat.com > > > > > > Sent: Thursd, TOTAL CHANGELOGS: 106 > > [2015-07-02 07:01:06.883504] E > > [gf-history-changelog.c:877:gf_history_changelog] 0-gfchangelog: wrong > > result for start: 1435818ay, July 2, 2015 12:01:03 PM > > > > > Subject: Re: Regression Failure: ./tests/basic/quota.t > > > > > > > > > > We look into this issue > > > > > > > > > > Thanks, > > > > > Vijay > > > > > > > > > > On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar > > > > > wrote: > > > > > > Hi, > > > > > > > > > > > > I see quota.t regression failure for the following. The changes are > > > > > > related > > > > > > to > > > > > > example programs in libgfchangelog. > > > > > > > > > > > > http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull > > > > > > > > > > > > Could someone from quota team, take a look at it. > > > > > > > > Hi, > > > > > > > > I had a quick look at this. It looks like the following test case > > > > failed > > > > > > > > TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4} > > > > EXPECT_WITHIN $REBALANCE_TIMEOUT "0" rebalance_completed > > > > > > > > Looks like the same "port in use" issue. From the d-backends-brick3.log: > > > [2015-07-01 09:27:17.821430] E [socket.c:818:__socket_server_bind] > 0-tcp.patchy-server: binding to failed: Address already in use > [2015-07-01 09:27:17.821441] E [socket.c:821:__socket_server_bind] > 0-tcp.patchy-server: Port is already in use > [2015-07-01 09:27:17.821452] W [rpcsvc.c:1599:rpcsvc_transport_create] > 0-rpc-service: listening on transport failed > [2015-07-01 09:27:17.821462] W [MSGID: 115045] [server.c:996:init] > 0-patchy-server: creation of listener failed > [2015-07-01 09:27:17.821475] E [MSGID: 101019] [xlator.c:423:xlator_init] > 0-patchy-server: Initialization of volume 'patchy-server' failed, review > your volfile again > [2015-07-01 09:27:17.821485] E [MSGID: 101066] > [graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator > failed > [2015-07-01 09:27:17.821495] E [MSGID: 101176] > [graph.c:669:glusterfs_graph_activate] 0-graph: init failed > [2015-07-01 09:27:17.821891] W [glusterfsd.c:1214:cleanup_and_exit] (--> 0-: > received signum (0), shutting down > > > The patch which exposed this bug is being reverted till the underlying bug is > also fixed. > You can monitor revert patches here > master: http://review.gluster.org/11507 > 3.7 branch: http://review.gluster.org/11508 > > Please rebase your patches after the above patches are merged to ensure that > you patches pass regression. > > > > > > > > > > > > > I looked at the logs too, and found out the following errors > > > > > > > > patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026] > > > > [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix layout > > > > on > > > > / > > > > failed > > > > build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01 > > > > 09:27:23.040998] > > > > E > > > > [MSGID: 106224] > > > > [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle] > > > > 0-management: > > > > Failed to update status > > > > StartMigrationDuringRebalanceT
Re: [Gluster-devel] Gluster and GCC 5.1
> Or perhaps we could just get everyone to stop using 'inline' I agree that it would be a good thing to reduce/modify our use of 'inline' significantly. Any advantage gained from avoiding normal function-all entry/exit has to be weighed against cache pollution from having the same code repeated over and over each place the function is invoked. Careful use of 'extern inline' can be good for performance and/or readability once in a great while, but IMO we should avoid 'inline' except in cases where the benefits are *proven*. On a similar note, can we please please please get people to stop abusing macros so much? I get that they're sometimes useful to get around C's lack of generics or other features, but many of our long complicated macros have no such justification. These pollute the cache just like 'inline' does, plus they make code harder to debug or edit. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster and GCC 5.1
Agree with Jeff But now the question is should we have this patch go through or leave it to float in time and space :) http://review.gluster.org/#/c/11214/ The patch just makes the existing inline calls static or extern appropriately without causing any harm to the existing code but removes the risk of undefined symbols for normal inline functions in gcc 5 and above! - Original Message - From: "Jeff Darcy" To: "Kaleb S. KEITHLEY" Cc: "gluster-infra" , "Gluster Devel" Sent: Friday, July 3, 2015 6:05:14 AM Subject: Re: [Gluster-devel] Gluster and GCC 5.1 > Or perhaps we could just get everyone to stop using 'inline' I agree that it would be a good thing to reduce/modify our use of 'inline' significantly. Any advantage gained from avoiding normal function-all entry/exit has to be weighed against cache pollution from having the same code repeated over and over each place the function is invoked. Careful use of 'extern inline' can be good for performance and/or readability once in a great while, but IMO we should avoid 'inline' except in cases where the benefits are *proven*. On a similar note, can we please please please get people to stop abusing macros so much? I get that they're sometimes useful to get around C's lack of generics or other features, but many of our long complicated macros have no such justification. These pollute the cache just like 'inline' does, plus they make code harder to debug or edit. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster and GCC 5.1
On Jul 2, 2015, at 8:35 PM, Jeff Darcy wrote: >> Or perhaps we could just get everyone to stop using 'inline' > > I agree that it would be a good thing to reduce/modify our use of > 'inline' significantly. Any advantage gained from avoiding normal > function-all entry/exit has to be weighed against cache pollution from > having the same code repeated over and over each place the function is > invoked. Careful use of 'extern inline' can be good for performance > and/or readability once in a great while, but IMO we should avoid > 'inline' except in cases where the benefits are *proven*. > > On a similar note, can we please please please get people to stop > abusing macros so much? I get that they're sometimes useful to get > around C's lack of generics or other features, but many of our long > complicated macros have no such justification. These pollute the cache > just like 'inline' does, plus they make code harder to debug or edit. And in the past, if not now, are contributing factors to small file performance issues. -peter > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
> It seems this is exactly whats happening. > > I have a question, I get the following data from netstat and grep > > tcp0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 > ESTABLISHED 31516/glusterfsd > tcp0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 > ESTABLISHED 31516/glusterfsd > unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd > /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket > > Here 31516 is the brick pid. > > Looking at the data, line 2 is very clear, it shows connection between > brick and glusterfs client. > unix socket on line 3 is also clear, it is the unix socket connection that > glusterd and brick process use for communication. > > I am not able to understand line 1; which part of brick process established > a tcp connection with glusterd using port 1023? This is the rpc connection from any glusterfs(d) process to glusterd to fetch volfile on receiving notification from glusterd. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
- Original Message - > > This is caused because when bind-insecure is turned on (which is the default > now), it may happen > that brick is not able to bind to port assigned by Glusterd for example > 49192-49195... > It seems to occur because the rpc_clnt connections are binding to ports in > the same range. > so brick fails to bind to a port which is already used by someone else. > > This bug already exist before http://review.gluster.org/#/c/11039/ when use > rdma, i.e. even > previously rdma binds to port >= 1024 if it cannot find a free port < 1024, > even when bind insecure was turned off (ref to commit '0e3fd04e'). > Since we don't have tests related to rdma we did not discover this issue > previously. > > http://review.gluster.org/#/c/11039/ discovers the bug we encountered, > however now the bug can be fixed by > http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers > from 65535 in a descending > order, as a result port clash is minimized, also it fixes issues in rdma too This approach could still surprise the storage-admin when glusterfs(d) processes bind to ports in the range where brick ports are being assigned. We should make this predictable by reserving brick ports setting net.ipv4.ip_local_reserved_ports. Initially reserve 50 ports starting at 49152. Subsequently, we could reserve ports on demand, say 50 more ports, when we exhaust previously reserved range. net.ipv4.ip_local_reserved_ports doesn't interfere with explicit port allocation behaviour. i.e if the socket uses a port other than zero. With this option we don't have to manage ports assignment at a process level. Thoughts? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
On 07/03/2015 11:58 AM, Krishnan Parthasarathi wrote: > > > - Original Message - >> >> This is caused because when bind-insecure is turned on (which is the default >> now), it may happen >> that brick is not able to bind to port assigned by Glusterd for example >> 49192-49195... >> It seems to occur because the rpc_clnt connections are binding to ports in >> the same range. >> so brick fails to bind to a port which is already used by someone else. >> >> This bug already exist before http://review.gluster.org/#/c/11039/ when use >> rdma, i.e. even >> previously rdma binds to port >= 1024 if it cannot find a free port < 1024, >> even when bind insecure was turned off (ref to commit '0e3fd04e'). >> Since we don't have tests related to rdma we did not discover this issue >> previously. >> >> http://review.gluster.org/#/c/11039/ discovers the bug we encountered, >> however now the bug can be fixed by >> http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers >> from 65535 in a descending >> order, as a result port clash is minimized, also it fixes issues in rdma too > > This approach could still surprise the storage-admin when glusterfs(d) > processes > bind to ports in the range where brick ports are being assigned. We should > make this > predictable by reserving brick ports setting net.ipv4.ip_local_reserved_ports. > Initially reserve 50 ports starting at 49152. Subsequently, we could reserve > ports on demand, > say 50 more ports, when we exhaust previously reserved range. > net.ipv4.ip_local_reserved_ports > doesn't interfere with explicit port allocation behaviour. i.e if the socket > uses > a port other than zero. With this option we don't have to manage ports > assignment at a process > level. Thoughts? If the reallocation can be done on demand, I do think this is a better approach to tackle this problem. > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > -- ~Atin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel