Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t

2015-07-02 Thread Susant Palai
Comments inline.

- Original Message -
> From: "Sachin Pandit" 
> To: "Kotresh Hiremath Ravishankar" 
> Cc: "Gluster Devel" 
> Sent: Thursday, July 2, 2015 12:21:44 PM
> Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> 
> - Original Message -
> > From: "Vijaikumar M" 
> > To: "Kotresh Hiremath Ravishankar" , "Gluster Devel"
> > 
> > Cc: "Sachin Pandit" 
> > Sent: Thursday, July 2, 2015 12:01:03 PM
> > Subject: Re: Regression Failure: ./tests/basic/quota.t
> > 
> > We look into this issue
> > 
> > Thanks,
> > Vijay
> > 
> > On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar wrote:
> > > Hi,
> > >
> > > I see quota.t regression failure for the following. The changes are
> > > related
> > > to
> > > example programs in libgfchangelog.
> > >
> > > http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull
> > >
> > > Could someone from quota team, take a look at it.
> 
> Hi,
> 
> I had a quick look at this. It looks like the following test case failed
> 
> TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4}
> EXPECT_WITHIN $REBALANCE_TIMEOUT "0" rebalance_completed
> 
> 
> I looked at the logs too, and found out the following errors
> 
> patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026]
> [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix layout on /
> failed
> build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01 09:27:23.040998] E
> [MSGID: 106224]
> [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle] 0-management:
> Failed to update status
> StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.557887]
> E [rpc-clnt.c:362:saved_frames_unwind] (-->
> /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7fc882d04d5a]
> (-->
> /build/install/lib/libgfrpc.so.0(saved_frames_unwind+0x212)[0x7fc882ace086]
> (-->
> /build/install/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7fc882ace183]
> (-->
> /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7fc882ace615]
> (--> /build/install/lib/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fc882acf00f]
> ) 0-StartMigrationDuringRebalanceTest-client-0: forced unwinding frame
> type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-06-19 14:34:47.554862
> (xid=0xc)
> StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.561191]
> E [MSGID: 114031] [client-rpc-fops.c:1623:client3_3_inodelk_cbk]
> 0-StartMigrationDuringRebalanceTest-client-0: remote operation failed:
> Transport endpoint is not connected [Transport endpoint is not connected]
> StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.561417]
> E [socket.c:2332:socket_connect_finish]
> 0-StartMigrationDuringRebalanceTest-client-0: connection to
> 23.253.62.104:24007 failed (Connection refused)
> StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19 14:34:47.561707]
> E [dht-common.c:2643:dht_find_local_subvol_cbk]
> 0-StartMigrationDuringRebalanceTest-dht: getxattr err (Transport endpoint is
> not connected) for dir
> 
Seems like a network partition. Rebalance fails if there it receives ENOTCONN 
on it's child.

> 
> Any help regarding this or more information on this would be much
> appreciated.
> 
> Thanks,
> Sachin Pandit.
> 
> 
> > >
> > > Thanks and Regards,
> > > Kotresh H R
> > >
> > 
> > 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t

2015-07-02 Thread Kotresh Hiremath Ravishankar
Comments inline.

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "Susant Palai" 
> To: "Sachin Pandit" 
> Cc: "Kotresh Hiremath Ravishankar" , "Gluster Devel" 
> 
> Sent: Thursday, July 2, 2015 12:35:08 PM
> Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> 
> Comments inline.
> 
> - Original Message -
> > From: "Sachin Pandit" 
> > To: "Kotresh Hiremath Ravishankar" 
> > Cc: "Gluster Devel" 
> > Sent: Thursday, July 2, 2015 12:21:44 PM
> > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> > 
> > - Original Message -
> > > From: "Vijaikumar M" 
> > > To: "Kotresh Hiremath Ravishankar" , "Gluster Devel"
> > > 
> > > Cc: "Sachin Pandit" 
> > > Sent: Thursd, TOTAL CHANGELOGS: 106
[2015-07-02 07:01:06.883504] E 
[gf-history-changelog.c:877:gf_history_changelog] 0-gfchangelog: wrong result 
for start: 1435818ay, July 2, 2015 12:01:03 PM
> > > Subject: Re: Regression Failure: ./tests/basic/quota.t
> > > 
> > > We look into this issue
> > > 
> > > Thanks,
> > > Vijay
> > > 
> > > On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar wrote:
> > > > Hi,
> > > >
> > > > I see quota.t regression failure for the following. The changes are
> > > > related
> > > > to
> > > > example programs in libgfchangelog.
> > > >
> > > > http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull
> > > >
> > > > Could someone from quota team, take a look at it.
> > 
> > Hi,
> > 
> > I had a quick look at this. It looks like the following test case failed
> > 
> > TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4}
> > EXPECT_WITHIN $REBALANCE_TIMEOUT "0" rebalance_completed
> > 
> > 
> > I looked at the logs too, and found out the following errors
> > 
> > patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026]
> > [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix layout on /
> > failed
> > build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01 09:27:23.040998] E
> > [MSGID: 106224]
> > [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle]
> > 0-management:
> > Failed to update status
> > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19
> > 14:34:47.557887]
> > E [rpc-clnt.c:362:saved_frames_unwind] (-->
> > /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7fc882d04d5a]
> > (-->
> > /build/install/lib/libgfrpc.so.0(saved_frames_unwind+0x212)[0x7fc882ace086]
> > (-->
> > /build/install/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7fc882ace183]
> > (-->
> > /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7fc882ace615]
> > (-->
> > /build/install/lib/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fc882acf00f]
> > ) 0-StartMigrationDuringRebalanceTest-client-0: forced unwinding frame
> > type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-06-19 14:34:47.554862
> > (xid=0xc)
> > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19
> > 14:34:47.561191]
> > E [MSGID: 114031] [client-rpc-fops.c:1623:client3_3_inodelk_cbk]
> > 0-StartMigrationDuringRebalanceTest-client-0: remote operation failed:
> > Transport endpoint is not connected [Transport endpoint is not connected]
> > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19
> > 14:34:47.561417]
> > E [socket.c:2332:socket_connect_finish]
> > 0-StartMigrationDuringRebalanceTest-client-0: connection to
> > 23.253.62.104:24007 failed (Connection refused)
> > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19
> > 14:34:47.561707]
> > E [dht-common.c:2643:dht_find_local_subvol_cbk]
> > 0-StartMigrationDuringRebalanceTest-dht: getxattr err (Transport endpoint
> > is
> > not connected) for dir
> > 
> Seems like a network partition. Rebalance fails if there it receives ENOTCONN
> on it's child.

Is this intended to happen on regression machines?
> 
> > 
> > Any help regarding this or more information on this would be much
> > appreciated.
> > 
> > Thanks,
> > Sachin Pandit.
> > 
> > 
> > > >
> > > > Thanks and Regards,
> > > > Kotresh H R
> > > >
> > > 
> > > 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> > 
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Unable to send patches to review.gluster.org

2015-07-02 Thread Anuradha Talur
Working fine for me now. In case someone hasn't checked, try using it.

- Original Message -
> From: "Anoop C S" 
> To: gluster-devel@gluster.org
> Cc: "Anuradha Talur" 
> Sent: Thursday, July 2, 2015 10:41:35 AM
> Subject: Re: [Gluster-devel] Unable to send patches to review.gluster.org
> 
> Same here. git pull from r.g.o failed with the following error.
> 
> Permission denied (publickey).
> fatal: Could not read from remote repository.
> 
> Please make sure you have the correct access rights
> and the repository exists.
> 
> --Anoop C S.
> 
> On 07/02/2015 09:53 AM, Anuradha Talur wrote:
> > Hi,
> > 
> > I'm unable to send patches to r.g.o, also not able to login. I'm
> > getting the following errors respectively: 1) Permission denied
> > (publickey). fatal: Could not read from remote repository.
> > 
> > Please make sure you have the correct access rights and the
> > repository exists.
> > 
> > 2) Internal server error or forbidden access.
> > 
> > Is anyone else facing the same issue?
> > 
> 

-- 
Thanks,
Anuradha.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t

2015-07-02 Thread Nithya Balachandran


- Original Message -
> From: "Kotresh Hiremath Ravishankar" 
> To: "Susant Palai" 
> Cc: "Gluster Devel" 
> Sent: Thursday, July 2, 2015 1:03:18 PM
> Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> 
> Comments inline.
> 
> Thanks and Regards,
> Kotresh H R
> 
> - Original Message -
> > From: "Susant Palai" 
> > To: "Sachin Pandit" 
> > Cc: "Kotresh Hiremath Ravishankar" , "Gluster Devel"
> > 
> > Sent: Thursday, July 2, 2015 12:35:08 PM
> > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> > 
> > Comments inline.
> > 
> > - Original Message -
> > > From: "Sachin Pandit" 
> > > To: "Kotresh Hiremath Ravishankar" 
> > > Cc: "Gluster Devel" 
> > > Sent: Thursday, July 2, 2015 12:21:44 PM
> > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> > > 
> > > - Original Message -
> > > > From: "Vijaikumar M" 
> > > > To: "Kotresh Hiremath Ravishankar" , "Gluster
> > > > Devel"
> > > > 
> > > > Cc: "Sachin Pandit" 
> > > > Sent: Thursd, TOTAL CHANGELOGS: 106
> [2015-07-02 07:01:06.883504] E
> [gf-history-changelog.c:877:gf_history_changelog] 0-gfchangelog: wrong
> result for start: 1435818ay, July 2, 2015 12:01:03 PM
> > > > Subject: Re: Regression Failure: ./tests/basic/quota.t
> > > > 
> > > > We look into this issue
> > > > 
> > > > Thanks,
> > > > Vijay
> > > > 
> > > > On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar wrote:
> > > > > Hi,
> > > > >
> > > > > I see quota.t regression failure for the following. The changes are
> > > > > related
> > > > > to
> > > > > example programs in libgfchangelog.
> > > > >
> > > > > http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull
> > > > >
> > > > > Could someone from quota team, take a look at it.
> > > 
> > > Hi,
> > > 
> > > I had a quick look at this. It looks like the following test case failed
> > > 
> > > TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4}
> > > EXPECT_WITHIN $REBALANCE_TIMEOUT "0" rebalance_completed
> > > 



Looks like the same "port in use" issue. From the d-backends-brick3.log:


[2015-07-01 09:27:17.821430] E [socket.c:818:__socket_server_bind] 
0-tcp.patchy-server: binding to  failed: Address already in use
[2015-07-01 09:27:17.821441] E [socket.c:821:__socket_server_bind] 
0-tcp.patchy-server: Port is already in use
[2015-07-01 09:27:17.821452] W [rpcsvc.c:1599:rpcsvc_transport_create] 
0-rpc-service: listening on transport failed
[2015-07-01 09:27:17.821462] W [MSGID: 115045] [server.c:996:init] 
0-patchy-server: creation of listener failed
[2015-07-01 09:27:17.821475] E [MSGID: 101019] [xlator.c:423:xlator_init] 
0-patchy-server: Initialization of volume 'patchy-server' failed, review your 
volfile again
[2015-07-01 09:27:17.821485] E [MSGID: 101066] 
[graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator 
failed
[2015-07-01 09:27:17.821495] E [MSGID: 101176] 
[graph.c:669:glusterfs_graph_activate] 0-graph: init failed
[2015-07-01 09:27:17.821891] W [glusterfsd.c:1214:cleanup_and_exit] (--> 0-: 
received signum (0), shutting down


> > > 
> > > I looked at the logs too, and found out the following errors
> > > 
> > > patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026]
> > > [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix layout on
> > > /
> > > failed
> > > build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01 09:27:23.040998]
> > > E
> > > [MSGID: 106224]
> > > [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle]
> > > 0-management:
> > > Failed to update status
> > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19
> > > 14:34:47.557887]
> > > E [rpc-clnt.c:362:saved_frames_unwind] (-->
> > > /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7fc882d04d5a]
> > > (-->
> > > /build/install/lib/libgfrpc.so.0(saved_frames_unwind+0x212)[0x7fc882ace086]
> > > (-->
> > > /build/install/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7fc882ace183]
> > > (-->
> > > /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7fc882ace615]
> > > (-->
> > > /build/install/lib/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fc882acf00f]
> > > ) 0-StartMigrationDuringRebalanceTest-client-0: forced unwinding
> > > frame
> > > type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-06-19 14:34:47.554862
> > > (xid=0xc)
> > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19
> > > 14:34:47.561191]
> > > E [MSGID: 114031] [client-rpc-fops.c:1623:client3_3_inodelk_cbk]
> > > 0-StartMigrationDuringRebalanceTest-client-0: remote operation failed:
> > > Transport endpoint is not connected [Transport endpoint is not connected]
> > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19
> > > 14:34:47.561417]
> > > E [socket.c:2332:socket_connect_finish]
> > > 0-StartMigrationDuringRebalanceTest-client-0: connection to
> > > 23.253.62.104:24007 failed (Connection refused)
> > > StartMigrationDuringRebalanceTest-

[Gluster-devel] glusterfs-3.6.4beta2 released

2015-07-02 Thread Raghavendra Bhat

Hi,

glusterfs-3.6.4beta1 has been released and the packages for 
RHEL/Fedora/Centos can be found here.

http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.6.4beta2/

Requesting people running 3.6.x to please try it out and let us know if 
there are any issues.


This release supposedly fixes the bugs listed below since 3.6.4beta1 was 
made available. Thanks to all who submitted patches, reviewed the changes.


1230242 - `ls' on a directory which has files with mismatching gfid's 
does not list anything

1230259 -  Honour afr self-heal volume set options from clients
1122290 - Issues reported by Cppcheck static analysis tool
1227670 - wait for sometime before accessing the activated snapshot
1225745 - [AFR-V2] - afr_final_errno() should treat op_ret > 0 also as 
success

1223891 - readdirp return 64bits inodes even if enable-ino32 is set
1206429 - Maintainin local transaction peer list in op-sm framework
1217419 - DHT:Quota:- brick process crashed after deleting .glusterfs 
from backend

1225072 - OpenSSL multi-threading changes break build in RHEL5 (3.6.4beta1)
1215419 - Autogenerated files delivered in tarball
1224624 - cli: Excessive logging
1217423 - glusterfsd crashed after directory was removed from the mount 
point, while self-heal and rebalance  were running on 
the volume



Regards,
Raghavendra Bhat
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Raghavendra Talur
On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi  wrote:

>
> > >
> > > A port assigned by Glusterd for a brick is found to be in use already
> by
> > > the brick. Any changes in Glusterd recently which can cause this?
> > >
> > > Or is it a test infra problem?
>
> This issue is likely to be caused by http://review.gluster.org/11039
> This patch changes the port allocation that happens for rpc_clnt based
> connections. Previously, ports allocated where < 1024. With this change,
> these connections, typically mount process, gluster-nfs server processes
> etc could end up using ports that bricks are being assigned to.
>
> IIUC, the intention of the patch was to make server processes lenient to
> inbound messages from ports > 1024. If we don't require to use ports > 1024
> we could leave the port allocation for rpc_clnt connections as before.
> Alternately, we could reserve the range of ports starting from 49152 for
> bricks
> by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is
> specific to Linux.
> I'm not aware of how this could be done in NetBSD for instance though.
>


It seems this is exactly whats happening.

I have a question, I get the following data from netstat and grep

tcp0  0 f6be17c0fbf5:1023   f6be17c0fbf5:24007
 ESTABLISHED 31516/glusterfsd
tcp0  0 f6be17c0fbf5:49152  f6be17c0fbf5:490
 ESTABLISHED 31516/glusterfsd
unix  3  [ ] STREAM CONNECTED 988353   31516/glusterfsd
/var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket

Here 31516 is the brick pid.

Looking at the data, line 2 is very clear, it shows connection between
brick and glusterfs client.
unix socket on line 3 is also clear, it is the unix socket connection that
glusterd and brick process use for communication.

I am not able to understand line 1; which part of brick process established
a tcp connection with glusterd using port 1023?
Note: this data is from a build which does not have the above mentioned
patch.

-- 
*Raghavendra Talur *
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

2015-07-02 Thread Pranith Kumar Karampuri

hi Joseph,
   Could you take a look at 
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11842/consoleFull


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

2015-07-02 Thread Dan Lambright
I'll check on this.

- Original Message -
> From: "Pranith Kumar Karampuri" 
> To: "Gluster Devel" , "Joseph Fernandes" 
> 
> Sent: Thursday, July 2, 2015 5:40:34 AM
> Subject: [Gluster-devel] Failure in   
> tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
> 
> hi Joseph,
> Could you take a look at
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/11842/consoleFull
> 
> Pranith
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

2015-07-02 Thread Pranith Kumar Karampuri

Thanks Dan!.

Pranith

On 07/02/2015 06:14 PM, Dan Lambright wrote:

I'll check on this.

- Original Message -

From: "Pranith Kumar Karampuri" 
To: "Gluster Devel" , "Joseph Fernandes" 

Sent: Thursday, July 2, 2015 5:40:34 AM
Subject: [Gluster-devel] Failure in 
tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

hi Joseph,
 Could you take a look at
http://build.gluster.org/job/rackspace-regression-2GB-triggered/11842/consoleFull

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t

2015-07-02 Thread Raghavendra Talur
On Thu, Jul 2, 2015 at 1:26 PM, Nithya Balachandran 
wrote:

>
>
> - Original Message -
> > From: "Kotresh Hiremath Ravishankar" 
> > To: "Susant Palai" 
> > Cc: "Gluster Devel" 
> > Sent: Thursday, July 2, 2015 1:03:18 PM
> > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> >
> > Comments inline.
> >
> > Thanks and Regards,
> > Kotresh H R
> >
> > - Original Message -
> > > From: "Susant Palai" 
> > > To: "Sachin Pandit" 
> > > Cc: "Kotresh Hiremath Ravishankar" , "Gluster
> Devel"
> > > 
> > > Sent: Thursday, July 2, 2015 12:35:08 PM
> > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> > >
> > > Comments inline.
> > >
> > > - Original Message -
> > > > From: "Sachin Pandit" 
> > > > To: "Kotresh Hiremath Ravishankar" 
> > > > Cc: "Gluster Devel" 
> > > > Sent: Thursday, July 2, 2015 12:21:44 PM
> > > > Subject: Re: [Gluster-devel] Regression Failure:
> ./tests/basic/quota.t
> > > >
> > > > - Original Message -
> > > > > From: "Vijaikumar M" 
> > > > > To: "Kotresh Hiremath Ravishankar" , "Gluster
> > > > > Devel"
> > > > > 
> > > > > Cc: "Sachin Pandit" 
> > > > > Sent: Thursd, TOTAL CHANGELOGS: 106
> > [2015-07-02 07:01:06.883504] E
> > [gf-history-changelog.c:877:gf_history_changelog] 0-gfchangelog: wrong
> > result for start: 1435818ay, July 2, 2015 12:01:03 PM
> > > > > Subject: Re: Regression Failure: ./tests/basic/quota.t
> > > > >
> > > > > We look into this issue
> > > > >
> > > > > Thanks,
> > > > > Vijay
> > > > >
> > > > > On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar
> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I see quota.t regression failure for the following. The changes
> are
> > > > > > related
> > > > > > to
> > > > > > example programs in libgfchangelog.
> > > > > >
> > > > > >
> http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull
> > > > > >
> > > > > > Could someone from quota team, take a look at it.
> > > >
> > > > Hi,
> > > >
> > > > I had a quick look at this. It looks like the following test case
> failed
> > > >
> > > > TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4}
> > > > EXPECT_WITHIN $REBALANCE_TIMEOUT "0" rebalance_completed
> > > >
>
>
>
> Looks like the same "port in use" issue. From the d-backends-brick3.log:
>
>
> [2015-07-01 09:27:17.821430] E [socket.c:818:__socket_server_bind]
> 0-tcp.patchy-server: binding to  failed: Address already in use
> [2015-07-01 09:27:17.821441] E [socket.c:821:__socket_server_bind]
> 0-tcp.patchy-server: Port is already in use
> [2015-07-01 09:27:17.821452] W [rpcsvc.c:1599:rpcsvc_transport_create]
> 0-rpc-service: listening on transport failed
> [2015-07-01 09:27:17.821462] W [MSGID: 115045] [server.c:996:init]
> 0-patchy-server: creation of listener failed
> [2015-07-01 09:27:17.821475] E [MSGID: 101019] [xlator.c:423:xlator_init]
> 0-patchy-server: Initialization of volume 'patchy-server' failed, review
> your volfile again
> [2015-07-01 09:27:17.821485] E [MSGID: 101066]
> [graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator
> failed
> [2015-07-01 09:27:17.821495] E [MSGID: 101176]
> [graph.c:669:glusterfs_graph_activate] 0-graph: init failed
> [2015-07-01 09:27:17.821891] W [glusterfsd.c:1214:cleanup_and_exit] (-->
> 0-: received signum (0), shutting down
>
>
The patch which exposed this bug is being reverted till the underlying bug
is also fixed.
You can monitor revert patches here
master: http://review.gluster.org/11507
3.7 branch: http://review.gluster.org/11508

Please rebase your patches after the above patches are merged to ensure
that you patches pass regression.



>
> > > >
> > > > I looked at the logs too, and found out the following errors
> > > >
> > > > patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026]
> > > > [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix
> layout on
> > > > /
> > > > failed
> > > > build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01
> 09:27:23.040998]
> > > > E
> > > > [MSGID: 106224]
> > > > [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle]
> > > > 0-management:
> > > > Failed to update status
> > > > StartMigrationDuringRebalanceTest-rebalance.log:[2015-06-19
> > > > 14:34:47.557887]
> > > > E [rpc-clnt.c:362:saved_frames_unwind] (-->
> > > >
> /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7fc882d04d5a]
> > > > (-->
> > > >
> /build/install/lib/libgfrpc.so.0(saved_frames_unwind+0x212)[0x7fc882ace086]
> > > > (-->
> > > >
> /build/install/lib/libgfrpc.so.0(saved_frames_destroy+0x1f)[0x7fc882ace183]
> > > > (-->
> > > >
> /build/install/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x11e)[0x7fc882ace615]
> > > > (-->
> > > >
> /build/install/lib/libgfrpc.so.0(rpc_clnt_notify+0x147)[0x7fc882acf00f]
> > > > ) 0-StartMigrationDuringRebalanceTest-client-0: forced unwinding
> > > > frame
> > > > type(GlusterFS 3.3) op(LOOKUP(27)) called at 2015-06-19
> 14:34:47.554862
> > > > (

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Raghavendra Talur
On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur <
raghavendra.ta...@gmail.com> wrote:

>
>
> On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi <
> kpart...@redhat.com> wrote:
>
>>
>> > >
>> > > A port assigned by Glusterd for a brick is found to be in use already
>> by
>> > > the brick. Any changes in Glusterd recently which can cause this?
>> > >
>> > > Or is it a test infra problem?
>>
>> This issue is likely to be caused by http://review.gluster.org/11039
>> This patch changes the port allocation that happens for rpc_clnt based
>> connections. Previously, ports allocated where < 1024. With this change,
>> these connections, typically mount process, gluster-nfs server processes
>> etc could end up using ports that bricks are being assigned to.
>>
>> IIUC, the intention of the patch was to make server processes lenient to
>> inbound messages from ports > 1024. If we don't require to use ports >
>> 1024
>> we could leave the port allocation for rpc_clnt connections as before.
>> Alternately, we could reserve the range of ports starting from 49152 for
>> bricks
>> by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is
>> specific to Linux.
>> I'm not aware of how this could be done in NetBSD for instance though.
>>
>
>
> It seems this is exactly whats happening.
>
> I have a question, I get the following data from netstat and grep
>
> tcp0  0 f6be17c0fbf5:1023   f6be17c0fbf5:24007
>  ESTABLISHED 31516/glusterfsd
> tcp0  0 f6be17c0fbf5:49152  f6be17c0fbf5:490
>  ESTABLISHED 31516/glusterfsd
> unix  3  [ ] STREAM CONNECTED 988353
> 31516/glusterfsd
> /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket
>
> Here 31516 is the brick pid.
>
> Looking at the data, line 2 is very clear, it shows connection between
> brick and glusterfs client.
> unix socket on line 3 is also clear, it is the unix socket connection that
> glusterd and brick process use for communication.
>
> I am not able to understand line 1; which part of brick process
> established a tcp connection with glusterd using port 1023?
> Note: this data is from a build which does not have the above mentioned
> patch.
>


The patch which exposed this bug is being reverted till the underlying bug
is also fixed.
You can monitor revert patches here
master: http://review.gluster.org/11507
3.7 branch: http://review.gluster.org/11508

Please rebase your patches after the above patches are merged to ensure
that you patches pass regression.



>
> --
> *Raghavendra Talur *
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Mount hangs because of connection delays

2015-07-02 Thread Pranith Kumar Karampuri

hi,
When glusterfs mount process is coming up all cluster xlators wait 
for at least one event from all the children before propagating the 
status upwards. Sometimes client xlator takes upto 2 minutes to 
propogate this 
event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to 
this xavi implemented timer in ec notify where we treat a child as down 
if it doesn't come up in 10 seconds. Similar patch went up for review 
@http://review.gluster.org/#/c/3 for afr. Kritika raised an 
interesting point in the review that all cluster xlators need to have 
this logic for the mount to not hang, and the correct place to fix it 
would be client xlator itself. i.e. add the timer logic in client 
xlator. Which seems like a better approach. I just want to take inputs 
from everyone before we go ahead in that direction.
i.e. on PARENT_UP in client xlator it will start a timer and if no rpc 
notification is received in that timeout it treats the client xlator as 
down.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Problems when using different hostnames in a bricks and a peer

2015-07-02 Thread Rarylson Freitas
Hi Atin,

You are right!!! I was using the version 3.5 in production. And when I've
checked the Gluster source code, I checked the wrong commit (not the latest
commit in the master branch).

Currently, you've already implemented my the proposed solution. It was done
at the function gd_peerinfo_find_from_addrinfo, file
xlators/mgmt/glusterd/src/glusterd-peer-utils.c.

Thanks for your tip! And sorry for any inconvenience.

--
*Rarylson Freitas*

On Thu, Jul 2, 2015 at 2:01 AM, Atin Mukherjee  wrote:

> Which gluster version are you using? Better peer identification feature
> (available 3.6 onwards) should tackle this problem IMO.
>
> ~Atin
>
> On 07/02/2015 10:05 AM, Rarylson Freitas wrote:
> > Hi,
> >
> > Recently, my company needed to change our hostnames used in the Gluster
> > Pool.
> >
> > In a first moment, we have two Gluster Nodes called storage1 and
> storage2.
> > Our volumes used two bricks: storage1:/MYVOLYME and storage2:/MYVOLUME.
> We
> > put the storage1 and storage2 IPs in the /etc/hosts file of our nodes and
> > in our client servers.
> >
> > After some time, more client servers started to using Gluster and we
> > discovered that using hostnames without domain (using /etc/hosts) in all
> > client servers is a pain in the a$$ :(. So, we decided to change them to
> > something like storage1.mydomain.com and storage2.mydomain.com.
> >
> > Remember that, at this point, we had already some volumes (with bricks):
> >
> > $ gluster volume info MYVOL
> > [...]
> > Brick1: storage1:/MYDIR
> > Brick1: storage2:/MYDIR
> >
> > For simplicity, let's consider that we had two Gluster Nodes, each one
> with
> > the following entries in /etc/hosts:
> >
> > 10.10.10.1  storage1
> > 10.10.10.2  storage2
> >
> > To implement the hostname changes, we've changed the etc hosts file to:
> >
> > 10.10.10.1  storage1 storage1.mydomain.com
> > 10.10.10.2  storage2 storage2.mydomain.com
> >
> > And we've run in storage1:
> >
> > $ gluster peer probe storage2.mydomain.com
> > peer probe: success
> >
> > Everything works well during some time, but the glusterd starts to fail
> > after any reboot:
> >
> > $ service glusterfs-server status
> > glusterfs-server start/running, process 14714
> > $ service glusterfs-server restart
> > glusterfs-server stop/waiting
> > glusterfs-server start/running, process 14860
> > $ service glusterfs-server status
> > glusterfs-server stop/waiting
> >
> > To start the service again, it was necessary to rollback the hostname1
> > config to storage2 in /var/lib/glusterd/peers/OUR_UUID.
> >
> > After some try and error, we discovered that if we change the order of
> the
> > entries in /etc/hosts and repeat the process, everything worked.
> >
> > It is, from:
> >
> > 10.10.10.1  storage1 storage1.mydomain.com
> > 10.10.10.2  storage2 storage2.mydomain.com
> >
> > To:
> >
> > 10.10.10.1  storage1.mydomain.com storage1
> > 10.10.10.2  storage2.mydomain.com storage2
> >
> > And run:
> >
> > gluster peer probe storage2.mydomain.com
> > service glusterfs-server restart
> >
> > So we've checked the Glusterd debug log and checked the GlusterFS source
> > code and discovered that the big secret was the function
> > glusterd_friend_find_by_hostname, in the file
> > xlators/mgmt/glusterd/src/glusterd-utils.c. This function is called for
> > each brick that isn't a local brick and does the following things:
> >
> >- It checks if the brick hostname is equal to some peer hostname;
> >- If it's, this peer is our wanted friend;
> >- If not, it gets the brick IP (resolves the hostname using the
> function
> >getaddrinfo) and checks if the brick IP is equal to the peer hostname;
> >   - It is, we could run gluster peer probe 10.10.10.2. Once the brick
> >   IP (storage2 resolves to 10.10.10.2) would have equal to the peer
> >   "hostname" (10.10.10.2);
> >- If it's, this peer is our wanted friend;
> >- If not, gets the reverse of the brick IP (using the function
> >getnameinfo) and checks if the brick reverse is equal to the peer
> >hostname;
> >   - This is why changing the order of the entries in /etc/hosts
> worked
> >   as an workaround for us;
> >- If not, returns and error (and Glusterd will fail).
> >
> > However, we think that comparing the brick IP (resolving the brick
> > hostname) and the peer IP (resolving the peer hostname) would be a
> simpler
> > and more comprehensive solution. Once both brick and peer will have
> > difference hostnames, but the same IP, it would work.
> >
> > The solution could be:
> >
> >- It checks if the brick hostname is equal to some peer hostname;
> >- If it's, this peer is our wanted friend;
> >- If not, it gets both the brick IP (resolves the hostname using the
> >function getaddrinfo) and the peer IP (resolves the peer hostname)
> and,
> >for each IP pair, check if a brick IP is equal to a peer IP;
> >- If it's, this peer is our wanted friend;
> >- If not, returns and error (and Gluste

Re: [Gluster-devel] Mount hangs because of connection delays

2015-07-02 Thread Xavier Hernandez

I agree that a generic solution for all cluster xlators would be good.

Only question I have is whether parallel notifications are specially 
handled somewhere.


For example, if client xlator sends EC_CHILD_DOWN after a timeout, it's 
possible that an immediate EC_CHILD_UP is sent if the brick is 
connected. In this case, the cluster xlator could receive both 
notifications in any order (we have multi-threading), which is dangerous 
if EC_CHILD_DOWN is processed after EC_CHILD_UP.


I've seen that protocol/client doesn't send one notification until the 
previous one has been completed. However this assumes that there won't 
be any xlator that delays the notification (i.e. sends it in background 
at another moment). Is that a requirement to process notifications ? 
otherwise the concurrent notifications problem could appear even if 
protocol/client serializes them.


Xavi

On 07/02/2015 03:34 PM, Pranith Kumar Karampuri wrote:

hi,
 When glusterfs mount process is coming up all cluster xlators wait
for at least one event from all the children before propagating the
status upwards. Sometimes client xlator takes upto 2 minutes to
propogate this
event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to
this xavi implemented timer in ec notify where we treat a child as down
if it doesn't come up in 10 seconds. Similar patch went up for review
@http://review.gluster.org/#/c/3 for afr. Kritika raised an
interesting point in the review that all cluster xlators need to have
this logic for the mount to not hang, and the correct place to fix it
would be client xlator itself. i.e. add the timer logic in client
xlator. Which seems like a better approach. I just want to take inputs
from everyone before we go ahead in that direction.
i.e. on PARENT_UP in client xlator it will start a timer and if no rpc
notification is received in that timeout it treats the client xlator as
down.

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Prasanna Kalever

This is caused because when bind-insecure is turned on (which is the default 
now), it may happen
that brick is not able to bind to port assigned by Glusterd for example 
49192-49195...
It seems to occur because the rpc_clnt connections are binding to ports in the 
same range. 
so brick fails to bind to a port which is already used by someone else.

This bug already exist before http://review.gluster.org/#/c/11039/ when use 
rdma, i.e. even
previously rdma binds to port >= 1024 if it cannot find a free port < 1024,
even when bind insecure was turned off (ref to commit '0e3fd04e').
Since we don't have tests related to rdma we did not discover this issue 
previously.

http://review.gluster.org/#/c/11039/ discovers the bug we encountered, however 
now the bug can be fixed by
http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers 
from 65535 in a descending
order, as a result port clash is minimized, also it fixes issues in rdma too

Thanks to Raghavendra Talur for help in discovering the real cause


Regards,
Prasanna Kalever



- Original Message -
From: "Raghavendra Talur" 
To: "Krishnan Parthasarathi" 
Cc: "Gluster Devel" 
Sent: Thursday, July 2, 2015 6:45:17 PM
Subject: Re: [Gluster-devel] spurious failures  
tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t



On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur < raghavendra.ta...@gmail.com 
> wrote: 





On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi < kpart...@redhat.com > 
wrote: 



> > 
> > A port assigned by Glusterd for a brick is found to be in use already by 
> > the brick. Any changes in Glusterd recently which can cause this? 
> > 
> > Or is it a test infra problem? 

This issue is likely to be caused by http://review.gluster.org/11039 
This patch changes the port allocation that happens for rpc_clnt based 
connections. Previously, ports allocated where < 1024. With this change, 
these connections, typically mount process, gluster-nfs server processes 
etc could end up using ports that bricks are being assigned to. 

IIUC, the intention of the patch was to make server processes lenient to 
inbound messages from ports > 1024. If we don't require to use ports > 1024 
we could leave the port allocation for rpc_clnt connections as before. 
Alternately, we could reserve the range of ports starting from 49152 for bricks 
by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is specific 
to Linux. 
I'm not aware of how this could be done in NetBSD for instance though. 


It seems this is exactly whats happening. 

I have a question, I get the following data from netstat and grep 

tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd 
tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd 
unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd 
/var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket 

Here 31516 is the brick pid. 

Looking at the data, line 2 is very clear, it shows connection between brick 
and glusterfs client. 
unix socket on line 3 is also clear, it is the unix socket connection that 
glusterd and brick process use for communication. 

I am not able to understand line 1; which part of brick process established a 
tcp connection with glusterd using port 1023? 
Note: this data is from a build which does not have the above mentioned patch. 


The patch which exposed this bug is being reverted till the underlying bug is 
also fixed. 
You can monitor revert patches here 
master: http://review.gluster.org/11507 
3.7 branch: http://review.gluster.org/11508 

Please rebase your patches after the above patches are merged to ensure that 
you patches pass regression. 





-- 
Raghavendra Talur 




___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Problems when using different hostnames in a bricks and a peer

2015-07-02 Thread Atin Mukherjee
Not at all a problem. I am here to help Rarylson :)

-Atin
Sent from one plus one
On Jul 2, 2015 7:23 PM, "Rarylson Freitas"  wrote:

> Hi Atin,
>
> You are right!!! I was using the version 3.5 in production. And when I've
> checked the Gluster source code, I checked the wrong commit (not the latest
> commit in the master branch).
>
> Currently, you've already implemented my the proposed solution. It was
> done at the function gd_peerinfo_find_from_addrinfo, file
> xlators/mgmt/glusterd/src/glusterd-peer-utils.c.
>
> Thanks for your tip! And sorry for any inconvenience.
>
> --
> *Rarylson Freitas*
>
> On Thu, Jul 2, 2015 at 2:01 AM, Atin Mukherjee 
> wrote:
>
>> Which gluster version are you using? Better peer identification feature
>> (available 3.6 onwards) should tackle this problem IMO.
>>
>> ~Atin
>>
>> On 07/02/2015 10:05 AM, Rarylson Freitas wrote:
>> > Hi,
>> >
>> > Recently, my company needed to change our hostnames used in the Gluster
>> > Pool.
>> >
>> > In a first moment, we have two Gluster Nodes called storage1 and
>> storage2.
>> > Our volumes used two bricks: storage1:/MYVOLYME and storage2:/MYVOLUME.
>> We
>> > put the storage1 and storage2 IPs in the /etc/hosts file of our nodes
>> and
>> > in our client servers.
>> >
>> > After some time, more client servers started to using Gluster and we
>> > discovered that using hostnames without domain (using /etc/hosts) in all
>> > client servers is a pain in the a$$ :(. So, we decided to change them to
>> > something like storage1.mydomain.com and storage2.mydomain.com.
>> >
>> > Remember that, at this point, we had already some volumes (with bricks):
>> >
>> > $ gluster volume info MYVOL
>> > [...]
>> > Brick1: storage1:/MYDIR
>> > Brick1: storage2:/MYDIR
>> >
>> > For simplicity, let's consider that we had two Gluster Nodes, each one
>> with
>> > the following entries in /etc/hosts:
>> >
>> > 10.10.10.1  storage1
>> > 10.10.10.2  storage2
>> >
>> > To implement the hostname changes, we've changed the etc hosts file to:
>> >
>> > 10.10.10.1  storage1 storage1.mydomain.com
>> > 10.10.10.2  storage2 storage2.mydomain.com
>> >
>> > And we've run in storage1:
>> >
>> > $ gluster peer probe storage2.mydomain.com
>> > peer probe: success
>> >
>> > Everything works well during some time, but the glusterd starts to fail
>> > after any reboot:
>> >
>> > $ service glusterfs-server status
>> > glusterfs-server start/running, process 14714
>> > $ service glusterfs-server restart
>> > glusterfs-server stop/waiting
>> > glusterfs-server start/running, process 14860
>> > $ service glusterfs-server status
>> > glusterfs-server stop/waiting
>> >
>> > To start the service again, it was necessary to rollback the hostname1
>> > config to storage2 in /var/lib/glusterd/peers/OUR_UUID.
>> >
>> > After some try and error, we discovered that if we change the order of
>> the
>> > entries in /etc/hosts and repeat the process, everything worked.
>> >
>> > It is, from:
>> >
>> > 10.10.10.1  storage1 storage1.mydomain.com
>> > 10.10.10.2  storage2 storage2.mydomain.com
>> >
>> > To:
>> >
>> > 10.10.10.1  storage1.mydomain.com storage1
>> > 10.10.10.2  storage2.mydomain.com storage2
>> >
>> > And run:
>> >
>> > gluster peer probe storage2.mydomain.com
>> > service glusterfs-server restart
>> >
>> > So we've checked the Glusterd debug log and checked the GlusterFS source
>> > code and discovered that the big secret was the function
>> > glusterd_friend_find_by_hostname, in the file
>> > xlators/mgmt/glusterd/src/glusterd-utils.c. This function is called for
>> > each brick that isn't a local brick and does the following things:
>> >
>> >- It checks if the brick hostname is equal to some peer hostname;
>> >- If it's, this peer is our wanted friend;
>> >- If not, it gets the brick IP (resolves the hostname using the
>> function
>> >getaddrinfo) and checks if the brick IP is equal to the peer
>> hostname;
>> >   - It is, we could run gluster peer probe 10.10.10.2. Once the
>> brick
>> >   IP (storage2 resolves to 10.10.10.2) would have equal to the peer
>> >   "hostname" (10.10.10.2);
>> >- If it's, this peer is our wanted friend;
>> >- If not, gets the reverse of the brick IP (using the function
>> >getnameinfo) and checks if the brick reverse is equal to the peer
>> >hostname;
>> >   - This is why changing the order of the entries in /etc/hosts
>> worked
>> >   as an workaround for us;
>> >- If not, returns and error (and Glusterd will fail).
>> >
>> > However, we think that comparing the brick IP (resolving the brick
>> > hostname) and the peer IP (resolving the peer hostname) would be a
>> simpler
>> > and more comprehensive solution. Once both brick and peer will have
>> > difference hostnames, but the same IP, it would work.
>> >
>> > The solution could be:
>> >
>> >- It checks if the brick hostname is equal to some peer hostname;
>> >- If it's, this peer is our wanted friend;
>> >- If not, it gets both the 

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Atin Mukherjee
Thanks Prasanna for the patches :)

-Atin
Sent from one plus one
On Jul 2, 2015 9:19 PM, "Prasanna Kalever"  wrote:

>
> This is caused because when bind-insecure is turned on (which is the
> default now), it may happen
> that brick is not able to bind to port assigned by Glusterd for example
> 49192-49195...
> It seems to occur because the rpc_clnt connections are binding to ports in
> the same range.
> so brick fails to bind to a port which is already used by someone else.
>
> This bug already exist before http://review.gluster.org/#/c/11039/ when
> use rdma, i.e. even
> previously rdma binds to port >= 1024 if it cannot find a free port < 1024,
> even when bind insecure was turned off (ref to commit '0e3fd04e').
> Since we don't have tests related to rdma we did not discover this issue
> previously.
>
> http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
> however now the bug can be fixed by
> http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port
> numbers from 65535 in a descending
> order, as a result port clash is minimized, also it fixes issues in rdma
> too
>
> Thanks to Raghavendra Talur for help in discovering the real cause
>
>
> Regards,
> Prasanna Kalever
>
>
>
> - Original Message -
> From: "Raghavendra Talur" 
> To: "Krishnan Parthasarathi" 
> Cc: "Gluster Devel" 
> Sent: Thursday, July 2, 2015 6:45:17 PM
> Subject: Re: [Gluster-devel] spurious failures
> tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
>
>
>
> On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur <
> raghavendra.ta...@gmail.com > wrote:
>
>
>
>
>
> On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi <
> kpart...@redhat.com > wrote:
>
>
>
> > >
> > > A port assigned by Glusterd for a brick is found to be in use already
> by
> > > the brick. Any changes in Glusterd recently which can cause this?
> > >
> > > Or is it a test infra problem?
>
> This issue is likely to be caused by http://review.gluster.org/11039
> This patch changes the port allocation that happens for rpc_clnt based
> connections. Previously, ports allocated where < 1024. With this change,
> these connections, typically mount process, gluster-nfs server processes
> etc could end up using ports that bricks are being assigned to.
>
> IIUC, the intention of the patch was to make server processes lenient to
> inbound messages from ports > 1024. If we don't require to use ports > 1024
> we could leave the port allocation for rpc_clnt connections as before.
> Alternately, we could reserve the range of ports starting from 49152 for
> bricks
> by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is
> specific to Linux.
> I'm not aware of how this could be done in NetBSD for instance though.
>
>
> It seems this is exactly whats happening.
>
> I have a question, I get the following data from netstat and grep
>
> tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd
> tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd
> unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd
> /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket
>
> Here 31516 is the brick pid.
>
> Looking at the data, line 2 is very clear, it shows connection between
> brick and glusterfs client.
> unix socket on line 3 is also clear, it is the unix socket connection that
> glusterd and brick process use for communication.
>
> I am not able to understand line 1; which part of brick process
> established a tcp connection with glusterd using port 1023?
> Note: this data is from a build which does not have the above mentioned
> patch.
>
>
> The patch which exposed this bug is being reverted till the underlying bug
> is also fixed.
> You can monitor revert patches here
> master: http://review.gluster.org/11507
> 3.7 branch: http://review.gluster.org/11508
>
> Please rebase your patches after the above patches are merged to ensure
> that you patches pass regression.
>
>
>
>
>
> --
> Raghavendra Talur
>
>
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

2015-07-02 Thread Joseph Fernandes
Hi All,

This is the same issue as the previous tiering regression failure.

Volume brick not able to start brick because port is busy 

[2015-07-02 10:20:20.601372]  [run.c:190:runner_log] (--> 
/build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7f05e080bc32] 
(--> /build/install/lib/libglusterfs.so.0(runner_log+0x192)[0x7f05e08754ce] 
(--> 
/build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0xae7)[0x7f05d5c935d7]
 (--> 
/build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_brick_start+0x151)[0x7f05d5c9d4e3]
 (--> 
/build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_op_perform_add_bricks+0x8fe)[0x7f05d5d10661]
 ) 0-: Starting GlusterFS: /build/install/sbin/glusterfsd -s 
slave33.cloud.gluster.org --volfile-id 
patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p 
/var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid
 -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name 
/d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log 
--xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0 
--brick-port 49167 --xlator-option patchy-server.listen-port=49167
[2015-07-02 10:20:20.624297] I [MSGID: 106144] 
[glusterd-pmap.c:269:pmap_registry_remove] 0-pmap: removing brick (null) on 
port 49167
[2015-07-02 10:20:20.625315] E [MSGID: 106005] 
[glusterd-utils.c:4448:glusterd_brick_start] 0-management: Unable to start 
brick slave33.cloud.gluster.org:/d/backends/patchy5
[2015-07-02 10:20:20.625354] E [MSGID: 106074] 
[glusterd-brick-ops.c:2096:glusterd_op_add_brick] 0-glusterd: Unable to add 
bricks
[2015-07-02 10:20:20.625368] E [MSGID: 106123] 
[glusterd-syncop.c:1416:gd_commit_op_phase] 0-management: Commit of operation 
'Volume Add brick' failed on localhost 


Brick Log:

[2015-07-02 10:20:20.608547] I [MSGID: 100030] [glusterfsd.c:2296:main] 
0-/build/install/sbin/glusterfsd: Started running 
/build/install/sbin/glusterfsd version 3.8dev (args: 
/build/install/sbin/glusterfsd -s slave33.cloud.gluster.org --volfile-id 
patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p 
/var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid
 -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name 
/d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log 
--xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0 
--brick-port 49167 --xlator-option patchy-server.listen-port=49167)
[2015-07-02 10:20:20.617113] I [MSGID: 101190] 
[event-epoll.c:627:event_dispatch_epoll_worker] 0-epoll: Started thread with 
index 1
[2015-07-02 10:20:20.623097] I [MSGID: 101173] 
[graph.c:268:gf_add_cmdline_options] 0-patchy-server: adding option 
'listen-port' for volume 'patchy-server' with value '49167'
[2015-07-02 10:20:20.623135] I [MSGID: 101173] 
[graph.c:268:gf_add_cmdline_options] 0-patchy-posix: adding option 
'glusterd-uuid' for volume 'patchy-posix' with value 
'da011de8-9103-4cf2-9f4b-03707d0019d0'
[2015-07-02 10:20:20.623358] I [MSGID: 115034] 
[server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format check 
for non-addr auth option auth.login./d/backends/patchy5.allow
[2015-07-02 10:20:20.623374] I [MSGID: 115034] 
[server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format check 
for non-addr auth option 
auth.login.96bcb872-559b-4f19-84ad-a735dc6068f6.password
[2015-07-02 10:20:20.623568] I [rpcsvc.c:2210:rpcsvc_set_outstanding_rpc_limit] 
0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2015-07-02 10:20:20.623633] W [MSGID: 101002] [options.c:952:xl_opt_validate] 
0-patchy-server: option 'listen-port' is deprecated, preferred is 
'transport.socket.listen-port', continuing with correction
[2015-07-02 10:20:20.623707] E [socket.c:818:__socket_server_bind] 
0-tcp.patchy-server: binding to  failed: Address already in use
[2015-07-02 10:20:20.623720] E [socket.c:821:__socket_server_bind] 
0-tcp.patchy-server: Port is already in use
[2015-07-02 10:20:20.623746] W [rpcsvc.c:1599:rpcsvc_transport_create] 
0-rpc-service: listening on transport failed
[2015-07-02 10:20:20.623758] W [MSGID: 115045] [server.c:998:init] 
0-patchy-server: creation of listener failed
[2015-07-02 10:20:20.623772] E [MSGID: 101019] [xlator.c:423:xlator_init] 
0-patchy-server: Initialization of volume 'patchy-server' failed, review your 
volfile again
[2015-07-02 10:20:20.623783] E [MSGID: 101066] 
[graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator 
failed
[2015-07-02 10:20:20.623792] E [MSGID: 101176] 
[graph.c:669:glusterfs_graph_activate] 0-graph: init failed
[2015-07-02 10:20:20.624203] W [glusterfsd.c:1214:cleanup_and_exit] (--> 0-: 
received signum (0), shutting down


Regards,
Joe


- Original Message -
From: "Pranith Kumar Karampuri" 
To: "Dan Lambright" 
Cc: "Gluster Devel" , "Joseph Fernandes" 

Sent: Thursday, July 2, 2015 6:16:44 P

Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

2015-07-02 Thread Atin Mukherjee
Joe,

Please refer to Prasanna's mail. He has uploaded a patch to solve it.

-Atin
Sent from one plus one
On Jul 2, 2015 9:42 PM, "Joseph Fernandes"  wrote:

> Hi All,
>
> This is the same issue as the previous tiering regression failure.
>
> Volume brick not able to start brick because port is busy
>
> [2015-07-02 10:20:20.601372]  [run.c:190:runner_log] (-->
> /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7f05e080bc32]
> (--> /build/install/lib/libglusterfs.so.0(runner_log+0x192)[0x7f05e08754ce]
> (-->
> /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0xae7)[0x7f05d5c935d7]
> (-->
> /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_brick_start+0x151)[0x7f05d5c9d4e3]
> (-->
> /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_op_perform_add_bricks+0x8fe)[0x7f05d5d10661]
> ) 0-: Starting GlusterFS: /build/install/sbin/glusterfsd -s
> slave33.cloud.gluster.org --volfile-id
> patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p
> /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid
> -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name
> /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log
> --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0
> --brick-port 49167 --xlator-option patchy-server.listen-port=49167
> [2015-07-02 10:20:20.624297] I [MSGID: 106144]
> [glusterd-pmap.c:269:pmap_registry_remove] 0-pmap: removing brick (null) on
> port 49167
> [2015-07-02 10:20:20.625315] E [MSGID: 106005]
> [glusterd-utils.c:4448:glusterd_brick_start] 0-management: Unable to start
> brick slave33.cloud.gluster.org:/d/backends/patchy5
> [2015-07-02 10:20:20.625354] E [MSGID: 106074]
> [glusterd-brick-ops.c:2096:glusterd_op_add_brick] 0-glusterd: Unable to add
> bricks
> [2015-07-02 10:20:20.625368] E [MSGID: 106123]
> [glusterd-syncop.c:1416:gd_commit_op_phase] 0-management: Commit of
> operation 'Volume Add brick' failed on localhost
>
>
> Brick Log:
>
> [2015-07-02 10:20:20.608547] I [MSGID: 100030] [glusterfsd.c:2296:main]
> 0-/build/install/sbin/glusterfsd: Started running
> /build/install/sbin/glusterfsd version 3.8dev (args:
> /build/install/sbin/glusterfsd -s slave33.cloud.gluster.org --volfile-id
> patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p
> /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid
> -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name
> /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log
> --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0
> --brick-port 49167 --xlator-option patchy-server.listen-port=49167)
> [2015-07-02 10:20:20.617113] I [MSGID: 101190]
> [event-epoll.c:627:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2015-07-02 10:20:20.623097] I [MSGID: 101173]
> [graph.c:268:gf_add_cmdline_options] 0-patchy-server: adding option
> 'listen-port' for volume 'patchy-server' with value '49167'
> [2015-07-02 10:20:20.623135] I [MSGID: 101173]
> [graph.c:268:gf_add_cmdline_options] 0-patchy-posix: adding option
> 'glusterd-uuid' for volume 'patchy-posix' with value
> 'da011de8-9103-4cf2-9f4b-03707d0019d0'
> [2015-07-02 10:20:20.623358] I [MSGID: 115034]
> [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format
> check for non-addr auth option auth.login./d/backends/patchy5.allow
> [2015-07-02 10:20:20.623374] I [MSGID: 115034]
> [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format
> check for non-addr auth option
> auth.login.96bcb872-559b-4f19-84ad-a735dc6068f6.password
> [2015-07-02 10:20:20.623568] I
> [rpcsvc.c:2210:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured
> rpc.outstanding-rpc-limit with value 64
> [2015-07-02 10:20:20.623633] W [MSGID: 101002]
> [options.c:952:xl_opt_validate] 0-patchy-server: option 'listen-port' is
> deprecated, preferred is 'transport.socket.listen-port', continuing with
> correction
> [2015-07-02 10:20:20.623707] E [socket.c:818:__socket_server_bind]
> 0-tcp.patchy-server: binding to  failed: Address already in use
> [2015-07-02 10:20:20.623720] E [socket.c:821:__socket_server_bind]
> 0-tcp.patchy-server: Port is already in use
> [2015-07-02 10:20:20.623746] W [rpcsvc.c:1599:rpcsvc_transport_create]
> 0-rpc-service: listening on transport failed
> [2015-07-02 10:20:20.623758] W [MSGID: 115045] [server.c:998:init]
> 0-patchy-server: creation of listener failed
> [2015-07-02 10:20:20.623772] E [MSGID: 101019] [xlator.c:423:xlator_init]
> 0-patchy-server: Initialization of volume 'patchy-server' failed, review
> your volfile again
> [2015-07-02 10:20:20.623783] E [MSGID: 101066]
> [graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator
> failed
> [2015-07-02 10:20:20.623792] E [MSGID: 101176]
> [graph.c:669:glusterfs_graph_activate] 0-graph: init failed
> [2015-07-02 10:20:20.

Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

2015-07-02 Thread Joseph Fernandes
Yep.. Thanks Guys.

- Original Message -
From: "Atin Mukherjee" 
To: "Joseph Fernandes" 
Cc: kpart...@redhat.com, "Atin Mukherjee" , "Gluster 
Devel" 
Sent: Thursday, July 2, 2015 9:45:01 PM
Subject: Re: [Gluster-devel] Failure in 
tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

Joe,

Please refer to Prasanna's mail. He has uploaded a patch to solve it.

-Atin
Sent from one plus one
On Jul 2, 2015 9:42 PM, "Joseph Fernandes"  wrote:

> Hi All,
>
> This is the same issue as the previous tiering regression failure.
>
> Volume brick not able to start brick because port is busy
>
> [2015-07-02 10:20:20.601372]  [run.c:190:runner_log] (-->
> /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7f05e080bc32]
> (--> /build/install/lib/libglusterfs.so.0(runner_log+0x192)[0x7f05e08754ce]
> (-->
> /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0xae7)[0x7f05d5c935d7]
> (-->
> /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_brick_start+0x151)[0x7f05d5c9d4e3]
> (-->
> /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_op_perform_add_bricks+0x8fe)[0x7f05d5d10661]
> ) 0-: Starting GlusterFS: /build/install/sbin/glusterfsd -s
> slave33.cloud.gluster.org --volfile-id
> patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p
> /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid
> -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name
> /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log
> --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0
> --brick-port 49167 --xlator-option patchy-server.listen-port=49167
> [2015-07-02 10:20:20.624297] I [MSGID: 106144]
> [glusterd-pmap.c:269:pmap_registry_remove] 0-pmap: removing brick (null) on
> port 49167
> [2015-07-02 10:20:20.625315] E [MSGID: 106005]
> [glusterd-utils.c:4448:glusterd_brick_start] 0-management: Unable to start
> brick slave33.cloud.gluster.org:/d/backends/patchy5
> [2015-07-02 10:20:20.625354] E [MSGID: 106074]
> [glusterd-brick-ops.c:2096:glusterd_op_add_brick] 0-glusterd: Unable to add
> bricks
> [2015-07-02 10:20:20.625368] E [MSGID: 106123]
> [glusterd-syncop.c:1416:gd_commit_op_phase] 0-management: Commit of
> operation 'Volume Add brick' failed on localhost
>
>
> Brick Log:
>
> [2015-07-02 10:20:20.608547] I [MSGID: 100030] [glusterfsd.c:2296:main]
> 0-/build/install/sbin/glusterfsd: Started running
> /build/install/sbin/glusterfsd version 3.8dev (args:
> /build/install/sbin/glusterfsd -s slave33.cloud.gluster.org --volfile-id
> patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p
> /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid
> -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name
> /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log
> --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0
> --brick-port 49167 --xlator-option patchy-server.listen-port=49167)
> [2015-07-02 10:20:20.617113] I [MSGID: 101190]
> [event-epoll.c:627:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1
> [2015-07-02 10:20:20.623097] I [MSGID: 101173]
> [graph.c:268:gf_add_cmdline_options] 0-patchy-server: adding option
> 'listen-port' for volume 'patchy-server' with value '49167'
> [2015-07-02 10:20:20.623135] I [MSGID: 101173]
> [graph.c:268:gf_add_cmdline_options] 0-patchy-posix: adding option
> 'glusterd-uuid' for volume 'patchy-posix' with value
> 'da011de8-9103-4cf2-9f4b-03707d0019d0'
> [2015-07-02 10:20:20.623358] I [MSGID: 115034]
> [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format
> check for non-addr auth option auth.login./d/backends/patchy5.allow
> [2015-07-02 10:20:20.623374] I [MSGID: 115034]
> [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format
> check for non-addr auth option
> auth.login.96bcb872-559b-4f19-84ad-a735dc6068f6.password
> [2015-07-02 10:20:20.623568] I
> [rpcsvc.c:2210:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured
> rpc.outstanding-rpc-limit with value 64
> [2015-07-02 10:20:20.623633] W [MSGID: 101002]
> [options.c:952:xl_opt_validate] 0-patchy-server: option 'listen-port' is
> deprecated, preferred is 'transport.socket.listen-port', continuing with
> correction
> [2015-07-02 10:20:20.623707] E [socket.c:818:__socket_server_bind]
> 0-tcp.patchy-server: binding to  failed: Address already in use
> [2015-07-02 10:20:20.623720] E [socket.c:821:__socket_server_bind]
> 0-tcp.patchy-server: Port is already in use
> [2015-07-02 10:20:20.623746] W [rpcsvc.c:1599:rpcsvc_transport_create]
> 0-rpc-service: listening on transport failed
> [2015-07-02 10:20:20.623758] W [MSGID: 115045] [server.c:998:init]
> 0-patchy-server: creation of listener failed
> [2015-07-02 10:20:20.623772] E [MSGID: 101019] [xlator.c:423:xlator_init]
> 0-patchy-server: Initialization of volume 'patchy-serv

Re: [Gluster-devel] Mount hangs because of connection delays

2015-07-02 Thread Shyam

Pranith,

I understand the bug and a more generic layer solution would be 
desirable and apt, rather than repeating things at each xlator.


However, I am always confused about notifications and its processing, so 
cannot state with conviction that this is fine and will work elegantly. 
Will leave others to chime in with the same.


Shyam

On 07/02/2015 09:34 AM, Pranith Kumar Karampuri wrote:

hi,
 When glusterfs mount process is coming up all cluster xlators wait
for at least one event from all the children before propagating the
status upwards. Sometimes client xlator takes upto 2 minutes to
propogate this
event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to
this xavi implemented timer in ec notify where we treat a child as down
if it doesn't come up in 10 seconds. Similar patch went up for review
@http://review.gluster.org/#/c/3 for afr. Kritika raised an
interesting point in the review that all cluster xlators need to have
this logic for the mount to not hang, and the correct place to fix it
would be client xlator itself. i.e. add the timer logic in client
xlator. Which seems like a better approach. I just want to take inputs
from everyone before we go ahead in that direction.
i.e. on PARENT_UP in client xlator it will start a timer and if no rpc
notification is received in that timeout it treats the client xlator as
down.

Pranith

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Mount hangs because of connection delays

2015-07-02 Thread Ravishankar N



On 07/02/2015 07:04 PM, Pranith Kumar Karampuri wrote:

hi,
When glusterfs mount process is coming up all cluster xlators wait 
for at least one event from all the children before propagating the 
status upwards. Sometimes client xlator takes upto 2 minutes to 
propogate this 
event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to 
this xavi implemented timer in ec notify where we treat a child as 
down if it doesn't come up in 10 seconds. Similar patch went up for 
review @http://review.gluster.org/#/c/3 for afr. Kritika raised an 
interesting point in the review that all cluster xlators need to have 
this logic for the mount to not hang, and the correct place to fix it 
would be client xlator itself. i.e. add the timer logic in client 
xlator. Which seems like a better approach.


I think it makes sense to handle the change only in relevant cluster 
xlators like AFR/EC because of the notion of high availability 
associated with them. In my limited understanding, protocol-client is 
the originator (?) of the child up/down events. While it looks okay to 
allow cluster xlators to take certain decisions because the 'originator' 
did not respond within a specific time, altering the originator itself 
without giving a chance to the upper xlators to make choices seems 
incorrect to me.  Perhaps I'm wrong, but setting an unconditional 10 
second timer on protocol/client seems to beat the purpose of having a 
configurable `network.ping-timeout` volume set option.


Just my two cents. :)


I just want to take inputs from everyone before we go ahead in that 
direction.
i.e. on PARENT_UP in client xlator it will start a timer and if no rpc 
notification is received in that timeout it treats the client xlator 
as down.


Pranith


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t

2015-07-02 Thread Raghavendra Gowdappa
I've reverted [1] which brought the change allow-insecure to be on by default. 
The patch seems to have issues which will be addressed and merged later. The 
revert can be found at [2].

[1] http://review.gluster.org/11274
[2] http://review.gluster.org/11507

Please let me know if the regressions are still failing.

regards,
Raghavendra.

- Original Message -
> From: "Joseph Fernandes" 
> To: "Atin Mukherjee" 
> Cc: "Gluster Devel" 
> Sent: Thursday, July 2, 2015 9:49:16 PM
> Subject: Re: [Gluster-devel] Failure in 
> tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
> 
> Yep.. Thanks Guys.
> 
> - Original Message -
> From: "Atin Mukherjee" 
> To: "Joseph Fernandes" 
> Cc: kpart...@redhat.com, "Atin Mukherjee" , "Gluster
> Devel" 
> Sent: Thursday, July 2, 2015 9:45:01 PM
> Subject: Re: [Gluster-devel] Failure in
> tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
> 
> Joe,
> 
> Please refer to Prasanna's mail. He has uploaded a patch to solve it.
> 
> -Atin
> Sent from one plus one
> On Jul 2, 2015 9:42 PM, "Joseph Fernandes"  wrote:
> 
> > Hi All,
> >
> > This is the same issue as the previous tiering regression failure.
> >
> > Volume brick not able to start brick because port is busy
> >
> > [2015-07-02 10:20:20.601372]  [run.c:190:runner_log] (-->
> > /build/install/lib/libglusterfs.so.0(_gf_log_callingfn+0x240)[0x7f05e080bc32]
> > (--> /build/install/lib/libglusterfs.so.0(runner_log+0x192)[0x7f05e08754ce]
> > (-->
> > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_volume_start_glusterfs+0xae7)[0x7f05d5c935d7]
> > (-->
> > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_brick_start+0x151)[0x7f05d5c9d4e3]
> > (-->
> > /build/install/lib/glusterfs/3.8dev/xlator/mgmt/glusterd.so(glusterd_op_perform_add_bricks+0x8fe)[0x7f05d5d10661]
> > ) 0-: Starting GlusterFS: /build/install/sbin/glusterfsd -s
> > slave33.cloud.gluster.org --volfile-id
> > patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p
> > /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid
> > -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name
> > /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log
> > --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0
> > --brick-port 49167 --xlator-option patchy-server.listen-port=49167
> > [2015-07-02 10:20:20.624297] I [MSGID: 106144]
> > [glusterd-pmap.c:269:pmap_registry_remove] 0-pmap: removing brick (null) on
> > port 49167
> > [2015-07-02 10:20:20.625315] E [MSGID: 106005]
> > [glusterd-utils.c:4448:glusterd_brick_start] 0-management: Unable to start
> > brick slave33.cloud.gluster.org:/d/backends/patchy5
> > [2015-07-02 10:20:20.625354] E [MSGID: 106074]
> > [glusterd-brick-ops.c:2096:glusterd_op_add_brick] 0-glusterd: Unable to add
> > bricks
> > [2015-07-02 10:20:20.625368] E [MSGID: 106123]
> > [glusterd-syncop.c:1416:gd_commit_op_phase] 0-management: Commit of
> > operation 'Volume Add brick' failed on localhost
> >
> >
> > Brick Log:
> >
> > [2015-07-02 10:20:20.608547] I [MSGID: 100030] [glusterfsd.c:2296:main]
> > 0-/build/install/sbin/glusterfsd: Started running
> > /build/install/sbin/glusterfsd version 3.8dev (args:
> > /build/install/sbin/glusterfsd -s slave33.cloud.gluster.org --volfile-id
> > patchy.slave33.cloud.gluster.org.d-backends-patchy5 -p
> > /var/lib/glusterd/vols/patchy/run/slave33.cloud.gluster.org-d-backends-patchy5.pid
> > -S /var/run/gluster/ca5f5a89aa3a24f0a54852590ab82ad5.socket --brick-name
> > /d/backends/patchy5 -l /var/log/glusterfs/bricks/d-backends-patchy5.log
> > --xlator-option *-posix.glusterd-uuid=da011de8-9103-4cf2-9f4b-03707d0019d0
> > --brick-port 49167 --xlator-option patchy-server.listen-port=49167)
> > [2015-07-02 10:20:20.617113] I [MSGID: 101190]
> > [event-epoll.c:627:event_dispatch_epoll_worker] 0-epoll: Started thread
> > with index 1
> > [2015-07-02 10:20:20.623097] I [MSGID: 101173]
> > [graph.c:268:gf_add_cmdline_options] 0-patchy-server: adding option
> > 'listen-port' for volume 'patchy-server' with value '49167'
> > [2015-07-02 10:20:20.623135] I [MSGID: 101173]
> > [graph.c:268:gf_add_cmdline_options] 0-patchy-posix: adding option
> > 'glusterd-uuid' for volume 'patchy-posix' with value
> > 'da011de8-9103-4cf2-9f4b-03707d0019d0'
> > [2015-07-02 10:20:20.623358] I [MSGID: 115034]
> > [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format
> > check for non-addr auth option auth.login./d/backends/patchy5.allow
> > [2015-07-02 10:20:20.623374] I [MSGID: 115034]
> > [server.c:392:_check_for_auth_option] 0-/d/backends/patchy5: skip format
> > check for non-addr auth option
> > auth.login.96bcb872-559b-4f19-84ad-a735dc6068f6.password
> > [2015-07-02 10:20:20.623568] I
> > [rpcsvc.c:2210:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured
> > rpc.outstanding-rpc-limit with value 64
> > [2015-07-02 10:20:20.623633] W [MSGID: 101002]
>

Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Raghavendra Gowdappa
I've reverted [1] which brought the change allow-insecure to be on by default. 
The patch seems to have issues which will be addressed and merged later. The 
revert can be found at [2].

[1] http://review.gluster.org/11274
[2] http://review.gluster.org/11507

Please let me know if the regressions are still failing.

regards,
Raghavendra.


- Original Message -
> From: "Atin Mukherjee" 
> To: "Prasanna Kalever" 
> Cc: "Gluster Devel" 
> Sent: Thursday, July 2, 2015 9:41:33 PM
> Subject: Re: [Gluster-devel] spurious failures
> tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
> 
> 
> 
> Thanks Prasanna for the patches :)
> 
> -Atin
> Sent from one plus one
> On Jul 2, 2015 9:19 PM, "Prasanna Kalever" < pkale...@redhat.com > wrote:
> 
> 
> 
> This is caused because when bind-insecure is turned on (which is the default
> now), it may happen
> that brick is not able to bind to port assigned by Glusterd for example
> 49192-49195...
> It seems to occur because the rpc_clnt connections are binding to ports in
> the same range.
> so brick fails to bind to a port which is already used by someone else.
> 
> This bug already exist before http://review.gluster.org/#/c/11039/ when use
> rdma, i.e. even
> previously rdma binds to port >= 1024 if it cannot find a free port < 1024,
> even when bind insecure was turned off (ref to commit '0e3fd04e').
> Since we don't have tests related to rdma we did not discover this issue
> previously.
> 
> http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
> however now the bug can be fixed by
> http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers
> from 65535 in a descending
> order, as a result port clash is minimized, also it fixes issues in rdma too
> 
> Thanks to Raghavendra Talur for help in discovering the real cause
> 
> 
> Regards,
> Prasanna Kalever
> 
> 
> 
> - Original Message -
> From: "Raghavendra Talur" < raghavendra.ta...@gmail.com >
> To: "Krishnan Parthasarathi" < kpart...@redhat.com >
> Cc: "Gluster Devel" < gluster-devel@gluster.org >
> Sent: Thursday, July 2, 2015 6:45:17 PM
> Subject: Re: [Gluster-devel] spurious failures
> tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
> 
> 
> 
> On Thu, Jul 2, 2015 at 4:40 PM, Raghavendra Talur <
> raghavendra.ta...@gmail.com > wrote:
> 
> 
> 
> 
> 
> On Thu, Jul 2, 2015 at 10:52 AM, Krishnan Parthasarathi < kpart...@redhat.com
> > wrote:
> 
> 
> 
> > > 
> > > A port assigned by Glusterd for a brick is found to be in use already by
> > > the brick. Any changes in Glusterd recently which can cause this?
> > > 
> > > Or is it a test infra problem?
> 
> This issue is likely to be caused by http://review.gluster.org/11039
> This patch changes the port allocation that happens for rpc_clnt based
> connections. Previously, ports allocated where < 1024. With this change,
> these connections, typically mount process, gluster-nfs server processes
> etc could end up using ports that bricks are being assigned to.
> 
> IIUC, the intention of the patch was to make server processes lenient to
> inbound messages from ports > 1024. If we don't require to use ports > 1024
> we could leave the port allocation for rpc_clnt connections as before.
> Alternately, we could reserve the range of ports starting from 49152 for
> bricks
> by setting net.ipv4.ip_local_reserved_ports using sysctl(8). This is specific
> to Linux.
> I'm not aware of how this could be done in NetBSD for instance though.
> 
> 
> It seems this is exactly whats happening.
> 
> I have a question, I get the following data from netstat and grep
> 
> tcp 0 0 f6be17c0fbf5:1023 f6be17c0fbf5:24007 ESTABLISHED 31516/glusterfsd
> tcp 0 0 f6be17c0fbf5:49152 f6be17c0fbf5:490 ESTABLISHED 31516/glusterfsd
> unix 3 [ ] STREAM CONNECTED 988353 31516/glusterfsd
> /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket
> 
> Here 31516 is the brick pid.
> 
> Looking at the data, line 2 is very clear, it shows connection between brick
> and glusterfs client.
> unix socket on line 3 is also clear, it is the unix socket connection that
> glusterd and brick process use for communication.
> 
> I am not able to understand line 1; which part of brick process established a
> tcp connection with glusterd using port 1023?
> Note: this data is from a build which does not have the above mentioned
> patch.
> 
> 
> The patch which exposed this bug is being reverted till the underlying bug is
> also fixed.
> You can monitor revert patches here
> master: http://review.gluster.org/11507
> 3.7 branch: http://review.gluster.org/11508
> 
> Please rebase your patches after the above patches are merged to ensure that
> you patches pass regression.
> 
> 
> 
> 
> 
> --
> Raghavendra Talur
> 
> 
> 
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> ___
> Gluster-devel mailing list
> Glu

Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t

2015-07-02 Thread Raghavendra Gowdappa
Thanks Raghavendra Talur for root causing the issue. I've reverted the patch 
you pointed out.

- Original Message -
> From: "Raghavendra Talur" 
> To: "Nithya Balachandran" 
> Cc: "Gluster Devel" 
> Sent: Thursday, July 2, 2015 6:44:22 PM
> Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> 
> 
> 
> On Thu, Jul 2, 2015 at 1:26 PM, Nithya Balachandran < nbala...@redhat.com >
> wrote:
> 
> 
> 
> 
> - Original Message -
> > From: "Kotresh Hiremath Ravishankar" < khire...@redhat.com >
> > To: "Susant Palai" < spa...@redhat.com >
> > Cc: "Gluster Devel" < gluster-devel@gluster.org >
> > Sent: Thursday, July 2, 2015 1:03:18 PM
> > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> > 
> > Comments inline.
> > 
> > Thanks and Regards,
> > Kotresh H R
> > 
> > - Original Message -
> > > From: "Susant Palai" < spa...@redhat.com >
> > > To: "Sachin Pandit" < span...@redhat.com >
> > > Cc: "Kotresh Hiremath Ravishankar" < khire...@redhat.com >, "Gluster
> > > Devel"
> > > < gluster-devel@gluster.org >
> > > Sent: Thursday, July 2, 2015 12:35:08 PM
> > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> > > 
> > > Comments inline.
> > > 
> > > - Original Message -
> > > > From: "Sachin Pandit" < span...@redhat.com >
> > > > To: "Kotresh Hiremath Ravishankar" < khire...@redhat.com >
> > > > Cc: "Gluster Devel" < gluster-devel@gluster.org >
> > > > Sent: Thursday, July 2, 2015 12:21:44 PM
> > > > Subject: Re: [Gluster-devel] Regression Failure: ./tests/basic/quota.t
> > > > 
> > > > - Original Message -
> > > > > From: "Vijaikumar M" < vmall...@redhat.com >
> > > > > To: "Kotresh Hiremath Ravishankar" < khire...@redhat.com >, "Gluster
> > > > > Devel"
> > > > > < gluster-devel@gluster.org >
> > > > > Cc: "Sachin Pandit" < span...@redhat.com >
> > > > > Sent: Thursd, TOTAL CHANGELOGS: 106
> > [2015-07-02 07:01:06.883504] E
> > [gf-history-changelog.c:877:gf_history_changelog] 0-gfchangelog: wrong
> > result for start: 1435818ay, July 2, 2015 12:01:03 PM
> > > > > Subject: Re: Regression Failure: ./tests/basic/quota.t
> > > > > 
> > > > > We look into this issue
> > > > > 
> > > > > Thanks,
> > > > > Vijay
> > > > > 
> > > > > On Thursday 02 July 2015 11:46 AM, Kotresh Hiremath Ravishankar
> > > > > wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I see quota.t regression failure for the following. The changes are
> > > > > > related
> > > > > > to
> > > > > > example programs in libgfchangelog.
> > > > > > 
> > > > > > http://build.gluster.org/job/rackspace-regression-2GB-triggered/11785/consoleFull
> > > > > > 
> > > > > > Could someone from quota team, take a look at it.
> > > > 
> > > > Hi,
> > > > 
> > > > I had a quick look at this. It looks like the following test case
> > > > failed
> > > > 
> > > > TEST $CLI volume add-brick $V0 $H0:$B0/brick{3,4}
> > > > EXPECT_WITHIN $REBALANCE_TIMEOUT "0" rebalance_completed
> > > > 
> 
> 
> 
> Looks like the same "port in use" issue. From the d-backends-brick3.log:
> 
> 
> [2015-07-01 09:27:17.821430] E [socket.c:818:__socket_server_bind]
> 0-tcp.patchy-server: binding to failed: Address already in use
> [2015-07-01 09:27:17.821441] E [socket.c:821:__socket_server_bind]
> 0-tcp.patchy-server: Port is already in use
> [2015-07-01 09:27:17.821452] W [rpcsvc.c:1599:rpcsvc_transport_create]
> 0-rpc-service: listening on transport failed
> [2015-07-01 09:27:17.821462] W [MSGID: 115045] [server.c:996:init]
> 0-patchy-server: creation of listener failed
> [2015-07-01 09:27:17.821475] E [MSGID: 101019] [xlator.c:423:xlator_init]
> 0-patchy-server: Initialization of volume 'patchy-server' failed, review
> your volfile again
> [2015-07-01 09:27:17.821485] E [MSGID: 101066]
> [graph.c:323:glusterfs_graph_init] 0-patchy-server: initializing translator
> failed
> [2015-07-01 09:27:17.821495] E [MSGID: 101176]
> [graph.c:669:glusterfs_graph_activate] 0-graph: init failed
> [2015-07-01 09:27:17.821891] W [glusterfsd.c:1214:cleanup_and_exit] (--> 0-:
> received signum (0), shutting down
> 
> 
> The patch which exposed this bug is being reverted till the underlying bug is
> also fixed.
> You can monitor revert patches here
> master: http://review.gluster.org/11507
> 3.7 branch: http://review.gluster.org/11508
> 
> Please rebase your patches after the above patches are merged to ensure that
> you patches pass regression.
> 
> 
> 
> 
> 
> > > > 
> > > > I looked at the logs too, and found out the following errors
> > > > 
> > > > patchy-rebalance.log:[2015-07-01 09:27:23.040756] E [MSGID: 109026]
> > > > [dht-rebalance.c:2689:gf_defrag_start_crawl] 0-patchy-dht: fix layout
> > > > on
> > > > /
> > > > failed
> > > > build-install-etc-glusterfs-glusterd.vol.log:[2015-07-01
> > > > 09:27:23.040998]
> > > > E
> > > > [MSGID: 106224]
> > > > [glusterd-rebalance.c:960:glusterd_defrag_event_notify_handle]
> > > > 0-management:
> > > > Failed to update status
> > > > StartMigrationDuringRebalanceT

Re: [Gluster-devel] Gluster and GCC 5.1

2015-07-02 Thread Jeff Darcy
> Or perhaps we could just get everyone to stop using 'inline'

I agree that it would be a good thing to reduce/modify our use of
'inline' significantly.  Any advantage gained from avoiding normal
function-all entry/exit has to be weighed against cache pollution from
having the same code repeated over and over each place the function is
invoked.  Careful use of 'extern inline' can be good for performance
and/or readability once in a great while, but IMO we should avoid
'inline' except in cases where the benefits are *proven*.

On a similar note, can we please please please get people to stop
abusing macros so much?  I get that they're sometimes useful to get
around C's lack of generics or other features, but many of our long
complicated macros have no such justification.  These pollute the cache
just like 'inline' does, plus they make code harder to debug or edit.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster and GCC 5.1

2015-07-02 Thread Joseph Fernandes
Agree with Jeff
But now the question is should we have this patch go through or leave it to 
float in time and space :)
http://review.gluster.org/#/c/11214/

The patch just makes the existing inline calls static or extern appropriately 
without causing any harm to the existing
code but removes the risk of undefined symbols for normal inline functions in 
gcc 5 and above!

- Original Message -
From: "Jeff Darcy" 
To: "Kaleb S. KEITHLEY" 
Cc: "gluster-infra" , "Gluster Devel" 

Sent: Friday, July 3, 2015 6:05:14 AM
Subject: Re: [Gluster-devel] Gluster and GCC 5.1

> Or perhaps we could just get everyone to stop using 'inline'

I agree that it would be a good thing to reduce/modify our use of
'inline' significantly.  Any advantage gained from avoiding normal
function-all entry/exit has to be weighed against cache pollution from
having the same code repeated over and over each place the function is
invoked.  Careful use of 'extern inline' can be good for performance
and/or readability once in a great while, but IMO we should avoid
'inline' except in cases where the benefits are *proven*.

On a similar note, can we please please please get people to stop
abusing macros so much?  I get that they're sometimes useful to get
around C's lack of generics or other features, but many of our long
complicated macros have no such justification.  These pollute the cache
just like 'inline' does, plus they make code harder to debug or edit.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Gluster and GCC 5.1

2015-07-02 Thread Peter Portante
On Jul 2, 2015, at 8:35 PM, Jeff Darcy  wrote:

>> Or perhaps we could just get everyone to stop using 'inline'
>
> I agree that it would be a good thing to reduce/modify our use of
> 'inline' significantly.  Any advantage gained from avoiding normal
> function-all entry/exit has to be weighed against cache pollution from
> having the same code repeated over and over each place the function is
> invoked.  Careful use of 'extern inline' can be good for performance
> and/or readability once in a great while, but IMO we should avoid
> 'inline' except in cases where the benefits are *proven*.
>
> On a similar note, can we please please please get people to stop
> abusing macros so much?  I get that they're sometimes useful to get
> around C's lack of generics or other features, but many of our long
> complicated macros have no such justification.  These pollute the cache
> just like 'inline' does, plus they make code harder to debug or edit.

And in the past, if not now, are contributing factors to small file
performance issues.

-peter

> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Krishnan Parthasarathi
 
> It seems this is exactly whats happening.
> 
> I have a question, I get the following data from netstat and grep
> 
> tcp0  0 f6be17c0fbf5:1023   f6be17c0fbf5:24007
>  ESTABLISHED 31516/glusterfsd
> tcp0  0 f6be17c0fbf5:49152  f6be17c0fbf5:490
>  ESTABLISHED 31516/glusterfsd
> unix  3  [ ] STREAM CONNECTED 988353   31516/glusterfsd
> /var/run/gluster/4878d6e905c5f6032140a00cc584df8a.socket
> 
> Here 31516 is the brick pid.
> 
> Looking at the data, line 2 is very clear, it shows connection between
> brick and glusterfs client.
> unix socket on line 3 is also clear, it is the unix socket connection that
> glusterd and brick process use for communication.
> 
> I am not able to understand line 1; which part of brick process established
> a tcp connection with glusterd using port 1023?

This is the rpc connection from any glusterfs(d) process to glusterd to fetch
volfile on receiving notification from glusterd.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Krishnan Parthasarathi


- Original Message -
> 
> This is caused because when bind-insecure is turned on (which is the default
> now), it may happen
> that brick is not able to bind to port assigned by Glusterd for example
> 49192-49195...
> It seems to occur because the rpc_clnt connections are binding to ports in
> the same range.
> so brick fails to bind to a port which is already used by someone else.
> 
> This bug already exist before http://review.gluster.org/#/c/11039/ when use
> rdma, i.e. even
> previously rdma binds to port >= 1024 if it cannot find a free port < 1024,
> even when bind insecure was turned off (ref to commit '0e3fd04e').
> Since we don't have tests related to rdma we did not discover this issue
> previously.
> 
> http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
> however now the bug can be fixed by
> http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers
> from 65535 in a descending
> order, as a result port clash is minimized, also it fixes issues in rdma too

This approach could still surprise the storage-admin when glusterfs(d) processes
bind to ports in the range where brick ports are being assigned. We should make 
this
predictable by reserving brick ports setting net.ipv4.ip_local_reserved_ports.
Initially reserve 50 ports starting at 49152. Subsequently, we could reserve 
ports on demand,
say 50 more ports, when we exhaust previously reserved range. 
net.ipv4.ip_local_reserved_ports
doesn't interfere with explicit port allocation behaviour. i.e if the socket 
uses
a port other than zero. With this option we don't have to manage ports 
assignment at a process
level. Thoughts?

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t

2015-07-02 Thread Atin Mukherjee


On 07/03/2015 11:58 AM, Krishnan Parthasarathi wrote:
> 
> 
> - Original Message -
>>
>> This is caused because when bind-insecure is turned on (which is the default
>> now), it may happen
>> that brick is not able to bind to port assigned by Glusterd for example
>> 49192-49195...
>> It seems to occur because the rpc_clnt connections are binding to ports in
>> the same range.
>> so brick fails to bind to a port which is already used by someone else.
>>
>> This bug already exist before http://review.gluster.org/#/c/11039/ when use
>> rdma, i.e. even
>> previously rdma binds to port >= 1024 if it cannot find a free port < 1024,
>> even when bind insecure was turned off (ref to commit '0e3fd04e').
>> Since we don't have tests related to rdma we did not discover this issue
>> previously.
>>
>> http://review.gluster.org/#/c/11039/ discovers the bug we encountered,
>> however now the bug can be fixed by
>> http://review.gluster.org/#/c/11512/ by making rpc_clnt to get port numbers
>> from 65535 in a descending
>> order, as a result port clash is minimized, also it fixes issues in rdma too
> 
> This approach could still surprise the storage-admin when glusterfs(d) 
> processes
> bind to ports in the range where brick ports are being assigned. We should 
> make this
> predictable by reserving brick ports setting net.ipv4.ip_local_reserved_ports.
> Initially reserve 50 ports starting at 49152. Subsequently, we could reserve 
> ports on demand,
> say 50 more ports, when we exhaust previously reserved range. 
> net.ipv4.ip_local_reserved_ports
> doesn't interfere with explicit port allocation behaviour. i.e if the socket 
> uses
> a port other than zero. With this option we don't have to manage ports 
> assignment at a process
> level. Thoughts?
If the reallocation can be done on demand, I do think this is a better
approach to tackle this problem.
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 

-- 
~Atin
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel