Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal#

Pranith Kumar Karampuri Fri, 23 Jan 2015 00:28:38 -0800


On 01/23/2015 01:54 PM, A Ghoshal wrote:

Thanks a lot, Pranith.
We'll set this option on our test servers and keep the setup underobservation.

How did you get the bind-insecure option working?
I guess I will send a patch to make it 'volume set option'

Pranith

Thanks,
Anirban



From: Pranith Kumar Karampuri <pkara...@redhat.com>
To: A Ghoshal <a.ghos...@tcs.com>
Cc: gluster-users@gluster.org, Niels de Vos <nde...@redhat.com>
Date: 01/23/2015 01:28 PM
Subject: Re: [Gluster-users] In a replica 2 server, file-updates onone server missing on the other server #Personal#
------------------------------------------------------------------------




On 01/22/2015 02:07 PM, A Ghoshal wrote:
Hi Pranith,
Yes, the very same (_chalcogen_eg_oxygen@yahoo.com_<mailto:chalcogen_eg_oxy...@yahoo.com>). Justin Clift sent me a mail awhile back telling me that it is better if we all use our businessemail addresses so I made me a new profile.
Glusterfs complains about /proc/sys/net/ipv4/ip_local_reserved_portsbecause we use a really old Linux kernel (2.6.34) wherein this featureis not present. We plan to upgrade our Linux so often but each time weare dissuaded from it by some compatibility issue or the other. So, weget this log every time - on both good volumes and bad ones. Whatbothered me was this (on serv1):Basically to make the connections to servers i.e. bricks clients needto choose secure ports i.e. port less than 1024. Since this file isnot present, it is not binding to any port as per the code I justchecked. There is an option called client-bind-insecure which bypassesthis check. I feel that is one (probably only way) to get around this.You have to "volume set server.allow-insecure on" option andbind-insecure option.CC ndevos who seemed to have helped someone set bind-insecure optioncorrectly here (_http://irclog.perlgeek.de/gluster/2014-04-09/text_)
Pranith
[2015-01-20 09:37:49.151744] T[rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Requestfraglen 456, payload: 360, rpc hdr: 96[2015-01-20 09:37:49.151780] T [rpc-clnt.c:1499:rpc_clnt_submit]0-rpc-clnt: submitted request (XID: 0x39620x Program: GlusterFS 3.3,ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-0)[2015-01-20 09:37:49.151810] T [rpc-clnt.c:1302:rpc_clnt_record]0-replicated_vol-client-1: Auth Info: pid: 7599, uid: 0, gid: 0,owner: 0000000000000000[2015-01-20 09:37:49.151824] T[rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Requestfraglen 456, payload: 360, rpc hdr: 96[2015-01-20 09:37:49.151889] T [rpc-clnt.c:1499:rpc_clnt_submit]0-rpc-clnt: submitted request (XID: 0x39563x Program: GlusterFS 3.3,ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-1)[2015-01-20 09:37:49.152239] T [rpc-clnt.c:669:rpc_clnt_reply_init]0-replicated_vol-client-1: received rpc message (RPC XID: 0x39563xProgram: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport(replicated_vol-client-1)[2015-01-20 09:37:49.152484] T [rpc-clnt.c:669:rpc_clnt_reply_init]0-replicated_vol-client-0: received rpc message (RPC XID: 0x39620xProgram: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport(replicated_vol-client-0)
When I write on the good server (serv1), we see that an RPC request issent to both client-0 and client-1. While, when I write on the badserver (serv0), the RPC request is sent only to client-0, which is whyit is no wonder that the writes are not synced over to serv1. SomehowI could not make the daemon on serv0 understand that there are twoup-children and not just one.
One additional detail - since we are using a kernel that's too old, wedo not have the (Anand Avati's?) FUse readdirplus patches, either.I've noticed that the fixes in the readdirplus version of glusterfsaren't always guaranteed to be present on the non-readdirplus versionof the patches. I'd filed a bug around one such anomaly back, butnever got around to writing a patch for it (sorry!) Here it is:_https://bugzilla.redhat.com/show_bug.cgi?id=1062287_
I don't this has anything to do with readdirplus.

Maybe something on similar lines here?

Thanks,
Anirban
P.s. Please ignore the #Personal# in the subject line - we need to dothat to push mails to the public domain past the email filter safely.
From: Pranith Kumar Karampuri _<pkara...@redhat.com>_<mailto:pkara...@redhat.com>To: A Ghoshal _<a.ghos...@tcs.com>_ <mailto:a.ghos...@tcs.com>,_gluster-users@gluster.org_ <mailto:gluster-users@gluster.org>
Date: 01/22/2015 12:09 AM
Subject: Re: [Gluster-users] In a replica 2 server, file-updates onone server missing on the other server
------------------------------------------------------------------------



hi,
  Responses inline.

PS: You are chalkogen_oxygen?

Pranith
On 01/20/2015 05:34 PM, A Ghoshal wrote:
Hello,

I am using the following replicated volume:

root@serv0:~> gluster v info replicated_vol

Volume Name: replicated_vol
Type: Replicate
Volume ID: 26d111e3-7e4c-479e-9355-91635ab7f1c2
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: serv0:/mnt/bricks/replicated_vol/brick
Brick2: serv1:/mnt/bricks/replicated_vol/brick
Options Reconfigured:
diagnostics.client-log-level: INFO
network.ping-timeout: 10
nfs.enable-ino32: on
cluster.self-heal-daemon: on
nfs.disable: off
replicated_vol is mounted at /mnt/replicated_vol on both serv0 andserv1. If I do the following on serv0:
root@serv0:~>echo "cranberries" > /mnt/replicated_vol/testfile
root@serv0:~>echo "tangerines" >> /mnt/replicated_vol/testfile
And then I check for the state of the replicas in the bricks, then Ifind that
root@serv0:~>cat /mnt/bricks/replicated_vol/brick/testfile
cranberries
tangerines
root@serv0:~>

root@serv1:~>cat /mnt/bricks/replicated_vol/brick/testfile
root@serv1:~>
As may be seen, the replica on serv1 is blank, when I write intotestfile from serv0 (even though the file is created on both bricks).Interestingly, if I write something to the file at serv1, then the tworeplicas become identical.
root@serv1:~>echo "artichokes" >> /mnt/replicated_vol/testfile

root@serv1:~>cat /mnt/bricks/replicated_vol/brick/testfile
cranberries
tangerines
artichokes
root@serv1:~>

root@serv0:~>cat /mnt/bricks/replicated_vol/brick/testfile
cranberries
tangerines
artichokes
root@serv0:~>
So, I dabbled into the logs a little bit, after upping the diagnosticlevel, and this is what I saw:*_
When I write on serv0 (bad case):_*
[2015-01-20 09:21:52.197704] T [fuse-bridge.c:546:fuse_lookup_resume]0-glusterfs-fuse: 53027: LOOKUP/testfl(f0a76987-8a42-47a2-b027-a823254b736b)[2015-01-20 09:21:52.197959] D[afr-common.c:131:afr_lookup_xattr_req_prepare]0-replicated_vol-replicate-0: /testfl: failed to get the gfid from dict[2015-01-20 09:21:52.198006] T [rpc-clnt.c:1302:rpc_clnt_record]0-replicated_vol-client-0: Auth Info: pid: 28151, uid: 0, gid: 0,owner: 0000000000000000[2015-01-20 09:21:52.198024] T[rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Requestfraglen 456, payload: 360, rpc hdr: 96[2015-01-20 09:21:52.198108] T [rpc-clnt.c:1499:rpc_clnt_submit]0-rpc-clnt: submitted request (XID: 0x78163x Program: GlusterFS 3.3,ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-0)[2015-01-20 09:21:52.198565] T [rpc-clnt.c:669:rpc_clnt_reply_init]0-replicated_vol-client-0: received rpc message (RPC XID: 0x78163xProgram: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport(replicated_vol-client-0)[2015-01-20 09:21:52.198640] D[afr-self-heal-common.c:138:afr_sh_print_pending_matrix]0-replicated_vol-replicate-0: pending_matrix: [ 0 3 ][2015-01-20 09:21:52.198669] D[afr-self-heal-common.c:138:afr_sh_print_pending_matrix]0-replicated_vol-replicate-0: pending_matrix: [ 0 0 ][2015-01-20 09:21:52.198681] D[afr-self-heal-common.c:887:afr_mark_sources]0-replicated_vol-replicate-0: Number of sources: 1[2015-01-20 09:21:52.198694] D[afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]0-replicated_vol-replicate-0: returning read_child: 0[2015-01-20 09:21:52.198705] D[afr-common.c:1380:afr_lookup_select_read_child]0-replicated_vol-replicate-0: Source selected as 0 for /testfl[2015-01-20 09:21:52.198720] D[afr-common.c:1117:afr_lookup_build_response_params]0-replicated_vol-replicate-0: Building lookup response from 0[2015-01-20 09:21:52.198732] D[afr-common.c:1732:afr_lookup_perform_self_heal]0-replicated_vol-replicate-0: Only 1 child up - do not attempt todetect self heal*_
When I write on serv1 (good case):_*
[2015-01-20 09:37:49.151506] T [fuse-bridge.c:546:fuse_lookup_resume]0-glusterfs-fuse: 31212: LOOKUP/testfl(f0a76987-8a42-47a2-b027-a823254b736b)[2015-01-20 09:37:49.151683] D[afr-common.c:131:afr_lookup_xattr_req_prepare]0-replicated_vol-replicate-0: /testfl: failed to get the gfid from dict[2015-01-20 09:37:49.151726] T [rpc-clnt.c:1302:rpc_clnt_record]0-replicated_vol-client-0: Auth Info: pid: 7599, uid: 0, gid: 0,owner: 0000000000000000[2015-01-20 09:37:49.151744] T[rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Requestfraglen 456, payload: 360, rpc hdr: 96[2015-01-20 09:37:49.151780] T [rpc-clnt.c:1499:rpc_clnt_submit]0-rpc-clnt: submitted request (XID: 0x39620x Program: GlusterFS 3.3,ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-0)[2015-01-20 09:37:49.151810] T [rpc-clnt.c:1302:rpc_clnt_record]0-replicated_vol-client-1: Auth Info: pid: 7599, uid: 0, gid: 0,owner: 0000000000000000[2015-01-20 09:37:49.151824] T[rpc-clnt.c:1182:rpc_clnt_record_build_header] 0-rpc-clnt: Requestfraglen 456, payload: 360, rpc hdr: 96[2015-01-20 09:37:49.151889] T [rpc-clnt.c:1499:rpc_clnt_submit]0-rpc-clnt: submitted request (XID: 0x39563x Program: GlusterFS 3.3,ProgVers: 330, Proc: 27) to rpc-transport (replicated_vol-client-1)[2015-01-20 09:37:49.152239] T [rpc-clnt.c:669:rpc_clnt_reply_init]0-replicated_vol-client-1: received rpc message (RPC XID: 0x39563xProgram: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport(replicated_vol-client-1)[2015-01-20 09:37:49.152484] T [rpc-clnt.c:669:rpc_clnt_reply_init]0-replicated_vol-client-0: received rpc message (RPC XID: 0x39620xProgram: GlusterFS 3.3, ProgVers: 330, Proc: 27) from rpc-transport(replicated_vol-client-0)[2015-01-20 09:37:49.152582] D[afr-self-heal-common.c:138:afr_sh_print_pending_matrix]0-replicated_vol-replicate-0: pending_matrix: [ 0 3 ][2015-01-20 09:37:49.152596] D[afr-self-heal-common.c:138:afr_sh_print_pending_matrix]0-replicated_vol-replicate-0: pending_matrix: [ 0 0 ][2015-01-20 09:37:49.152621] D[afr-self-heal-common.c:887:afr_mark_sources]0-replicated_vol-replicate-0: Number of sources: 1[2015-01-20 09:37:49.152633] D[afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]0-replicated_vol-replicate-0: returning read_child: 0[2015-01-20 09:37:49.152644] D[afr-common.c:1380:afr_lookup_select_read_child]0-replicated_vol-replicate-0: Source selected as 0 for /testfl[2015-01-20 09:37:49.152657] D[afr-common.c:1117:afr_lookup_build_response_params]0-replicated_vol-replicate-0: Building lookup response from 0
We see that when you write on serv1, the RPC request is sent to bothreplicated_vol-client-0 and replicated_vol-client-1, while when wewrite on serv0, the request is sent only to replicated_vol-client-0,and the FUse client is unaware of the presence of client-1 in thelatter case.
I checked a bit more in the logs. When I turn on my trace, I foundmany instances of these logs on serv0 but NOT on serv1:
[2015-01-20 09:21:15.520784] T [fuse-bridge.c:681:fuse_attr_cbk]0-glusterfs-fuse: 53011: LOOKUP() / => 1[2015-01-20 09:21:17.683088] T [rpc-clnt.c:422:rpc_clnt_reconnect]0-replicated_vol-client-1: attempting reconnect[2015-01-20 09:21:17.683159] D [name.c:155:client_fill_address_family]0-replicated_vol-client-1: address-family not specified, guessing itto be inet from (remote-host: serv1)[2015-01-20 09:21:17.683178] T[name.c:225:af_inet_client_get_remote_sockaddr]0-replicated_vol-client-1: option remote-port missing in volumereplicated_vol-client-1. Defaulting to 24007[2015-01-20 09:21:17.683191] T [common-utils.c:188:gf_resolve_ip6]0-resolver: flushing DNS cache[2015-01-20 09:21:17.683202] T [common-utils.c:195:gf_resolve_ip6]0-resolver: DNS cache not present, freshly probing hostname: serv1[2015-01-20 09:21:17.683814] D [common-utils.c:237:gf_resolve_ip6]0-resolver: returning ip-192.168.24.81 (port-24007) for hostname:serv1 and port: 24007[2015-01-20 09:21:17.684139] D [common-utils.c:257:gf_resolve_ip6]0-resolver: next DNS query will return: ip-192.168.24.81 port-24007[2015-01-20 09:21:17.684164] T [socket.c:731:__socket_nodelay]0-replicated_vol-client-1: NODELAY enabled for socket 10[2015-01-20 09:21:17.684177] T [socket.c:790:__socket_keepalive]0-replicated_vol-client-1: Keep-alive enabled for socket 10, interval2, idle: 20[2015-01-20 09:21:17.684236] W[common-utils.c:2247:gf_get_reserved_ports] 0-glusterfs: could notopen the file /proc/sys/net/ipv4/ip_local_reserved_ports for gettingreserved ports info (No such file or directory)[2015-01-20 09:21:17.684253] W[common-utils.c:2280:gf_process_reserved_ports] 0-glusterfs: Not ableto get reserved ports, hence there is a possibility that glusterfs mayconsume reserved portLogs above suggest that mount process couldn't assign a reserved portbecause it couldn't find the file/proc/sys/net/ipv4/ip_local_reserved_ports
I guess reboot of the machine fixed it. Wonder why it was not found inthe first place.
Pranith.
[2015-01-20 09:21:17.684660] D [socket.c:605:__socket_shutdown]0-replicated_vol-client-1: shutdown() returned -1. Transport endpointis not connected[2015-01-20 09:21:17.684699] T[rpc-clnt.c:519:rpc_clnt_connection_cleanup]0-replicated_vol-client-1: cleaning up state in transport object 0x68a630[2015-01-20 09:21:17.684731] D [socket.c:486:__socket_rwv]0-replicated_vol-client-1: EOF on socket[2015-01-20 09:21:17.684750] W [socket.c:514:__socket_rwv]0-replicated_vol-client-1: readv failed (No data available)[2015-01-20 09:21:17.684766] D[socket.c:1962:__socket_proto_state_machine]0-replicated_vol-client-1: reading from socket failed. Error (No dataavailable), peer (192.168.24.81:49198)
I could not find a 'remote-port' option in /var/lib/glusterd on eitherpeer. Could somebody tell me where this configuration is looked upfrom? Also, sometime later, I rebooted serv0 and that seemed to solvethe problem. However, stop+start of replicated_vol and restart of/etc/init.d/glusterd did NOT solve the problem.Ignore that log. If no port is given in that volfile, it picks 24007as the port, which is the default port where glusterd 'listens'
Any help on this matter will be greatly appreciated as I need toprovide robustness assurances for our setup.
Thanks a lot,
Anirban

P.s. Additional details:/
glusterfs version: 3.4.2//
Linux kernel version: 2.6.34/

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you



_______________________________________________
Gluster-users mailing list_
__Gluster-users@gluster.org_ <mailto:Gluster-users@gluster.org>_
__http://www.gluster.org/mailman/listinfo/gluster-users_

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] In a replica 2 server, file-updates on one server missing on the other server #Personal#

Reply via email to