Re: [Gluster-users] [Gluster-devel] docs.gluster.org

2017-09-01 Thread Amye Scavarda
On Fri, Sep 1, 2017 at 9:42 AM, Michael Scherer  wrote:
> Le vendredi 01 septembre 2017 à 14:02 +0100, Michael Scherer a écrit :
>> Le mercredi 30 août 2017 à 12:11 +0530, Nigel Babu a écrit :
>> > Hello,
>> >
>> > To reduce confusion, we've setup docs.gluster.org pointing to
>> > gluster.readthedocs.org. Both URLs will continue to work for the
>> > forseeable
>> > future.
>> >
>> > Please update any references that you control to point to
>> > docs.gluster.org. At
>> > some point in the distant future, we will switch to hosting
>> > docs.gluster.org on
>> > our own servers.
>> >
>> > RTD will set up a canonical link to docs.gluster.org[1]. Over time,
>> > this will
>> > change update the results on search engines to docs.gluster.org.
>> > This
>> > change
>> > will reduce confusion we've had with copies of our docs hosted on
>> > RTD.
>> >
>> > [1]: https://docs.readthedocs.io/en/latest/canonical.html
>>
>> So , seems TLS certificate is wrong, should we correct the link to be
>> http for now ?
>
> So I opened a few PR/review:
> https://github.com/gluster/glusterdocs/pull/259
>
> https://review.gluster.org/#/c/18182/
>
> https://github.com/gluster/glusterweb/pull/148
>
>
> --
> Michael Scherer
> Sysadmin, Community Infrastructure and Platform, OSAS
>
>
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel

Fair warning, glusterweb is now deprecated as discussed in Community
Meeting on 30 Aug. However, github.com/gluster/glusterweb will be used
as a bug tracker moving forward.

This requested change in Glusterweb will be changed on the new main
site that combines www and blog, I'll be putting out a call for
volunteers who would like to be involved with the website before the
next community meeting.



-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] ganesha error ?

2017-09-01 Thread Renaud Fortier
Hi,
I got these errors 3 times since I'm testing gluster with nfs-ganesha. The 
clients are php apps and when this happen, clients got strange php session 
error. Below, the first error only happen once but other errors happen every 
time a clients try to create a new session file. To make php apps work again, I 
had to restart the client. Do you have an idea of what's happening here ?

31/08/2017 17:02:55 : epoch 59a450e6 : GLUSTER-NODE3 : 
ganesha.nfsd-2067[work-161] nfs4_State_Del :STATE :CRIT :hashtable get latch 
failed: 2

31/08/2017 17:04:00 : epoch 59a450e6 : GLUSTER-NODE3 : 
ganesha.nfsd-2067[work-31] create_nfs4_owner :NFS4 LOCK :CRIT :Related 
{STATE_OPEN_OWNER_NFSV4 0x7f0274475840: clientid={0x7f036836b0f0 
ClientID={Epoch=0x59a450e6 Counter=0x0008} CONFIRMED Client={0x7f0278001940 
name=(24:Linux NFSv4.2 devel-web3) refcount=4} t_delta=0 reservations=1 
refcount=466} owner=(24:0x6f70656e2069643a0027a95056901336) 
confirmed=0 seqid=0 refcount=2} doesn't match for {STATE_LOCK_OWNER_NFSV4 
0x7f0328479b10: clientid={0x7f036836b0f0 ClientID={Epoch=0x59a450e6 
Counter=0x0008} CONFIRMED Client={0x7f0278001940 name=(24:Linux NFSv4.2 
devel-web3) refcount=4} t_delta=0 reservations=1 refcount=466} 
owner=(20:0x6c6f636b2069643a0027) confirmed=0 seqid=0 
related_owner={STATE_OPEN_OWNER_NFSV4 0x7f03204cb2d0: clientid={0x7f036836b0f0 
ClientID={Epoch=0x59a450e6 Counter=0x0008} CONFIRMED Client={0x7f0278001940 
name=(24:Linux NFSv4.2 devel-web3) refcount=4} t_delta=0 reservations=1 
refcount=466} owner=(24:0x6f70656e2069643a0027a90f9b20feca) 
confirmed=0 seqid=0 refcount=3} refcount=5}

31/08/2017 17:04:00 : epoch 59a450e6 : GLUSTER-NODE3 : 
ganesha.nfsd-2067[work-31] nfs4_op_lock :NFS4 LOCK :EVENT :LOCK failed to 
create new lock owner Lock: obj=0x7f036c3a8268, fileid=11757051714723246668, 
type=READ , start=0x0, end=0x, owner={STATE_OPEN_OWNER_NFSV4 
0x7f0274475840: clientid={0x7f036836b0f0 ClientID={Epoch=0x59a450e6 
Counter=0x0008} CONFIRMED Client={0x7f0278001940 name=(24:Linux NFSv4.2 
devel-web3) refcount=4} t_delta=0 reservations=1 refcount=466} 
owner=(24:0x6f70656e2069643a0027a95056901336) confirmed=0 
seqid=0 refcount=2}

31/08/2017 17:04:00 : epoch 59a450e6 : GLUSTER-NODE3 : 
ganesha.nfsd-2067[work-71] create_nfs4_owner :NFS4 LOCK :CRIT :Related 
{STATE_OPEN_OWNER_NFSV4 0x7f0274475840: clientid={0x7f036836b0f0 
ClientID={Epoch=0x59a450e6 Counter=0x0008} CONFIRMED Client={0x7f0278001940 
name=(24:Linux NFSv4.2 devel-web3) refcount=4} t_delta=0 reservations=1 
refcount=466} owner=(24:0x6f70656e2069643a0027a95056901336) 
confirmed=0 seqid=0 refcount=2} doesn't match for {STATE_LOCK_OWNER_NFSV4 
0x7f0328479b10: clientid={0x7f036836b0f0 ClientID={Epoch=0x59a450e6 
Counter=0x0008} CONFIRMED Client={0x7f0278001940 name=(24:Linux NFSv4.2 
devel-web3) refcount=4} t_delta=0 reservations=1 refcount=466} 
owner=(20:0x6c6f636b2069643a0027) confirmed=0 seqid=0 
related_owner={STATE_OPEN_OWNER_NFSV4 0x7f03204cb2d0: clientid={0x7f036836b0f0 
ClientID={Epoch=0x59a450e6 Counter=0x0008} CONFIRMED Client={0x7f0278001940 
name=(24:Linux NFSv4.2 devel-web3) refcount=4} t_delta=0 reservations=1 
refcount=466} owner=(24:0x6f70656e2069643a0027a90f9b20feca) 
confirmed=0 seqid=0 refcount=3} refcount=5}

31/08/2017 17:04:00 : epoch 59a450e6 : GLUSTER-NODE3 : 
ganesha.nfsd-2067[work-71] nfs4_op_lock :NFS4 LOCK :EVENT :LOCK failed to 
create new lock owner Lock: obj=0x7f036c3a8268, fileid=11757051714723246668, 
type=WRITE, start=0x0, end=0x, owner={STATE_OPEN_OWNER_NFSV4 
0x7f0274475840: clientid={0x7f036836b0f0 ClientID={Epoch=0x59a450e6 
Counter=0x0008} CONFIRMED Client={0x7f0278001940 name=(24:Linux NFSv4.2 
devel-web3) refcount=4} t_delta=0 reservations=1 refcount=466} 
owner=(24:0x6f70656e2069643a0027a95056901336) confirmed=0 
seqid=0 refcount=2}
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] high number of pending calls

2017-09-01 Thread Ingard Mevåg
Hi

We're seeing high? number of pending calls on two of our glusterfs 3.10
clusters.
We have not tried to tune anything except changing server.event-threads: 2

gluster volume status callpool | grep Pending results in various numbers
but more often that not a fair few of the bricks have 200-400 pending calls.

Is there a way I can debug this further?

The servers are 8x dual 8 core with 64G memory running a distributed
cluster with 4 bricks on each.
The four glusterfsd processes are averaging 100% cpu each and the load
average is around 25-30

kind regards
ingard
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] docs.gluster.org

2017-09-01 Thread Michael Scherer
Le vendredi 01 septembre 2017 à 14:02 +0100, Michael Scherer a écrit :
> Le mercredi 30 août 2017 à 12:11 +0530, Nigel Babu a écrit :
> > Hello,
> > 
> > To reduce confusion, we've setup docs.gluster.org pointing to
> > gluster.readthedocs.org. Both URLs will continue to work for the
> > forseeable
> > future.
> > 
> > Please update any references that you control to point to
> > docs.gluster.org. At
> > some point in the distant future, we will switch to hosting
> > docs.gluster.org on
> > our own servers.
> > 
> > RTD will set up a canonical link to docs.gluster.org[1]. Over time,
> > this will
> > change update the results on search engines to docs.gluster.org.
> > This
> > change
> > will reduce confusion we've had with copies of our docs hosted on
> > RTD.
> > 
> > [1]: https://docs.readthedocs.io/en/latest/canonical.html
> 
> So , seems TLS certificate is wrong, should we correct the link to be
> http for now ?

So I opened a few PR/review:
https://github.com/gluster/glusterdocs/pull/259

https://review.gluster.org/#/c/18182/

https://github.com/gluster/glusterweb/pull/148


-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] docs.gluster.org

2017-09-01 Thread Michael Scherer
Le mercredi 30 août 2017 à 12:11 +0530, Nigel Babu a écrit :
> Hello,
> 
> To reduce confusion, we've setup docs.gluster.org pointing to
> gluster.readthedocs.org. Both URLs will continue to work for the
> forseeable
> future.
> 
> Please update any references that you control to point to
> docs.gluster.org. At
> some point in the distant future, we will switch to hosting
> docs.gluster.org on
> our own servers.
> 
> RTD will set up a canonical link to docs.gluster.org[1]. Over time,
> this will
> change update the results on search engines to docs.gluster.org. This
> change
> will reduce confusion we've had with copies of our docs hosted on
> RTD.
> 
> [1]: https://docs.readthedocs.io/en/latest/canonical.html

So , seems TLS certificate is wrong, should we correct the link to be
http for now ?


-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS



signature.asc
Description: This is a digitally signed message part
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] peer rejected but connected

2017-09-01 Thread lejeczek
I've also noticed - In case it's abnormal and might help - 
that when rejected peer get probed it is present only on the 
probing peer(though as rejected).
Remaining two peers do not show rejected peer in the status, 
and rejected peer shows only the peer it got probed from.


On 01/09/17 07:30, Gaurav Yadav wrote:

Logs from newly added node helped me in RCA of the issue.

Info file on node 10.5.6.17 consist of an additional 
property "tier-enabled" which is not present in info file 
from other 3 nodes, hence
when gluster peer probe call is made, in order to maintain 
consistency across the cluster cksum is compared. In this
case as both files are different leading to different 
cksum, causing state in  "State: Peer Rejected (Connected)".


This inconsistency arise due to upgrade you did.

Workaround:
1.Go to node 10.5.6.17
2.Open info file from 
"/var/lib/glusterd/vols//info" and remove 
"tier-enabled=0".

3.Restart glusterd services
4.Peer probe again.

Thanks
Gaurav

On Thu, Aug 31, 2017 at 3:37 PM, lejeczek 
> wrote:


attached the lot as per your request.

Would bee really great if you can find the root cause
of this and suggest a resolution. Fingers crossed.
thanks, L.

On 31/08/17 05:34, Gaurav Yadav wrote:

Could you please sendentire content of
"/var/lib/glusterd/" directory of the 4th node
which is being peer probed, along with
command-history and glusterd.logs.

Thanks
Gaurav

On Wed, Aug 30, 2017 at 7:10 PM, lejeczek

>> wrote:



    On 30/08/17 07:18, Gaurav Yadav wrote:


        Could you please send me "info" file which is
        placed in "/var/lib/glusterd/vols/"
        directory from all the nodes along with
        glusterd.logs and command-history.

        Thanks
        Gaurav

        On Tue, Aug 29, 2017 at 7:13 PM, lejeczek
        
>
        

        

Re: [Gluster-users] GFID attir is missing after adding large amounts of data

2017-09-01 Thread Christoph Schäbel
My answers inline.

> Am 01.09.2017 um 04:19 schrieb Ben Turner :
> 
> I re-added gluster-users to get some more eye on this.
> 
> - Original Message -
>> From: "Christoph Schäbel" 
>> To: "Ben Turner" 
>> Sent: Wednesday, August 30, 2017 8:18:31 AM
>> Subject: Re: [Gluster-users] GFID attir is missing after adding large 
>> amounts of data
>> 
>> Hello Ben,
>> 
>> thank you for offering your help.
>> 
>> Here are outputs from all the gluster commands I could think of.
>> Note that we had to remove the terrabytes of data to keep the system
>> operational, because it is a live system.
>> 
>> # gluster volume status
>> 
>> Status of volume: gv0
>> Gluster process TCP Port  RDMA Port  Online  Pid
>> --
>> Brick 10.191.206.15:/mnt/brick1/gv0 49154 0  Y   2675
>> Brick 10.191.198.15:/mnt/brick1/gv0 49154 0  Y   2679
>> Self-heal Daemon on localhost   N/A   N/AY
>> 12309
>> Self-heal Daemon on 10.191.206.15   N/A   N/AY   2670
>> 
>> Task Status of Volume gv0
>> --
>> There are no active volume tasks
> 
> OK so your bricks are all online, you have two nodes with 1 brick per node.

Yes

> 
>> 
>> # gluster volume info
>> 
>> Volume Name: gv0
>> Type: Replicate
>> Volume ID: 5e47d0b8-b348-45bb-9a2a-800f301df95b
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x 2 = 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.191.206.15:/mnt/brick1/gv0
>> Brick2: 10.191.198.15:/mnt/brick1/gv0
>> Options Reconfigured:
>> transport.address-family: inet
>> performance.readdir-ahead: on
>> nfs.disable: on
> 
> You are using a replicate volume with 2 copies of your data, it looks like 
> you are using the defaults as I don't see any tuning.

The only thing we tuned is the network.ping-timeout, we set this to 10 seconds 
(if this is not the default anyways)

> 
>> 
>> # gluster peer status
>> 
>> Number of Peers: 1
>> 
>> Hostname: 10.191.206.15
>> Uuid: 030a879d-da93-4a48-8c69-1c552d3399d2
>> State: Peer in Cluster (Connected)
>> 
>> 
>> # gluster —version
>> 
>> glusterfs 3.8.11 built on Apr 11 2017 09:50:39
>> Repository revision: git://git.gluster.com/glusterfs.git
>> Copyright (c) 2006-2011 Gluster Inc. 
>> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>> You may redistribute copies of GlusterFS under the terms of the GNU General
>> Public License.
> 
> You are running Gluster 3.8 which is the latest upstream release marked 
> stable.
> 
>> 
>> # df -h
>> 
>> Filesystem   Size  Used Avail Use% Mounted on
>> /dev/mapper/vg00-root 75G  5.7G   69G   8% /
>> devtmpfs 1.9G 0  1.9G   0% /dev
>> tmpfs1.9G 0  1.9G   0% /dev/shm
>> tmpfs1.9G   17M  1.9G   1% /run
>> tmpfs1.9G 0  1.9G   0% /sys/fs/cgroup
>> /dev/sda1477M  151M  297M  34% /boot
>> /dev/mapper/vg10-brick1  8.0T  700M  8.0T   1% /mnt/brick1
>> localhost:/gv0   8.0T  768M  8.0T   1% /mnt/glusterfs_client
>> tmpfs380M 0  380M   0% /run/user/0
>> 
> 
> Your brick is:
> 
> /dev/mapper/vg10-brick1  8.0T  700M  8.0T   1% /mnt/brick1
> 
> The block device is 8TB.  Can you tell me more about your brick?  Is it a 
> single disk or a RAID?  If its a RAID can you tell me about the disks?  I am 
> interested in:
> 
> -Size of disks
> -RAID type
> -Stripe size
> -RAID controller

Not sure about the disks, because it comes from a large storage system (not the 
cheap NAS kind, but the really expensive rack kind) which is then used by 
VMWare to present a single Volume to my virtual machine. I am pretty sure that 
on the storage system there is some kind of RAID going on, but I am not sure if 
that does have an effect on the "virtual“ disk that is presented to my VM. To 
the VM the disk does not look like a RAID, as far as I can tell.

# lvdisplay 
  --- Logical volume --- 
  LV Path/dev/vg10/brick1 
  LV Namebrick1 
  VG Namevg10 
  LV UUIDOEvHEG-m5zc-2MQ1-3gNd-o2gh-q405-YWG02j 
  LV Write Accessread/write 
  LV Creation host, time localhost, 2017-01-26 09:44:08 + 
  LV Status  available 
  # open 1 
  LV Size8.00 TiB 
  Current LE 2096890 
  Segments   1 
  Allocation inherit 
  Read ahead sectors auto 
  - currently set to 8192 
  Block device   253:1 

  --- Logical volume --- 
  LV Path/dev/vg00/root 
  LV Nameroot 
  VG Namevg00 
  LV UUID3uyF7l-Xhfa-6frx-qjsP-Iy0u-JdbQ-Me03AS 
  LV Write Access

Re: [Gluster-users] peer rejected but connected

2017-09-01 Thread lejeczek

hi, still tricky

whether I do or do not remove "tier-enabled=0" on rejected 
peer, and try to restart glusterd service there, restart fails:


lusterd version 3.10.5 (args: /usr/sbin/glusterd -p 
/var/run/glusterd.pid --log-level INFO)
[2017-09-01 07:41:08.251314] I [MSGID: 106478] 
[glusterd.c:1449:init] 0-management: Maximum allowed open 
file descriptors set to 65536
[2017-09-01 07:41:08.251400] I [MSGID: 106479] 
[glusterd.c:1496:init] 0-management: Using /var/lib/glusterd 
as working directory
[2017-09-01 07:41:08.275000] W [MSGID: 103071] 
[rdma.c:4590:__gf_rdma_ctx_create] 0-rpc-transport/rdma: 
rdma_cm event channel creation failed [No such device]
[2017-09-01 07:41:08.275071] W [MSGID: 103055] 
[rdma.c:4897:init] 0-rdma.management: Failed to initialize 
IB Device
[2017-09-01 07:41:08.275096] W 
[rpc-transport.c:350:rpc_transport_load] 0-rpc-transport: 
'rdma' initialization failed
[2017-09-01 07:41:08.275307] W 
[rpcsvc.c:1661:rpcsvc_create_listener] 0-rpc-service: cannot 
create listener, initing the transport failed
[2017-09-01 07:41:08.275343] E [MSGID: 106243] 
[glusterd.c:1720:init] 0-management: creation of 1 listeners 
failed, continuing with succeeded transport
[2017-09-01 07:41:13.941020] I [MSGID: 106513] 
[glusterd-store.c:2197:glusterd_restore_op_version] 
0-glusterd: retrieved op-version: 30712
[2017-09-01 07:41:14.109192] I [MSGID: 106498] 
[glusterd-handler.c:3669:glusterd_friend_add_from_peerinfo] 
0-management: connect returned 0
[2017-09-01 07:41:14.109364] W [MSGID: 106062] 
[glusterd-handler.c:3466:glusterd_transport_inet_options_build] 
0-glusterd: Failed to get tcp-user-timeout
[2017-09-01 07:41:14.109481] I 
[rpc-clnt.c:1059:rpc_clnt_connection_init] 0-management: 
setting frame-timeout to 600
[2017-09-01 07:41:14.134691] E [MSGID: 106187] 
[glusterd-store.c:4559:glusterd_resolve_all_bricks] 
0-glusterd: resolve brick failed in restore
[2017-09-01 07:41:14.134769] E [MSGID: 101019] 
[xlator.c:503:xlator_init] 0-management: Initialization of 
volume 'management' failed, review your volfile again
[2017-09-01 07:41:14.134790] E [MSGID: 101066] 
[graph.c:325:glusterfs_graph_init] 0-management: 
initializing translator failed
[2017-09-01 07:41:14.134804] E [MSGID: 101176] 
[graph.c:681:glusterfs_graph_activate] 0-graph: init failed
[2017-09-01 07:41:14.135723] W 
[glusterfsd.c:1332:cleanup_and_exit] 
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xfd) 
[0x55f22fab3abd] 
-->/usr/sbin/glusterd(glusterfs_process_volfp+0x1b1) 
[0x55f22fab3961] 
-->/usr/sbin/glusterd(cleanup_and_exit+0x6b) 
[0x55f22fab2e4b] ) 0-: received signum (1), shutting down


I have to wipe clean /var/lib/glusterd on 
rejected(10.5.6.17) peer and then can restart it, but.. I 
probe it anew and then "tier-enabled=0" lands in the "info" 
file for each vol on 10.5.6.17 and... vicious circle?




On 01/09/17 07:30, Gaurav Yadav wrote:

Logs from newly added node helped me in RCA of the issue.

Info file on node 10.5.6.17 consist of an additional 
property "tier-enabled" which is not present in info file 
from other 3 nodes, hence
when gluster peer probe call is made, in order to maintain 
consistency across the cluster cksum is compared. In this
case as both files are different leading to different 
cksum, causing state in  "State: Peer Rejected (Connected)".


This inconsistency arise due to upgrade you did.

Workaround:
1.Go to node 10.5.6.17
2.Open info file from 
"/var/lib/glusterd/vols//info" and remove 
"tier-enabled=0".

3.Restart glusterd services
4.Peer probe again.

Thanks
Gaurav

On Thu, Aug 31, 2017 at 3:37 PM, lejeczek 
> wrote:


attached the lot as per your request.

Would bee really great if you can find the root cause
of this and suggest a resolution. Fingers crossed.
thanks, L.

On 31/08/17 05:34, Gaurav Yadav wrote:

Could you please sendentire content of
"/var/lib/glusterd/" directory of the 4th node
which is being peer probed, along with
command-history and glusterd.logs.

Thanks
Gaurav

On Wed, Aug 30, 2017 at 7:10 PM, lejeczek

>> wrote:



    On 30/08/17 07:18, Gaurav Yadav wrote:


        Could you please send me "info" file which is
        placed in "/var/lib/glusterd/vols/"
        directory from all the nodes along with
        glusterd.logs and command-history.

        Thanks
        Gaurav

        On Tue, Aug 29, 2017 at 7:13 PM, lejeczek
        
>
        

        

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-09-01 Thread Serkan Çoban
Hi,
You can find pstack sampes here:
https://www.dropbox.com/s/6gw8b6tng8puiox/pstack_with_debuginfo.zip?dl=0

Here is the first one:
Thread 8 (Thread 0x7f92879ae700 (LWP 78909)):
#0  0x003d99c0f00d in nanosleep () from /lib64/libpthread.so.0
#1  0x00310fe37d57 in gf_timer_proc () from /usr/lib64/libglusterfs.so.0
#2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f9286fad700 (LWP 78910)):
#0  0x003d99c0f585 in sigwait () from /lib64/libpthread.so.0
#1  0x0040643b in glusterfs_sigwaiter ()
#2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f92865ac700 (LWP 78911)):
#0  0x003d998acc4d in nanosleep () from /lib64/libc.so.6
#1  0x003d998acac0 in sleep () from /lib64/libc.so.6
#2  0x00310fe528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
#3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f9285bab700 (LWP 78912)):
#0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x00310fe729f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f92851aa700 (LWP 78913)):
#0  0x003d99c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00310fe64afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
#2  0x00310fe729f0 in syncenv_processor () from /usr/lib64/libglusterfs.so.0
#3  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f9282ecc700 (LWP 78915)):
#0  0x003d99c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x7f928450099b in hooks_worker () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#2  0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#3  0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f92824cb700 (LWP 78916)):
#0  0x003d9992867a in __strcmp_sse42 () from /lib64/libc.so.6
#1  0x00310fe2244a in dict_lookup_common () from
/usr/lib64/libglusterfs.so.0
#2  0x00310fe2433d in dict_set_lk () from /usr/lib64/libglusterfs.so.0
#3  0x00310fe245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
#4  0x00310fe2524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
#5  0x7f928453a8c4 in gd_add_brick_snap_details_to_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#6  0x7f928447b0df in glusterd_add_volume_to_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#7  0x7f928447b47c in glusterd_add_volumes_to_export_dict () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#8  0x7f9284491edf in glusterd_rpc_friend_add () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#9  0x7f92844528f7 in glusterd_ac_friend_add () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#10 0x7f9284450bb9 in glusterd_friend_sm () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#11 0x7f92844ac89a in __glusterd_mgmt_hndsk_version_ack_cbk ()
from /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#12 0x7f92844923ee in glusterd_big_locked_cbk () from
/usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
#13 0x00311020fad5 in rpc_clnt_handle_reply () from /usr/lib64/libgfrpc.so.0
#14 0x003110210c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
#15 0x00311020bd68 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#16 0x7f9283492ccd in socket_event_poll_in () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#17 0x7f9283493ffe in socket_event_handler () from
/usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
#18 0x00310fe87806 in event_dispatch_epoll_worker () from
/usr/lib64/libglusterfs.so.0
#19 0x003d99c07aa1 in start_thread () from /lib64/libpthread.so.0
#20 0x003d998e8bbd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f928e4a4740 (LWP 78908)):
#0  0x003d99c082fd in pthread_join () from /lib64/libpthread.so.0
#1  0x00310fe872d5 in event_dispatch_epoll () from
/usr/lib64/libglusterfs.so.0
#2  0x00409020 in main ()

On Fri, Sep 1, 2017 at 6:17 AM, Milind Changire  wrote:
> Serkan,
> I have gone through other mails in the mail thread as well but responding to
> this one specifically.
>
> Is this a source install or an RPM install ?
> If this is an RPM install, could you please install the glusterfs-debuginfo
> RPM and retry to capture the gdb backtrace.
>
> If this is a source install, then you'll need to configure the build with
> 

Re: [Gluster-users] peer rejected but connected

2017-09-01 Thread Gaurav Yadav
Logs from newly added node helped me in RCA of the issue.

Info file on node 10.5.6.17 consist of an additional property
"tier-enabled" which is not present in info file from other 3 nodes, hence
when gluster peer probe call is made, in order to maintain consistency
across the cluster cksum is compared. In this
case as both files are different leading to different cksum, causing state
in  "State: Peer Rejected (Connected)".

This inconsistency arise due to upgrade you did.

Workaround:
1.Go to node 10.5.6.17
2.Open info file from "/var/lib/glusterd/vols//info" and remove
"tier-enabled=0".
3.Restart glusterd services
4.Peer probe again.

Thanks
Gaurav

On Thu, Aug 31, 2017 at 3:37 PM, lejeczek  wrote:

> attached the lot as per your request.
>
> Would bee really great if you can find the root cause of this and suggest
> a resolution. Fingers crossed.
> thanks, L.
>
> On 31/08/17 05:34, Gaurav Yadav wrote:
>
>> Could you please sendentire content of "/var/lib/glusterd/" directory of
>> the 4th node which is being peer probed, along with command-history and
>> glusterd.logs.
>>
>> Thanks
>> Gaurav
>>
>> On Wed, Aug 30, 2017 at 7:10 PM, lejeczek  pelj...@yahoo.co.uk>> wrote:
>>
>>
>>
>> On 30/08/17 07:18, Gaurav Yadav wrote:
>>
>>
>> Could you please send me "info" file which is
>> placed in "/var/lib/glusterd/vols/"
>> directory from all the nodes along with
>> glusterd.logs and command-history.
>>
>> Thanks
>> Gaurav
>>
>> On Tue, Aug 29, 2017 at 7:13 PM, lejeczek
>> 
>> >
>> >> wrote:
>>
>> hi fellas,
>> same old same
>> in log of the probing peer I see:
>> ...
>> 2017-08-29 13:36:16.882196] I [MSGID: 106493]
>>
>> [glusterd-handler.c:3020:__glusterd_handle_probe_query]
>> 0-glusterd: Responded to priv.xx.xx.priv.xx.xx.x,
>> op_ret: 0, op_errno: 0, ret: 0
>> [2017-08-29 13:36:16.904961] I [MSGID: 106490]
>>
>> [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req]
>> 0-glusterd: Received probe from uuid:
>> 2a17edb4-ae68-4b67-916e-e38a2087ca28
>> [2017-08-29 13:36:16.906477] E [MSGID: 106010]
>>
>> [glusterd-utils.c:3034:glusterd_compare_friend_volume]
>> 0-management: Version of Cksums CO-DATA
>> differ. local
>> cksum = 4088157353, remote cksum = 2870780063
>> on peer
>> 10.5.6.17
>> [2017-08-29 13:36:16.907187] I [MSGID: 106493]
>>
>> [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp]
>> 0-glusterd: Responded to 10.5.6.17 (0), ret:
>> 0, op_ret: -1
>> ...
>>
>> Why would adding a new peer make cluster jump
>> to check
>> checksums on a vol on that newly added peer?
>>
>>
>> really. I mean, no brick even exists on newly added
>> peer, it's just been probed, why this?:
>>
>> [2017-08-30 13:17:51.949430] E [MSGID: 106010]
>> [glusterd-utils.c:3034:glusterd_compare_friend_volume]
>> 0-management: Version of Cksums CO-DATA differ. local
>> cksum = 4088157353, remote cksum = 2870780063 on peer
>> 10.5.6.17
>>
>> 10.5.6.17 is a candidate I'm probing from a working
>> cluster.
>> Why gluster wants checksums and why checksums would be
>> different?
>> Would anybody know what is going on there?
>>
>>
>> Is it why the peer gets rejected?
>> That peer I'm hoping to add, was a member of the
>> cluster in the past but I did "usual" wipe of
>> /var/lib/gluster on candidate peer.
>>
>> a hint, solution would be great to hear.
>> L.
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> 
>> > >
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>>
>> > >
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> 
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>>
>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users