Re: [Gluster-users] peer rejected but connected

2017-08-31 Thread Gaurav Yadav
Logs from newly added node helped me in RCA of the issue.

Info file on node 10.5.6.17 consist of an additional property
"tier-enabled" which is not present in info file from other 3 nodes, hence
when gluster peer probe call is made, in order to maintain consistency
across the cluster cksum is compared. In this
case as both files are different leading to different cksum, causing state
in  "State: Peer Rejected (Connected)".

This inconsistency arise due to upgrade you did.

Workaround:
1.Go to node 10.5.6.17
2.Open info file from "/var/lib/glusterd/vols//info" and remove
"tier-enabled=0".
3.Restart glusterd services
4.Peer probe again.

Thanks
Gaurav

On Thu, Aug 31, 2017 at 3:37 PM, lejeczek  wrote:

> attached the lot as per your request.
>
> Would bee really great if you can find the root cause of this and suggest
> a resolution. Fingers crossed.
> thanks, L.
>
> On 31/08/17 05:34, Gaurav Yadav wrote:
>
>> Could you please sendentire content of "/var/lib/glusterd/" directory of
>> the 4th node which is being peer probed, along with command-history and
>> glusterd.logs.
>>
>> Thanks
>> Gaurav
>>
>> On Wed, Aug 30, 2017 at 7:10 PM, lejeczek > pelj...@yahoo.co.uk>> wrote:
>>
>>
>>
>> On 30/08/17 07:18, Gaurav Yadav wrote:
>>
>>
>> Could you please send me "info" file which is
>> placed in "/var/lib/glusterd/vols/"
>> directory from all the nodes along with
>> glusterd.logs and command-history.
>>
>> Thanks
>> Gaurav
>>
>> On Tue, Aug 29, 2017 at 7:13 PM, lejeczek
>> mailto:pelj...@yahoo.co.uk>
>> >
>> >> wrote:
>>
>> hi fellas,
>> same old same
>> in log of the probing peer I see:
>> ...
>> 2017-08-29 13:36:16.882196] I [MSGID: 106493]
>>
>> [glusterd-handler.c:3020:__glusterd_handle_probe_query]
>> 0-glusterd: Responded to priv.xx.xx.priv.xx.xx.x,
>> op_ret: 0, op_errno: 0, ret: 0
>> [2017-08-29 13:36:16.904961] I [MSGID: 106490]
>>
>> [glusterd-handler.c:2606:__glusterd_handle_incoming_friend_req]
>> 0-glusterd: Received probe from uuid:
>> 2a17edb4-ae68-4b67-916e-e38a2087ca28
>> [2017-08-29 13:36:16.906477] E [MSGID: 106010]
>>
>> [glusterd-utils.c:3034:glusterd_compare_friend_volume]
>> 0-management: Version of Cksums CO-DATA
>> differ. local
>> cksum = 4088157353, remote cksum = 2870780063
>> on peer
>> 10.5.6.17
>> [2017-08-29 13:36:16.907187] I [MSGID: 106493]
>>
>> [glusterd-handler.c:3866:glusterd_xfer_friend_add_resp]
>> 0-glusterd: Responded to 10.5.6.17 (0), ret:
>> 0, op_ret: -1
>> ...
>>
>> Why would adding a new peer make cluster jump
>> to check
>> checksums on a vol on that newly added peer?
>>
>>
>> really. I mean, no brick even exists on newly added
>> peer, it's just been probed, why this?:
>>
>> [2017-08-30 13:17:51.949430] E [MSGID: 106010]
>> [glusterd-utils.c:3034:glusterd_compare_friend_volume]
>> 0-management: Version of Cksums CO-DATA differ. local
>> cksum = 4088157353, remote cksum = 2870780063 on peer
>> 10.5.6.17
>>
>> 10.5.6.17 is a candidate I'm probing from a working
>> cluster.
>> Why gluster wants checksums and why checksums would be
>> different?
>> Would anybody know what is going on there?
>>
>>
>> Is it why the peer gets rejected?
>> That peer I'm hoping to add, was a member of the
>> cluster in the past but I did "usual" wipe of
>> /var/lib/gluster on candidate peer.
>>
>> a hint, solution would be great to hear.
>> L.
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> 
>> > >
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>>
>> > >
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> 
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>> 
>>
>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-31 Thread Milind Changire
Serkan,
I have gone through other mails in the mail thread as well but responding
to this one specifically.

Is this a source install or an RPM install ?
If this is an RPM install, could you please install the glusterfs-debuginfo
RPM and retry to capture the gdb backtrace.

If this is a source install, then you'll need to configure the build with
--enable-debug and reinstall and retry capturing the gdb backtrace.

Having the debuginfo package or a debug build helps to resolve the function
names and/or line numbers.
--
Milind



On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban 
wrote:

> Here you can find 10 stack trace samples from glusterd. I wait 10
> seconds between each trace.
> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>
> Content of the first stack trace is here:
>
> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> #1  0x0040643b in glusterfs_sigwaiter ()
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> #2  0x00303f8528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f7a898a099b in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
> #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> #4  0x00303f82524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> #5  0x7f7a898da7fd in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #6  0x7f7a8981b0df in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #7  0x7f7a8981b47c in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #8  0x7f7a89831edf in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #9  0x7f7a897f28f7 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #10 0x7f7a897f0bb9 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #11 0x7f7a8984c89a in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #12 0x7f7a898323ee in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #13 0x00303f40fad5 in rpc_clnt_handle_reply () from
> /usr/lib64/libgfrpc.so.0
> #14 0x00303f410c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
> #15 0x00303f40bd68 in rpc_transport_notify () from
> /usr/lib64/libgfrpc.so.0
> #16 0x7f7a88a6fccd in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #17 0x7f7a88a70ffe in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #18 0x00303f887806 in ?? () from /usr/lib64/libglusterfs.so.0
> #19 0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x7f7a93844740 (LWP 43068)):
> #0  0x003aa5c082fd in pthread_join () from /lib64/libpthread.so.0
> #1  0x00303f8872d5 in ?? () from /usr/lib64/libglusterf

Re: [Gluster-users] GFID attir is missing after adding large amounts of data

2017-08-31 Thread Ben Turner
I re-added gluster-users to get some more eye on this.

- Original Message -
> From: "Christoph Schäbel" 
> To: "Ben Turner" 
> Sent: Wednesday, August 30, 2017 8:18:31 AM
> Subject: Re: [Gluster-users] GFID attir is missing after adding large amounts 
> of  data
> 
> Hello Ben,
> 
> thank you for offering your help.
> 
> Here are outputs from all the gluster commands I could think of.
> Note that we had to remove the terrabytes of data to keep the system
> operational, because it is a live system.
> 
> # gluster volume status
> 
> Status of volume: gv0
> Gluster process TCP Port  RDMA Port  Online  Pid
> --
> Brick 10.191.206.15:/mnt/brick1/gv0 49154 0  Y   2675
> Brick 10.191.198.15:/mnt/brick1/gv0 49154 0  Y   2679
> Self-heal Daemon on localhost   N/A   N/AY
> 12309
> Self-heal Daemon on 10.191.206.15   N/A   N/AY   2670
> 
> Task Status of Volume gv0
> --
> There are no active volume tasks

OK so your bricks are all online, you have two nodes with 1 brick per node.

> 
> # gluster volume info
> 
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5e47d0b8-b348-45bb-9a2a-800f301df95b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: 10.191.206.15:/mnt/brick1/gv0
> Brick2: 10.191.198.15:/mnt/brick1/gv0
> Options Reconfigured:
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on

You are using a replicate volume with 2 copies of your data, it looks like you 
are using the defaults as I don't see any tuning.

> 
> # gluster peer status
> 
> Number of Peers: 1
> 
> Hostname: 10.191.206.15
> Uuid: 030a879d-da93-4a48-8c69-1c552d3399d2
> State: Peer in Cluster (Connected)
> 
> 
> # gluster —version
> 
> glusterfs 3.8.11 built on Apr 11 2017 09:50:39
> Repository revision: git://git.gluster.com/glusterfs.git
> Copyright (c) 2006-2011 Gluster Inc. 
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> You may redistribute copies of GlusterFS under the terms of the GNU General
> Public License.

You are running Gluster 3.8 which is the latest upstream release marked stable.

> 
> # df -h
> 
> Filesystem   Size  Used Avail Use% Mounted on
> /dev/mapper/vg00-root 75G  5.7G   69G   8% /
> devtmpfs 1.9G 0  1.9G   0% /dev
> tmpfs1.9G 0  1.9G   0% /dev/shm
> tmpfs1.9G   17M  1.9G   1% /run
> tmpfs1.9G 0  1.9G   0% /sys/fs/cgroup
> /dev/sda1477M  151M  297M  34% /boot
> /dev/mapper/vg10-brick1  8.0T  700M  8.0T   1% /mnt/brick1
> localhost:/gv0   8.0T  768M  8.0T   1% /mnt/glusterfs_client
> tmpfs380M 0  380M   0% /run/user/0
>

Your brick is:

 /dev/mapper/vg10-brick1  8.0T  700M  8.0T   1% /mnt/brick1

The block device is 8TB.  Can you tell me more about your brick?  Is it a 
single disk or a RAID?  If its a RAID can you tell me about the disks?  I am 
interested in:

-Size of disks
-RAID type
-Stripe size
-RAID controller

I also see:

 localhost:/gv0   8.0T  768M  8.0T   1% /mnt/glusterfs_client

So you are mounting your volume on the local node, is this the mount where you 
are writing data to?
 
> 
> 
> The setup of the servers is done via shell script on CentOS 7 containing the
> following commands:
> 
> yum install -y centos-release-gluster
> yum install -y glusterfs-server
> 
> mkdir /mnt/brick1
> ssm create -s 999G -n brick1 --fstype xfs -p vg10 /dev/sdb /mnt/brick1

I haven't used system-storage-manager before, do you know if it takes care of 
properly tuning your storage stack(if you have a RAID that is)?  If you don't 
have a RAID its prolly not that big of a deal, if you do have a RAID we should 
make sure everything is aware of your stripe size and tune appropriately.

> 
> echo "/dev/mapper/vg10-brick1   /mnt/brick1 xfs defaults1   2" >>
> /etc/fstab
> mount -a && mount
> mkdir /mnt/brick1/gv0
> 
> gluster peer probe OTHER_SERVER_IP
> 
> gluster pool list
> gluster volume create gv0 replica 2 OWN_SERVER_IP:/mnt/brick1/gv0
> OTHER_SERVER_IP:/mnt/brick1/gv0
> gluster volume start gv0
> gluster volume info gv0
> gluster volume set gv0 network.ping-timeout "10"
> gluster volume info gv0
> 
> # mount as client for archiving cronjob, is already in fstab
> mount -a
> 
> # mount via fuse-client
> mkdir -p /mnt/glusterfs_client
> echo "localhost:/gv0  /mnt/glusterfs_client   glusterfs   
> defaults,_netdev0   0" >>
> /etc/fstab
> mount -a
> 
> 
> We untar multiple files (around 1300 tar files) each around 2,7GB in size.
> The tar files are not compressed.
> We untar the files with a shell script containing the following:
> 
> #! /bin/bash
>  for f 

Re: [Gluster-users] Manually delete .glusterfs/changelogs directory ?

2017-08-31 Thread mabi
Hi Everton,

Thanks for your tip regarding the "reset-sync-time". I understand now that I 
should have used this additional parameter in order to get rid of the CHANGELOG 
files.

I will now manually delete them from all bricks. Also I have noticed the 
following 3 geo-replication related volume parameters are still set on my 
volume:

changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on

I will also remove them manually.

Best,
M.

>  Original Message 
> Subject: Re: [Gluster-users] Manually delete .glusterfs/changelogs directory ?
> Local Time: August 31, 2017 8:56 AM
> UTC Time: August 31, 2017 6:56 AM
> From: broglia...@gmail.com
> To: mabi 
> Gluster Users 
>
> Hi Mabi,
> If you will not use that geo-replication volume session again, I believe it 
> is safe to manually delete the files in the brick directory using rm -rf.
>
> However, the gluster documentation specifies that if the session is to be 
> permanently deleted, this is the command to use:
> gluster volume geo-replication gv1 snode1::gv2 delete reset-sync-time
>
> https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Geo%20Replication/#deleting-the-session
>
> Regards,
> Everton Brogliatto
>
> On Thu, Aug 31, 2017 at 12:15 AM, mabi  wrote:
>
>> Hi, has anyone any advice to give about my question below? Thanks!
>>
>>>  Original Message 
>>> Subject: Manually delete .glusterfs/changelogs directory ?
>>> Local Time: August 16, 2017 5:59 PM
>>> UTC Time: August 16, 2017 3:59 PM
>>> From: m...@protonmail.ch
>>> To: Gluster Users 
>>>
>>> Hello,
>>>
>>> I just deleted (permanently) my geo-replication session using the following 
>>> command:
>>>
>>> gluster volume geo-replication myvolume gfs1geo.domain.tld::myvolume-geo 
>>> delete
>>>
>>> and noticed that the .glusterfs/changelogs on my volume still exists. Is it 
>>> safe to delete the whole directly myself with "rm -rf 
>>> .glusterfs/changelogs" ? As far as I understand the CHANGELOG.* files are 
>>> only needed for geo-replication, correct?
>>>
>>> Finally shouldn't the geo-replication delete command I used above delete 
>>> these files automatically for me?
>>>
>>> Regards,
>>> Mabi
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-31 Thread Atin Mukherjee
On Thu, Aug 24, 2017 at 11:19 AM, Serkan Çoban 
wrote:

> Here you can find 10 stack trace samples from glusterd. I wait 10
> seconds between each trace.
> https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
>
> Content of the first stack trace is here:
>
> Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
> #0  0x003aa5c0f00d in nanosleep () from /lib64/libpthread.so.0
> #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
> #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
> #1  0x0040643b in glusterfs_sigwaiter ()
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
> #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
> #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
> #2  0x00303f8528fb in pool_sweeper () from /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x7f7a8a54a700 (LWP 43073)):
> #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x00303f864afc in syncenv_task () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f8729f0 in syncenv_processor () from
> /usr/lib64/libglusterfs.so.0
> #3  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x7f7a886ac700 (LWP 43075)):
> #0  0x003aa5c0b68c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x7f7a898a099b in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #2  0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x7f7a87cab700 (LWP 43076)):
> #0  0x003aa5928692 in __strcmp_sse42 () from /lib64/libc.so.6
> #1  0x00303f82244a in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x00303f82433d in ?? () from /usr/lib64/libglusterfs.so.0
> #3  0x00303f8245f5 in dict_set () from /usr/lib64/libglusterfs.so.0
> #4  0x00303f82524c in dict_set_str () from /usr/lib64/libglusterfs.so.0
> #5  0x7f7a898da7fd in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #6  0x7f7a8981b0df in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #7  0x7f7a8981b47c in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #8  0x7f7a89831edf in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #9  0x7f7a897f28f7 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #10 0x7f7a897f0bb9 in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #11 0x7f7a8984c89a in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #12 0x7f7a898323ee in ?? () from
> /usr/lib64/glusterfs/3.10.5/xlator/mgmt/glusterd.so
> #13 0x00303f40fad5 in rpc_clnt_handle_reply () from
> /usr/lib64/libgfrpc.so.0
> #14 0x00303f410c85 in rpc_clnt_notify () from /usr/lib64/libgfrpc.so.0
> #15 0x00303f40bd68 in rpc_transport_notify () from
> /usr/lib64/libgfrpc.so.0
> #16 0x7f7a88a6fccd in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #17 0x7f7a88a70ffe in ?? () from
> /usr/lib64/glusterfs/3.10.5/rpc-transport/socket.so
> #18 0x00303f887806 in ?? () from /usr/lib64/libglusterfs.so.0
> #19 0x003aa5c07aa1 in start_thread () from /lib64/libpthread.so.0
> #20 0x003aa58e8bbd in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x7f7a93844740 (LWP 43068)):
> #0  0x003aa5c082fd in pthread_join () from /lib64/libpthread.so.0
> #1  0x00303f8872d5 in ?? () from /usr/lib64/libglusterfs.so.0
> #2  0x00409020 in main ()
>

FWIW, we need to figure out the respective function handlers from the
addresses dumped in thread 2 which would help us to figure out where the
glusterd process is stuck. I remember Milind has been working on a script
to have these addresses converted to the function names

@Milind - can you please help here in getting the function names dumped
from these addresses? Probably sharing the script with Serkan and letting
it run on the setup would be ideal?


> On Wed, Aug 23, 2017 at 8:46

Re: [Gluster-users] Glusterd proccess hangs on reboot

2017-08-31 Thread Serkan Çoban
Hi Gaurav,

Any improvement about the issue?

On Tue, Aug 29, 2017 at 1:57 PM, Serkan Çoban  wrote:
> glusterd returned to normal, here is the logs:
> https://www.dropbox.com/s/41jx2zn3uizvr53/80servers_glusterd_normal_status.zip?dl=0
>
>
> On Tue, Aug 29, 2017 at 1:47 PM, Serkan Çoban  wrote:
>> Here is the logs after stopping all three volumes and restarting
>> glusterd in all nodes. I waited 70 minutes after glusterd restart but
>> it is still consuming %100 CPU.
>> https://www.dropbox.com/s/pzl0f198v03twx3/80servers_after_glusterd_restart.zip?dl=0
>>
>>
>> On Tue, Aug 29, 2017 at 12:37 PM, Gaurav Yadav  wrote:
>>>
>>> I believe logs you have shared logs which consist of create volume followed
>>> by starting the volume.
>>> However, you have mentioned that when a node from 80 server cluster gets
>>> rebooted, glusterd process hangs.
>>>
>>> Could you please provide the logs which led glusterd to hang for all the
>>> cases along with gusterd process utilization.
>>>
>>>
>>> Thanks
>>> Gaurav
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Aug 29, 2017 at 2:44 PM, Serkan Çoban  wrote:

 Here is the requested logs:

 https://www.dropbox.com/s/vt187h0gtu5doip/gluster_logs_20_40_80_servers.zip?dl=0


 On Tue, Aug 29, 2017 at 7:48 AM, Gaurav Yadav  wrote:
 > Till now I haven't found anything significant.
 >
 > Can you send me gluster logs along with command-history-logs for these
 > scenarios:
 >  Scenario1 : 20 servers
 >  Scenario2 : 40 servers
 >  Scenario3:  80 Servers
 >
 >
 > Thanks
 > Gaurav
 >
 >
 >
 > On Mon, Aug 28, 2017 at 11:22 AM, Serkan Çoban 
 > wrote:
 >>
 >> Hi Gaurav,
 >> Any progress about the problem?
 >>
 >> On Thursday, August 24, 2017, Serkan Çoban 
 >> wrote:
 >>>
 >>> Thank you Gaurav,
 >>> Here is more findings:
 >>> Problem does not happen using only 20 servers each has 68 bricks.
 >>> (peer probe only 20 servers)
 >>> If we use 40 servers with single volume, glusterd cpu %100 state
 >>> continues for 5 minutes and it goes to normal state.
 >>> with 80 servers we have no working state yet...
 >>>
 >>> On Thu, Aug 24, 2017 at 1:33 PM, Gaurav Yadav 
 >>> wrote:
 >>> >
 >>> > I am working on it and will share my findings as soon as possible.
 >>> >
 >>> >
 >>> > Thanks
 >>> > Gaurav
 >>> >
 >>> > On Thu, Aug 24, 2017 at 3:58 PM, Serkan Çoban
 >>> > 
 >>> > wrote:
 >>> >>
 >>> >> Restarting glusterd causes the same thing. I tried with 3.12.rc0,
 >>> >> 3.10.5. 3.8.15, 3.7.20 all same behavior.
 >>> >> My OS is centos 6.9, I tried with centos 6.8 problem remains...
 >>> >> Only way to a healthy state is destroy gluster config/rpms,
 >>> >> reinstall
 >>> >> and recreate volumes.
 >>> >>
 >>> >> On Thu, Aug 24, 2017 at 8:49 AM, Serkan Çoban
 >>> >> 
 >>> >> wrote:
 >>> >> > Here you can find 10 stack trace samples from glusterd. I wait 10
 >>> >> > seconds between each trace.
 >>> >> >
 >>> >> > https://www.dropbox.com/s/9f36goq5xn3p1yt/glusterd_pstack.zip?dl=0
 >>> >> >
 >>> >> > Content of the first stack trace is here:
 >>> >> >
 >>> >> > Thread 8 (Thread 0x7f7a8cd4e700 (LWP 43069)):
 >>> >> > #0  0x003aa5c0f00d in nanosleep () from
 >>> >> > /lib64/libpthread.so.0
 >>> >> > #1  0x00303f837d57 in ?? () from /usr/lib64/libglusterfs.so.0
 >>> >> > #2  0x003aa5c07aa1 in start_thread () from
 >>> >> > /lib64/libpthread.so.0
 >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
 >>> >> > Thread 7 (Thread 0x7f7a8c34d700 (LWP 43070)):
 >>> >> > #0  0x003aa5c0f585 in sigwait () from /lib64/libpthread.so.0
 >>> >> > #1  0x0040643b in glusterfs_sigwaiter ()
 >>> >> > #2  0x003aa5c07aa1 in start_thread () from
 >>> >> > /lib64/libpthread.so.0
 >>> >> > #3  0x003aa58e8bbd in clone () from /lib64/libc.so.6
 >>> >> > Thread 6 (Thread 0x7f7a8b94c700 (LWP 43071)):
 >>> >> > #0  0x003aa58acc4d in nanosleep () from /lib64/libc.so.6
 >>> >> > #1  0x003aa58acac0 in sleep () from /lib64/libc.so.6
 >>> >> > #2  0x00303f8528fb in pool_sweeper () from
 >>> >> > /usr/lib64/libglusterfs.so.0
 >>> >> > #3  0x003aa5c07aa1 in start_thread () from
 >>> >> > /lib64/libpthread.so.0
 >>> >> > #4  0x003aa58e8bbd in clone () from /lib64/libc.so.6
 >>> >> > Thread 5 (Thread 0x7f7a8af4b700 (LWP 43072)):
 >>> >> > #0  0x003aa5c0ba5e in pthread_cond_timedwait@@GLIBC_2.3.2 ()
 >>> >> > from
 >>> >> > /lib64/libpthread.so.0
 >>> >> > #1  0x00303f864afc in syncenv_task () from
 >>> >> > /usr/lib64/libglusterfs.so.0
 >>> >> > #2  0x00303f8729f0 in syncenv_processor () from
 >>> >> > /usr/lib64/libglusterfs.so.0
 >>> >> > #3  0x003aa5c07aa1 in start_thread () from
 >>> >>

Re: [Gluster-users] Gluster status fails

2017-08-31 Thread Atin Mukherjee
Thank you for the acknowledgement.

On Thu, Aug 31, 2017 at 8:30 PM, mohammad kashif 
wrote:

> Hi Atin
>
> Thanks, I was not running any script or gluster command. But now gluster
> status command started working. CPU usage also came down and looking at the
> ganglia graph, cpu usage is strongly correlated with network activity.
>
> It may be due to high load that status command failed. I will keep an eye
> and see whether it happens again.
>
> Cheers
>
> Kashif
>
> On Thu, Aug 31, 2017 at 2:40 AM, Atin Mukherjee 
> wrote:
>
>>
>> On Wed, 30 Aug 2017 at 20:55, mohammad kashif 
>> wrote:
>>
>>> Hi
>>>
>>> I am running a 400TB  five node purely distributed gluster setup. I am
>>> troubleshooting an issue where some times files creation fails. I found
>>> that volume status is not working
>>>
>>> gluster volume status
>>> Another transaction is in progress for atlasglust. Please try again
>>> after sometime.
>>>
>>> When I tried from other node then it seems two nodes have Locking issue
>>>
>>> gluster volume status
>>> Locking failed on pplxgluster01... Please check log file for details.
>>> Locking failed on pplxgluster04... Please check log file for details.
>>>
>>
>> This suggests that there are concurrent gluster cli operations been
>> performed on the same volume. Are you monitoring the cluster through nagios
>> or you have a script on all the nodes which checks for volume's health in a
>> period of interval? Please note glusterd will process one cli operation on
>> a volume at one time, the rest all transactions on the same volume will be
>> failed.
>>
>>
>>
>>>
>>> Also noticed that glusterfsd process is using around 1000% cpu usage.
>>> It is a decent server with 16 core and 64GB RAM.
>>>
>>> Gluster version is 3.11.2-1
>>>
>>> Can you please suggest that how to troubleshoot further?
>>>
>>> Thanks
>>>
>>> Kashif
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> --
>> - Atin (atinm)
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] single brick logging errors endlessly

2017-08-31 Thread Neil Caldwell
Hey gluster experts,

We have a 20 physical server, replicate level 2, 40 brick cluster, the
first brick is showing errors such as attached paste. It's around a 1PB
system which is nearly full.

https://paste.ee/p/Dqdde

This seems to be a file too long error, as the link is going
../folder/../folder/../folder/../folder/ around 30+ times. Any suggestions
as to why this has occurred and what we can do to prevent it from
happening?

It seems to be different folders at the end.

Also, our .glusterfs folders seem to have many broken links, is this
expected?

Many thanks, Neil
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Unable to use Heketi setup to install Gluster for Kubernetes

2017-08-31 Thread Gaurav Chhabra
Hi,


I have the following setup in place:

1 node: RancherOS having Rancher application for Kubernetes setup
2 nodes  : RancherOS having Rancher agent
1 node   : CentOS 7 workstation having kubectl installed and folder
cloned/downloaded from https://github.com/gluster/gluster-kubernetes using
which i run Heketi setup (gk-deploy -g)

I also have rancher-glusterfs-server container running with the following
configuration:
--
[root@node-1 rancher]# cat gluster-server.sh
#!/bin/bash

sudo docker run --name=gluster-server -d \
--env 'SERVICE_NAME=gluster' \
--restart always \
--env 'GLUSTER_DATA=/srv/docker/gitlab' \
--publish :22 \
webcenter/rancher-glusterfs-server
--

In /etc/heketi/heketi.json, following is the only modified portion:
--
"executor": "ssh",

"_sshexec_comment": "SSH username and private key file information",
"sshexec": {
  "keyfile": "/var/lib/heketi/.ssh/id_rsa",
  "user": "root",
  "port": "22",
  "fstab": "/etc/fstab"
},
--

Status before running gk-deploy:

[root@workstation deploy]# kubectl get nodes,pods,services,deployments
NAME STATUSAGE   VERSION
no/node-1.c.kubernetes-174104.internal   Ready 2dv1.7.2-rancher1
no/node-2.c.kubernetes-174104.internal   Ready 2dv1.7.2-rancher1
no/node-3.c.kubernetes-174104.internal   Ready 2dv1.7.2-rancher1

NAME CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
svc/kubernetes   10.43.0.1443/TCP   2d


Now when i run 'gk-deploy -g', in the Rancher console, i see the following
error:
Readiness probe failed: Failed to get D-Bus connection: Operation not
permitted

>From the attached gk-deploy_log i see that it failed at:
Waiting for GlusterFS pods to start ... pods not found.

In the kube-templates/glusterfs-daemonset.yaml file, i see this for
Readiness probe section:
--
readinessProbe:
  timeoutSeconds: 3
  initialDelaySeconds: 40
  exec:
command:
- "/bin/bash"
- "-c"
- systemctl status glusterd.service
  periodSeconds: 25
  successThreshold: 1
  failureThreshold: 15
--


Status after running gk-deploy:

[root@workstation deploy]# kubectl get nodes,pods,deployments,services
NAME STATUSAGE   VERSION
no/node-1.c.kubernetes-174104.internal   Ready 2dv1.7.2-rancher1
no/node-2.c.kubernetes-174104.internal   Ready 2dv1.7.2-rancher1
no/node-3.c.kubernetes-174104.internal   Ready 2dv1.7.2-rancher1

NAME READY STATUSRESTARTS   AGE
po/glusterfs-0s440   0/1   Running   0  1m
po/glusterfs-j7dgr   0/1   Running   0  1m
po/glusterfs-p6jl3   0/1   Running   0  1m

NAME CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
svc/kubernetes   10.43.0.1443/TCP   2d


Also, from prerequisite perspective, i was also seeing this mentioned:

The following kernel modules must be loaded:
 * dm_snapshot
 * dm_mirror
 * dm_thin_pool

Where exactly is this to be checked? On all Gluster server nodes? How can i
check whether it's there?

I have attached topology.json and gk-deploy log for reference.

Does this issue has anything to do with the host OS (RancherOS) that i am
using for Gluster nodes? Any idea how i can fix this? Any help will really
be appreciated.


​Thanks.
[root@workstation deploy]# ./gk-deploy -g
Welcome to the deployment tool for GlusterFS on Kubernetes and OpenShift.

Before getting started, this script has some requirements of the execution
environment and of the container platform that you should verify.

The client machine that will run this script must have:
 * Administrative access to an existing Kubernetes or OpenShift cluster
 * Access to a python interpreter 'python'

Each of the nodes that will host GlusterFS must also have appropriate firewall
rules for the required GlusterFS ports:
 *   - sshd (if running GlusterFS in a pod)
 * 24007 - GlusterFS Management
 * 24008 - GlusterFS RDMA
 * 49152 to 49251 - Each brick for every volume on the host requires its own
   port. For every new brick, one new port will be used starting at 49152. We
   recommend a default range of 49152-49251 on each host, though you can adjust
   this to fit your needs.

The following kernel modules must be loaded:
 * dm_snapshot
 * dm_mirror
 * dm_thin_pool

For systems with SELinux, the following settings need to be considered:
 * virt_sandbox_use_fusefs should be enabled on each

Re: [Gluster-users] error msg in the glustershd.log

2017-08-31 Thread Ashish Pandey

BTW, I think it should be in 3.10.1 also. 
We have back ported to 3.10.1 too. 
If possible upgrade to 3.11.0 and see if you are seeing this messages or not. 

- Original Message -

From: "Ashish Pandey"  
To: "Amudhan P"  
Cc: "Gluster Users"  
Sent: Thursday, August 31, 2017 1:12:02 PM 
Subject: Re: [Gluster-users] error msg in the glustershd.log 


Based on this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1414287 
it has been fixed in glusterfs-3.11.0 

--- 
Ashish 


- Original Message -

From: "Amudhan P"  
To: "Ashish Pandey"  
Cc: "Gluster Users"  
Sent: Thursday, August 31, 2017 1:07:16 PM 
Subject: Re: [Gluster-users] error msg in the glustershd.log 

Ashish, which version has this issue fixed? 

On Tue, Aug 29, 2017 at 6:38 PM, Amudhan P < amudha...@gmail.com > wrote: 



I am using 3.10.1 from which version this update is available. 


On Tue, Aug 29, 2017 at 5:03 PM, Ashish Pandey < aspan...@redhat.com > wrote: 




Whenever we do some fop on EC volume on a file, we check the xattr also to see 
if the file is healthy or not. If not, we trigger heal. 
lookup is the fop for which we don't take inodelk lock so it is possible that 
the xattr which we get for lookup fop are different for some bricks. 
This difference is not reliable but still we are triggering heal and that is 
why you are seeing these messages. 

We have fixed it in latest release https://review.gluster.org/16468 
Now, we check if fop actually needs heal or not. 

 
Ashish 



From: "Amudhan P" < amudha...@gmail.com > 
To: "Gluster Users" < gluster-users@gluster.org > 
Sent: Tuesday, August 29, 2017 4:47:10 PM 
Subject: [Gluster-users] error msg in the glustershd.log 


Hi , 

I need some clarification for below error msg in the glustershd.log file. What 
is this msg? Why is this showing up?. currently using glusterfs 3.10.1 

when ever I start write process to volume (volume mounted thru fuse) I am 
seeing this kind of error and glustershd process consumes some percentage of 
cpu until write process completes. 

[2017-08-28 10:01:13.030710] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769963520-1769947136, mode: 100755-100755) 
[2017-08-28 10:01:13.030752] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 
[2017-08-28 10:01:13.031127] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769947136-1769963520, mode: 100755-100755) 
The message "N [MSGID: 122029] [ec-generic.c:684:ec_combine_lookup] 
0-glustervol-disperse-109: Mismatching iatt in answers of 'GF_FOP_LOOKUP'" 
repeated 3 times between [2017-08-28 10:01:13.030752] and [2017-08-28 
10:01:13.031215] 
[2017-08-28 10:01:13.032033] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769996288-1769979904, mode: 100755-100755) 
[2017-08-28 10:01:13.032425] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 
[2017-08-28 10:01:13.032746] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769996288-1769979904, mode: 100755-100755) 
[2017-08-28 10:01:13.032797] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 
[2017-08-28 10:01:13.032983] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769996288-1769979904, mode: 100755-100755) 
[2017-08-28 10:01:13.033047] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 
[2017-08-28 10:01:13.033099] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769979904-1769996288, mode: 100755-100755) 
[2017-08-28 10:01:13.033176] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 


[2017-08-29 11:02:37.054929] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-116: Failed to combine 
iatt (inode: 11207221728611356412-11207221728611356412, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, siz

Re: [Gluster-users] error msg in the glustershd.log

2017-08-31 Thread Ashish Pandey

Based on this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1414287 
it has been fixed in glusterfs-3.11.0 

--- 
Ashish 


- Original Message -

From: "Amudhan P"  
To: "Ashish Pandey"  
Cc: "Gluster Users"  
Sent: Thursday, August 31, 2017 1:07:16 PM 
Subject: Re: [Gluster-users] error msg in the glustershd.log 

Ashish, which version has this issue fixed? 

On Tue, Aug 29, 2017 at 6:38 PM, Amudhan P < amudha...@gmail.com > wrote: 



I am using 3.10.1 from which version this update is available. 


On Tue, Aug 29, 2017 at 5:03 PM, Ashish Pandey < aspan...@redhat.com > wrote: 




Whenever we do some fop on EC volume on a file, we check the xattr also to see 
if the file is healthy or not. If not, we trigger heal. 
lookup is the fop for which we don't take inodelk lock so it is possible that 
the xattr which we get for lookup fop are different for some bricks. 
This difference is not reliable but still we are triggering heal and that is 
why you are seeing these messages. 

We have fixed it in latest release https://review.gluster.org/16468 
Now, we check if fop actually needs heal or not. 

 
Ashish 



From: "Amudhan P" < amudha...@gmail.com > 
To: "Gluster Users" < gluster-users@gluster.org > 
Sent: Tuesday, August 29, 2017 4:47:10 PM 
Subject: [Gluster-users] error msg in the glustershd.log 


Hi , 

I need some clarification for below error msg in the glustershd.log file. What 
is this msg? Why is this showing up?. currently using glusterfs 3.10.1 

when ever I start write process to volume (volume mounted thru fuse) I am 
seeing this kind of error and glustershd process consumes some percentage of 
cpu until write process completes. 

[2017-08-28 10:01:13.030710] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769963520-1769947136, mode: 100755-100755) 
[2017-08-28 10:01:13.030752] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 
[2017-08-28 10:01:13.031127] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769947136-1769963520, mode: 100755-100755) 
The message "N [MSGID: 122029] [ec-generic.c:684:ec_combine_lookup] 
0-glustervol-disperse-109: Mismatching iatt in answers of 'GF_FOP_LOOKUP'" 
repeated 3 times between [2017-08-28 10:01:13.030752] and [2017-08-28 
10:01:13.031215] 
[2017-08-28 10:01:13.032033] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769996288-1769979904, mode: 100755-100755) 
[2017-08-28 10:01:13.032425] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 
[2017-08-28 10:01:13.032746] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769996288-1769979904, mode: 100755-100755) 
[2017-08-28 10:01:13.032797] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 
[2017-08-28 10:01:13.032983] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769996288-1769979904, mode: 100755-100755) 
[2017-08-28 10:01:13.033047] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 
[2017-08-28 10:01:13.033099] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to combine 
iatt (inode: 11548094941524765708-11548094941524765708, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 1769979904-1769996288, mode: 100755-100755) 
[2017-08-28 10:01:13.033176] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 


[2017-08-29 11:02:37.054929] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-116: Failed to combine 
iatt (inode: 11207221728611356412-11207221728611356412, links: 1-1, uid: 0-0, 
gid: 0-0, rdev: 0-0, size: 314900480-314867712, mode: 100755-100755) 
[2017-08-29 11:02:37.054981] N [MSGID: 122029] 
[ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-116: Mismatching 
iatt in answers of 'GF_FOP_LOOKUP' 
[2017-08-29 11:02:37.055014] W [MSGID: 122006] 
[ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-116: Failed to combine 
iatt (inode: 112072217286

Re: [Gluster-users] error msg in the glustershd.log

2017-08-31 Thread Amudhan P
Ashish, which version has this issue fixed?

On Tue, Aug 29, 2017 at 6:38 PM, Amudhan P  wrote:

> I am using 3.10.1 from which version this update is available.
>
>
> On Tue, Aug 29, 2017 at 5:03 PM, Ashish Pandey 
> wrote:
>
>>
>> Whenever we do some fop on EC volume on a file, we check the xattr also
>> to see if the file is healthy or not. If not, we trigger heal.
>> lookup is the fop for which we don't take inodelk lock so it is possible
>> that the xattr which we get for lookup fop are different for some bricks.
>> This difference is not reliable but still we are triggering heal and that
>> is why you are seeing these messages.
>>
>> We have fixed it in latest release https://review.gluster.org/16468
>> Now, we check if fop actually needs heal or not.
>>
>> 
>> Ashish
>>
>>
>>
>> *From: *"Amudhan P" 
>> *To: *"Gluster Users" 
>> *Sent: *Tuesday, August 29, 2017 4:47:10 PM
>> *Subject: *[Gluster-users] error msg in the glustershd.log
>>
>>
>> Hi ,
>>
>> I need some clarification for below error msg in the glustershd.log file.
>> What is this msg? Why is this showing up?. currently using glusterfs 3.10.1
>>
>> when ever I start write process to volume (volume mounted thru fuse) I am
>> seeing this kind of error and glustershd process consumes some percentage
>> of cpu until write process completes.
>>
>> [2017-08-28 10:01:13.030710] W [MSGID: 122006]
>> [ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to
>> combine iatt (inode: 11548094941524765708-11548094941524765708, links:
>> 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 1769963520-1769947136, mode:
>> 100755-100755)
>> [2017-08-28 10:01:13.030752] N [MSGID: 122029]
>> [ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109:
>> Mismatching iatt in answers of 'GF_FOP_LOOKUP'
>> [2017-08-28 10:01:13.031127] W [MSGID: 122006]
>> [ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to
>> combine iatt (inode: 11548094941524765708-11548094941524765708, links:
>> 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 1769947136-1769963520, mode:
>> 100755-100755)
>> The message "N [MSGID: 122029] [ec-generic.c:684:ec_combine_lookup]
>> 0-glustervol-disperse-109: Mismatching iatt in answers of 'GF_FOP_LOOKUP'"
>> repeated 3 times between [2017-08-28 10:01:13.030752] and [2017-08-28
>> 10:01:13.031215]
>> [2017-08-28 10:01:13.032033] W [MSGID: 122006]
>> [ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to
>> combine iatt (inode: 11548094941524765708-11548094941524765708, links:
>> 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 1769996288-1769979904, mode:
>> 100755-100755)
>> [2017-08-28 10:01:13.032425] N [MSGID: 122029]
>> [ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109:
>> Mismatching iatt in answers of 'GF_FOP_LOOKUP'
>> [2017-08-28 10:01:13.032746] W [MSGID: 122006]
>> [ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to
>> combine iatt (inode: 11548094941524765708-11548094941524765708, links:
>> 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 1769996288-1769979904, mode:
>> 100755-100755)
>> [2017-08-28 10:01:13.032797] N [MSGID: 122029]
>> [ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109:
>> Mismatching iatt in answers of 'GF_FOP_LOOKUP'
>> [2017-08-28 10:01:13.032983] W [MSGID: 122006]
>> [ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to
>> combine iatt (inode: 11548094941524765708-11548094941524765708, links:
>> 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 1769996288-1769979904, mode:
>> 100755-100755)
>> [2017-08-28 10:01:13.033047] N [MSGID: 122029]
>> [ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109:
>> Mismatching iatt in answers of 'GF_FOP_LOOKUP'
>> [2017-08-28 10:01:13.033099] W [MSGID: 122006]
>> [ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-109: Failed to
>> combine iatt (inode: 11548094941524765708-11548094941524765708, links:
>> 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 1769979904-1769996288, mode:
>> 100755-100755)
>> [2017-08-28 10:01:13.033176] N [MSGID: 122029]
>> [ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-109:
>> Mismatching iatt in answers of 'GF_FOP_LOOKUP'
>>
>>
>> [2017-08-29 11:02:37.054929] W [MSGID: 122006]
>> [ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-116: Failed to
>> combine iatt (inode: 11207221728611356412-11207221728611356412, links:
>> 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 314900480-314867712, mode:
>> 100755-100755)
>> [2017-08-29 11:02:37.054981] N [MSGID: 122029]
>> [ec-generic.c:684:ec_combine_lookup] 0-glustervol-disperse-116:
>> Mismatching iatt in answers of 'GF_FOP_LOOKUP'
>> [2017-08-29 11:02:37.055014] W [MSGID: 122006]
>> [ec-combine.c:191:ec_iatt_combine] 0-glustervol-disperse-116: Failed to
>> combine iatt (inode: 11207221728611356412-11207221728611356412, links:
>> 1-1, uid: 0-0, gid: 0-0, rdev: 0-0, size: 314900480-314867712, mode:
>> 100755-100755)
>> [2017-08-29 11:02:37.055072] N [MSGID: 122029]
>> [ec-generic.c:684:ec_combine_lookup] 0-gl

Re: [Gluster-users] Manually delete .glusterfs/changelogs directory ?

2017-08-31 Thread Everton Brogliatto
The RedHat documentation has a good process on how to clean a unusable
brick:

5.4.4. Cleaning An Unusable Brick
If the file system associated with the brick cannot be reformatted, and the
brick directory cannot be removed,
perform the following steps:
1 Delete all previously existing data in the brick, including the
.glusterfs subdirectory.

2 Run
 # setfattr -x trusted.glusterfs.volume-id brick
  and
 # setfattr -x trusted.gfid brick
  to remove the attributes from the root of the brick.

3 Run
 # getfattr -d -m . brick
  to examine the attributes set on the volume. Take note of the attributes.

4 Run
 # setfattr -x attribute brick
  to remove the attributes relating to the glusterFS file system.
  The trusted.glusterfs.dht attribute for a distributed volume is one such
example of attributes that need to be removed.


https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/pdf/administration_guide/Red_Hat_Gluster_Storage-3.2-Administration_Guide-en-US.pdf#%23Formatting_and_Mounting_Bricks-Reusing_a_brick

Regards,
Everton Brogliatto

On Thu, Aug 31, 2017 at 2:56 PM, Everton Brogliatto 
wrote:

> Hi Mabi,
>
> If you will not use that geo-replication volume session again, I believe
> it is safe to manually delete the files in the brick directory using rm -rf.
>
> However, the gluster documentation specifies that if the session is to be
> permanently deleted, this is the command to use:
> gluster volume geo-replication gv1 snode1::gv2 delete reset-sync-time
>
> https://gluster.readthedocs.io/en/latest/Administrator%
> 20Guide/Geo%20Replication/#deleting-the-session
>
> Regards,
> Everton Brogliatto
>
>
>
>
>
>
> On Thu, Aug 31, 2017 at 12:15 AM, mabi  wrote:
>
>> Hi, has anyone any advice to give about my question below? Thanks!
>>
>>
>>
>>  Original Message 
>> Subject: Manually delete .glusterfs/changelogs directory ?
>> Local Time: August 16, 2017 5:59 PM
>> UTC Time: August 16, 2017 3:59 PM
>> From: m...@protonmail.ch
>> To: Gluster Users 
>>
>> Hello,
>>
>> I just deleted (permanently) my geo-replication session using the
>> following command:
>>
>> gluster volume geo-replication myvolume gfs1geo.domain.tld::myvolume-geo
>> delete
>>
>>
>>
>> and noticed that the .glusterfs/changelogs on my volume still exists. Is
>> it safe to delete the whole directly myself with "rm -rf
>> .glusterfs/changelogs" ? As far as I understand the CHANGELOG.* files are
>> only needed for geo-replication, correct?
>>
>> Finally shouldn't the geo-replication delete command I used above delete
>> these files automatically for me?
>>
>> Regards,
>> Mabi
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users