Re: [Gluster-users] [Gluster-devel] Network Block device (NBD) on top of glusterfs

2019-03-21 Thread Prasanna Kalever
On Thu, Mar 21, 2019 at 6:31 PM Xiubo Li  wrote:

> On 2019/3/21 18:09, Prasanna Kalever wrote:
>
>
>
> On Thu, Mar 21, 2019 at 9:00 AM Xiubo Li  wrote:
>
>> All,
>>
>> I am one of the contributor for gluster-block
>> [1] project, and also I
>> contribute to linux kernel and open-iscsi 
>> project.[2]
>>
>> NBD was around for some time, but in recent time, linux kernel’s Network
>> Block Device (NBD) is enhanced and made to work with more devices and also
>> the option to integrate with netlink is added. So, I tried to provide a
>> glusterfs client based NBD driver recently. Please refer github issue
>> #633 [3], and good news
>> is I have a working code, with most basic things @ nbd-runner project
>> [4].
>>
>> While this email is about announcing the project, and asking for more
>> collaboration, I would also like to discuss more about the placement of the
>> project itself. Currently nbd-runner project is expected to be shared by
>> our friends at Ceph project too, to provide NBD driver for Ceph. I have
>> personally worked with some of them closely while contributing to
>> open-iSCSI project, and we would like to take this project to great success.
>>
>> Now few questions:
>>
>>1. Can I continue to use http://github.com/gluster/nbd-runner as home
>>for this project, even if its shared by other filesystem projects?
>>
>>
>>- I personally am fine with this.
>>
>>
>>1. Should there be a separate organization for this repo?
>>
>>
>>- While it may make sense in future, for now, I am not planning to
>>start any new thing?
>>
>> It would be great if we have some consensus on this soon as nbd-runner is
>> a new repository. If there are no concerns, I will continue to contribute
>> to the existing repository.
>>
>
> Thanks Xiubo Li, for finally sending this email out. Since this email is
> out on gluster mailing list, I would like to take a stand from gluster
> community point of view *only* and share my views.
>
> My honest answer is "If we want to maintain this within gluster org, then
> 80% of the effort is common/duplicate of what we did all these days with
> gluster-block",
>
> The great idea came from Mike Christie days ago and the nbd-runner
> project's framework is initially emulated from tcmu-runner. This is why I
> name this project as nbd-runner, which will work for all the other
> Distributed Storages, such as Gluster/Ceph/Azure, as discussed with Mike
> before.
>
> nbd-runner(NBD proto) and tcmu-runner(iSCSI proto) are almost the same and
> both are working as lower IO(READ/WRITE/...) stuff, not the management
> layer like ceph-iscsi-gateway and gluster-block currently do.
>
> Currently since I only implemented the Gluster handler and also using the
> RPC like glusterfs and gluster-block, most of the other code (about 70%) in
> nbd-runner are for the NBD proto and these are very different from
> tcmu-runner/glusterfs/gluster-block projects, and there are many new
> features in NBD module that not yet supported and then there will be more
> different in future.
>
> The framework coding has been done and the nbd-runner project is already
> stable and could already work well for me now.
>
> like:
> * rpc/socket code
> * cli/daemon parser/helper logics
> * gfapi util functions
> * logger framework
> * inotify & dyn-config threads
>
> Yeah, these features were initially from tcmu-runner project, Mike and I
> coded two years ago. Currently nbd-runner also has copied them from
> tcmu-runner.
>

I don't think tcmu-runner has any of,

-> cli/daemon approach routines
-> rpc low-level clnt/svc routines
-> gfapi level file create/delete util functions
-> Json parser support
-> socket bound/listener related functionalities
-> autoMake build frame-work, and
-> many other maintenance files

I actually can go in detail and furnish a long list of reference made here
and you cannot deny the fact, but its **all okay** to take references from
other alike projects. But my intention was not to point about the copy made
here, but rather saying we are just wasting our efforts rewriting,
copy-pasting, maintaining and fixing the same functionality framework.

Again all I'm trying to make is, if at all you want to maintain nbd client
as part of gluster.org, why not use gluster-block itself ? which is well
tested and stable enough.

Apart from all the examples I have mentioned in my previous thread, there
are other great advantages from user perspective as-well, like:

* The top layers such as heketi consuming gluster's block storage really
don't have to care whether the backend provider is tcmu-runner or
nbd-runner or qemu-tcmu or kernel loopback or fileIO or something else ...
They simply call gluster-block and get a block device out there.

* We can reuse the existing gluster-block's rest api interface too.


** Believe me, over the years I 

Re: [Gluster-users] "rpc_clnt_ping_timer_expired" errors

2019-03-21 Thread Mauro Tridici
Do you think that I made some mistake during “top -bHd 3 > top_bHd3.txt” 
command execution? (I executed top command and I interrupted it after some 
seconds, maybe 10 seconds)
Or do you means that there is something wrong on gluster services?

Thank you,
Mauro

> On 21 Mar 2019, at 11:48, Raghavendra Gowdappa  wrote:
> 
> 
> 
> On Thu, Mar 21, 2019 at 4:10 PM Mauro Tridici  > wrote:
> Hi Raghavendra,
> 
> the number of errors reduced, but during last days I received some error 
> notifications from Nagios server similar to the following one:
> 
> * Nagios *
> 
> Notification Type: PROBLEM
> 
> Service: Brick - /gluster/mnt5/brick
> Host: s04
> Address: s04-stg
> State: CRITICAL
> 
> Date/Time: Mon Mar 18 19:56:36 CET 2019
> 
> Additional Info:
> 
> CHECK_NRPE STATE CRITICAL: Socket timeout after 10 seconds.
> 
> The error was related only to s04 gluster server.
> 
> So, following your suggestions,  I executed, on s04 node, the top command.
> In attachment, you can find the related output.
> 
> top output doesn't contain cmd/thread names. Was there anything wrong.
> 
> 
> Thank you very much for your help.
> Regards,
> Mauro
> 
> 
> 
>> On 14 Mar 2019, at 13:31, Raghavendra Gowdappa > > wrote:
>> 
>> Thanks Mauro.
>> 
>> On Thu, Mar 14, 2019 at 3:38 PM Mauro Tridici > > wrote:
>> Hi Raghavendra,
>> 
>> I just changed the client option value to 8.
>> I will check the volume behaviour during the next hours.
>> 
>> The GlusterFS version is 3.12.14.
>> 
>> I will provide you the logs as soon as the activity load will be high.
>> Thank you,
>> Mauro
>> 
>>> On 14 Mar 2019, at 04:57, Raghavendra Gowdappa >> > wrote:
>>> 
>>> 
>>> 
>>> On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici >> > wrote:
>>> Hi Raghavendra,
>>> 
>>> Yes, server.event-thread has been changed from 4 to 8.
>>> 
>>> Was client.event-thread value too changed to 8? If not, I would like to 
>>> know the results of including this tuning too. Also, if possible, can you 
>>> get the output of following command from problematic clients and bricks 
>>> (during the duration when load tends to be high and ping-timer-expiry is 
>>> seen)?
>>> 
>>> # top -bHd 3
>>>  
>>> This will help us to know  CPU utilization of event-threads.
>>> 
>>> And I forgot to ask, what version of Glusterfs are you using?
>>> 
>>> During last days, I noticed that the error events are still here although 
>>> they have been considerably reduced.
>>> 
>>> So, I used grep command against the log files in order to provide you a 
>>> global vision about the warning, error and critical events appeared today 
>>> at 06:xx (may be useful I hope).
>>> I collected the info from s06 gluster server, but the behaviour is the the 
>>> almost the same on the other gluster servers.
>>> 
>>> ERRORS:  
>>> CWD: /var/log/glusterfs 
>>> COMMAND: grep " E " *.log |grep "2019-03-13 06:"
>>> 
>>> (I can see a lot of this kind of message in the same period but I'm 
>>> notifying you only one record for each type of error)
>>> 
>>> glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042] 
>>> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of 
>>> /var/run/gluster/tier2_quota_list/
>>> 
>>> glustershd.log:[2019-03-13 06:14:28.666562] E 
>>> [rpc-clnt.c:350:saved_frames_unwind] (--> 
>>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (--> 
>>> /lib64/libgfr
>>> pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (--> 
>>> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (--> 
>>> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup
>>> +0x90)[0x7f4a71ba3640] (--> 
>>> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] ) 
>>> 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3) 
>>> op(INODELK(29)) 
>>> called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50) 
>>> 
>>> glustershd.log:[2019-03-13 06:17:48.883825] E 
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 
>>> 192.168.0.55:49158  failed (Connection timed 
>>> out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:19:58.931798] E 
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 
>>> 192.168.0.55:49158  failed (Connection timed 
>>> out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:22:08.979829] E 
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to 
>>> 192.168.0.55:49158  failed (Connection timed 
>>> out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031] 
>>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote 
>>> operation failed [Transport endpoint 
>>> is not connected]
>>> glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031] 
>>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 

Re: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain

2019-03-21 Thread Karthik Subrahmanya
Hey Milos,

I see that gfid got healed for those directories from the getfattr output
and the glfsheal log also has messages corresponding to deleting the
entries on one brick as part of healing which then got recreated on the
brick with the correct gfid. Can you run the "guster volume heal "
& "gluster volume heal  info" command and paste the output here?
If you still see entries pending heal, give the latest glustershd.log files
from both the nodes along with the getfattr output of the files which are
listed in the heal info output.

Regards,
Karthik

On Thu, Mar 21, 2019 at 6:03 PM Milos Cuculovic  wrote:

> Sure:
>
> brick1:
> 
> 
> sudo getfattr -d -m . -e hex
> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
> getfattr: Removing leading '/' from absolute path names
> # file:
> data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
> trusted.gfid=0xe358ff34504241d387efe1e76eb28bb0
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.dht.mds=0x
>
> sudo getfattr -d -m . -e hex
> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617
> getfattr: Removing leading '/' from absolute path names
> # file:
> data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617
> trusted.gfid=0x33585a577c4a4c55b39b9abc07eacff1
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.dht.mds=0x
>
> sudo getfattr -d -m . -e hex
> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf
> getfattr: Removing leading '/' from absolute path names
> # file:
> data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf
> trusted.gfid=0x99cdfe0773d74d7ab29592ce40cf47f3
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.dht.mds=0x
>
> sudo getfattr -d -m . -e hex
> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b
> getfattr: Removing leading '/' from absolute path names
> # file:
> data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b
> trusted.gfid=0x8a96a80ddf8a4163ac00bb12369389df
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.dht.mds=0x
>
> sudo getfattr -d -m . -e hex
> /data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b
> getfattr: Removing leading '/' from absolute path names
> # file:
> data/data-cluster/dms/final_archive/e5934699809a3b6dcfc5945f408b978b
> trusted.gfid=0x04d087a96cc6421ca7bc356fc20d7753
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.dht.mds=0x
>
> sudo getfattr -d -m . -e hex
> /data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e
> getfattr: Removing leading '/' from absolute path names
> # file:
> data/data-cluster/dms/final_archive/e7cdc94f60d390812a5f9754885e119e
> trusted.gfid=0xdee76d06204d4bd9b7652eb7a79be1a5
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.dht.mds=0x
> 
> sudo stat
> /data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
>   File:
> '/data/data-cluster/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59'
>   Size: 33Blocks: 0  IO Block: 4096   directory
> Device: 807h/2055d Inode: 40809094709  Links: 3
> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2019-03-20 11:06:26.994047597 +0100
> Modify: 2019-03-20 11:28:28.294689870 +0100
> Change: 2019-03-21 13:01:03.077654239 +0100
>  Birth: -
>
> sudo stat
> /data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617
>   File:
> '/data/data-cluster/dms/final_archive/5543982fab4b56060aa09f667a8ae617'
>   Size: 33Blocks: 0  IO Block: 4096   directory
> Device: 807h/2055d Inode: 49399908865  Links: 3
> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2019-03-20 11:07:20.342140927 +0100
> Modify: 2019-03-20 11:28:28.318690015 +0100
> Change: 2019-03-21 13:01:03.133654344 +0100
>  Birth: -
>
> sudo stat
> /data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf
>   File:
> '/data/data-cluster/dms/final_archive/a8b7f31775eebc8d1867e7f9de7b6eaf'
>   Size: 33Blocks: 0  IO Block: 4096   directory
> Device: 807h/2055d Inode: 53706303549  Links: 3
> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2019-03-20 11:06:55.414097315 +0100
> Modify: 2019-03-20 11:28:28.362690281 +0100
> Change: 2019-03-21 13:01:03.141654359 +0100
>  Birth: -
>
> sudo stat
> /data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b
>   File:
> '/data/data-cluster/dms/final_archive/c1d3f3c2d7ae90e891e671e2f20d5d4b'
>   Size: 33Blocks: 0  IO Block: 4096   directory
> Device: 807h/2055d Inode: 57990935591  Links: 3
> 

Re: [Gluster-users] [Gluster-devel] Network Block device (NBD) on top of glusterfs

2019-03-21 Thread Xiubo Li

On 2019/3/21 18:09, Prasanna Kalever wrote:



On Thu, Mar 21, 2019 at 9:00 AM Xiubo Li > wrote:


All,

I am one of the contributor forgluster-block
[1] project, and also I
contribute to linux kernel andopen-iscsi
 project.[2]

NBD was around for some time, but in recent time, linux kernel’s
Network Block Device (NBD) is enhanced and made to work with more
devices and also the option to integrate with netlink is added.
So, I tried to provide a glusterfs client based NBD driver
recently. Please refergithub issue #633
[3], and good
news is I have a working code, with most basic things @nbd-runner
project [4].

While this email is about announcing the project, and asking for
more collaboration, I would also like to discuss more about the
placement of the project itself. Currently nbd-runner project is
expected to be shared by our friends at Ceph project too, to
provide NBD driver for Ceph. I have personally worked with some of
them closely while contributing to open-iSCSI project, and we
would like to take this project to great success.

Now few questions:

 1. Can I continue to usehttp://github.com/gluster/nbd-runneras
home for this project, even if its shared by other filesystem
projects?

  * I personally am fine with this.

 2. Should there be a separate organization for this repo?

  * While it may make sense in future, for now, I am not planning
to start any new thing?

It would be great if we have some consensus on this soon as
nbd-runner is a new repository. If there are no concerns, I will
continue to contribute to the existing repository.


Thanks Xiubo Li, for finally sending this email out. Since this email 
is out on gluster mailing list, I would like to take a stand from 
gluster community point of view *only* and share my views.


My honest answer is "If we want to maintain this within gluster org, 
then 80% of the effort is common/duplicate of what we did all these 
days with gluster-block",


The great idea came from Mike Christie days ago and the nbd-runner 
project's framework is initially emulated from tcmu-runner. This is why 
I name this project as nbd-runner, which will work for all the other 
Distributed Storages, such as Gluster/Ceph/Azure, as discussed with Mike 
before.


nbd-runner(NBD proto) and tcmu-runner(iSCSI proto) are almost the same 
and both are working as lower IO(READ/WRITE/...) stuff, not the 
management layer like ceph-iscsi-gateway and gluster-block currently do.


Currently since I only implemented the Gluster handler and also using 
the RPC like glusterfs and gluster-block, most of the other code (about 
70%) in nbd-runner are for the NBD proto and these are very different 
from tcmu-runner/glusterfs/gluster-block projects, and there are many 
new features in NBD module that not yet supported and then there will be 
more different in future.


The framework coding has been done and the nbd-runner project is already 
stable and could already work well for me now.




like:
* rpc/socket code
* cli/daemon parser/helper logics
* gfapi util functions
* logger framework
* inotify & dyn-config threads


Yeah, these features were initially from tcmu-runner project, Mike and I 
coded two years ago. Currently nbd-runner also has copied them from 
tcmu-runner.


Very appreciated for you great ideas here Prasanna and hope nbd-runner 
could be more generically and successfully used in future.


BRs

Xiubo Li



* configure/Makefile/specfiles
* docsAboutGluster and etc ..

The repository gluster-block is actually a home for all the block 
related stuff within gluster and its designed to accommodate alike 
functionalities, if I was you I would have simply copied nbd-runner.c 
into https://github.com/gluster/gluster-block/tree/master/daemon/ just 
like ceph plays it here 
https://github.com/ceph/ceph/blob/master/src/tools/rbd_nbd/rbd-nbd.cc 
and be done.


Advantages of keeping nbd client within gluster-block:
-> No worry about maintenance code burdon
-> No worry about monitoring a new component
-> shipping packages to fedora/centos/rhel is handled
-> This helps improve and stabilize the current gluster-block framework
-> We can build a common CI
-> We can use reuse common test framework and etc ..

If you have an impression that gluster-block is for management, then I 
would really want to correct you at this point.


Some of my near future plans for gluster-block:
* Allow exporting blocks with FUSE access via fileIO backstore to 
improve large-file workloads, draft: 
https://github.com/gluster/gluster-block/pull/58

* Accommodate kernel loopback handling for local only applications
* The same way we can accommodate nbd app/client, and IMHO this effort 

Re: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain

2019-03-21 Thread Karthik Subrahmanya
Can you give me the stat & getfattr output of all those 6 entries from both
the bricks and the glfsheal-.log file from the node where you run
this command?
Meanwhile can you also try running this with the source-brick option?

On Thu, Mar 21, 2019 at 5:22 PM Milos Cuculovic  wrote:

> Thank you Karthik,
>
> I have run this for all files (see example below) and it says the file is
> not in split-brain:
>
> sudo gluster volume heal storage2 split-brain latest-mtime
> /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
> Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File
> not in split-brain.
> Volume heal failed.
>
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculo...@mdpi.com 
> Skype: milos.cuculovic.mdpi
>
> Disclaimer: The information and files contained in this message
> are confidential and intended solely for the use of the individual or
> entity to whom they are addressed. If you have received this message in
> error, please notify me and delete this message from your system. You may
> not copy this message in its entirety or in part, or disclose its contents
> to anyone.
>
> On 21 Mar 2019, at 12:36, Karthik Subrahmanya  wrote:
>
> Hi Milos,
>
> Thanks for the logs and the getfattr output.
> From the logs I can see that there are 6 entries under the
> directory "/data/data-cluster/dms/final_archive" named
> 41be9ff5ec05c4b1c989c6053e709e59
> 5543982fab4b56060aa09f667a8ae617
> a8b7f31775eebc8d1867e7f9de7b6eaf
> c1d3f3c2d7ae90e891e671e2f20d5d4b
> e5934699809a3b6dcfc5945f408b978b
> e7cdc94f60d390812a5f9754885e119e
> which are having gfid mismatch, so the heal is failing on this directory.
>
> You can use the CLI option to resolve these files from gfid mismatch. You
> can use any of the 3 methods available:
> 1. bigger-file
> gluster volume heal  split-brain bigger-file 
>
> 2. latest-mtime
> gluster volume heal  split-brain latest-mtime 
>
> 3. source-brick
> gluster volume heal  split-brain source-brick
>  
>
> where  must be absolute path w.r.t. the volume, starting with '/'.
> If all those entries are directories then go for either
> latest-mtime/source-brick option.
> After you resolve all these gfid-mismatches, run the "gluster volume heal
> " command. Then check the heal info and let me know the result.
>
> Regards,
> Karthik
>
> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic 
> wrote:
>
>> Sure, thank you for following up.
>>
>> About the commands, here is what I see:
>>
>> brick1:
>> —
>> sudo gluster volume heal storage2 info
>> Brick storage3:/data/data-cluster
>> 
>> 
>> /dms/final_archive - Possibly undergoing heal
>>
>> Status: Connected
>> Number of entries: 3
>>
>> Brick storage4:/data/data-cluster
>> 
>> /dms/final_archive - Possibly undergoing heal
>>
>> Status: Connected
>> Number of entries: 2
>> —
>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
>> getfattr: Removing leading '/' from absolute path names
>> # file: data/data-cluster/dms/final_archive
>> trusted.afr.dirty=0x
>> trusted.afr.storage2-client-1=0x0010
>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
>> trusted.glusterfs.dht=0x0001
>> trusted.glusterfs.dht.mds=0x
>> —
>> stat /data/data-cluster/dms/final_archive
>>   File: '/data/data-cluster/dms/final_archive'
>>   Size: 3497984   Blocks: 8768   IO Block: 4096   directory
>> Device: 807h/2055d Inode: 26427748396  Links: 72123
>> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
>> Access: 2018-10-09 04:22:40.514629044 +0200
>> Modify: 2019-03-21 11:55:37.382278863 +0100
>> Change: 2019-03-21 11:55:37.382278863 +0100
>>  Birth: -
>> —
>> —
>>
>> brick2:
>> —
>> sudo gluster volume heal storage2 info
>> Brick storage3:/data/data-cluster
>> 
>> 
>> /dms/final_archive - Possibly undergoing heal
>>
>> Status: Connected
>> Number of entries: 3
>>
>> Brick storage4:/data/data-cluster
>> 
>> /dms/final_archive - Possibly undergoing heal
>>
>> Status: Connected
>> Number of entries: 2
>> —
>> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
>> getfattr: Removing leading '/' from absolute path names
>> # file: data/data-cluster/dms/final_archive
>> trusted.afr.dirty=0x
>> trusted.afr.storage2-client-0=0x0001
>> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
>> trusted.glusterfs.dht=0x0001
>> trusted.glusterfs.dht.mds=0x
>> —
>> stat /data/data-cluster/dms/final_archive
>>   

Re: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain

2019-03-21 Thread Milos Cuculovic
Thank you Karthik,

I have run this for all files (see example below) and it says the file is not 
in split-brain:

sudo gluster volume heal storage2 split-brain latest-mtime  
/dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59
Healing /dms/final_archive/41be9ff5ec05c4b1c989c6053e709e59 failed: File not in 
split-brain.
Volume heal failed.


- Kindest regards,

Milos Cuculovic
IT Manager

---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculo...@mdpi.com
Skype: milos.cuculovic.mdpi

Disclaimer: The information and files contained in this message are 
confidential and intended solely for the use of the individual or entity to 
whom they are addressed. If you have received this message in error, please 
notify me and delete this message from your system. You may not copy this 
message in its entirety or in part, or disclose its contents to anyone.

> On 21 Mar 2019, at 12:36, Karthik Subrahmanya  wrote:
> 
> Hi Milos,
> 
> Thanks for the logs and the getfattr output.
> From the logs I can see that there are 6 entries under the directory 
> "/data/data-cluster/dms/final_archive" named
> 41be9ff5ec05c4b1c989c6053e709e59
> 5543982fab4b56060aa09f667a8ae617
> a8b7f31775eebc8d1867e7f9de7b6eaf
> c1d3f3c2d7ae90e891e671e2f20d5d4b
> e5934699809a3b6dcfc5945f408b978b
> e7cdc94f60d390812a5f9754885e119e
> which are having gfid mismatch, so the heal is failing on this directory.
> 
> You can use the CLI option to resolve these files from gfid mismatch. You can 
> use any of the 3 methods available:
> 1. bigger-file
> gluster volume heal  split-brain bigger-file 
> 
> 2. latest-mtime
> gluster volume heal  split-brain latest-mtime 
> 
> 3. source-brick
> gluster volume heal  split-brain source-brick  
> 
> 
> where  must be absolute path w.r.t. the volume, starting with '/'.
> If all those entries are directories then go for either 
> latest-mtime/source-brick option.
> After you resolve all these gfid-mismatches, run the "gluster volume heal 
> " command. Then check the heal info and let me know the result.
> 
> Regards,
> Karthik
> 
> On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic  > wrote:
> Sure, thank you for following up.
> 
> About the commands, here is what I see:
> 
> brick1:
> —
> sudo gluster volume heal storage2 info
> Brick storage3:/data/data-cluster
>  
>  
> /dms/final_archive - Possibly undergoing heal
> 
> Status: Connected
> Number of entries: 3
> 
> Brick storage4:/data/data-cluster
>  
> /dms/final_archive - Possibly undergoing heal
> 
> Status: Connected
> Number of entries: 2
> —
> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
> getfattr: Removing leading '/' from absolute path names
> # file: data/data-cluster/dms/final_archive
> trusted.afr.dirty=0x
> trusted.afr.storage2-client-1=0x0010
> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.dht.mds=0x
> —
> stat /data/data-cluster/dms/final_archive
>   File: '/data/data-cluster/dms/final_archive'
>   Size: 3497984   Blocks: 8768   IO Block: 4096   directory
> Device: 807h/2055dInode: 26427748396  Links: 72123
> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2018-10-09 04:22:40.514629044 +0200
> Modify: 2019-03-21 11:55:37.382278863 +0100
> Change: 2019-03-21 11:55:37.382278863 +0100
>  Birth: -
> —
> —
> 
> brick2:
> —
> sudo gluster volume heal storage2 info
> Brick storage3:/data/data-cluster
>  
>  
> /dms/final_archive - Possibly undergoing heal
> 
> Status: Connected
> Number of entries: 3
> 
> Brick storage4:/data/data-cluster
>  
> /dms/final_archive - Possibly undergoing heal
> 
> Status: Connected
> Number of entries: 2
> —
> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
> getfattr: Removing leading '/' from absolute path names
> # file: data/data-cluster/dms/final_archive
> trusted.afr.dirty=0x
> trusted.afr.storage2-client-0=0x0001
> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.dht.mds=0x
> —
> stat /data/data-cluster/dms/final_archive
>   File: '/data/data-cluster/dms/final_archive'
>   Size: 3497984   Blocks: 8760   IO Block: 4096   directory
> Device: 807h/2055dInode: 13563551265  Links: 72124
> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2018-10-09 04:22:40.514629044 +0200
> Modify: 2019-03-21 11:55:46.382565124 +0100
> Change: 

Re: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain

2019-03-21 Thread Karthik Subrahmanya
Hi Milos,

Thanks for the logs and the getfattr output.
>From the logs I can see that there are 6 entries under the
directory "/data/data-cluster/dms/final_archive" named
41be9ff5ec05c4b1c989c6053e709e59
5543982fab4b56060aa09f667a8ae617
a8b7f31775eebc8d1867e7f9de7b6eaf
c1d3f3c2d7ae90e891e671e2f20d5d4b
e5934699809a3b6dcfc5945f408b978b
e7cdc94f60d390812a5f9754885e119e
which are having gfid mismatch, so the heal is failing on this directory.

You can use the CLI option to resolve these files from gfid mismatch. You
can use any of the 3 methods available:
1. bigger-file
gluster volume heal  split-brain bigger-file 

2. latest-mtime
gluster volume heal  split-brain latest-mtime 

3. source-brick
gluster volume heal  split-brain source-brick 


where  must be absolute path w.r.t. the volume, starting with '/'.
If all those entries are directories then go for either
latest-mtime/source-brick option.
After you resolve all these gfid-mismatches, run the "gluster volume heal
" command. Then check the heal info and let me know the result.

Regards,
Karthik

On Thu, Mar 21, 2019 at 4:27 PM Milos Cuculovic  wrote:

> Sure, thank you for following up.
>
> About the commands, here is what I see:
>
> brick1:
> —
> sudo gluster volume heal storage2 info
> Brick storage3:/data/data-cluster
> 
> 
> /dms/final_archive - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 3
>
> Brick storage4:/data/data-cluster
> 
> /dms/final_archive - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 2
> —
> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
> getfattr: Removing leading '/' from absolute path names
> # file: data/data-cluster/dms/final_archive
> trusted.afr.dirty=0x
> trusted.afr.storage2-client-1=0x0010
> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.dht.mds=0x
> —
> stat /data/data-cluster/dms/final_archive
>   File: '/data/data-cluster/dms/final_archive'
>   Size: 3497984   Blocks: 8768   IO Block: 4096   directory
> Device: 807h/2055d Inode: 26427748396  Links: 72123
> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2018-10-09 04:22:40.514629044 +0200
> Modify: 2019-03-21 11:55:37.382278863 +0100
> Change: 2019-03-21 11:55:37.382278863 +0100
>  Birth: -
> —
> —
>
> brick2:
> —
> sudo gluster volume heal storage2 info
> Brick storage3:/data/data-cluster
> 
> 
> /dms/final_archive - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 3
>
> Brick storage4:/data/data-cluster
> 
> /dms/final_archive - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 2
> —
> sudo getfattr -d -m . -e hex /data/data-cluster/dms/final_archive
> getfattr: Removing leading '/' from absolute path names
> # file: data/data-cluster/dms/final_archive
> trusted.afr.dirty=0x
> trusted.afr.storage2-client-0=0x0001
> trusted.gfid=0x16c6a1e2b3fe4851972b998980097a87
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.dht.mds=0x
> —
> stat /data/data-cluster/dms/final_archive
>   File: '/data/data-cluster/dms/final_archive'
>   Size: 3497984   Blocks: 8760   IO Block: 4096   directory
> Device: 807h/2055d Inode: 13563551265  Links: 72124
> Access: (0755/drwxr-xr-x)  Uid: (   33/www-data)   Gid: (   33/www-data)
> Access: 2018-10-09 04:22:40.514629044 +0200
> Modify: 2019-03-21 11:55:46.382565124 +0100
> Change: 2019-03-21 11:55:46.382565124 +0100
>  Birth: -
> —
>
> Hope this helps.
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculo...@mdpi.com 
> Skype: milos.cuculovic.mdpi
>
> Disclaimer: The information and files contained in this message
> are confidential and intended solely for the use of the individual or
> entity to whom they are addressed. If you have received this message in
> error, please notify me and delete this message from your system. You may
> not copy this message in its entirety or in part, or disclose its contents
> to anyone.
>
> On 21 Mar 2019, at 11:43, Karthik Subrahmanya  wrote:
>
> Can you attach the "glustershd.log"  file which will be present under
> "/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m
> . -e hex " output of all the entries listed in the heal
> info output from both the bricks?
>
> On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic 
> wrote:
>
>> Thanks Karthik!
>>
>> 

[Gluster-users] Announcing Gluster release 5.5

2019-03-21 Thread Shyam Ranganathan
The Gluster community is pleased to announce the release of Gluster
5.5 (packages available at [1]).

Release notes for the release can be found at [3].

Major changes, features and limitations addressed in this release:

- Release 5.4 introduced an incompatible change that prevented rolling
upgrades, and hence was never announced to the lists. As a result we are
jumping a release version and going to 5.5 from 5.3, that does not have
the problem.

Thanks,
Gluster community

[1] Packages for 5.5:
https://download.gluster.org/pub/gluster/glusterfs/5/5.5/

[2] Release notes for 5.5:
https://docs.gluster.org/en/latest/release-notes/5.5/
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] "rpc_clnt_ping_timer_expired" errors

2019-03-21 Thread Raghavendra Gowdappa
On Thu, Mar 21, 2019 at 4:10 PM Mauro Tridici  wrote:

> Hi Raghavendra,
>
> the number of errors reduced, but during last days I received some error
> notifications from Nagios server similar to the following one:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ** Nagios *Notification Type: PROBLEMService: Brick -
> /gluster/mnt5/brickHost: s04Address: s04-stgState: CRITICALDate/Time: Mon
> Mar 18 19:56:36 CET 2019Additional Info:CHECK_NRPE STATE CRITICAL: Socket
> timeout after 10 seconds.*
>
> The error was related only to s04 gluster server.
>
> So, following your suggestions,  I executed, on s04 node, the top command.
> In attachment, you can find the related output.
>

top output doesn't contain cmd/thread names. Was there anything wrong.


> Thank you very much for your help.
> Regards,
> Mauro
>
>
>
> On 14 Mar 2019, at 13:31, Raghavendra Gowdappa 
> wrote:
>
> Thanks Mauro.
>
> On Thu, Mar 14, 2019 at 3:38 PM Mauro Tridici 
> wrote:
>
>> Hi Raghavendra,
>>
>> I just changed the client option value to 8.
>> I will check the volume behaviour during the next hours.
>>
>> The GlusterFS version is 3.12.14.
>>
>> I will provide you the logs as soon as the activity load will be high.
>> Thank you,
>> Mauro
>>
>> On 14 Mar 2019, at 04:57, Raghavendra Gowdappa 
>> wrote:
>>
>>
>>
>> On Wed, Mar 13, 2019 at 3:55 PM Mauro Tridici 
>> wrote:
>>
>>> Hi Raghavendra,
>>>
>>> Yes, server.event-thread has been changed from 4 to 8.
>>>
>>
>> Was client.event-thread value too changed to 8? If not, I would like to
>> know the results of including this tuning too. Also, if possible, can you
>> get the output of following command from problematic clients and bricks
>> (during the duration when load tends to be high and ping-timer-expiry is
>> seen)?
>>
>> # top -bHd 3
>>
>> This will help us to know  CPU utilization of event-threads.
>>
>> And I forgot to ask, what version of Glusterfs are you using?
>>
>> During last days, I noticed that the error events are still here although
>>> they have been considerably reduced.
>>>
>>> So, I used grep command against the log files in order to provide you a
>>> global vision about the warning, error and critical events appeared today
>>> at 06:xx (may be useful I hope).
>>> I collected the info from s06 gluster server, but the behaviour is the
>>> the almost the same on the other gluster servers.
>>>
>>> *ERRORS:  *
>>> *CWD: /var/log/glusterfs *
>>> *COMMAND: grep " E " *.log |grep "2019-03-13 06:"*
>>>
>>> (I can see a lot of this kind of message in the same period but I'm
>>> notifying you only one record for each type of error)
>>>
>>> glusterd.log:[2019-03-13 06:12:35.982863] E [MSGID: 101042]
>>> [compat.c:569:gf_umount_lazy] 0-management: Lazy unmount of
>>> /var/run/gluster/tier2_quota_list/
>>>
>>> glustershd.log:[2019-03-13 06:14:28.666562] E
>>> [rpc-clnt.c:350:saved_frames_unwind] (-->
>>> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f4a71ddcebb] (-->
>>> /lib64/libgfr
>>> pc.so.0(saved_frames_unwind+0x1de)[0x7f4a71ba1d9e] (-->
>>> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4a71ba1ebe] (-->
>>> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup
>>> +0x90)[0x7f4a71ba3640] (-->
>>> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f4a71ba4130] )
>>> 0-tier2-client-55: forced unwinding frame type(GlusterFS 3.3)
>>> op(INODELK(29))
>>> called at 2019-03-13 06:14:14.858441 (xid=0x17fddb50)
>>>
>>> glustershd.log:[2019-03-13 06:17:48.883825] E
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>>> 192.168.0.55:49158 failed (Connection timed out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:19:58.931798] E
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>>> 192.168.0.55:49158 failed (Connection timed out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:22:08.979829] E
>>> [socket.c:2376:socket_connect_finish] 0-tier2-client-55: connection to
>>> 192.168.0.55:49158 failed (Connection timed out); disco
>>> nnecting socket
>>> glustershd.log:[2019-03-13 06:22:36.226847] E [MSGID: 114031]
>>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote
>>> operation failed [Transport endpoint
>>> is not connected]
>>> glustershd.log:[2019-03-13 06:22:36.306669] E [MSGID: 114031]
>>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote
>>> operation failed [Transport endpoint
>>> is not connected]
>>> glustershd.log:[2019-03-13 06:22:36.385257] E [MSGID: 114031]
>>> [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 0-tier2-client-55: remote
>>> operation failed [Transport endpoint
>>> is not connected]
>>>
>>> *WARNINGS:*
>>> *CWD: /var/log/glusterfs *
>>> *COMMAND: grep " W " *.log |grep "2019-03-13 06:"*
>>>
>>> (I can see a lot of this kind of message in the same period but I'm
>>> notifying you only one record for each type of warnings)
>>>
>>> glustershd.log:[2019-03-13 06:14:28.666772] W [MSGID: 114031]
>>> [client-rpc-fops.c:1080:client3_3_getxattr_cbk] 

Re: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain

2019-03-21 Thread Karthik Subrahmanya
Can you attach the "glustershd.log"  file which will be present under
"/var/log/glusterfs/" from both the nodes and the "stat" & "getfattr -d -m
. -e hex " output of all the entries listed in the heal
info output from both the bricks?

On Thu, Mar 21, 2019 at 3:54 PM Milos Cuculovic  wrote:

> Thanks Karthik!
>
> I was trying to find some resolution methods from [2] but unfortunately
> none worked (I can explain what I tried if needed).
>
> I guess the volume you are talking about is of type replica-2 (1x2).
>
> That’s correct, aware of the arbiter solution but still didn’t took time
> to implement.
>
> From the info results I posted, how to know in which situation I am. No
> files are mentioned in spit brain, only directories. One brick has 3
> entries and one two entries.
>
> sudo gluster volume heal storage2 info
> [sudo] password for sshadmin:
> Brick storage3:/data/data-cluster
> 
> 
> /dms/final_archive - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 3
>
> Brick storage4:/data/data-cluster
> 
> /dms/final_archive - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 2
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculo...@mdpi.com 
> Skype: milos.cuculovic.mdpi
>
> Disclaimer: The information and files contained in this message
> are confidential and intended solely for the use of the individual or
> entity to whom they are addressed. If you have received this message in
> error, please notify me and delete this message from your system. You may
> not copy this message in its entirety or in part, or disclose its contents
> to anyone.
>
> On 21 Mar 2019, at 10:27, Karthik Subrahmanya  wrote:
>
> Hi,
>
> Note: I guess the volume you are talking about is of type replica-2 (1x2).
> Usually replica 2 volumes are prone to split-brain. If you can consider
> converting them to arbiter or replica-3, they will handle most of the cases
> which can lead to slit-brains. For more information see [1].
>
> Resolving the split-brain: [2] talks about how to interpret the heal info
> output and different ways to resolve them using the CLI/manually/using the
> favorite-child-policy.
> If you are having entry split brain, and is a gfid split-brain (file/dir
> having different gfids on the replica bricks) then you can use the CLI
> option to resolve them. If a directory is in gfid split-brain in a
> distributed-replicate volume and you are using the source-brick option
> please make sure you use the brick of this subvolume, which has the same
> gfid as that of the other distribute subvolume(s) where you have the
> correct gfid, as the source.
> If you are having a type mismatch then follow the steps in [3] to resolve
> the split-brain.
>
> [1]
> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/
> [2]
> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
> [3]
> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain
>
> HTH,
> Karthik
>
> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic 
> wrote:
>
>> I was now able to catch the split brain log:
>>
>> sudo gluster volume heal storage2 info
>> Brick storage3:/data/data-cluster
>> 
>> 
>> /dms/final_archive - Is in split-brain
>>
>> Status: Connected
>> Number of entries: 3
>>
>> Brick storage4:/data/data-cluster
>> 
>> /dms/final_archive - Is in split-brain
>>
>> Status: Connected
>> Number of entries: 2
>>
>> Milos
>>
>> On 21 Mar 2019, at 09:07, Milos Cuculovic  wrote:
>>
>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal
>> shows this:
>>
>> sudo gluster volume heal storage2 info
>> Brick storage3:/data/data-cluster
>> 
>> 
>> /dms/final_archive - Possibly undergoing heal
>>
>> Status: Connected
>> Number of entries: 3
>>
>> Brick storage4:/data/data-cluster
>> 
>> /dms/final_archive - Possibly undergoing heal
>>
>> Status: Connected
>> Number of entries: 2
>>
>> The same files stay there. From time to time the status of the
>> /dms/final_archive is in split brain at the following command shows:
>>
>> sudo gluster volume heal storage2 info split-brain
>> Brick storage3:/data/data-cluster
>> /dms/final_archive
>> Status: Connected
>> Number of entries in split-brain: 1
>>
>> Brick storage4:/data/data-cluster
>> /dms/final_archive
>> Status: Connected
>> Number of entries in split-brain: 1
>>
>> How to know the file who is in split brain? The files in
>> /dms/final_archive are not very important, fine to remove (ideally resolve
>> the split brain) for the ones that differ.
>>
>> I can only see the directory and GFID. Any idea on how to resolve this
>> situation as I would like to continue with the upgrade on the 2nd server,
>> and for this the heal needs to be done with 0 entries in sudo gluster
>> volume heal storage2 info
>>

Re: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain

2019-03-21 Thread Milos Cuculovic
Thanks Karthik!

I was trying to find some resolution methods from [2] but unfortunately none 
worked (I can explain what I tried if needed).

> I guess the volume you are talking about is of type replica-2 (1x2).
That’s correct, aware of the arbiter solution but still didn’t took time to 
implement.

From the info results I posted, how to know in which situation I am. No files 
are mentioned in spit brain, only directories. One brick has 3 entries and one 
two entries.

sudo gluster volume heal storage2 info
[sudo] password for sshadmin: 
Brick storage3:/data/data-cluster
 
 
/dms/final_archive - Possibly undergoing heal

Status: Connected
Number of entries: 3

Brick storage4:/data/data-cluster
 
/dms/final_archive - Possibly undergoing heal

Status: Connected
Number of entries: 2

- Kindest regards,

Milos Cuculovic
IT Manager

---
MDPI AG
Postfach, CH-4020 Basel, Switzerland
Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
Tel. +41 61 683 77 35
Fax +41 61 302 89 18
Email: cuculo...@mdpi.com
Skype: milos.cuculovic.mdpi

Disclaimer: The information and files contained in this message are 
confidential and intended solely for the use of the individual or entity to 
whom they are addressed. If you have received this message in error, please 
notify me and delete this message from your system. You may not copy this 
message in its entirety or in part, or disclose its contents to anyone.

> On 21 Mar 2019, at 10:27, Karthik Subrahmanya  wrote:
> 
> Hi,
> 
> Note: I guess the volume you are talking about is of type replica-2 (1x2). 
> Usually replica 2 volumes are prone to split-brain. If you can consider 
> converting them to arbiter or replica-3, they will handle most of the cases 
> which can lead to slit-brains. For more information see [1].
> 
> Resolving the split-brain: [2] talks about how to interpret the heal info 
> output and different ways to resolve them using the CLI/manually/using the 
> favorite-child-policy.
> If you are having entry split brain, and is a gfid split-brain (file/dir 
> having different gfids on the replica bricks) then you can use the CLI option 
> to resolve them. If a directory is in gfid split-brain in a 
> distributed-replicate volume and you are using the source-brick option please 
> make sure you use the brick of this subvolume, which has the same gfid as 
> that of the other distribute subvolume(s) where you have the correct gfid, as 
> the source.
> If you are having a type mismatch then follow the steps in [3] to resolve the 
> split-brain.
> 
> [1] 
> https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/
>  
> 
> [2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/ 
> 
> [3] 
> https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain
>  
> 
> 
> HTH,
> Karthik
> 
> On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic  > wrote:
> I was now able to catch the split brain log:
> 
> sudo gluster volume heal storage2 info
> Brick storage3:/data/data-cluster
>  
>  
> /dms/final_archive - Is in split-brain
> 
> Status: Connected
> Number of entries: 3
> 
> Brick storage4:/data/data-cluster
>  
> /dms/final_archive - Is in split-brain
> 
> Status: Connected
> Number of entries: 2
> 
> Milos
> 
>> On 21 Mar 2019, at 09:07, Milos Cuculovic > > wrote:
>> 
>> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal 
>> shows this:
>> 
>> sudo gluster volume heal storage2 info
>> Brick storage3:/data/data-cluster
>>  
>>  
>> /dms/final_archive - Possibly undergoing heal
>> 
>> Status: Connected
>> Number of entries: 3
>> 
>> Brick storage4:/data/data-cluster
>>  
>> /dms/final_archive - Possibly undergoing heal
>> 
>> Status: Connected
>> Number of entries: 2
>> 
>> The same files stay there. From time to time the status of the 
>> /dms/final_archive is in split brain at the following command shows:
>> 
>> sudo gluster volume heal storage2 info split-brain
>> Brick storage3:/data/data-cluster
>> /dms/final_archive
>> Status: Connected
>> Number of entries in split-brain: 1
>> 
>> Brick storage4:/data/data-cluster
>> /dms/final_archive
>> Status: Connected
>> Number of entries in split-brain: 1
>> 
>> How to know the file who is in split brain? The files in /dms/final_archive 
>> are not very important, fine to remove (ideally resolve the split brain) for 
>> the ones that differ.
>> 
>> I can only see the directory and GFID. Any idea on how to resolve this 
>> situation as I would like to continue with the upgrade on the 2nd server, 
>> and for this the heal needs to be done with 0 entries in sudo gluster volume 
>> heal storage2 info
>> 

Re: [Gluster-users] [Gluster-devel] Network Block device (NBD) on top of glusterfs

2019-03-21 Thread Prasanna Kalever
On Thu, Mar 21, 2019 at 9:00 AM Xiubo Li  wrote:

> All,
>
> I am one of the contributor for gluster-block
> [1] project, and also I
> contribute to linux kernel and open-iscsi 
> project.[2]
>
> NBD was around for some time, but in recent time, linux kernel’s Network
> Block Device (NBD) is enhanced and made to work with more devices and also
> the option to integrate with netlink is added. So, I tried to provide a
> glusterfs client based NBD driver recently. Please refer github issue #633
> [3], and good news is I
> have a working code, with most basic things @ nbd-runner project
> [4].
>
> While this email is about announcing the project, and asking for more
> collaboration, I would also like to discuss more about the placement of the
> project itself. Currently nbd-runner project is expected to be shared by
> our friends at Ceph project too, to provide NBD driver for Ceph. I have
> personally worked with some of them closely while contributing to
> open-iSCSI project, and we would like to take this project to great success.
>
> Now few questions:
>
>1. Can I continue to use http://github.com/gluster/nbd-runner as home
>for this project, even if its shared by other filesystem projects?
>
>
>- I personally am fine with this.
>
>
>1. Should there be a separate organization for this repo?
>
>
>- While it may make sense in future, for now, I am not planning to
>start any new thing?
>
> It would be great if we have some consensus on this soon as nbd-runner is
> a new repository. If there are no concerns, I will continue to contribute
> to the existing repository.
>

Thanks Xiubo Li, for finally sending this email out. Since this email is
out on gluster mailing list, I would like to take a stand from gluster
community point of view *only* and share my views.

My honest answer is "If we want to maintain this within gluster org, then
80% of the effort is common/duplicate of what we did all these days with
gluster-block",

like:
* rpc/socket code
* cli/daemon parser/helper logics
* gfapi util functions
* logger framework
* inotify & dyn-config threads
* configure/Makefile/specfiles
* docsAboutGluster and etc ..

The repository gluster-block is actually a home for all the block related
stuff within gluster and its designed to accommodate alike functionalities,
if I was you I would have simply copied nbd-runner.c into
https://github.com/gluster/gluster-block/tree/master/daemon/ just like ceph
plays it here
https://github.com/ceph/ceph/blob/master/src/tools/rbd_nbd/rbd-nbd.cc and
be done.

Advantages of keeping nbd client within gluster-block:
-> No worry about maintenance code burdon
-> No worry about monitoring a new component
-> shipping packages to fedora/centos/rhel is handled
-> This helps improve and stabilize the current gluster-block framework
-> We can build a common CI
-> We can use reuse common test framework and etc ..

If you have an impression that gluster-block is for management, then I
would really want to correct you at this point.

Some of my near future plans for gluster-block:
* Allow exporting blocks with FUSE access via fileIO backstore to improve
large-file workloads, draft:
https://github.com/gluster/gluster-block/pull/58
* Accommodate kernel loopback handling for local only applications
* The same way we can accommodate nbd app/client, and IMHO this effort
shouldn't take 1 or 2 days to get it merged with in gluster-block and ready
for a go release.


Hope that clarifies it.


Best Regards,
--
Prasanna


> Regards,
> Xiubo Li (@lxbsz)
>
> [1] - https://github.com/gluster/gluster-block
> [2] - https://github.com/open-iscsi
> [3] - https://github.com/gluster/glusterfs/issues/633
> [4] - https://github.com/gluster/nbd-runner
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain

2019-03-21 Thread Karthik Subrahmanya
Hi,

Note: I guess the volume you are talking about is of type replica-2 (1x2).
Usually replica 2 volumes are prone to split-brain. If you can consider
converting them to arbiter or replica-3, they will handle most of the cases
which can lead to slit-brains. For more information see [1].

Resolving the split-brain: [2] talks about how to interpret the heal info
output and different ways to resolve them using the CLI/manually/using the
favorite-child-policy.
If you are having entry split brain, and is a gfid split-brain (file/dir
having different gfids on the replica bricks) then you can use the CLI
option to resolve them. If a directory is in gfid split-brain in a
distributed-replicate volume and you are using the source-brick option
please make sure you use the brick of this subvolume, which has the same
gfid as that of the other distribute subvolume(s) where you have the
correct gfid, as the source.
If you are having a type mismatch then follow the steps in [3] to resolve
the split-brain.

[1]
https://docs.gluster.org/en/v3/Administrator%20Guide/Split%20brain%20and%20ways%20to%20deal%20with%20it/
[2] https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/
[3]
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#dir-split-brain

HTH,
Karthik

On Thu, Mar 21, 2019 at 1:45 PM Milos Cuculovic  wrote:

> I was now able to catch the split brain log:
>
> sudo gluster volume heal storage2 info
> Brick storage3:/data/data-cluster
> 
> 
> /dms/final_archive - Is in split-brain
>
> Status: Connected
> Number of entries: 3
>
> Brick storage4:/data/data-cluster
> 
> /dms/final_archive - Is in split-brain
>
> Status: Connected
> Number of entries: 2
>
> Milos
>
> On 21 Mar 2019, at 09:07, Milos Cuculovic  wrote:
>
> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal
> shows this:
>
> sudo gluster volume heal storage2 info
> Brick storage3:/data/data-cluster
> 
> 
> /dms/final_archive - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 3
>
> Brick storage4:/data/data-cluster
> 
> /dms/final_archive - Possibly undergoing heal
>
> Status: Connected
> Number of entries: 2
>
> The same files stay there. From time to time the status of the
> /dms/final_archive is in split brain at the following command shows:
>
> sudo gluster volume heal storage2 info split-brain
> Brick storage3:/data/data-cluster
> /dms/final_archive
> Status: Connected
> Number of entries in split-brain: 1
>
> Brick storage4:/data/data-cluster
> /dms/final_archive
> Status: Connected
> Number of entries in split-brain: 1
>
> How to know the file who is in split brain? The files in
> /dms/final_archive are not very important, fine to remove (ideally resolve
> the split brain) for the ones that differ.
>
> I can only see the directory and GFID. Any idea on how to resolve this
> situation as I would like to continue with the upgrade on the 2nd server,
> and for this the heal needs to be done with 0 entries in sudo gluster
> volume heal storage2 info
>
> Thank you in advance, Milos.
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication status always on 'Created'

2019-03-21 Thread Maurya M
hi Sunny,
 i did use the [1] link for the setup, when i encountered this error during
ssh-copy-id : (so setup the passwordless ssh, by manually copied the
private/ public keys to all the nodes , both master & slave)

[root@k8s-agentpool1-24779565-1 ~]# ssh-copy-id geou...@xxx.xx.xxx.x
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed:
"/root/.ssh/id_rsa.pub"
The authenticity of host ' xxx.xx.xxx.x   ( xxx.xx.xxx.x  )' can't be
established.
ECDSA key fingerprint is SHA256:B2rNaocIcPjRga13oTnopbJ5KjI/7l5fMANXc+KhA9s.
ECDSA key fingerprint is
MD5:1b:70:f9:7a:bf:35:33:47:0c:f2:c1:cd:21:e2:d3:75.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to
filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are
prompted now it is to install the new keys
Permission denied (publickey).

To start afresh what all needs to teardown / delete, do we have any script
for it ? where all the pem keys do i need to delete?

thanks,
Maurya

On Thu, Mar 21, 2019 at 2:12 PM Sunny Kumar  wrote:

> Hey you can start a fresh I think you are not following proper setup steps.
>
> Please follow these steps [1] to create geo-rep session, you can
> delete the old one and do a fresh start. Or alternative you can use
> this tool[2] to setup geo-rep.
>
>
> [1].
> https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/
> [2]. http://aravindavk.in/blog/gluster-georep-tools/
>
>
> /Sunny
>
> On Thu, Mar 21, 2019 at 11:28 AM Maurya M  wrote:
> >
> > Hi Sunil,
> >  I did run the on the slave node :
> >  /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser
> vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781
> > getting this message "/home/azureuser/common_secret.pem.pub not present.
> Please run geo-replication command on master with push-pem option to
> generate the file"
> >
> > So went back and created the session again, no change, so manually
> copied the common_secret.pem.pub to /home/azureuser/ but still the
> set_geo_rep_pem_keys.sh is looking the pem file in different name :
> COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pem.pub ,
> change the name of pem , ran the command again :
> >
> >  /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser
> vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781
> > Successfully copied file.
> > Command executed successfully.
> >
> >
> > - went back and created the session , start the geo-replication , still
> seeing the  same error in logs. Any ideas ?
> >
> > thanks,
> > Maurya
> >
> >
> >
> > On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar 
> wrote:
> >>
> >> Hi Maurya,
> >>
> >> I guess you missed last trick to distribute keys in slave node. I see
> >> this is non-root geo-rep setup so please try this:
> >>
> >>
> >> Run the following command as root in any one of Slave node.
> >>
> >> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh  
> >>  
> >>
> >> - Sunny
> >>
> >> On Wed, Mar 20, 2019 at 10:47 PM Maurya M  wrote:
> >> >
> >> > Hi all,
> >> >  Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for
> geo-replication, but once have the geo-replication configure the status is
> always on "Created',
> >> > even after have force start the session.
> >> >
> >> > On close inspect of the logs on the master node seeing this error:
> >> >
> >> > "E [syncdutils(monitor):801:errlog] Popen: command returned error
>  cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i
> /var/lib/glusterd/geo-replication/secret.pem -p 22 azureu...@x...xxx.
> gluster --xml --remote-host=localhost volume info
> vol_a5ae34341a873c043c99a938adcb5b5781  error=255"
> >> >
> >> > Any ideas what is issue?
> >> >
> >> > thanks,
> >> > Maurya
> >> >
> >> > ___
> >> > Gluster-users mailing list
> >> > Gluster-users@gluster.org
> >> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Geo-replication status always on 'Created'

2019-03-21 Thread Sunny Kumar
Hey you can start a fresh I think you are not following proper setup steps.

Please follow these steps [1] to create geo-rep session, you can
delete the old one and do a fresh start. Or alternative you can use
this tool[2] to setup geo-rep.


[1]. https://docs.gluster.org/en/latest/Administrator%20Guide/Geo%20Replication/
[2]. http://aravindavk.in/blog/gluster-georep-tools/


/Sunny

On Thu, Mar 21, 2019 at 11:28 AM Maurya M  wrote:
>
> Hi Sunil,
>  I did run the on the slave node :
>  /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser 
> vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781
> getting this message "/home/azureuser/common_secret.pem.pub not present. 
> Please run geo-replication command on master with push-pem option to generate 
> the file"
>
> So went back and created the session again, no change, so manually copied the 
> common_secret.pem.pub to /home/azureuser/ but still the 
> set_geo_rep_pem_keys.sh is looking the pem file in different name :  
> COMMON_SECRET_PEM_PUB=${master_vol}_${slave_vol}_common_secret.pem.pub , 
> change the name of pem , ran the command again :
>
>  /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh azureuser 
> vol_041afbc53746053368a1840607636e97 vol_a5aee81a873c043c99a938adcb5b5781
> Successfully copied file.
> Command executed successfully.
>
>
> - went back and created the session , start the geo-replication , still 
> seeing the  same error in logs. Any ideas ?
>
> thanks,
> Maurya
>
>
>
> On Wed, Mar 20, 2019 at 11:07 PM Sunny Kumar  wrote:
>>
>> Hi Maurya,
>>
>> I guess you missed last trick to distribute keys in slave node. I see
>> this is non-root geo-rep setup so please try this:
>>
>>
>> Run the following command as root in any one of Slave node.
>>
>> /usr/local/libexec/glusterfs/set_geo_rep_pem_keys.sh  
>>  
>>
>> - Sunny
>>
>> On Wed, Mar 20, 2019 at 10:47 PM Maurya M  wrote:
>> >
>> > Hi all,
>> >  Have setup a 3 master nodes - 3 slave nodes (gluster 4.1) for 
>> > geo-replication, but once have the geo-replication configure the status is 
>> > always on "Created',
>> > even after have force start the session.
>> >
>> > On close inspect of the logs on the master node seeing this error:
>> >
>> > "E [syncdutils(monitor):801:errlog] Popen: command returned error   
>> > cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i 
>> > /var/lib/glusterd/geo-replication/secret.pem -p 22 
>> > azureu...@x...xxx. gluster --xml --remote-host=localhost volume 
>> > info vol_a5ae34341a873c043c99a938adcb5b5781  error=255"
>> >
>> > Any ideas what is issue?
>> >
>> > thanks,
>> > Maurya
>> >
>> > ___
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org
>> > https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Heal flapping between Possibly undergoing heal and In split brain

2019-03-21 Thread Milos Cuculovic
Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal shows 
this:

sudo gluster volume heal storage2 info
Brick storage3:/data/data-cluster
 
 
/dms/final_archive - Possibly undergoing heal

Status: Connected
Number of entries: 3

Brick storage4:/data/data-cluster
 
/dms/final_archive - Possibly undergoing heal

Status: Connected
Number of entries: 2

The same files stay there. From time to time the status of the 
/dms/final_archive is in split brain at the following command shows:

sudo gluster volume heal storage2 info split-brain
Brick storage3:/data/data-cluster
/dms/final_archive
Status: Connected
Number of entries in split-brain: 1

Brick storage4:/data/data-cluster
/dms/final_archive
Status: Connected
Number of entries in split-brain: 1

How to know the file who is in split brain? The files in /dms/final_archive are 
not very important, fine to remove (ideally resolve the split brain) for the 
ones that differ.

I can only see the directory and GFID. Any idea on how to resolve this 
situation as I would like to continue with the upgrade on the 2nd server, and 
for this the heal needs to be done with 0 entries in sudo gluster volume heal 
storage2 info

Thank you in advance, Milos.
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Heal flapping between Possibly undergoing heal and In split brain

2019-03-21 Thread Milos Cuculovic
I was now able to catch the split brain log:

sudo gluster volume heal storage2 info
Brick storage3:/data/data-cluster
 
 
/dms/final_archive - Is in split-brain

Status: Connected
Number of entries: 3

Brick storage4:/data/data-cluster
 
/dms/final_archive - Is in split-brain

Status: Connected
Number of entries: 2

Milos

> On 21 Mar 2019, at 09:07, Milos Cuculovic  wrote:
> 
> Since 24h, after upgrading from 4.0 to 4.1.7 one of the servers, the heal 
> shows this:
> 
> sudo gluster volume heal storage2 info
> Brick storage3:/data/data-cluster
>  
>  
> /dms/final_archive - Possibly undergoing heal
> 
> Status: Connected
> Number of entries: 3
> 
> Brick storage4:/data/data-cluster
>  
> /dms/final_archive - Possibly undergoing heal
> 
> Status: Connected
> Number of entries: 2
> 
> The same files stay there. From time to time the status of the 
> /dms/final_archive is in split brain at the following command shows:
> 
> sudo gluster volume heal storage2 info split-brain
> Brick storage3:/data/data-cluster
> /dms/final_archive
> Status: Connected
> Number of entries in split-brain: 1
> 
> Brick storage4:/data/data-cluster
> /dms/final_archive
> Status: Connected
> Number of entries in split-brain: 1
> 
> How to know the file who is in split brain? The files in /dms/final_archive 
> are not very important, fine to remove (ideally resolve the split brain) for 
> the ones that differ.
> 
> I can only see the directory and GFID. Any idea on how to resolve this 
> situation as I would like to continue with the upgrade on the 2nd server, and 
> for this the heal needs to be done with 0 entries in sudo gluster volume heal 
> storage2 info
> 
> Thank you in advance, Milos.

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-21 Thread Hu Bert
Good morning,

looks like on 2 clients there was an automatic cleanup:

[2019-03-21 05:04:52.857127] I [fuse-bridge.c:5144:fuse_thread_proc]
0-fuse: initating unmount of /data/repository/shared/public
[2019-03-21 05:04:52.857507] W [glusterfsd.c:1500:cleanup_and_exit]
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4) [0x7fa062cf64a4]
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+
0xfd) [0x56223e5b291d] -->/usr/sbin/glusterfs(cleanup_and_exit+0x54)
[0x56223e5b2774] ) 0-: received signum (15), shutting down
[2019-03-21 05:04:52.857532] I [fuse-bridge.c:5914:fini] 0-fuse:
Unmounting '/data/repository/shared/public'.
[2019-03-21 05:04:52.857547] I [fuse-bridge.c:5919:fini] 0-fuse:
Closing fuse connection to '/data/repository/shared/public'.

On the 3rd client i unmounted both volumes, killed the 4 processes and
mounted the volumes again. Now no more "dict is NULL" messages. Fine
:-)

Best regards,
Hubert

Am Mi., 20. März 2019 um 09:39 Uhr schrieb Hu Bert :
>
> Hi,
>
> i updated our live systems (debian stretch) from 5.3 -> 5.5 this
> morning; update went fine so far :-)
>
> However, on 3 (of 9) clients, the log entries still appear. The
> upgrade steps for all clients were identical:
>
> - install 5.5 (via apt upgrade)
> - umount volumes
> - mount volumes
>
> Interestingly the log entries still refer to version 5.3:
>
> [2019-03-20 08:38:31.880132] W [dict.c:761:dict_ref]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/quick-read.so(+0x6df4)
> [0x7f35f214ddf4]
> -->/usr/lib/x86_64-linux-gnu/glusterfs/5.3/xlator/performance/io-cache.so(+0xa39d)
> [0x7f35f235f39d]
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_ref+0x58)
> [0x7f35f9403a38] ) 11-dict: dict is NULL [Invalid argument]
>
> First i thought there could be old processes running/hanging on these
> 3 clients, but I see that there are 4 processes (for 2 volumes)
> running on all clients:
>
> root 11234  0.0  0.2 1858720 580964 ?  Ssl  Mar11   7:23
> /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
> --lru-limit=0 --process-name fuse --volfile-server=gluster1
> --volfile-id=/persistent /data/repository/shared/private
> root 11323  0.6  2.5 10061536 6788940 ?Ssl  Mar11  77:42
> /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
> --lru-limit=0 --process-name fuse --volfile-server=gluster1
> --volfile-id=/workdata /data/repository/shared/public
> root 11789  0.0  0.0 874116 11076 ?Ssl  07:32   0:00
> /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
> --process-name fuse --volfile-server=gluster1 --volfile-id=/persistent
> /data/repository/shared/private
> root 11881  0.0  0.0 874116 10992 ?Ssl  07:32   0:00
> /usr/sbin/glusterfs --attribute-timeout=0 --entry-timeout=0
> --process-name fuse --volfile-server=gluster1 --volfile-id=/workdata
> /data/repository/shared/public
>
> The first 2 processes are for the "old" mount (with lru-limit=0), the
> last 2 processes are for the "new" mount. But only 3 clients still
> have these entries. Systems are running fine, no problems so far.
> Maybe wrong order of the update? If i look at
> https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/ -
> then it would be better to: unmount - upgrade - mount?
>
>
> Best regards,
> Hubert
>
> Am Di., 19. März 2019 um 15:53 Uhr schrieb Artem Russakovskii
> :
> >
> > The flood is indeed fixed for us on 5.5. However, the crashes are not.
> >
> > Sincerely,
> > Artem
> >
> > --
> > Founder, Android Police, APK Mirror, Illogical Robot LLC
> > beerpla.net | +ArtemRussakovskii | @ArtemR
> >
> >
> > On Mon, Mar 18, 2019 at 5:41 AM Hu Bert  wrote:
> >>
> >> Hi Amar,
> >>
> >> if you refer to this bug:
> >> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test
> >> setup i haven't seen those entries, while copying & deleting a few GBs
> >> of data. For a final statement we have to wait until i updated our
> >> live gluster servers - could take place on tuesday or wednesday.
> >>
> >> Maybe other users can do an update to 5.4 as well and report back here.
> >>
> >>
> >> Hubert
> >>
> >>
> >>
> >> Am Mo., 18. März 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan
> >> :
> >> >
> >> > Hi Hu Bert,
> >> >
> >> > Appreciate the feedback. Also are the other boiling issues related to 
> >> > logs fixed now?
> >> >
> >> > -Amar
> >> >
> >> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert  wrote:
> >> >>
> >> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2
> >> >> volumes done. In 'gluster peer status' the peers stay connected during
> >> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the
> >> >> logs. Looks good :-)
> >> >>
> >> >> Am Mo., 18. März 2019 um 09:54 Uhr schrieb Hu Bert 
> >> >> :
> >> >> >
> >> >> > Good morning :-)
> >> >> >
> >> >> > for debian the packages are there:
> >> >> > https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
> >> >> >
> >> >> > I'll do an upgrade of a test