Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-01-31 Thread Amar Tumballi Suryanarayan
Hi Artem,

Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as a clone
of other bugs where recent discussions happened), and marked it as a
blocker for glusterfs-5.4 release.

We already have fixes for log flooding - https://review.gluster.org/22128,
and are the process of identifying and fixing the issue seen with crash.

Can you please tell if the crashes happened as soon as upgrade ? or was
there any particular pattern you observed before the crash.

-Amar


On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii 
wrote:

> Within 24 hours after updating from rock solid 4.1 to 5.3, I already got a
> crash which others have mentioned in
> https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to unmount,
> kill gluster, and remount:
>
>
> [2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fcccafcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
> [2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fcccafcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
> [2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fcccafcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
> [2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
> [0x7fcccafcd329]
> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
> [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
> [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
> The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
> 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3"
> repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31
> 09:38:03.958061]
> The message "E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
> handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
> [2019-01-31 09:38:04.696993]
> pending frames:
> frame : type(1) op(READ)
> frame : type(1) op(OPEN)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 6
> time of crash:
> 2019-01-31 09:38:04
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 5.3
> /usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
> /usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
> /lib64/libc.so.6(+0x36160)[0x7fccd622d160]
> /lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
> /lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
> /lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
> /lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
> /lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
>
> /usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
>
> /usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
> /usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
> /usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
> /usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
> /usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
> /lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
> /lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
> -
>
> Do the pending patches fix the crash or only the repeated warnings? I'm
> running glusterfs on OpenSUSE 15.0 installed via
> http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
> not too sure how to make it core dump.
>
> If it's not fixed by the patches above, has anyone already opened a ticket
> for the crashes that I can join and monitor? This is going to create a
> massive problem for us since production systems are crashing.
>
> Thanks.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa 
> wrote:
>
>>
>>
>> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii 
>> wrote:
>>
>>> Also, not sure if related or not, but I got a ton of these "Failed to
>>> dispatch handler" in 

Re: [Gluster-users] Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/us

2019-01-31 Thread Artem Russakovskii
Within 24 hours after updating from rock solid 4.1 to 5.3, I already got a
crash which others have mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to unmount,
kill gluster, and remount:


[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fcccafcd329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fcccafcd329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fcccafcd329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref]
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
[0x7fcccafcd329]
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
[0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
[0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3"
repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31
09:38:03.958061]
The message "E [MSGID: 101191]
[event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
handler" repeated 72 times between [2019-01-31 09:37:53.746741] and
[2019-01-31 09:38:04.696993]
pending frames:
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-01-31 09:38:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
-

Do the pending patches fix the crash or only the repeated warnings? I'm
running glusterfs on OpenSUSE 15.0 installed via
http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/,
not too sure how to make it core dump.

If it's not fixed by the patches above, has anyone already opened a ticket
for the crashes that I can join and monitor? This is going to create a
massive problem for us since production systems are crashing.

Thanks.

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa 
wrote:

>
>
> On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii 
> wrote:
>
>> Also, not sure if related or not, but I got a ton of these "Failed to
>> dispatch handler" in my logs as well. Many people have been commenting
>> about this issue here https://bugzilla.redhat.com/show_bug.cgi?id=1651246
>> .
>>
>
> https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this.
>
>
>> ==> mnt-SITE_data1.log <==
>>> [2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref]
>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>> [0x7fd966fcd329]
>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>> [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>> [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
>>> ==> mnt-SITE_data3.log <==
>>> The message "E [MSGID: 101191]
>>> [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch
>>> 

[Gluster-users] Gluster Monthly Newsletter, January 2019

2019-01-31 Thread Amye Scavarda
Gluster Monthly Newsletter, January 2019

Gluster Community Survey - open from February 1st through February 28! Give
us your feedback, we’ll send you a never before seen Gluster branded item!
https://www.gluster.org/gluster-community-survey-february-2019/

See you at FOSDEM! We have a jampacked Software Defined Storage day on
Sunday, Feb 3rd  (with a few sessions on the previous day):
https://fosdem.org/2019/schedule/track/software_defined_storage/
We also have a shared stand with Ceph, come find us!

Gluster 6 - We’re in planning for our Gluster 6 release, currently
scheduled for Feb-March 2019. More details on the mailing lists at
https://lists.gluster.org/pipermail/gluster-devel/2018-November/055672.html

Contributors
Top Contributing Companies:  Red Hat, Comcast, DataLab, Gentoo Linux,
Facebook, BioDec, Samsung, Etersoft

Top Contributors in January: Amar Tumballi, Kinglong Mee, Sunny Kumar,
Susant Palai, Ravishankar N


Noteworthy Threads:
[Gluster-users] GCS 0.5 release -
https://lists.gluster.org/pipermail/gluster-users/2019-January/035597.html
[Gluster-users] Announcing Gluster release 5.3 and 4.1.7 -
https://lists.gluster.org/pipermail/gluster-users/2019-January/035656.html
[Gluster-users] Improvements to Gluster upstream documentation -
https://lists.gluster.org/pipermail/gluster-users/2019-January/035741.html
[Gluster-devel] Tests for the GCS stack using the k8s framework
https://lists.gluster.org/pipermail/gluster-devel/2019-January/055765.html
[Gluster-devel] Gluster Maintainer's meeting: 7th Jan, 2019 - Agenda
https://lists.gluster.org/pipermail/gluster-devel/2019-January/055767.html
[Gluster-devel] Implementing multiplexing for self heal client.
https://lists.gluster.org/pipermail/gluster-devel/2019-January/055768.html
[Gluster-devel] Regression health for release-5.next and release-6
https://lists.gluster.org/pipermail/gluster-devel/2019-January/055775.html
[Gluster-devel] FUSE directory filehandle
https://lists.gluster.org/pipermail/gluster-devel/2019-January/055776.html
[Gluster-devel] Latency analysis of GlusterFS' network layer for pgbench


https://lists.gluster.org/pipermail/gluster-devel/2019-January/055782.html
[Gluster-devel] Release 6: Kick off!
https://lists.gluster.org/pipermail/gluster-devel/2019-January/055793.html
[Gluster-devel] Maintainer's meeting: Jan 21st, 2019
https://lists.gluster.org/pipermail/gluster-devel/2019-January/055798.html
[Gluster-devel] Infer results - Glusterfs
https://lists.gluster.org/pipermail/gluster-devel/2019-January/055814.html

Events:

FOSDEM, Feb 2-3 2019 in Brussels, Belgium - https://fosdem.org/2019/
Vault: February 25–26, 2019 - https://www.usenix.org/conference/vault19/


-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Files losing permissions in GlusterFS 3.12

2019-01-31 Thread Gudrun Mareike Amedick
Hi Nithya,

That's what I'm getting from file3:

getfattr -d -m. -e hex $file3 
# file: $file3
trusted.ec.config=0x080602000200
trusted.ec.dirty=0x
trusted.ec.size=0x006c8aba
trusted.ec.version=0x000f0019
trusted.gfid=0x47d6124290e844e2b733740134a657ce
trusted.gfid2path.60d8a15c6ccaf15b=0x36363732366635372d396533652d343337372d616637382d6366353061636434306265322f616c676f732e63707974686f6e2d33356d2d783
8365f36342d6c696e75782d676e752e736f
trusted.glusterfs.quota.66726f57-9e3e-4377-af78-cf50acd40be2.contri.3=0x001b2401
trusted.pgfid.66726f57-9e3e-4377-af78-cf50acd40be2=0x0001So, no dht 
attribute. I think.

That's what I found in the rebalance logs. rebalance.log.3 was another 
rebalance that, to our knowledge, finished without problems. I included the
results from both rebalances, just in case. There is no mention of this file in 
the logs of the other servers.


root@gluster06:/var/log/glusterfs# zgrep $file3 $VOLUME-rebalance.log*
$VOLUME-rebalance.log.1:[2019-01-12 07:17:06.243620] I [MSGID: 109045] 
[dht-common.c:2456:dht_lookup_cbk] 0-$VOLUME-dht: linkfile not having link
subvol for $file3
$VOLUME-rebalance.log.1:[2019-01-12 07:17:06.275213] I [MSGID: 109069] 
[dht-common.c:1410:dht_lookup_unlink_of_false_linkto_cbk] 0-$VOLUME-dht:
lookup_unlink returned with op_ret -> 0 and op-errno -> 0 for $file3
$VOLUME-rebalance.log.1:[2019-01-12 07:17:06.307754] I 
[dht-rebalance.c:1570:dht_migrate_file] 0-$VOLUME-dht: $file3: attempting to 
move from $VOLUME-
readdir-ahead-6 to $VOLUME-readdir-ahead-8
$VOLUME-rebalance.log.1:[2019-01-12 07:17:06.341451] I 
[dht-rebalance.c:1570:dht_migrate_file] 0-$VOLUME-dht: $file3: attempting to 
move from $VOLUME-
readdir-ahead-6 to $VOLUME-readdir-ahead-8
$VOLUME-rebalance.log.1:[2019-01-12 07:17:06.488473] I [MSGID: 109022] 
[dht-rebalance.c:2218:dht_migrate_file] 0-$VOLUME-dht: completed migration of
$file3 from subvolume $VOLUME-readdir-ahead-6 to $VOLUME-readdir-ahead-8
$VOLUME-rebalance.log.1:[2019-01-12 07:17:06.494803] W [MSGID: 109023] 
[dht-rebalance.c:2094:dht_migrate_file] 0-$VOLUME-dht: Migrate file
failed:$file3: failed to get xattr from $VOLUME-readdir-ahead-6 [No such file 
or directory]
$VOLUME-rebalance.log.1:[2019-01-12 07:17:06.499016] W 
[dht-rebalance.c:2159:dht_migrate_file] 0-$VOLUME-dht: $file3: failed to 
perform removexattr on
$VOLUME-readdir-ahead-8 (No data available)
$VOLUME-rebalance.log.1:[2019-01-12 07:17:06.499776] W [MSGID: 109023] 
[dht-rebalance.c:2179:dht_migrate_file] 0-$VOLUME-dht: $file3: failed to do a
stat on $VOLUME-readdir-ahead-6 [No such file or directory]
$VOLUME-rebalance.log.1:[2019-01-12 07:17:06.500900] I [MSGID: 109022] 
[dht-rebalance.c:2218:dht_migrate_file] 0-$VOLUME-dht: completed migration of
$file3 from subvolume $VOLUME-readdir-ahead-6 to $VOLUME-readdir-ahead-8

$VOLUME-rebalance.log.3.gz:[2018-12-10 23:18:43.145616] I 
[dht-rebalance.c:1570:dht_migrate_file] 0-$VOLUME-dht: $file3: attempting to 
move from
$VOLUME-disperse-6 to $VOLUME-disperse-8
$VOLUME-rebalance.log.3.gz:[2018-12-10 23:18:43.150303] W [MSGID: 109023] 
[dht-rebalance.c:1013:__dht_check_free_space] 0-$VOLUME-dht: data movement
of file {blocks:13896 name:($file3)} would result in dst node 
($VOLUME-disperse-8:23116260576) having lower disk space than the source node 
($VOLUME-
disperse-6:23521698592).Skipping file.
$VOLUME-rebalance.log.3.gz:[2018-12-10 23:18:43.153051] I [MSGID: 109126] 
[dht-rebalance.c:2812:gf_defrag_migrate_single_file] 0-$VOLUME-dht: File
migration skipped for $file3.

Kind regards,

Gudrun


Am Donnerstag, den 31.01.2019, 14:46 +0530 schrieb Nithya Balachandran:
> 
> 
> On Wed, 30 Jan 2019 at 19:12, Gudrun Mareike Amedick 
>  wrote:
> > Hi,
> > 
> > a bit additional info inlineAm Montag, den 28.01.2019, 10:23 +0100 schrieb 
> > Frank Ruehlemann:
> > > Am Montag, den 28.01.2019, 09:50 +0530 schrieb Nithya Balachandran:
> > > > 
> > > > On Fri, 25 Jan 2019 at 20:51, Gudrun Mareike Amedick <
> > > > g.amed...@uni-luebeck.de> wrote:
> > > > 
> > > > > 
> > > > > Hi all,
> > > > > 
> > > > > we have a problem with a distributed dispersed volume (GlusterFS 
> > > > > 3.12). We
> > > > > have files that lost their permissions or gained sticky bits. The 
> > > > > files
> > > > > themselves seem to be okay.
> > > > > 
> > > > > It looks like this:
> > > > > 
> > > > > # ls -lah $file1
> > > > > -- 1 www-data www-data 45M Jan 12 07:01 $file1
> > > > > 
> > > > > # ls -lah $file2
> > > > > -rw-rwS--T 1 $user $group 11K Jan  9 11:48 $file2
> > > > > 
> > > > > # ls -lah $file3
> > > > > -T 1 $user $group 6.8M Jan 12 08:17 $file3
> > > > > 
> > > > > These are linkto files (internal dht files) and should not be visible 
> > > > > on
> > > > the mount point. Are they consistently visible like this or do they 
> > > > revert
> > > > to the proper permissions after some time?
> > > They didn't heal yet, even after more than 4 weeks. 

[Gluster-users] glusterfs 4.1.6 improving folder listing

2019-01-31 Thread Amudhan P
Hi,

What is the option to improve folder listing speed in glusterfs 4.1.6 with
distributed-disperse volume?

regards
Amudhan
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] chrome / chromium crash on gluster

2019-01-31 Thread Dr. Michael J. Chudobiak

On 1/31/19 2:21 AM, Amar Tumballi Suryanarayan wrote:
Interesting, I run F29 for all development, and didn't see anything like 
this.


Please share 'gluster volume info'. And also logs from mount process.




[root@gluster1 ~]# gluster volume info

Volume Name: volume1
Type: Replicate
Volume ID: 91ef5aed-94be-44ff-a19d-c41682808159
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: gluster1:/gluster/brick1/data
Brick2: gluster2:/gluster/brick2/data
Options Reconfigured:
nfs.disable: on
server.allow-insecure: on
cluster.favorite-child-policy: mtime



And a client mount log is below - although the log is full megabytes of:

The message "E [MSGID: 101191] 
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
dispatch handler" repeated 20178 times between [2019-01-31 
13:44:14.962950] and [2019-01-31 13:46:00.013310]


and

[2019-01-31 13:46:07.470163] W [dict.c:761:dict_ref] 
(-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7c45) 
[0x7fb0e0b49c45] 
-->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaba1) 
[0x7fb0e0b5cba1] -->/lib64/libglusterfs.so.0(dict_ref+0x60) 
[0x7fb0f2457c40] ) 0-dict: dict is NULL [Invalid argument]


so I've just shown the start of the log. I guess that's related to 
https://bugzilla.redhat.com/show_bug.cgi?id=1651246.


- Mike




Mount log:

[2019-01-31 13:44:00.775353] I [MSGID: 100030] [glusterfsd.c:2715:main] 
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 
(args: /usr/sbin/glusterfs --process-name fuse --volfile-server=gluster1 
--volfile-server=gluster2 --volfile-id=/volume1 /fileserver2)
[2019-01-31 13:44:00.817140] I [MSGID: 101190] 
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 1
[2019-01-31 13:44:00.926491] I [MSGID: 101190] 
[event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread 
with index 2
[2019-01-31 13:44:00.928102] I [MSGID: 114020] [client.c:2354:notify] 
0-volume1-client-0: parent translators are ready, attempting connect on 
transport
[2019-01-31 13:44:00.931063] I [MSGID: 114020] [client.c:2354:notify] 
0-volume1-client-1: parent translators are ready, attempting connect on 
transport
[2019-01-31 13:44:00.932144] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 
0-volume1-client-0: changing port to 49152 (from 0)

Final graph:
+--+
  1: volume volume1-client-0
  2: type protocol/client
  3: option ping-timeout 42
  4: option remote-host gluster1
  5: option remote-subvolume /gluster/brick1/data
  6: option transport-type socket
  7: option transport.tcp-user-timeout 0
  8: option transport.socket.keepalive-time 20
  9: option transport.socket.keepalive-interval 2
 10: option transport.socket.keepalive-count 9
 11: option send-gids true
 12: end-volume
 13:
 14: volume volume1-client-1
 15: type protocol/client
 16: option ping-timeout 42
 17: option remote-host gluster2
 18: option remote-subvolume /gluster/brick2/data
 19: option transport-type socket
 20: option transport.tcp-user-timeout 0
 21: option transport.socket.keepalive-time 20
 22: option transport.socket.keepalive-interval 2
 23: option transport.socket.keepalive-count 9
 24: option send-gids true
 25: end-volume
 26:
 27: volume volume1-replicate-0
 28: type cluster/replicate
 29: option afr-pending-xattr volume1-client-0,volume1-client-1
 30: option favorite-child-policy mtime
 31: option use-compound-fops off
 32: subvolumes volume1-client-0 volume1-client-1
 33: end-volume
 34:
 35: volume volume1-dht
 36: type cluster/distribute
[2019-01-31 13:44:00.932495] E [MSGID: 101191] 
[event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to 
dispatch handler

 37: option lock-migration off
 38: option force-migration off
 39: subvolumes volume1-replicate-0
 40: end-volume
 41:
 42: volume volume1-write-behind
 43: type performance/write-behind
 44: subvolumes volume1-dht
 45: end-volume
 46:
 47: volume volume1-read-ahead
 48: type performance/read-ahead
 49: subvolumes volume1-write-behind
 50: end-volume
 51:
 52: volume volume1-readdir-ahead
 53: type performance/readdir-ahead
 54: option parallel-readdir off
 55: option rda-request-size 131072
 56: option rda-cache-limit 10MB
 57: subvolumes volume1-read-ahead
 58: end-volume
 59:
 60: volume volume1-io-cache
 61: type performance/io-cache
 62: subvolumes volume1-readdir-ahead
 63: end-volume
 64:
 65: volume volume1-quick-read
 66: type performance/quick-read
 67: subvolumes volume1-io-cache
 68: end-volume
 69:
 70: volume volume1-open-behind
 71: type performance/open-behind
 72: subvolumes volume1-quick-read
 73: end-volume
 74:
 75: volume volume1-md-cache
 76: type performance/md-cache
 77: subvolumes volume1-open-behind
 

Re: [Gluster-users] Files losing permissions in GlusterFS 3.12

2019-01-31 Thread Nithya Balachandran
On Wed, 30 Jan 2019 at 19:12, Gudrun Mareike Amedick <
g.amed...@uni-luebeck.de> wrote:

> Hi,
>
> a bit additional info inlineAm Montag, den 28.01.2019, 10:23 +0100 schrieb
> Frank Ruehlemann:
> > Am Montag, den 28.01.2019, 09:50 +0530 schrieb Nithya Balachandran:
> > >
> > > On Fri, 25 Jan 2019 at 20:51, Gudrun Mareike Amedick <
> > > g.amed...@uni-luebeck.de> wrote:
> > >
> > > >
> > > > Hi all,
> > > >
> > > > we have a problem with a distributed dispersed volume (GlusterFS
> 3.12). We
> > > > have files that lost their permissions or gained sticky bits. The
> files
> > > > themselves seem to be okay.
> > > >
> > > > It looks like this:
> > > >
> > > > # ls -lah $file1
> > > > -- 1 www-data www-data 45M Jan 12 07:01 $file1
> > > >
> > > > # ls -lah $file2
> > > > -rw-rwS--T 1 $user $group 11K Jan  9 11:48 $file2
> > > >
> > > > # ls -lah $file3
> > > > -T 1 $user $group 6.8M Jan 12 08:17 $file3
> > > >
> > > > These are linkto files (internal dht files) and should not be
> visible on
> > > the mount point. Are they consistently visible like this or do they
> revert
> > > to the proper permissions after some time?
> > They didn't heal yet, even after more than 4 weeks. Therefore we decided
> > to recommend our users to fix their files by setting the correct
> > permissions again, which worked without problems. But for analysis
> > reasons we still have some broken files nobody touched yet.
> >
> > We know these linkto files but they were never visible to clients. We
> > did these ls-commands on a client, not on a brick.
>
> They have linkfile permissions but on brick side, it looks like this:
>
> root@gluster06:~# ls -lah /$brick/$file3
> -T 2 $user $group 1.7M Jan 12 08:17 /$brick/$file3
>
> That seems to be too big for a linkfile. Also, there is no file it could
> link to. There's no other file with that name at that path on any other
> subvolume.
>

This sounds like the rebalance failed to transition the file from a linkto
to a data file once the migration was complete. Please check the rebalance
logs on all nodes for any messages that refer to this file.
If you still see any such files, please check the its xattrs directly on
the brick. You should see one called trusted.glusterfs.dht.linkto. Let me
know if that is missing.

Regards,
Nithya

>
>
> >
> > >
> > > >
> > > > This is not what the permissions are supposed to look. They were 644
> or
> > > > 660 before. And they definitely had no sticky bits.
> > > > The permissions on the bricks match what I see on client side. So I
> think
> > > > the original permissions are lost without a chance to recover them,
> right?
> > > >
> > > >
> > > > With some files with weird looking permissions (but not with all of
> them),
> > > > I can do this:
> > > > # ls -lah $path/$file4
> > > > -rw-r--r-- 1 $user $group 6.0G Oct 11 09:34 $path/$file4
> > > > ls -lah $path | grep $file4
> > > > -rw-r-Sr-T  1 $user$group 6.0G Oct 11 09:34 $file4
> > >
> > > >
> > > > So, the permissions I see depend on how I'm querying them. The
> permissions
> > > > on brick side agree with the ladder result, stat sees the former.
> I'm not
> > > > sure how that works.
> > > >
> > > The S and T bits indicate that a file is being migrated. The difference
> > > seems to be because of the way lookup versus readdirp handle this  -
> this
> > > looks like a bug. Lookup will strip out the internal permissions set. I
> > > don't think readdirp does. This is happening because a rebalance is in
> > > progress.
> > There is no active rebalance. At least in "gluster volume rebalance
> > $VOLUME status" is none visible.
> >
> > And in the rebalance log file of this volume is the last line:
> > "[2019-01-11 02:14:50.101944] W … received signum (15), shutting down"
> >
> > >
> > > >
> > > > We know for at least a part of those files that they were okay at
> December
> > > > 19th. We got the first reports of weird-looking permissions at
> January
> > > > 12th. Between that, there was a rebalance running (January 7th to
> January
> > > > 11th). During that rebalance, a node was offline for a longer period
> of time
> > > > due to hardware issues. The output of "gluster volume heal $VOLUME
> info"
> > > > shows no files though.
> > > >
> > > > For all files with broken permissions we found so far, the following
> lines
> > > > are in the rebalance log:
> > > >
> > > > [2019-01-07 09:31:11.004802] I [MSGID: 109045]
> > > > [dht-common.c:2456:dht_lookup_cbk] 0-$VOLUME-dht: linkfile not
> having link
> > > > subvol for $file5
> > > > [2019-01-07 09:31:11.262273] I [MSGID: 109069]
> > > > [dht-common.c:1410:dht_lookup_unlink_of_false_linkto_cbk]
> 0-$VOLUME-dht:
> > > > lookup_unlink returned with
> > > > op_ret -> 0 and op-errno -> 0 for $file5
> > > > [2019-01-07 09:31:11.266014] I
> [dht-rebalance.c:1570:dht_migrate_file]
> > > > 0-$VOLUME-dht: $file5: attempting to move from
> $VOLUME-readdir-ahead-0 to
> > > > $VOLUME-readdir-ahead-5
> > > > [2019-01-07 09:31:11.278120] I
> 

Re: [Gluster-users] Default Port Range for Bricks

2019-01-31 Thread David Spisla
Thank you for the clarification.

Am Do., 31. Jan. 2019 um 04:22 Uhr schrieb Atin Mukherjee <
amukh...@redhat.com>:

>
>
> On Tue, Jan 29, 2019 at 8:52 PM David Spisla  wrote:
>
>> Hello Gluster Community,
>>
>> in glusterd.vol are parameters to define the port range for the bricks.
>> They are commented out per default:
>>
>> # option base-port 49152
>> # option max-port  65535
>> I assume that glusterd is not using this range if the parameters are 
>> commented out.
>>
>> The current commented out config of base and max port you see defined in
> the glusterd.vol are the same default which is defined in glusterd codebase
> as well. The intention of introducing these options in the config was to
> ensure if users want to bring in more granular control w.r.t port range the
> same can be achieved by defining the range in this file.
>
> However from glusterfs-6 onwards, we have fixed a bug 1659857 which will
> consider the default max port to be 60999.
>
>
>> But what range instead? Is there a way to find this out?
>>
>> Regards
>>
>> David Spisla
>>
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users