Re: [Gluster-users] Unable to Mount on Non-Server Machines

2017-03-10 Thread Andrew Kester
I did find that downgrading the clients to 3.9.1 resolved it.  Could it
be an issue in 3.10.0?

Thanks,

Andrew Kester
The Storehouse
https://sthse.co

On 03/10/2017 11:03 AM, Andrew Kester wrote:
> I have a Gluster volume that I'm unable to mount from some clients.  The 
> client process on the servers are able to mount the volume without issue, but 
> other clients are unable to.  All nodes are running Debian 8 and Gluster 
> 3.10.0.
> 
> I see a couple lines in the log that have "Using Program GlusterFS 3.3", this 
> is a pretty old volume definition but I've updated the operating mode as it's 
> been upgraded, setting it to 31000 with this most recent update.  Could that 
> be the issue?
> 
> Any help or guidance is appreciated.  Thanks!
> 
> glusterfs --version:
> glusterfs 3.10.0
> Repository revision: git://git.gluster.org/glusterfs.git
> Copyright (c) 2006-2016 Red Hat, Inc. 
> GlusterFS comes with ABSOLUTELY NO WARRANTY.
> It is licensed to you under your choice of the GNU Lesser
> General Public License, version 3 or any later version (LGPLv3
> or later), or the GNU General Public License, version 2 (GPLv2),
> in all cases as published by the Free Software Foundation.
> 
> Volume Info:
> Volume Name: gv0
> Type: Replicate
> Volume ID: 
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: pegasus.sthse.co:/mnt/brick
> Brick2: atlantia.sthse.co:/mnt/brick
> Options Reconfigured:
> server.allow-insecure: on
> transport.address-family: inet
> nfs.disable: on
> server.root-squash: off
> auth.allow: all
> features.scrub-freq: weekly
> features.scrub-throttle: normal
> features.scrub: Active
> features.bitrot: on
> performance.io-thread-count: 16
> 
> Log entries on clients:
> [2017-03-10 16:43:13.659555] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 
> 0-gv0-client-2: changing port to 49153 (from 0)
> [2017-03-10 16:43:13.659672] I [MSGID: 114057] 
> [client-handshake.c:1451:select_server_supported_programs] 0-gv0-client-1: 
> Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2017-03-10 16:43:13.660189] I [MSGID: 114057] 
> [client-handshake.c:1451:select_server_supported_programs] 0-gv0-client-2: 
> Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2017-03-10 16:43:13.660192] W [MSGID: 114043] 
> [client-handshake.c:1105:client_setvolume_cbk] 0-gv0-client-1: failed to set 
> the volume [Permission denied]
> [2017-03-10 16:43:13.660249] W [MSGID: 114007] 
> [client-handshake.c:1134:client_setvolume_cbk] 0-gv0-client-1: failed to get 
> 'process-uuid' from reply dict [Invalid argument]
> [2017-03-10 16:43:13.660266] E [MSGID: 114044] 
> [client-handshake.c:1140:client_setvolume_cbk] 0-gv0-client-1: SETVOLUME on 
> remote-host failed [Permission denied]
> [2017-03-10 16:43:13.660282] I [MSGID: 114049] 
> [client-handshake.c:1243:client_setvolume_cbk] 0-gv0-client-1: sending 
> AUTH_FAILED event
> [2017-03-10 16:43:13.660321] E [fuse-bridge.c:5322:notify] 0-fuse: Server 
> authenication failed. Shutting down.
> [2017-03-10 16:43:13.660345] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting 
> '/gv0'.
> [2017-03-10 16:43:13.660814] W [MSGID: 114043] 
> [client-handshake.c:1105:client_setvolume_cbk] 0-gv0-client-2: failed to set 
> the volume [Permission denied]
> [2017-03-10 16:43:13.660850] W [MSGID: 114007] 
> [client-handshake.c:1134:client_setvolume_cbk] 0-gv0-client-2: failed to get 
> 'process-uuid' from reply dict [Invalid argument]
> [2017-03-10 16:43:13.660869] E [MSGID: 114044] 
> [client-handshake.c:1140:client_setvolume_cbk] 0-gv0-client-2: SETVOLUME on 
> remote-host failed [Permission denied]
> [2017-03-10 16:43:13.660884] I [MSGID: 114049] 
> [client-handshake.c:1243:client_setvolume_cbk] 0-gv0-client-2: sending 
> AUTH_FAILED event
> [2017-03-10 16:43:13.673681] E [fuse-bridge.c:5322:notify] 0-fuse: Server 
> authenication failed. Shutting down.
> [2017-03-10 16:43:13.673997] W [glusterfsd.c:1329:cleanup_and_exit] 
> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f81c866e064] 
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x56191158cdb5] 
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x56191158cc27] ) 0-: received 
> signum (15), shutting down
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Issue with duplicated files in gluster 3.10

2017-03-10 Thread Ravishankar N

On 03/10/2017 10:32 PM, Luca Gervasi wrote:

Hi,
I'm Andrea's collegue. I'd like to add that we have no trusted.afr 
xattr on the root folder

Just to confirm, this would be 'includes2013' right?
where those files are located and every file seems to be clean on each 
brick.
You can find another example file's xattr here: 
https://nopaste.me/view/3c2014ac

Here a listing: https://nopaste.me/view/eb4430a2
This behavior causes the directory which contains those files 
undeletable (we had to clear them up on brick level, clearing all the 
hardlinks too).
This issue is visible on fuse mounted volumes while it's not 
noticeable when mounted in NFS through ganesha.


Could you provide the complete output of `gluster volume info`? I want 
to find out which bricks constitute a replica pair.


Also could you change the diagnostics.client-log-level to DEBUG 
temporarily, do an `ls ` on 
the fuse mount  and share the corresponding mount log?

Thanks,
Ravi


Thanks a lot.

Luca Gervasi



On Fri, 10 Mar 2017 at 17:41 Andrea Fogazzi > wrote:


Hi community,

we ran an extensive issue on our installation of gluster 3.10,
which we did upgraded from 3.8.8 (it's a distribute+replicate, 5
nodes, 3 bricks in replica 2+1 quorum); recently we noticed a
frequent issue where files get duplicated on the some of the
directories; this is visible on the fuse mount points (RW), but
not on the NFS/Ganesha (RO) mount points.


A sample of an ll output:


-T 1 48 web_rw 0 Mar 10 11:57 paginazione.shtml
-rw-rw-r-- 1 48 web_rw   272 Feb 18 22:00 paginazione.shtml

As you can see, the file is listed twice, but only one of the two
is good (the name is identical, we verified that no
spurious/hidden characters are present in the name); the issue
maybe is related on how we uploaded the files on the file system,
via incremental rsync on the fuse mount.

Do anyone have suggestion on how it can happen, how to solve
existing duplication or how to prevent to happen anymore.

Thanks in advance.
Best regards,
andrea

Options Reconfigured:
performance.cache-invalidation: true
cluster.favorite-child-policy: mtime
features.cache-invalidation: 1
network.inode-lru-limit: 9
performance.cache-size: 1024MB
storage.linux-aio: on
nfs.outstanding-rpc-limit: 64
storage.build-pgfid: on
cluster.server-quorum-type: server
cluster.self-heal-daemon: enable
performance.nfs.io-cache: on
performance.client-io-threads: on
performance.nfs.stat-prefetch: on
performance.nfs.io-threads: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
performance.md-cache-timeout: 1
performance.io-thread-count: 16
performance.high-prio-threads: 32
performance.normal-prio-threads: 32
performance.low-prio-threads: 32
performance.least-prio-threads: 1
nfs.acl: off
nfs.rpc-auth-unix: off
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
cluster.lookup-unhashed: auto
performance.nfs.quick-read: on
performance.nfs.read-ahead: on
cluster.quorum-type: auto
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.lookup-optimize: on
cluster.readdir-optimize: on
performance.read-ahead: off
performance.write-behind-window-size: 1MB
client.event-threads: 4
server.event-threads: 16
cluster.granular-entry-heal: enable
performance.parallel-readdir: on
cluster.server-quorum-ratio: 51



Andrea Fogazzi



___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd crashing

2017-03-10 Thread Vijay Bellur
On Fri, Mar 10, 2017 at 12:50 PM, Sergei Gerasenko 
wrote:

> I see why it's not saving the cores: the package isn't signed with the
> right signature. I will modify the abrd configs to change that behavior and
> wait for the next crash.
>

Ok, thanks. Please let us know when you get hold of the next core.

-Vijay


>
> On Fri, Mar 10, 2017 at 11:23 AM, Vijay Bellur  wrote:
>
>>
>>
>> On Fri, Mar 10, 2017 at 11:17 AM, Sergei Gerasenko 
>> wrote:
>>
>>> Hi,
>>>
>>> I'm running gluster 3.7.12. It's an 8-node distributed, replicated
>>> cluster (replica 2). It's had been working fine for a long time when all of
>>> a sudden I started seeing bricks going offline. Researching further I found
>>> messages like this:
>>>
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: pending frames:
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: frame : type(0)
>>> op(5)
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: patchset: git://
>>> git.gluster.com/glusterfs.git
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: signal received:
>>> 6
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: time of crash:
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: 2017-03-10
>>> 05:02:12
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: configuration
>>> details:
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: argp 1
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: backtrace 1
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: dlfcn 1
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: libpthread 1
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: llistxattr 1
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: setfsid 1
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: spinlock 1
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: epoll.h 1
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: xattr.h 1
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: st_atim.tv_nsec 1
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: package-string:
>>> glusterfs 3.7.12
>>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: -
>>>
>>> I initially thought it was related to quota support (based on some
>>> googling), so I turned off quota and also disabled NFS support to simplify
>>> the debugging. Every time after the crash, I restarted gluster and the
>>> bricks would go online for several hours only to crash again later. There
>>> are lots of messages like this preceding the crash:
>>>
>>> ...
>>> [2017-03-10 04:40:46.002225] E [MSGID: 113091]
>>> [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null)
>>> [2017-03-10 04:40:46.002278] E [MSGID: 113018]
>>> [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed
>>> [Invalid argument]
>>> The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
>>> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
>>> [2017-03-10 04:40:46.002225] and [2017-03-10 04:40:46.005699]
>>> The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
>>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
>>> times between [2017-03-10 04:40:46.002278] and [2017-03-10 04:40:46.005701]
>>> [2017-03-10 04:50:47.002170] E [MSGID: 113091]
>>> [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null)
>>> [2017-03-10 04:50:47.002219] E [MSGID: 113018]
>>> [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed
>>> [Invalid argument]
>>> The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
>>> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
>>> [2017-03-10 04:50:47.002170] and [2017-03-10 04:50:47.005623]
>>> The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
>>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
>>> times between [2017-03-10 04:50:47.002219] and [2017-03-10 04:50:47.005625]
>>> [2017-03-10 05:00:48.002246] E [MSGID: 113091]
>>> [posix.c:178:posix_lookup] 0-ftp_volume-posix: null gfid for path (null)
>>> [2017-03-10 05:00:48.002314] E [MSGID: 113018]
>>> [posix.c:196:posix_lookup] 0-ftp_volume-posix: lstat on null failed
>>> [Invalid argument]
>>> The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
>>> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
>>> [2017-03-10 05:00:48.002246] and [2017-03-10 05:00:48.005828]
>>> The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
>>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
>>> times between [2017-03-10 05:00:48.002314] and [2017-03-10 05:00:48.005830]
>>>
>>> One important detail I noticed yesterday is that one of the nodes was
>>> running gluster version 3.7.13! I'm not sure what did the upgrade. So I
>>> downgraded to 3.7.12 and restarted gluster. The crash above happened
>>> several hours later. But again, the 

Re: [Gluster-users] glusterfsd crashing

2017-03-10 Thread Sergei Gerasenko
I see why it's not saving the cores: the package isn't signed with the
right signature. I will modify the abrd configs to change that behavior and
wait for the next crash.

On Fri, Mar 10, 2017 at 11:23 AM, Vijay Bellur  wrote:

>
>
> On Fri, Mar 10, 2017 at 11:17 AM, Sergei Gerasenko 
> wrote:
>
>> Hi,
>>
>> I'm running gluster 3.7.12. It's an 8-node distributed, replicated
>> cluster (replica 2). It's had been working fine for a long time when all of
>> a sudden I started seeing bricks going offline. Researching further I found
>> messages like this:
>>
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: pending frames:
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: frame : type(0)
>> op(5)
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: patchset: git://
>> git.gluster.com/glusterfs.git
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: signal received: 6
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: time of crash:
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: 2017-03-10
>> 05:02:12
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: configuration
>> details:
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: argp 1
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: backtrace 1
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: dlfcn 1
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: libpthread 1
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: llistxattr 1
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: setfsid 1
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: spinlock 1
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: epoll.h 1
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: xattr.h 1
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: st_atim.tv_nsec 1
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: package-string:
>> glusterfs 3.7.12
>> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: -
>>
>> I initially thought it was related to quota support (based on some
>> googling), so I turned off quota and also disabled NFS support to simplify
>> the debugging. Every time after the crash, I restarted gluster and the
>> bricks would go online for several hours only to crash again later. There
>> are lots of messages like this preceding the crash:
>>
>> ...
>> [2017-03-10 04:40:46.002225] E [MSGID: 113091] [posix.c:178:posix_lookup]
>> 0-ftp_volume-posix: null gfid for path (null)
>> [2017-03-10 04:40:46.002278] E [MSGID: 113018] [posix.c:196:posix_lookup]
>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]
>> The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
>> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
>> [2017-03-10 04:40:46.002225] and [2017-03-10 04:40:46.005699]
>> The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
>> times between [2017-03-10 04:40:46.002278] and [2017-03-10 04:40:46.005701]
>> [2017-03-10 04:50:47.002170] E [MSGID: 113091] [posix.c:178:posix_lookup]
>> 0-ftp_volume-posix: null gfid for path (null)
>> [2017-03-10 04:50:47.002219] E [MSGID: 113018] [posix.c:196:posix_lookup]
>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]
>> The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
>> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
>> [2017-03-10 04:50:47.002170] and [2017-03-10 04:50:47.005623]
>> The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
>> times between [2017-03-10 04:50:47.002219] and [2017-03-10 04:50:47.005625]
>> [2017-03-10 05:00:48.002246] E [MSGID: 113091] [posix.c:178:posix_lookup]
>> 0-ftp_volume-posix: null gfid for path (null)
>> [2017-03-10 05:00:48.002314] E [MSGID: 113018] [posix.c:196:posix_lookup]
>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]
>> The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
>> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
>> [2017-03-10 05:00:48.002246] and [2017-03-10 05:00:48.005828]
>> The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
>> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
>> times between [2017-03-10 05:00:48.002314] and [2017-03-10 05:00:48.005830]
>>
>> One important detail I noticed yesterday is that one of the nodes was
>> running gluster version 3.7.13! I'm not sure what did the upgrade. So I
>> downgraded to 3.7.12 and restarted gluster. The crash above happened
>> several hours later. But again, the crashes had been happening before the
>> downgrade -- possibly because of the version mismatch on one of the nodes.
>>
>> Anybody have any ideas?
>>
>>
>
> Do you have the core files from the crashes? If so, can you please provide
> a gdb backtrace from one of the core 

Re: [Gluster-users] glusterfsd crashing

2017-03-10 Thread Vijay Bellur
On Fri, Mar 10, 2017 at 11:17 AM, Sergei Gerasenko 
wrote:

> Hi,
>
> I'm running gluster 3.7.12. It's an 8-node distributed, replicated cluster
> (replica 2). It's had been working fine for a long time when all of a
> sudden I started seeing bricks going offline. Researching further I found
> messages like this:
>
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: pending frames:
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: frame : type(0)
> op(5)
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: patchset: git://
> git.gluster.com/glusterfs.git
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: signal received: 6
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: time of crash:
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: 2017-03-10 05:02:12
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: configuration
> details:
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: argp 1
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: backtrace 1
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: dlfcn 1
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: libpthread 1
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: llistxattr 1
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: setfsid 1
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: spinlock 1
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: epoll.h 1
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: xattr.h 1
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: st_atim.tv_nsec 1
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: package-string:
> glusterfs 3.7.12
> Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: -
>
> I initially thought it was related to quota support (based on some
> googling), so I turned off quota and also disabled NFS support to simplify
> the debugging. Every time after the crash, I restarted gluster and the
> bricks would go online for several hours only to crash again later. There
> are lots of messages like this preceding the crash:
>
> ...
> [2017-03-10 04:40:46.002225] E [MSGID: 113091] [posix.c:178:posix_lookup]
> 0-ftp_volume-posix: null gfid for path (null)
> [2017-03-10 04:40:46.002278] E [MSGID: 113018] [posix.c:196:posix_lookup]
> 0-ftp_volume-posix: lstat on null failed [Invalid argument]
> The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
> [2017-03-10 04:40:46.002225] and [2017-03-10 04:40:46.005699]
> The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
> times between [2017-03-10 04:40:46.002278] and [2017-03-10 04:40:46.005701]
> [2017-03-10 04:50:47.002170] E [MSGID: 113091] [posix.c:178:posix_lookup]
> 0-ftp_volume-posix: null gfid for path (null)
> [2017-03-10 04:50:47.002219] E [MSGID: 113018] [posix.c:196:posix_lookup]
> 0-ftp_volume-posix: lstat on null failed [Invalid argument]
> The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
> [2017-03-10 04:50:47.002170] and [2017-03-10 04:50:47.005623]
> The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
> times between [2017-03-10 04:50:47.002219] and [2017-03-10 04:50:47.005625]
> [2017-03-10 05:00:48.002246] E [MSGID: 113091] [posix.c:178:posix_lookup]
> 0-ftp_volume-posix: null gfid for path (null)
> [2017-03-10 05:00:48.002314] E [MSGID: 113018] [posix.c:196:posix_lookup]
> 0-ftp_volume-posix: lstat on null failed [Invalid argument]
> The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
> 0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
> [2017-03-10 05:00:48.002246] and [2017-03-10 05:00:48.005828]
> The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
> 0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
> times between [2017-03-10 05:00:48.002314] and [2017-03-10 05:00:48.005830]
>
> One important detail I noticed yesterday is that one of the nodes was
> running gluster version 3.7.13! I'm not sure what did the upgrade. So I
> downgraded to 3.7.12 and restarted gluster. The crash above happened
> several hours later. But again, the crashes had been happening before the
> downgrade -- possibly because of the version mismatch on one of the nodes.
>
> Anybody have any ideas?
>
>

Do you have the core files from the crashes? If so, can you please provide
a gdb backtrace from one of the core files?

Thanks,
Vijay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Unable to Mount on Non-Server Machines

2017-03-10 Thread Andrew Kester
I have a Gluster volume that I'm unable to mount from some clients.  The client 
process on the servers are able to mount the volume without issue, but other 
clients are unable to.  All nodes are running Debian 8 and Gluster 3.10.0.

I see a couple lines in the log that have "Using Program GlusterFS 3.3", this 
is a pretty old volume definition but I've updated the operating mode as it's 
been upgraded, setting it to 31000 with this most recent update.  Could that be 
the issue?

Any help or guidance is appreciated.  Thanks!

glusterfs --version:
glusterfs 3.10.0
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. 
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

Volume Info:
Volume Name: gv0
Type: Replicate
Volume ID: 
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: pegasus.sthse.co:/mnt/brick
Brick2: atlantia.sthse.co:/mnt/brick
Options Reconfigured:
server.allow-insecure: on
transport.address-family: inet
nfs.disable: on
server.root-squash: off
auth.allow: all
features.scrub-freq: weekly
features.scrub-throttle: normal
features.scrub: Active
features.bitrot: on
performance.io-thread-count: 16

Log entries on clients:
[2017-03-10 16:43:13.659555] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 
0-gv0-client-2: changing port to 49153 (from 0)
[2017-03-10 16:43:13.659672] I [MSGID: 114057] 
[client-handshake.c:1451:select_server_supported_programs] 0-gv0-client-1: 
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-03-10 16:43:13.660189] I [MSGID: 114057] 
[client-handshake.c:1451:select_server_supported_programs] 0-gv0-client-2: 
Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2017-03-10 16:43:13.660192] W [MSGID: 114043] 
[client-handshake.c:1105:client_setvolume_cbk] 0-gv0-client-1: failed to set 
the volume [Permission denied]
[2017-03-10 16:43:13.660249] W [MSGID: 114007] 
[client-handshake.c:1134:client_setvolume_cbk] 0-gv0-client-1: failed to get 
'process-uuid' from reply dict [Invalid argument]
[2017-03-10 16:43:13.660266] E [MSGID: 114044] 
[client-handshake.c:1140:client_setvolume_cbk] 0-gv0-client-1: SETVOLUME on 
remote-host failed [Permission denied]
[2017-03-10 16:43:13.660282] I [MSGID: 114049] 
[client-handshake.c:1243:client_setvolume_cbk] 0-gv0-client-1: sending 
AUTH_FAILED event
[2017-03-10 16:43:13.660321] E [fuse-bridge.c:5322:notify] 0-fuse: Server 
authenication failed. Shutting down.
[2017-03-10 16:43:13.660345] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting 
'/gv0'.
[2017-03-10 16:43:13.660814] W [MSGID: 114043] 
[client-handshake.c:1105:client_setvolume_cbk] 0-gv0-client-2: failed to set 
the volume [Permission denied]
[2017-03-10 16:43:13.660850] W [MSGID: 114007] 
[client-handshake.c:1134:client_setvolume_cbk] 0-gv0-client-2: failed to get 
'process-uuid' from reply dict [Invalid argument]
[2017-03-10 16:43:13.660869] E [MSGID: 114044] 
[client-handshake.c:1140:client_setvolume_cbk] 0-gv0-client-2: SETVOLUME on 
remote-host failed [Permission denied]
[2017-03-10 16:43:13.660884] I [MSGID: 114049] 
[client-handshake.c:1243:client_setvolume_cbk] 0-gv0-client-2: sending 
AUTH_FAILED event
[2017-03-10 16:43:13.673681] E [fuse-bridge.c:5322:notify] 0-fuse: Server 
authenication failed. Shutting down.
[2017-03-10 16:43:13.673997] W [glusterfsd.c:1329:cleanup_and_exit] 
(-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x8064) [0x7f81c866e064] 
-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x56191158cdb5] 
-->/usr/sbin/glusterfs(cleanup_and_exit+0x57) [0x56191158cc27] ) 0-: received 
signum (15), shutting down
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Issue with duplicated files in gluster 3.10

2017-03-10 Thread Luca Gervasi
Hi,
I'm Andrea's collegue. I'd like to add that we have no trusted.afr xattr on
the root folder where those files are located and every file seems to be
clean on each brick.
You can find another example file's xattr here:
https://nopaste.me/view/3c2014ac
Here a listing: https://nopaste.me/view/eb4430a2
This behavior causes the directory which contains those files undeletable
(we had to clear them up on brick level, clearing all the hardlinks too).
This issue is visible on fuse mounted volumes while it's not noticeable
when mounted in NFS through ganesha.

Thanks a lot.

Luca Gervasi



On Fri, 10 Mar 2017 at 17:41 Andrea Fogazzi  wrote:

> Hi community,
>
> we ran an extensive issue on our installation of gluster 3.10, which we
> did upgraded from 3.8.8 (it's a distribute+replicate, 5 nodes, 3 bricks in
> replica 2+1 quorum); recently we noticed a frequent issue where files get
> duplicated on the some of the directories; this is visible on the fuse
> mount points (RW), but not on the NFS/Ganesha (RO) mount points.
>
>
> A sample of an ll output:
>
>
> -T 1 48 web_rw 0 Mar 10 11:57 paginazione.shtml
> -rw-rw-r-- 1 48 web_rw   272 Feb 18 22:00 paginazione.shtml
>
> As you can see, the file is listed twice, but only one of the two is good
> (the name is identical, we verified that no spurious/hidden characters are
> present in the name); the issue maybe is related on how we uploaded the
> files on the file system, via incremental rsync on the fuse mount.
>
> Do anyone have suggestion on how it can happen, how to solve existing
> duplication or how to prevent to happen anymore.
>
> Thanks in advance.
> Best regards,
> andrea
>
> Options Reconfigured:
> performance.cache-invalidation: true
> cluster.favorite-child-policy: mtime
> features.cache-invalidation: 1
> network.inode-lru-limit: 9
> performance.cache-size: 1024MB
> storage.linux-aio: on
> nfs.outstanding-rpc-limit: 64
> storage.build-pgfid: on
> cluster.server-quorum-type: server
> cluster.self-heal-daemon: enable
> performance.nfs.io-cache: on
> performance.client-io-threads: on
> performance.nfs.stat-prefetch: on
> performance.nfs.io-threads: on
> diagnostics.latency-measurement: on
> diagnostics.count-fop-hits: on
> performance.md-cache-timeout: 1
> performance.io-thread-count: 16
> performance.high-prio-threads: 32
> performance.normal-prio-threads: 32
> performance.low-prio-threads: 32
> performance.least-prio-threads: 1
> nfs.acl: off
> nfs.rpc-auth-unix: off
> diagnostics.client-log-level: ERROR
> diagnostics.brick-log-level: ERROR
> cluster.lookup-unhashed: auto
> performance.nfs.quick-read: on
> performance.nfs.read-ahead: on
> cluster.quorum-type: auto
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
> cluster.lookup-optimize: on
> cluster.readdir-optimize: on
> performance.read-ahead: off
> performance.write-behind-window-size: 1MB
> client.event-threads: 4
> server.event-threads: 16
> cluster.granular-entry-heal: enable
> performance.parallel-readdir: on
> cluster.server-quorum-ratio: 51
>
>
>
> Andrea Fogazzi
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Issue with duplicated files in gluster 3.10

2017-03-10 Thread Andrea Fogazzi
Hi community,

we ran an extensive issue on our installation of gluster 3.10, which we did 
upgraded from 3.8.8 (it's a distribute+replicate, 5 nodes, 3 bricks in replica 
2+1 quorum); recently we noticed a frequent issue where files get duplicated on 
the some of the directories; this is visible on the fuse mount points (RW), but 
not on the NFS/Ganesha (RO) mount points.


A sample of an ll output:


-T 1 48 web_rw 0 Mar 10 11:57 paginazione.shtml
-rw-rw-r-- 1 48 web_rw   272 Feb 18 22:00 paginazione.shtml

As you can see, the file is listed twice, but only one of the two is good (the 
name is identical, we verified that no spurious/hidden characters are present 
in the name); the issue maybe is related on how we uploaded the files on the 
file system, via incremental rsync on the fuse mount.

Do anyone have suggestion on how it can happen, how to solve existing 
duplication or how to prevent to happen anymore.

Thanks in advance.
Best regards,
andrea

Options Reconfigured:
performance.cache-invalidation: true
cluster.favorite-child-policy: mtime
features.cache-invalidation: 1
network.inode-lru-limit: 9
performance.cache-size: 1024MB
storage.linux-aio: on
nfs.outstanding-rpc-limit: 64
storage.build-pgfid: on
cluster.server-quorum-type: server
cluster.self-heal-daemon: enable
performance.nfs.io-cache: on
performance.client-io-threads: on
performance.nfs.stat-prefetch: on
performance.nfs.io-threads: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
performance.md-cache-timeout: 1
performance.io-thread-count: 16
performance.high-prio-threads: 32
performance.normal-prio-threads: 32
performance.low-prio-threads: 32
performance.least-prio-threads: 1
nfs.acl: off
nfs.rpc-auth-unix: off
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
cluster.lookup-unhashed: auto
performance.nfs.quick-read: on
performance.nfs.read-ahead: on
cluster.quorum-type: auto
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
cluster.lookup-optimize: on
cluster.readdir-optimize: on
performance.read-ahead: off
performance.write-behind-window-size: 1MB
client.event-threads: 4
server.event-threads: 16
cluster.granular-entry-heal: enable
performance.parallel-readdir: on
cluster.server-quorum-ratio: 51




Andrea Fogazzi

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] glusterfsd crashing

2017-03-10 Thread Sergei Gerasenko
Hi,

I'm running gluster 3.7.12. It's an 8-node distributed, replicated cluster
(replica 2). It's had been working fine for a long time when all of a
sudden I started seeing bricks going offline. Researching further I found
messages like this:

Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: pending frames:
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: frame : type(0)
op(5)
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: patchset: git://
git.gluster.com/glusterfs.git
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: signal received: 6
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: time of crash:
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: 2017-03-10 05:02:12
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: configuration
details:
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: argp 1
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: backtrace 1
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: dlfcn 1
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: libpthread 1
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: llistxattr 1
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: setfsid 1
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: spinlock 1
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: epoll.h 1
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: xattr.h 1
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: st_atim.tv_nsec 1
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: package-string:
glusterfs 3.7.12
Mar 10 00:02:12 HOSTNAME data-ftp_gluster_brick[23769]: -

I initially thought it was related to quota support (based on some
googling), so I turned off quota and also disabled NFS support to simplify
the debugging. Every time after the crash, I restarted gluster and the
bricks would go online for several hours only to crash again later. There
are lots of messages like this preceding the crash:

...
[2017-03-10 04:40:46.002225] E [MSGID: 113091] [posix.c:178:posix_lookup]
0-ftp_volume-posix: null gfid for path (null)
[2017-03-10 04:40:46.002278] E [MSGID: 113018] [posix.c:196:posix_lookup]
0-ftp_volume-posix: lstat on null failed [Invalid argument]
The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
[2017-03-10 04:40:46.002225] and [2017-03-10 04:40:46.005699]
The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
times between [2017-03-10 04:40:46.002278] and [2017-03-10 04:40:46.005701]
[2017-03-10 04:50:47.002170] E [MSGID: 113091] [posix.c:178:posix_lookup]
0-ftp_volume-posix: null gfid for path (null)
[2017-03-10 04:50:47.002219] E [MSGID: 113018] [posix.c:196:posix_lookup]
0-ftp_volume-posix: lstat on null failed [Invalid argument]
The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
[2017-03-10 04:50:47.002170] and [2017-03-10 04:50:47.005623]
The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
times between [2017-03-10 04:50:47.002219] and [2017-03-10 04:50:47.005625]
[2017-03-10 05:00:48.002246] E [MSGID: 113091] [posix.c:178:posix_lookup]
0-ftp_volume-posix: null gfid for path (null)
[2017-03-10 05:00:48.002314] E [MSGID: 113018] [posix.c:196:posix_lookup]
0-ftp_volume-posix: lstat on null failed [Invalid argument]
The message "E [MSGID: 113091] [posix.c:178:posix_lookup]
0-ftp_volume-posix: null gfid for path (null)" repeated 3 times between
[2017-03-10 05:00:48.002246] and [2017-03-10 05:00:48.005828]
The message "E [MSGID: 113018] [posix.c:196:posix_lookup]
0-ftp_volume-posix: lstat on null failed [Invalid argument]" repeated 3
times between [2017-03-10 05:00:48.002314] and [2017-03-10 05:00:48.005830]

One important detail I noticed yesterday is that one of the nodes was
running gluster version 3.7.13! I'm not sure what did the upgrade. So I
downgraded to 3.7.12 and restarted gluster. The crash above happened
several hours later. But again, the crashes had been happening before the
downgrade -- possibly because of the version mismatch on one of the nodes.

Anybody have any ideas?

Thanks!
  Sergei
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] gluster delete disk

2017-03-10 Thread p...@email.cz

Hello,
how can I delete sharded disk which had been broken during snapshot 
deleting ( node crashed ) ?
I delete main VM, but disk still persists  and gluster volume has any 
files inconsistency.

How to find shard files attached to that disk and delete them all ??

regards
Paf1
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Announcing release 3.11 : Scope, schedule and feature tracking

2017-03-10 Thread Shyam

On 02/28/2017 10:17 AM, Shyam wrote:

Hi,

With release 3.10 shipped [1], it is time to set the dates for release
3.11 (and subsequently 4.0).

This mail has the following sections, so please read or revisit as needed,
  - 3.11 focus areas
  - Release owners


Focusing on the above 2 sections, in this mail.


*3.11 focus areas:*
As maintainers of gluster, we want to harden testing around the various
gluster features in this release. Towards this the focus area for this
release are,

1) Testing improvements in Gluster
  - Primary focus would be to get automated test cases to determine
release health, rather than repeating a manual exercise every 3 months
  - Further, we would also attempt to focus on maturing Glusto[7] for
this, and other needs (as much as possible)


We are yet to see any testing improvement suggestions for the release, 
request the community to help with the same.


Nigel sent a good mail stating what it means to have a good build that 
can be released, here [8]. Please add to that, or propose your thoughts.


Further, during 3.9 there was an effort to collect per component health 
checks that need to be done for a release by Pranith and Aravinda, if 
component maintainers agree, then these can be targeted for automation 
as well. See [9] (@Pranith, @Aravinda, The etherpad where we collected 
this information is defunct, if you have the information somewhere, 
please post a new link for the same).


Also, we would like github issues for these testing improvements, so 
that we can track them for the 3.11 and further release, to ensure we 
reach this goal at some point in the future.



- We will still retain features that slipped 3.10 and hence were moved
to 3.11 (see [4] for the list).


Are there other features that are being targeted for 3.11? If so please 
post the same to github and also send a mail to devel and users, stating 
what they are.


We would like to hear sooner, than later about what *may* get into 3.11


*Release owners:*
  - Primary: Shyam 

Assisted by: Kaushal 

Kaushal has volunteered to assist me with this release going forward, 
thanks Kaushal. So now that is 2 people you can post your queries to (in 
addition to the lists), about the release.




Shyam

[1] 3.10 release announcement:
http://lists.gluster.org/pipermail/gluster-devel/2017-February/052188.html

[2] Gluster release schedule:
https://www.gluster.org/community/release-schedule/

[3] Mail regarding facebook patches:
http://lists.gluster.org/pipermail/gluster-devel/2016-December/051784.html

[4] Release scope: https://github.com/gluster/glusterfs/projects/1

[5] glusterfs github issues: https://github.com/gluster/glusterfs/issues

[6] github issues for features and major fixes:
https://hackmd.io/s/BkgH8sdtg#

[7] Glusto tests: https://github.com/gluster/glusto-tests


[8] Good build thread: 
http://lists.gluster.org/pipermail/gluster-devel/2017-March/052245.html


[9] Release checklist of tests collected during 3.9: 
https://public.pad.fsfe.org/p/gluster-component-release-checklist
NOTE: The above link is defunct now, hopefully we still have the data 
and we can post it again elsewhere


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Sharding?

2017-03-10 Thread Cedric Lemarchand
> On 10 Mar 2017, at 12:05, Krutika Dhananjay  wrote:
> 
> On Fri, Mar 10, 2017 at 4:09 PM, Cedric Lemarchand  > wrote:
> 
> > On 10 Mar 2017, at 10:33, Alessandro Briosi  > > wrote:
> >
> > Il 10/03/2017 10:28, Kevin Lemonnier ha scritto:
> >>> I haven't done any test yet, but I was under the impression that
> >>> sharding feature isn't so stable/mature yet.
> >>> In the remote of my mind I remember reading something about a
> >>> bug/situation which caused data corruption.
> >>> Can someone confirm that sharding is stable enough to be used in
> >>> production and won't cause any data loss?
> >> There were a few bugs yeah. I can tell you that in 3.7.15 (and I assume
> >> later versions) it works well as long as you don't try to add new bricks
> >> to your volumes (we use it in production for HA virtual machine disks).
> >> Apparently that bug was fixed recently, so latest versions should be
> >> pretty stable yeah.
> >
> > I'm using 3.8.9, so I suppose all known bugs have been fixed there (also 
> > the one with adding briks)
> >
> > I'll then proceed with some tests before going to production.
> 
> I am still asking myself how such bug could happen on a clustered storage 
> software, where adding bricks is a base feature for scalable solution, like 
> Gluster. Or maybe is it that STM releases are really under tested compared to 
> LTM ones ? Could we states that STM release are really not made for 
> production, or at least really risky ?
> 
> Not entirely true. The same bug existed in LTM release too.
> 
> I did try reproducing the bug on my setup as soon as Lindsay, Kevin and 
> others started reporting about it, but it was never reproducible on my setup.
> Absence of proper logging in libgfapi upon failures only made it harder to 
> debug, even when the users successfully recreated the issue and shared
> their logs. It was only after Satheesaran recreated it successfully with FUSE 
> mount that the real debugging could begin, when fuse-bridge translator
> logged the exact error code for failure.

Indeed an unreproducible bug is pretty hard to fix … thanks for the feed back. 
What would be the best way to find out critical bugs in different Gluster 
releases ? maybe browsing https://review.gluster.org/ or 
https://bugzilla.redhat.com, any advices ?

Cheers

Cédric

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding?

2017-03-10 Thread Gandalf Corvotempesta
2017-03-10 11:39 GMT+01:00 Cedric Lemarchand :
> I am still asking myself how such bug could happen on a clustered storage 
> software, where adding bricks is a base feature for scalable solution, like 
> Gluster. Or maybe is it that STM releases are really under tested compared to 
> LTM ones ? Could we states that STM release are really not made for 
> production, or at least really risky ?

This is the same i've reported some months ago.
I think it's probably the worst thing in gluster. Tons of critical
bugs for critial features (that are also the basis features for a
storage software) that lead to data loss and still to be merged.

This kind of bugs *MUST* be addressed, fixed and released *ASAP*, not
after months and months and are still waiting for a review.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Sharding?

2017-03-10 Thread Krutika Dhananjay
On Fri, Mar 10, 2017 at 4:09 PM, Cedric Lemarchand 
wrote:

>
> > On 10 Mar 2017, at 10:33, Alessandro Briosi  wrote:
> >
> > Il 10/03/2017 10:28, Kevin Lemonnier ha scritto:
> >>> I haven't done any test yet, but I was under the impression that
> >>> sharding feature isn't so stable/mature yet.
> >>> In the remote of my mind I remember reading something about a
> >>> bug/situation which caused data corruption.
> >>> Can someone confirm that sharding is stable enough to be used in
> >>> production and won't cause any data loss?
> >> There were a few bugs yeah. I can tell you that in 3.7.15 (and I assume
> >> later versions) it works well as long as you don't try to add new bricks
> >> to your volumes (we use it in production for HA virtual machine disks).
> >> Apparently that bug was fixed recently, so latest versions should be
> >> pretty stable yeah.
> >
> > I'm using 3.8.9, so I suppose all known bugs have been fixed there (also
> the one with adding briks)
> >
> > I'll then proceed with some tests before going to production.
>
> I am still asking myself how such bug could happen on a clustered storage
> software, where adding bricks is a base feature for scalable solution, like
> Gluster. Or maybe is it that STM releases are really under tested compared
> to LTM ones ? Could we states that STM release are really not made for
> production, or at least really risky ?
>

Not entirely true. The same bug existed in LTM release too.

I did try reproducing the bug on my setup as soon as Lindsay, Kevin and
others started reporting about it, but it was never reproducible on my
setup.
Absence of proper logging in libgfapi upon failures only made it harder to
debug, even when the users successfully recreated the issue and shared
their logs. It was only after Satheesaran recreated it successfully with
FUSE mount that the real debugging could begin, when fuse-bridge translator
logged the exact error code for failure.

-Krutika


> Sorry if the question could sounds a bit rude, but I think it still
> remains for newish peoples that had to make a choice on which release is
> better for production ;-)
>
> Cheers
>
> Cédric
>
> >
> > Thank you
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding?

2017-03-10 Thread Cedric Lemarchand

> On 10 Mar 2017, at 10:33, Alessandro Briosi  wrote:
> 
> Il 10/03/2017 10:28, Kevin Lemonnier ha scritto:
>>> I haven't done any test yet, but I was under the impression that
>>> sharding feature isn't so stable/mature yet.
>>> In the remote of my mind I remember reading something about a
>>> bug/situation which caused data corruption.
>>> Can someone confirm that sharding is stable enough to be used in
>>> production and won't cause any data loss?
>> There were a few bugs yeah. I can tell you that in 3.7.15 (and I assume
>> later versions) it works well as long as you don't try to add new bricks
>> to your volumes (we use it in production for HA virtual machine disks).
>> Apparently that bug was fixed recently, so latest versions should be
>> pretty stable yeah.
> 
> I'm using 3.8.9, so I suppose all known bugs have been fixed there (also the 
> one with adding briks)
> 
> I'll then proceed with some tests before going to production.

I am still asking myself how such bug could happen on a clustered storage 
software, where adding bricks is a base feature for scalable solution, like 
Gluster. Or maybe is it that STM releases are really under tested compared to 
LTM ones ? Could we states that STM release are really not made for production, 
or at least really risky ?

Sorry if the question could sounds a bit rude, but I think it still remains for 
newish peoples that had to make a choice on which release is better for 
production ;-)

Cheers

Cédric

> 
> Thank you
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding?

2017-03-10 Thread Krutika Dhananjay
On Fri, Mar 10, 2017 at 3:03 PM, Alessandro Briosi  wrote:

> Il 10/03/2017 10:28, Kevin Lemonnier ha scritto:
>
> I haven't done any test yet, but I was under the impression that
> sharding feature isn't so stable/mature yet.
> In the remote of my mind I remember reading something about a
> bug/situation which caused data corruption.
> Can someone confirm that sharding is stable enough to be used in
> production and won't cause any data loss?
>
> There were a few bugs yeah. I can tell you that in 3.7.15 (and I assume
> later versions) it works well as long as you don't try to add new bricks
> to your volumes (we use it in production for HA virtual machine disks).
> Apparently that bug was fixed recently, so latest versions should be
> pretty stable yeah.
>
>
> I'm using 3.8.9, so I suppose all known bugs have been fixed there (also
> the one with adding briks)
>

No. That one is out for review and yet to be merged.

... which again reminds me ...

Niels,

Care to merge the two patches?

https://review.gluster.org/#/c/16749/
https://review.gluster.org/#/c/16750/

-Krutika


> I'll then proceed with some tests before going to production.
>
> Thank you
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding?

2017-03-10 Thread Kevin Lemonnier
> I'm using 3.8.9, so I suppose all known bugs have been fixed there (also
> the one with adding briks)

Can't comment on that, I just saw they fixed it, not sure in which version.
I'd wait for someone who knows to confirm that before going into production
if adding bricks is something you'll need !

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding?

2017-03-10 Thread Alessandro Briosi
Il 10/03/2017 10:28, Kevin Lemonnier ha scritto:
>> I haven't done any test yet, but I was under the impression that
>> sharding feature isn't so stable/mature yet.
>> In the remote of my mind I remember reading something about a
>> bug/situation which caused data corruption.
>> Can someone confirm that sharding is stable enough to be used in
>> production and won't cause any data loss?
> There were a few bugs yeah. I can tell you that in 3.7.15 (and I assume
> later versions) it works well as long as you don't try to add new bricks
> to your volumes (we use it in production for HA virtual machine disks).
> Apparently that bug was fixed recently, so latest versions should be
> pretty stable yeah.

I'm using 3.8.9, so I suppose all known bugs have been fixed there (also
the one with adding briks)

I'll then proceed with some tests before going to production.

Thank you

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding?

2017-03-10 Thread Kevin Lemonnier
> I haven't done any test yet, but I was under the impression that
> sharding feature isn't so stable/mature yet.
> In the remote of my mind I remember reading something about a
> bug/situation which caused data corruption.
> Can someone confirm that sharding is stable enough to be used in
> production and won't cause any data loss?

There were a few bugs yeah. I can tell you that in 3.7.15 (and I assume
later versions) it works well as long as you don't try to add new bricks
to your volumes (we use it in production for HA virtual machine disks).
Apparently that bug was fixed recently, so latest versions should be
pretty stable yeah.

-- 
Kevin Lemonnier
PGP Fingerprint : 89A5 2283 04A0 E6E9 0111


signature.asc
Description: Digital signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding?

2017-03-10 Thread Alessandro Briosi
Il 09/03/2017 17:17, Vijay Bellur ha scritto:
>
>
> On Thu, Mar 9, 2017 at 11:10 AM, Kevin Lemonnier  > wrote:
>
> > I've seen the term sharding pop up on the list a number of times
> but I
> > haven't found any documentation or explanation of what it is.
> Would someone
> > please enlighten me?
>
> It's a way to split the files you put on the volume. With a shard
> size of 64 MB
> for example, the biggest file on the volume will be 64 MB. It's
> transparent
> when accessing the files though, you can still of course write
> your 2 TB file
> and access it as usual.
>
> It's usefull for things like healing (only the shard being headed
> is locked,
> and you have a lot less data to transfert) and for things like
> hosting a single
> huge file that would be bigger than one of your replicas.
>
> We use it for VM disks, as it decreases heal times a lot.
>
>
>
> Some more details on sharding can be found at [1].
>
>  


I haven't done any test yet, but I was under the impression that
sharding feature isn't so stable/mature yet.
In the remote of my mind I remember reading something about a
bug/situation which caused data corruption.
Can someone confirm that sharding is stable enough to be used in
production and won't cause any data loss?

I'd really would like to use it, as it would probably speed up healing a
lot (as I use it to store vm disks).

Thanks,
Alessandro
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Disperse mkdir fails

2017-03-10 Thread Xavier Hernandez

Hi Ram,

On 09/03/17 20:15, Ankireddypalle Reddy wrote:

Xavi,
Thanks for checking this.
1) mkdir returns errnum 5. EIO.
2)  The specified directory is the parent directory under which all 
the data in the gluster volume will be stored. Current around 160TB of 262 TB 
is  consumed.


I only need the first level entries of that directory, not the entire 
tree of entries. This should be in the order of thousands, right ?


We need to make sure that all bricks have the same entries in this 
directory. Otherwise we would need to check other things.



3)  It is extremely difficult to list the exact sequence of FOPS 
that would have been issued to the directory. The storage is heavily used and 
lot of sub directories are present inside this directory.

   Are you looking for the extended attributes for this directory from 
all the bricks inside the volume.  There are about 60 bricks.


If possible, yes.

However, if there's a lot of modifications on that directory while you 
are getting the xattr, it's possible that you get inconsistent values, 
but they are not really inconsistent.


If possible, you should get that information pausing all activity to 
that directory.


Xavi



Thanks and Regards,
Ram

-Original Message-
From: Xavier Hernandez [mailto:xhernan...@datalab.es]
Sent: Thursday, March 09, 2017 11:15 AM
To: Ankireddypalle Reddy; Gluster Devel (gluster-de...@gluster.org); 
gluster-users@gluster.org
Subject: Re: [Gluster-users] Disperse mkdir fails

Hi Ram,

On 09/03/17 16:52, Ankireddypalle Reddy wrote:

Attachment (1):

1



info.txt

[Download]
(3.35
KB)

Hi,

I have a disperse gluster volume  with 6 servers. 262TB of
usable capacity.  Gluster version is 3.7.19.

glusterfs1, glusterf2 and glusterfs3 nodes were initially used
for creating the volume. Nodes glusterf4, glusterfs5 and glusterfs6
were later added to the volume.



Directory creation failed on a directory called
/ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC.

# file: ws/glus/Folder_07.11.2016_23.02/CV_MAGNETIC

glusterfs.gfid.string="e8e51015-616f-4f04-b9d2-92f46eb5cfc7"



gluster mount log contains lot of following errors:

[2017-03-09 15:32:36.773937] W [MSGID: 122056]
[ec-combine.c:875:ec_combine_check] 0-StoragePool-disperse-7:
Mismatching xdata in answers of 'LOOKUP' for
e8e51015-616f-4f04-b9d2-92f46eb5cfc7



The directory seems to be out of sync between nodes
glusterfs1,
glusterfs2 and glusterfs3. Each has different version.



 trusted.ec.version=0x000839f83a4d

 trusted.ec.version=0x00082ea400083a4b

 trusted.ec.version=0x00083a7600083a7b



 Self-heal does not seem to be healing this directory.



This is very similar to what happened the other time. Once more than 1 brick is 
damaged, self-heal cannot do anything to heal it on a 2+1 configuration.

What error does return the mkdir request ?

Does the directory you are trying to create already exist on some brick ?

Can you show all the remaining extended attributes of the directory ?

It would also be useful to have the directory contents on each brick (an 'ls 
-l'). In this case, include the name of the directory you are trying to create.

Can you explain a detailed sequence of operations done on that directory since 
the last time you successfully created a new subdirectory ?
including any metadata change.

Xavi




Thanks and Regards,

Ram

***Legal Disclaimer***
"This communication may contain confidential and privileged material
for the sole use of the intended recipient. Any unauthorized review,
use or distribution by others is strictly prohibited. If you have
received the message by mistake, please advise the sender by reply
email and delete the message. Thank you."
**


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users



***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."