Re: [Gluster-users] KVM lockups on Gluster 4.1.1

2018-10-02 Thread Dmitry Melekhov

02.10.2018 12:59, Amar Tumballi пишет:
Recently, in one of the situation, we found that locks were not freed 
up due to not getting TCP timeout..


Can you try the option like below and let us know?

`gluster volume set $volname tcp-user-timeout 42`

(ref: https://review.gluster.org/21170/ )

Regards,
Amar



Thank you, we'll try this.



On Tue, Oct 2, 2018 at 10:40 AM Dmitry Melekhov > wrote:


01.10.2018 23:09, Danny Lee пишет:

Ran into this issue too with 4.1.5 with an arbiter setup.  Also
could not run a statedump due to "Segmentation fault".

Tried with 3.12.13 and had issues with locked files as well.  We
were able to do a statedump and found that some of our files were
"BLOCKED" (xlator.features.locks.vol-locks.inode).  Attached part
of statedump.

Also tried clearing the locks using clear-locks, which did remove
the lock, but as soon as I tried to cat the file, it got locked
again and the cat process hung.


I created issue in bugzilla, can't find it though :-(
Looks like there is no activity after I sent all logs...




On Wed, Aug 29, 2018, 3:13 AM Dmitry Melekhov mailto:d...@belkam.com>> wrote:

28.08.2018 10:43, Amar Tumballi пишет:



On Tue, Aug 28, 2018 at 11:24 AM, Dmitry Melekhov
mailto:d...@belkam.com>> wrote:

Hello!


Yesterday we hit something like this on 4.1.2

Centos 7.5.


Volume is replicated - two bricks and one arbiter.


We rebooted arbiter, waited for heal end, and tried to
live migrate VM to another node ( we run VMs on gluster
nodes ):


[2018-08-27 09:56:22.085411] I [MSGID: 115029]
[server-handshake.c:763:server_setvolume] 0-pool-server:
accepted client from

CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
client-6-RECON_NO:-0 (version: 4.1.2)
[2018-08-27 09:56:22.107609] I [MSGID: 115036]
[server.c:483:server_rpc_notify] 0-pool-server:
disconnecting connection from

CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
client-6-RECON_NO:-0
[2018-08-27 09:56:22.107747] I [MSGID: 101055]
[client_t.c:444:gf_client_unref] 0-pool-server: Shutting
down connection

CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-clien
t-6-RECON_NO:-0
[2018-08-27 09:58:37.905829] I [MSGID: 115036]
[server.c:483:server_rpc_notify] 0-pool-server:
disconnecting connection from

CTX_ID:c3eb6cfc-2ef9-470a-89d1-a87170d00da5-GRAPH_ID:0-PID:30292-HOST:father-PC_NAME:p
ool-client-6-RECON_NO:-0
[2018-08-27 09:58:37.905926] W
[inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292
lk-owner=28c831d8bc55}
[2018-08-27 09:58:37.905959] W
[inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292
lk-owner=2870a7d6bc55}
[2018-08-27 09:58:37.905979] W
[inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292
lk-owner=2880a7d6bc55}
[2018-08-27 09:58:37.905997] W
[inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292
lk-owner=28f031d8bc55}
[2018-08-27 09:58:37.906016] W
[inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292
lk-owner=28b07dd5bc55}
[2018-08-27 09:58:37.906034] W
[inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292
lk-owner=28e0a7d6bc55}
[2018-08-27 09:58:37.906056] W
[inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292
lk-owner=28b845d8bc55}
[2018-08-27 09:58:37.906079] W
[inodelk.c:610:pl_inodelk_log_cleanup] 0-pool-server:
releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318
held by {client=0x7ffb58035bc0, pid=30292

Re: [Gluster-users] sharding in glusterfs

2018-10-02 Thread Raghavendra Gowdappa
On Sun, Sep 30, 2018 at 9:54 PM Ashayam Gupta 
wrote:

> Hi Pranith,
>
> Thanks for you reply, it would be helpful if you can please help us with
> the following issues with respect to sharding.
> The gluster version we are using is *glusterfs 4.1.4 *on Ubuntu 18.04.1
> LTS
>
>
>- *Shards-Creation Algo*: We were interested in understanding the way
>in which shards are distributed across bricks and nodes, is it Round-Robin
>or some other algo and can we change this mechanism using some config file.
>E.g. If we have 2 nodes with each nodes having 2 bricks , with a total
>of 4 (2*2) bricks how will the shards be distributed, will it be always
>even distribution?(Volume type in this case is plain)
>
>-  *Sharding+Distributed-Volume*: Currently we are using plain volume
>with sharding enabled and we do not see even distribution of shards across
>bricks .Can we use sharding with distributed volume to achieve evenly and
>better distribution of shards? Would be helpful if you can suggest the most
>efficient way of using sharding , our goal is to have a evenly distributed
>file system(we have large files hence using sharding) and we are not
>concerned with replication as of now.
>
>
For distribution you need DHT as a descendant of shard xlator in graph. The
way features/shard xlator handles sharding of file is to create an
independent file for each shard and hence an individual shard is visible as
an independent file in children of xlator shard. The entire distribution
logic is off-loaded to the xlator that handles distribution logic.


>
>- *Shard-Block-Size: *In case we change the
>* features.shard-block-size* value from X -> Y after lots of data has
>been populated , how does this affect the existing shards are they auto
>corrected as per the new size or do we need to run some commdands to get
>this done or is this even recommended to do the change?
>- *Rebalance-Shard*: As per the docs whenever we add new server/node
>to the existing gluster we need to run Rebalance command, we would like to
>know if there are any known issues for re-balancing with sharding enabled.
>
> We would highly appreciate if you can point us to the latest sharding
> docs, we tried to search but could not find better than this
> https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/shard/
> .
>
> Thanks
> Ashayam
>
>
> On Thu, Sep 20, 2018 at 7:47 PM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Wed, Sep 19, 2018 at 11:37 AM Ashayam Gupta <
>> ashayam.gu...@alpha-grep.com> wrote:
>>
>>> Please find our workload details as requested by you :
>>>
>>> * Only 1 write-mount point as of now
>>> * Read-Mount : Since we auto-scale our machines this can be as big as
>>> 300-400 machines during peak times
>>> * >" multiple concurrent reads means that Reads will not happen until
>>> the file is completely written to"  Yes , in our current scenario we can
>>> ensure that indeed this is the case.
>>>
>>> But when you say it only supports single writer workload we would like
>>> to understand the following scenarios with respect to multiple writers and
>>> the current behaviour of glusterfs with sharding
>>>
>>>- Multiple Writer writes to different files
>>>
>>> When I say multiple writers, I mean multiple mounts. Since you were
>> saying earlier there is only one mount which does all writes, everything
>> should work as expected.
>>
>>>
>>>- Multiple Writer writes to same file
>>>   - they write to same file but different shards of same file
>>>   - they write to same file (no gurantee if they write to different
>>>   shards)
>>>
>>> As long as the above happens from same mount, things should be fine.
>> Otherwise there could be problems.
>>
>>
>>> There might be some more cases which are known to you , would be helpful
>>> if you can describe us about those scenarios as well or may point us to the
>>> relevant documents.
>>>
>> Also it would be helpful if you can suggest the most stable version of
>>> glusterfs with sharding feature to use , since we would like to use this in
>>> production.
>>>
>>
>> It has been stable for a while, so use any of the latest maintained
>> releases like 3.12.x or 4.1.x
>>
>> As I was mentioning already, sharding is mainly tested with
>> VM/gluster-block workloads. So there could be some corner cases with single
>> writer workload which we never ran into for the VM/block workloads we test.
>> But you may run into them. Do let us know and we can take a look if you
>> find something out of the ordinary. What I would suggest is to use one of
>> the maintained releases and run the workloads you have for some time to
>> test things out, once you feel confident, you can put it in production.
>>
>> HTH
>>
>>>
>>> Thanks
>>> Ashayam Gupta
>>>
>>> On Tue, Sep 18, 2018 at 11:00 AM Pranith Kumar Karampuri <
>>> pkara...@redhat.com> wrote:
>>>


 On Mon, Sep 17, 2018 at 

[Gluster-users] Gluster problems permission denied LOOKUP () /etc/samba/private/msg.sock

2018-10-02 Thread Diego Remolina
Dear all,

I have a two node setup running on Centos and gluster version
glusterfs-3.10.12-1.el7.x86_64

One of my nodes died (motherboard issue). Since I had to continue
being up, I modified the quorum to below 50% to make sure I could
still run on one server.

The server runs ovirt and 2 VMs on top of a volume called vmstorage. I
also had a third node in the peer list, but never configured it as an
arbiter, so it just comes up in gluster v status. The server also run
a file server with samba to serve files to windows machines.

The issue is that since starting the server on it's own as the samba
server, I am seeing permission denied errors for the "export" volume
in /var/log/glusterfs/export.log

The errors look like this and repeat over and over:

[2018-10-02 11:46:56.327925] I [MSGID: 139001]
[posix-acl.c:269:posix_acl_log_permit_denied] 0-posix-acl-autoload:
client: -, gfid: 5b5bed22-ace0-410d-8623-4f1a31069b81,
req(uid:1051,gid:513,perm:1,ngrps:2),
ctx(uid:0,gid:0,in-groups:0,perm:700,updated-fop:LOOKUP, acl:-)
[Permission denied]
[2018-10-02 11:46:56.328004] W [fuse-bridge.c:490:fuse_entry_cbk]
0-glusterfs-fuse: 20599112: LOOKUP() /etc/samba/private/msg.sock/15149
=> -1 (Permission denied)
[2018-10-02 11:46:56.328185] W [fuse-bridge.c:490:fuse_entry_cbk]
0-glusterfs-fuse: 20599113: LOOKUP() /etc/samba/private/msg.sock/15149
=> -1 (Permission denied)
[2018-10-02 11:47:53.766562] W [fuse-bridge.c:490:fuse_entry_cbk]
0-glusterfs-fuse: 20600590: LOOKUP() /etc/samba/private/msg.sock/15149
=> -1 (Permission denied)

The gluster volume export is mounted on /export, samba and ctdb are
instructed to use /export/etc/samba/private and /export/lock which is
on the gluster file system for the clustered tdb, etc. However, I keep
getting the log messages that fuse seems to try to access a folder
that does not exist, /etc/samba/private/msg.sock

Why is this, how can I fix it?

[root@ysmha01 export]# gluster v status export
Status of volume: export
Gluster process TCP Port  RDMA Port  Online  Pid
--
Brick 10.0.1.6:/bricks/hdds/brick   49153 0  Y   3516
Self-heal Daemon on localhost   N/A   N/AY   3710
Self-heal Daemon on 10.0.1.5N/A   N/AY   4380

Task Status of Volume export
--
There are no active volume tasks

These are all the volume options currently set:

http://termbin.com/1xm5

Diego
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] KVM lockups on Gluster 4.1.1

2018-10-02 Thread Amar Tumballi
Recently, in one of the situation, we found that locks were not freed up
due to not getting TCP timeout..

Can you try the option like below and let us know?

`gluster volume set $volname tcp-user-timeout 42`

(ref: https://review.gluster.org/21170/ )

Regards,
Amar


On Tue, Oct 2, 2018 at 10:40 AM Dmitry Melekhov  wrote:

> 01.10.2018 23:09, Danny Lee пишет:
>
> Ran into this issue too with 4.1.5 with an arbiter setup.  Also could not
> run a statedump due to "Segmentation fault".
>
> Tried with 3.12.13 and had issues with locked files as well.  We were able
> to do a statedump and found that some of our files were "BLOCKED"
> (xlator.features.locks.vol-locks.inode).  Attached part of statedump.
>
> Also tried clearing the locks using clear-locks, which did remove the
> lock, but as soon as I tried to cat the file, it got locked again and the
> cat process hung.
>
>
> I created issue in bugzilla, can't find it though :-(
> Looks like there is no activity after I sent all logs...
>
>
>
> On Wed, Aug 29, 2018, 3:13 AM Dmitry Melekhov  wrote:
>
>> 28.08.2018 10:43, Amar Tumballi пишет:
>>
>>
>>
>> On Tue, Aug 28, 2018 at 11:24 AM, Dmitry Melekhov  wrote:
>>
>>> Hello!
>>>
>>>
>>> Yesterday we hit something like this on 4.1.2
>>>
>>> Centos 7.5.
>>>
>>>
>>> Volume is replicated - two bricks and one arbiter.
>>>
>>>
>>> We rebooted arbiter, waited for heal end,  and tried to live migrate VM
>>> to another node ( we run VMs on gluster nodes ):
>>>
>>>
>>> [2018-08-27 09:56:22.085411] I [MSGID: 115029]
>>> [server-handshake.c:763:server_setvolume] 0-pool-server: accepted client
>>> from
>>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
>>> client-6-RECON_NO:-0 (version: 4.1.2)
>>> [2018-08-27 09:56:22.107609] I [MSGID: 115036]
>>> [server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection
>>> from
>>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-
>>> client-6-RECON_NO:-0
>>> [2018-08-27 09:56:22.107747] I [MSGID: 101055]
>>> [client_t.c:444:gf_client_unref] 0-pool-server: Shutting down connection
>>> CTX_ID:b55f4a90-e241-48ce-bd4d-268c8a956f4a-GRAPH_ID:0-PID:8887-HOST:son-PC_NAME:pool-clien
>>> t-6-RECON_NO:-0
>>> [2018-08-27 09:58:37.905829] I [MSGID: 115036]
>>> [server.c:483:server_rpc_notify] 0-pool-server: disconnecting connection
>>> from
>>> CTX_ID:c3eb6cfc-2ef9-470a-89d1-a87170d00da5-GRAPH_ID:0-PID:30292-HOST:father-PC_NAME:p
>>> ool-client-6-RECON_NO:-0
>>> [2018-08-27 09:58:37.905926] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28c831d8bc55}
>>> [2018-08-27 09:58:37.905959] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2870a7d6bc55}
>>> [2018-08-27 09:58:37.905979] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2880a7d6bc55}
>>> [2018-08-27 09:58:37.905997] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f031d8bc55}
>>> [2018-08-27 09:58:37.906016] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b07dd5bc55}
>>> [2018-08-27 09:58:37.906034] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28e0a7d6bc55}
>>> [2018-08-27 09:58:37.906056] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28b845d8bc55}
>>> [2018-08-27 09:58:37.906079] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2858a7d8bc55}
>>> [2018-08-27 09:58:37.906098] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=2868a8d7bc55}
>>> [2018-08-27 09:58:37.906121] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28f80bd7bc55}
>>> ...
>>>
>>> [2018-08-27 09:58:37.907375] W [inodelk.c:610:pl_inodelk_log_cleanup]
>>> 0-pool-server: releasing lock on 12172afe-f0a4-4e10-bc0f-c5e4e0d9f318 held
>>> by {client=0x7ffb58035bc0, pid=30292 lk-owner=28a8cdd6bc55}
>>> [2018-08-27 09:58:37.907393] W