date:20181120

[Gluster-users] remote operation failed [Transport endpoint is not connected]

2018-11-20 Thread hsafe


Hello,

I am stuck with failure of my gluster 2x replica heal with messages at 
glustershd.log as :


*[2018-11-21 05:28:07.813003] E [MSGID: 114031] 
[client-rpc-fops.c:1646:client3_3_entrylk_cbk] 0-gv1-client-0: remote 
operation failed [Transport endpoint is not connected]*


When the log hits here In either of the replica nodes; I can see that my 
command: *watch glsuter volume heal  statistics *returns no 
more progress and status unchanged afterward. I am running glusterfs on 
top of zfs and it is basically a storage for small read-only files. 
There was a thread with Shyam Ranganathan and Reiner Keller in here 
where the core of the problem was the storage going out of inods and no 
space left error which obviously can not be my case as am on top of ZFS. 
However, the similarity between us is that we were previously on 3.10 
and upon various issues with that version we upgraded to 3.12 on Ubuntu 
16.04 with kernel 4.4.0-116-generic.


Anybody faced issue as above?Can you advise what can be done as it is 
for over a month with no effective self-heal process completed...


Here is my gluster cluster info:

*Volume Name: gv1
Type: Replicate
Volume ID: f1c955a1-7a92-4b1b-acb5-8b72b41aaace
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: IMG-01:/images/storage/brick1
Brick2: IMG-02:/images/storage/brick1
Options Reconfigured:
cluster.self-heal-daemon: enable
cluster.eager-lock: off
client.event-threads: 4
performance.cache-max-file-size: 8
features.scrub: Inactive
features.bitrot: off
network.inode-lru-limit: 5
nfs.disable: true
performance.readdir-ahead: on
server.statedump-path: /tmp
cluster.background-self-heal-count: 32
performance.md-cache-timeout: 30
cluster.readdir-optimize: on
cluster.shd-max-threads: 4
cluster.lookup-optimize: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.stat-prefetch: on
performance.cache-invalidation: on
server.event-threads: 4*

Thank you


--
Hamid Safe
www.devopt.net
+989361491768

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster snapshot & geo-replication

2018-11-20 Thread FNU Raghavendra Manjunath

Hi Marcus,

/var/log/glusterfs/snaps/urd-gds-volume/snapd.log is the log file of the
snapview daemon that is mainly used for user serviceable snapshots. Are you
using that feature? i.e. are you accessing the snapshots of your volume
from the main volume's mount point?

Few other information that might be helpful are:

1) output of "gluster volume info"
2) log files from the gluster nodes (/var/log/glusterfs)

Can you please provide the above information?

NOTE: If you are not using user serviceable snapshots feature, then you can
turn it off. This will stop the snapview daemon and thus prevent it's log
file from growing.
The command to turn off user serviceable snapshots is "gluster volume set
 features.uss disable"

Regards

On Fri, Nov 16, 2018 at 5:41 PM Marcus Pedersén 
wrote:

> Hi all,
>
> I am using CentOS 7 and Gluster version 4.1.3
>
>
> I am using thin LVM and creates snapshots once a day, of
> cause deleting the oldest ones after a while.
>
> Creating a snap fails every now and then with the following different
> errors:
> Error : Request timed out
>
> or
>
> failed: Brick ops failed on urd-gds-002. changelog notify failed
>
> (Where the server name are different hosts in the gluster cluster all the
> time)
>
>
> I have descovered that the log for snaps grows large, endlessly?
>
> The log:
>
> /var/log/glusterfs/snaps/urd-gds-volume/snapd.log
>
> I now of size 21G and continues to grow.
>
> I removed the file about 2 weeks ago and it was about the same size.
>
> Is this the way it should be?
>
> See a part of the log below.
>
>
>
>
> Second of all I have stopped the geo-replication as I never managed to
> make it work.
>
> Even when it is stopped and you try to pause geo-replication, you still
> get the respond:
>
> Geo-replication paused successfully.
>
> Should there be an error instead?
>
>
> Resuming gives an error:
>
> geo-replication command failed
> Geo-replication session between urd-gds-volume and 
> geouser@urd-gds-geo-001::urd-gds-volume
> is not Paused.
>
>
> This is related to bug 1547446
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1547446
>
> The fix should be present from 4.0 and onwards
>
> Should I report this in the same bug?
>
>
> Thanks alot!
>
>
> Best regards
>
> Marcus Pedersén
>
>
> /var/log/glusterfs/snaps/urd-gds-volume/snapd.log:
>
> [2018-11-13 18:51:16.498206] E
> [server-handshake.c:402:server_first_lookup] 0-urd-gds-volume-server: first
> lookup on subdir (/interbull/common) failed: Invalid argument
> [2018-11-13 18:51:16.498752] I [MSGID: 115036]
> [server.c:483:server_rpc_notify] 0-urd-gds-volume-server: disconnecting
> connection from
> iqn-A003.iqnet.org-2653-2018/08/14-18:53:49:637444-urd-gds-volume-snapd-client-0-1638773
> [2018-11-13 18:51:16.502120] I [MSGID: 101055]
> [client_t.c:444:gf_client_unref] 0-urd-gds-volume-server: Shutting down
> connection
> iqn-A003.iqnet.org-2653-2018/08/14-18:53:49:637444-urd-gds-volume-snapd-client-0-1638773
> [2018-11-13 18:51:16.589263] I [addr.c:55:compare_addr_and_update]
> 0-snapd-urd-gds-volume: allowed = "*", received addr = "192.168.67.118"
> [2018-11-13 18:51:16.589324] I [MSGID: 115029]
> [server-handshake.c:763:server_setvolume] 0-urd-gds-volume-server: accepted
> client from
> iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
> (version: 3.13.1)
> [2018-11-13 18:51:16.593003] E
> [server-handshake.c:385:server_first_lookup] 0-snapd-urd-gds-volume: lookup
> on root failed: Permission denied
> [2018-11-13 18:51:16.593177] E [server-handshake.c:342:do_path_lookup]
> 0-snapd-urd-gds-volume: first lookup on subdir (interbull) failed:
> Permission denied
> [2018-11-13 18:51:16.593206] E
> [server-handshake.c:402:server_first_lookup] 0-urd-gds-volume-server: first
> lookup on subdir (/interbull/home) failed: Invalid argument
> [2018-11-13 18:51:16.593678] I [MSGID: 115036]
> [server.c:483:server_rpc_notify] 0-urd-gds-volume-server: disconnecting
> connection from
> iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
> [2018-11-13 18:51:16.597201] I [MSGID: 101055]
> [client_t.c:444:gf_client_unref] 0-urd-gds-volume-server: Shutting down
> connection
> iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
> [root@urd-gds-001 ~]# tail -n 100
> /var/log/glusterfs/snaps/urd-gds-volume/snapd.log
> [2018-11-13 18:52:09.782058] I [MSGID: 115036]
> [server.c:483:server_rpc_notify] 0-urd-gds-volume-server: disconnecting
> connection from
> iqn-A002.iqnet.org-24786-2018/08/14-18:39:54:890651-urd-gds-volume-snapd-client-0-1638767
> [2018-11-13 18:52:09.785473] I [MSGID: 101055]
> [client_t.c:444:gf_client_unref] 0-urd-gds-volume-server: Shutting down
> connection
> iqn-A002.iqnet.org-24786-2018/08/14-18:39:54:890651-urd-gds-volume-snapd-client-0-1638767
> [2018-11-13 18:52:09.821147] I [addr.c:55:compare_addr_and_update]
> 0-snapd-urd-gds-volume: allowed = "*", received addr = "192.168.67.115"
> [2018-11-13

[Gluster-users] Gluster Community Meeting, Nov 21 15:00 UTC

2018-11-20 Thread Amye Scavarda

We'll be in #gluster-meeting on freenode at 15:00 UTC on Wednesday, Nov
21st.
https://bit.ly/gluster-community-meetings has the agenda, feel free to add!
- amye

-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.12.14: wrong quota in Distributed Dispersed Volume

2018-11-20 Thread Hari Gowtham

reply inline.
On Tue, Nov 20, 2018 at 3:53 PM Gudrun Mareike Amedick
 wrote:
>
> Hi,
>
> I think I know what happened. According to the logs, the crawlers recieved a 
> signum(15). They seemed to have died before having finished. Probably too
> much to do simultaneously. I have disabled and re-enabled quota and will set 
> the quotas again with more time.
>
> Is there a way to restart a crawler that was killed too soon?

No. the disable and enable of quota starts a new crawl.

>
> If I restart a server while a crawler is running, will the crawler be 
> restarted, too? We'll need to do some hardware fixing on one of the servers 
> soon
> and I need to know whether I have to check the crawlers first before shutting 
> it down.

During the shutdown of the server the crawl will be killed. (data
usage shown will be updated as per what has been crawled)
The crawl won't be restarted on starting the server. Only quotad will
be restarted (which is not the same as crawl).
For the crawl to happen you will have to restart the quota.

>
> Thanks for the pointers
>
> Gudrun Amedick
> Am Dienstag, den 20.11.2018, 11:38 +0530 schrieb Hari Gowtham:
> > Hi,
> >
> > Can you check if the quota crawl finished? Without it having finished
> > the quota list will show incorrect values.
> > Looking at the under accounting, it looks like the crawl is not yet
> > finished ( it does take a lot of time as it has to crawl the whole
> > filesystem).
> >
> > If the crawl has finished and the usage is still showing wrong values
> > then there should be an accounting issue.
> > The easy way to fix this is to try restarting quota. This will not
> > cause any problems. The only downside is the limits won't hold true
> > while the quota is disabled,
> > till its enabled and the crawl finishes.
> > Or you can try using the quota fsck script
> > https://review.gluster.org/#/c/glusterfs/+/19179/ to fix your
> > accounting issue.
> >
> > Regards,
> > Hari.
> > On Mon, Nov 19, 2018 at 10:05 PM Frank Ruehlemann
> >  wrote:
> > >
> > >
> > > Hi,
> > >
> > > we're running a Distributed Dispersed volume with Gluster 3.12.14 at
> > > Debian 9.6 (Stretch).
> > >
> > > We migrated our data (>300TB) from a pure Distributed volume into this
> > > Dispersed volume with cp, followed by multiple rsyncs.
> > > After the migration was successful we enabled quotas again with "gluster
> > > volume quota $VOLUME enable", which finished successfully.
> > > And we set our required quotas with "gluster volume quota $VOLUME
> > > limit-usage $PATH $QUOTA", which finished without errors too.
> > >
> > > But our "gluster volume quota $VOLUME list" shows wrong values.
> > > For example:
> > > A directory with ~170TB of data shows only 40.8TB Used.
> > > When we sum up all quoted directories we're way under the ~310TB that
> > > "df -h /$volume" shows.
> > > And "df -h /$volume/$directory" shows wrong values for nearly all
> > > directories.
> > >
> > > All 72 8TB-bricks and all quota deamons of the 6 servers are visible and
> > > online in "gluster volume status $VOLUME".
> > >
> > >
> > > In quotad.log I found multiple warnings like this:
> > > >
> > > > [2018-11-16 09:21:25.738901] W [dict.c:636:dict_unref] 
> > > > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x1d58)
> > > > [0x7f6844be7d58] 
> > > > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x2b92)
> > > >  [0x7f6844be8b92] -->/usr/lib/x86_64-linux-
> > > > gnu/libglusterfs.so.0(dict_unref+0xc0) [0x7f684b0db640] ) 0-dict: dict 
> > > > is NULL [Invalid argument]
> > > In some brick logs I found those:
> > > >
> > > > [2018-11-19 07:23:30.932327] I [MSGID: 120020] 
> > > > [quota.c:2198:quota_unlink_cbk] 0-$VOLUME-quota: quota context not set 
> > > > inode (gfid:f100f7a9-0779-
> > > > 4b4c-880f-c8b3b4bdc49d) [Invalid argument]
> > > and (replaced the volume name with "$VOLUME") those:
> > > >
> > > > The message "W [MSGID: 120003] [quota.c:821:quota_build_ancestry_cbk] 
> > > > 0-$VOLUME-quota: parent is NULL [Invalid argument]" repeated 13 times
> > > > between [2018-11-19 15:28:54.089404] and [2018-11-19 15:30:12.792175]
> > > > [2018-11-19 15:31:34.559348] W [MSGID: 120003] 
> > > > [quota.c:821:quota_build_ancestry_cbk] 0-$VOLUME-quota: parent is NULL 
> > > > [Invalid argument]
> > > I already found that setting the flag "trusted.glusterfs.quota.dirty" 
> > > might help, but I'm unsure about the consequences that will be triggered.
> > > And I'm unsure about the necessary version flag.
> > >
> > > Has anyone an idea how to fix this?
> > >
> > > Best Regards,
> > > --
> > > Frank Rühlemann
> > >IT-Systemtechnik
> > >
> > > UNIVERSITÄT ZU LÜBECK
> > > IT-Service-Center
> > >
> > > Ratzeburger Allee 160
> > > 23562 Lübeck
> > > Tel +49 451 3101 2034
> > > Fax +49 451 3101 2004
> > > ruehlem...@itsc.uni-luebeck.de
> > > www.itsc.uni-luebeck.de
> > >
> > >
> > >
> > >
> > > ___
>

Re: [Gluster-users] Gluster 3.12.14: wrong quota in Distributed Dispersed Volume

2018-11-20 Thread Gudrun Mareike Amedick

Hi,

I think I know what happened. According to the logs, the crawlers recieved a 
signum(15). They seemed to have died before having finished. Probably too
much to do simultaneously. I have disabled and re-enabled quota and will set 
the quotas again with more time.

Is there a way to restart a crawler that was killed too soon? 

If I restart a server while a crawler is running, will the crawler be 
restarted, too? We'll need to do some hardware fixing on one of the servers soon
and I need to know whether I have to check the crawlers first before shutting 
it down.

Thanks for the pointers

Gudrun Amedick
Am Dienstag, den 20.11.2018, 11:38 +0530 schrieb Hari Gowtham:
> Hi,
> 
> Can you check if the quota crawl finished? Without it having finished
> the quota list will show incorrect values.
> Looking at the under accounting, it looks like the crawl is not yet
> finished ( it does take a lot of time as it has to crawl the whole
> filesystem).
> 
> If the crawl has finished and the usage is still showing wrong values
> then there should be an accounting issue.
> The easy way to fix this is to try restarting quota. This will not
> cause any problems. The only downside is the limits won't hold true
> while the quota is disabled,
> till its enabled and the crawl finishes.
> Or you can try using the quota fsck script
> https://review.gluster.org/#/c/glusterfs/+/19179/ to fix your
> accounting issue.
> 
> Regards,
> Hari.
> On Mon, Nov 19, 2018 at 10:05 PM Frank Ruehlemann
>  wrote:
> > 
> > 
> > Hi,
> > 
> > we're running a Distributed Dispersed volume with Gluster 3.12.14 at
> > Debian 9.6 (Stretch).
> > 
> > We migrated our data (>300TB) from a pure Distributed volume into this
> > Dispersed volume with cp, followed by multiple rsyncs.
> > After the migration was successful we enabled quotas again with "gluster
> > volume quota $VOLUME enable", which finished successfully.
> > And we set our required quotas with "gluster volume quota $VOLUME
> > limit-usage $PATH $QUOTA", which finished without errors too.
> > 
> > But our "gluster volume quota $VOLUME list" shows wrong values.
> > For example:
> > A directory with ~170TB of data shows only 40.8TB Used.
> > When we sum up all quoted directories we're way under the ~310TB that
> > "df -h /$volume" shows.
> > And "df -h /$volume/$directory" shows wrong values for nearly all
> > directories.
> > 
> > All 72 8TB-bricks and all quota deamons of the 6 servers are visible and
> > online in "gluster volume status $VOLUME".
> > 
> > 
> > In quotad.log I found multiple warnings like this:
> > > 
> > > [2018-11-16 09:21:25.738901] W [dict.c:636:dict_unref] 
> > > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x1d58)
> > > [0x7f6844be7d58] 
> > > -->/usr/lib/x86_64-linux-gnu/glusterfs/3.12.14/xlator/features/quotad.so(+0x2b92)
> > >  [0x7f6844be8b92] -->/usr/lib/x86_64-linux-
> > > gnu/libglusterfs.so.0(dict_unref+0xc0) [0x7f684b0db640] ) 0-dict: dict is 
> > > NULL [Invalid argument]
> > In some brick logs I found those:
> > > 
> > > [2018-11-19 07:23:30.932327] I [MSGID: 120020] 
> > > [quota.c:2198:quota_unlink_cbk] 0-$VOLUME-quota: quota context not set 
> > > inode (gfid:f100f7a9-0779-
> > > 4b4c-880f-c8b3b4bdc49d) [Invalid argument]
> > and (replaced the volume name with "$VOLUME") those:
> > > 
> > > The message "W [MSGID: 120003] [quota.c:821:quota_build_ancestry_cbk] 
> > > 0-$VOLUME-quota: parent is NULL [Invalid argument]" repeated 13 times
> > > between [2018-11-19 15:28:54.089404] and [2018-11-19 15:30:12.792175]
> > > [2018-11-19 15:31:34.559348] W [MSGID: 120003] 
> > > [quota.c:821:quota_build_ancestry_cbk] 0-$VOLUME-quota: parent is NULL 
> > > [Invalid argument]
> > I already found that setting the flag "trusted.glusterfs.quota.dirty" might 
> > help, but I'm unsure about the consequences that will be triggered.
> > And I'm unsure about the necessary version flag.
> > 
> > Has anyone an idea how to fix this?
> > 
> > Best Regards,
> > --
> > Frank Rühlemann
> >    IT-Systemtechnik
> > 
> > UNIVERSITÄT ZU LÜBECK
> > IT-Service-Center
> > 
> > Ratzeburger Allee 160
> > 23562 Lübeck
> > Tel +49 451 3101 2034
> > Fax +49 451 3101 2004
> > ruehlem...@itsc.uni-luebeck.de
> > www.itsc.uni-luebeck.de
> > 
> > 
> > 
> > 
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
> 
> 

smime.p7s
Description: S/MIME cryptographic signature
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Deleted file sometimes remains in .glusterfs/unlink

2018-11-20 Thread David Spisla

Hello Ravi,



I am using Gluster v4.1.5. I have replica 4 volume. This is the info:



Volume Name: testv1

Type: Replicate

Volume ID: a5b2d650-4e93-4334-94bb-3105acb112d1

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 4 = 4

Transport-type: tcp

Bricks:

Brick1: fs-davids-c1-n1:/gluster/brick1/glusterbrick

Brick2: fs-davids-c1-n2:/gluster/brick1/glusterbrick

Brick3: fs-davids-c1-n3:/gluster/brick1/glusterbrick

Brick4: fs-davids-c1-n4:/gluster/brick1/glusterbrick

Options Reconfigured:

performance.client-io-threads: off

nfs.disable: on

transport.address-family: inet

user.smb: disable

features.read-only: off

features.worm: off

features.worm-file-level: on

features.retention-mode: enterprise

features.default-retention-period: 120

network.ping-timeout: 10

features.cache-invalidation: on

features.cache-invalidation-timeout: 600

performance.nl-cache: on

performance.nl-cache-timeout: 600

client.event-threads: 32

server.event-threads: 32

cluster.lookup-optimize: on

performance.stat-prefetch: on

performance.cache-invalidation: on

performance.md-cache-timeout: 600

performance.cache-samba-metadata: on

performance.cache-ima-xattrs: on

performance.io-thread-count: 64

cluster.use-compound-fops: on

performance.cache-size: 512MB

performance.cache-refresh-timeout: 10

performance.read-ahead: off

performance.write-behind-window-size: 4MB

performance.write-behind: on

storage.build-pgfid: on

auth.ssl-allow: *

client.ssl: on

server.ssl: on

features.utime: on

storage.ctime: on

features.bitrot: on

features.scrub: Active

features.scrub-freq: daily

cluster.enable-shared-storage: enable



Regards

David

Am Di., 20. Nov. 2018 um 07:33 Uhr schrieb Ravishankar N <
ravishan...@redhat.com>:

>
>
> On 11/19/2018 08:18 PM, David Spisla wrote:
>
> Hello Gluster Community,
>
> sometimes it happens that a file accessed via FUSE or SMB will remain in
> .glusterfs/unlink after delete it. The command 'df -hT' still prints the
> volume capacity before the files was deleted. Another observation is that
> after waiting a hole nigth the file is removed completely and there is the
> correct capacit . Is this behaviour "works as design"?
>
> Is this a replicate volume? Files end up in .glusterfs/unlink post
> deletion only if there is still an fd open on the file. Perhaps there was
> an on going data-self heal or another application had not yet closed the
> file descriptor?
> Which version of gluster are you using and what is the volume info?
> -Ravi
>
>
> The issue was mentioned here already:
> https://lists.gluster.org/pipermail/gluster-devel/2016-July/049952.html
>
> and there seems to be a fix . But unfortunately it still occurs and there
> is only the workaround to restart the brick processes or wait for some
> hours.
>
> Regards
> David Spisla
>
>
> ___
> Gluster-users mailing 
> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] remote operation failed [Transport endpoint is not connected]

Re: [Gluster-users] Gluster snapshot & geo-replication

[Gluster-users] Gluster Community Meeting, Nov 21 15:00 UTC

Re: [Gluster-users] Gluster 3.12.14: wrong quota in Distributed Dispersed Volume

Re: [Gluster-users] Gluster 3.12.14: wrong quota in Distributed Dispersed Volume

Re: [Gluster-users] Deleted file sometimes remains in .glusterfs/unlink

6 matches

Site Navigation

Mail list logo

Footer information