Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

2019-03-19 Thread Kotresh Hiremath Ravishankar
Hi Andy,

This is a issue with non-root geo-rep session and is not fixed yet. Could
you please raise a bug for this issue?

Thanks,
Kotresh HR

On Wed, Mar 20, 2019 at 11:53 AM Andy Coates  wrote:

> We're seeing the same permission denied errors when running as a non-root
> geosync user.
>
> Does anyone know what the underlying issue is?
>
> On Wed, 26 Sep 2018 at 00:40, Kotte, Christian (Ext) <
> christian.ko...@novartis.com> wrote:
>
>> I changed to replication to use the root user and re-created the
>> replication with “create force”. Now the files and folders were replicated,
>> and the permission denied, and New folder error disappeared, but old files
>> are not deleted.
>>
>>
>>
>> Looks like the history crawl is in some kind of a loop:
>>
>>
>>
>> [root@master ~]# gluster volume geo-replication status
>>
>>
>>
>> MASTER NODE MASTER VOL MASTER
>> BRICKSLAVE USERSLAVE
>>SLAVE
>> NODE  STATUSCRAWL STATUSLAST_SYNCED
>>
>>
>> 
>>
>> master  glustervol1
>> /bricks/brick1/brickroot  ssh://slave_1::glustervol1
>>   slave_1 Active
>> Hybrid CrawlN/A
>>
>> master  glustervol1
>> /bricks/brick1/brickroot  ssh://slave_2::glustervol1
>>   slave_2 Active
>> Hybrid CrawlN/A
>>
>> master  glustervol1
>> /bricks/brick1/brickroot  ssh://slave_3::glustervol1
>>   slave_3 Active
>> Hybrid CrawlN/A
>>
>>
>>
>> tail -f
>> /var/log/glusterfs/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.log
>>
>>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
>> 104, in cl_history_changelog
>>
>> raise ChangelogHistoryNotAvailable()
>>
>> ChangelogHistoryNotAvailable
>>
>> [2018-09-25 14:10:44.196011] E [repce(worker
>> /bricks/brick1/brick):197:__call__] RepceClient: call failed
>> call=29945:139700517484352:1537884644.19  method=history
>> error=ChangelogHistoryNotAvailable
>>
>> [2018-09-25 14:10:44.196405] I [resource(worker
>> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not
>> available, using xsync
>>
>> [2018-09-25 14:10:44.221385] I [master(worker
>> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl
>> stime=(0, 0)
>>
>> [2018-09-25 14:10:44.223382] I [gsyncdstatus(worker
>> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
>> Status Change  status=Hybrid Crawl
>>
>> [2018-09-25 14:10:46.225296] I [master(worker
>> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog
>> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537884644
>>
>> [2018-09-25 14:13:36.157408] I [gsyncd(config-get):297:main] : Using
>> session config file
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:13:36.377880] I [gsyncd(status):297:main] : Using
>> session config file
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:31:10.145035] I [master(worker
>> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
>> duration=1212.5316  num_files=1474job=2 return_code=11
>>
>> [2018-09-25 14:31:10.152637] E [syncdutils(worker
>> /bricks/brick1/brick):801:errlog] Popen: command returned error cmd=rsync
>> -aR0 --inplace --files-from=- --super --stats --numeric-ids
>> --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no
>> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
>> -p 22 -oControlMaster=auto -S
>> /tmp/gsyncd-aux-ssh-gg758Z/caec4d1d03cc28bc1853f692e291164f.sock
>> slave_3:/proc/15919/cwd error=11
>>
>> [2018-09-25 14:31:10.237371] I [repce(agent
>> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
>> EOF.
>>
>> [2018-09-25 14:31:10.430820] I
>> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
>> Change  status=Faulty
>>
>> [2018-09-25 14:31:20.541475] I [monitor(monitor):158:monitor] Monitor:
>> starting gsyncd worker  brick=/bricks/brick1/brickslave_node=slave_3
>>
>> [2018-09-25 14:31:20.806518] I [gsyncd(agent
>> /bricks/brick1/brick):297:main] : Using session config file
>> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>>
>> [2018-09-25 14:31:20.816536] I [changelogagent(agent
>> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
>>
>> [2018-09-25 14:31:20.821574] I [g

Re: [Gluster-users] Docu - how to debug issues

2019-03-19 Thread Ravishankar N


On 20/03/19 10:29 AM, Strahil wrote:


Hello Community,

Is there a docu page clearing what information is needed to be 
gathered in advance in order to help the devs resolve issues ?

So far I couldn't find one - but I have missed that.

volume info, gluster version of the clients/servers and all gluster logs 
under /var/log/glusterfs/ are the first things that you would need to 
provide if you were to raise a  bugzilla or a github issue. After that, 
it is mostly issue specific. Some pointers are there in 
https://docs.gluster.org/en/latest/Troubleshooting/.


A consistent reproducer is also something that is good to have as it 
helps speed up the RCA.


HTH,
Ravi


If not, it will be nice to have that info posted somewhere.
For example  - FUSE issues -  do 1,2,3...
Same for other client-side issues and then for cluster-side also.
I guess this will save a lot of  'what is your output of gluster 
volume info vol' questions.


Best Regards,
Strahil Nikolov


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log OSError: [Errno 13] Permission denied

2019-03-19 Thread Andy Coates
We're seeing the same permission denied errors when running as a non-root
geosync user.

Does anyone know what the underlying issue is?

On Wed, 26 Sep 2018 at 00:40, Kotte, Christian (Ext) <
christian.ko...@novartis.com> wrote:

> I changed to replication to use the root user and re-created the
> replication with “create force”. Now the files and folders were replicated,
> and the permission denied, and New folder error disappeared, but old files
> are not deleted.
>
>
>
> Looks like the history crawl is in some kind of a loop:
>
>
>
> [root@master ~]# gluster volume geo-replication status
>
>
>
> MASTER NODE MASTER VOL MASTER BRICK
> SLAVE USERSLAVE  SLAVE
> NODE  STATUSCRAWL STATUSLAST_SYNCED
>
>
> 
>
> master  glustervol1/bricks/brick1/brick
> root  ssh://slave_1::glustervol1
> slave_1 ActiveHybrid CrawlN/A
>
> master  glustervol1/bricks/brick1/brick
> root  ssh://slave_2::glustervol1
> slave_2 ActiveHybrid CrawlN/A
>
> master  glustervol1/bricks/brick1/brick
> root  ssh://slave_3::glustervol1
> slave_3 ActiveHybrid CrawlN/A
>
>
>
> tail -f
> /var/log/glusterfs/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.log
>
>   File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line
> 104, in cl_history_changelog
>
> raise ChangelogHistoryNotAvailable()
>
> ChangelogHistoryNotAvailable
>
> [2018-09-25 14:10:44.196011] E [repce(worker
> /bricks/brick1/brick):197:__call__] RepceClient: call failed
> call=29945:139700517484352:1537884644.19  method=history
> error=ChangelogHistoryNotAvailable
>
> [2018-09-25 14:10:44.196405] I [resource(worker
> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not
> available, using xsync
>
> [2018-09-25 14:10:44.221385] I [master(worker
> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl
> stime=(0, 0)
>
> [2018-09-25 14:10:44.223382] I [gsyncdstatus(worker
> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl
> Status Change  status=Hybrid Crawl
>
> [2018-09-25 14:10:46.225296] I [master(worker
> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog
> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537884644
>
> [2018-09-25 14:13:36.157408] I [gsyncd(config-get):297:main] : Using
> session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 14:13:36.377880] I [gsyncd(status):297:main] : Using
> session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 14:31:10.145035] I [master(worker
> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken
> duration=1212.5316  num_files=1474job=2 return_code=11
>
> [2018-09-25 14:31:10.152637] E [syncdutils(worker
> /bricks/brick1/brick):801:errlog] Popen: command returned error cmd=rsync
> -aR0 --inplace --files-from=- --super --stats --numeric-ids
> --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no
> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem
> -p 22 -oControlMaster=auto -S
> /tmp/gsyncd-aux-ssh-gg758Z/caec4d1d03cc28bc1853f692e291164f.sock
> slave_3:/proc/15919/cwd error=11
>
> [2018-09-25 14:31:10.237371] I [repce(agent
> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching
> EOF.
>
> [2018-09-25 14:31:10.430820] I
> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status
> Change  status=Faulty
>
> [2018-09-25 14:31:20.541475] I [monitor(monitor):158:monitor] Monitor:
> starting gsyncd worker  brick=/bricks/brick1/brickslave_node=slave_3
>
> [2018-09-25 14:31:20.806518] I [gsyncd(agent
> /bricks/brick1/brick):297:main] : Using session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 14:31:20.816536] I [changelogagent(agent
> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining...
>
> [2018-09-25 14:31:20.821574] I [gsyncd(worker
> /bricks/brick1/brick):297:main] : Using session config file
> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf
>
> [2018-09-25 14:31:20.882128] I [resource(worker
> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection
> between master and slave...
>
> [2018-09-25 14:31:24.169857] I [resource(worker
> /bricks/brick1/brick):1424:connect_remote]

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-19 Thread Amar Tumballi Suryanarayan
On Wed, Mar 20, 2019 at 9:52 AM Artem Russakovskii 
wrote:

> Can I roll back performance.write-behind: off and lru-limit=0 then? I'm
> waiting for the debug packages to be available for OpenSUSE, then I can
> help Amar with another debug session.
>
>
Yes, the write-behind issue is now fixed. You can enable write-behind. Also
remove lru-limit=0, so you can also utilize the benefit of garbage
collection introduced in 5.4

Lets get to fixing the problem once the debuginfo packages are available.



> In the meantime, have you had time to set up 1x4 replicate testing? I was
> told you were only testing 1x3, and it's the 4th brick that may be causing
> the crash, which is consistent with this whole time only 1 of 4 bricks
> constantly crashing. The other 3 have been rock solid. I'm hoping you could
> find the issue without a debug session this way.
>
>
That is my gut feeling still. Added a basic test case with 4 bricks,
https://review.gluster.org/#/c/glusterfs/+/22328/. But I think this
particular issue is happening only on certain pattern of access for 1x4
setup. Lets get to the root of it once we have debuginfo packages for Suse
builds.

-Amar

Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Tue, Mar 19, 2019 at 8:27 PM Nithya Balachandran 
> wrote:
>
> > Hi Artem,
> >
> > I think you are running into a different crash. The ones reported which
> > were prevented by turning off write-behind are now fixed.
> > We will need to look into the one you are seeing to see why it is
> > happening.
> >
> > Regards,
> > Nithya
> >
> >
> > On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii 
> > wrote:
> >
> >> The flood is indeed fixed for us on 5.5. However, the crashes are not.
> >>
> >> Sincerely,
> >> Artem
> >>
> >> --
> >> Founder, Android Police , APK Mirror
> >> , Illogical Robot LLC
> >> beerpla.net | +ArtemRussakovskii
> >>  | @ArtemR
> >> 
> >>
> >>
> >> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert  wrote:
> >>
> >>> Hi Amar,
> >>>
> >>> if you refer to this bug:
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test
> >>> setup i haven't seen those entries, while copying & deleting a few GBs
> >>> of data. For a final statement we have to wait until i updated our
> >>> live gluster servers - could take place on tuesday or wednesday.
> >>>
> >>> Maybe other users can do an update to 5.4 as well and report back here.
> >>>
> >>>
> >>> Hubert
> >>>
> >>>
> >>>
> >>> Am Mo., 18. März 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan
> >>> :
> >>> >
> >>> > Hi Hu Bert,
> >>> >
> >>> > Appreciate the feedback. Also are the other boiling issues related to
> >>> logs fixed now?
> >>> >
> >>> > -Amar
> >>> >
> >>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert 
> >>> wrote:
> >>> >>
> >>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2
> >>> >> volumes done. In 'gluster peer status' the peers stay connected
> during
> >>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the
> >>> >> logs. Looks good :-)
> >>> >>
> >>> >> Am Mo., 18. März 2019 um 09:54 Uhr schrieb Hu Bert <
> >>> revi...@googlemail.com>:
> >>> >> >
> >>> >> > Good morning :-)
> >>> >> >
> >>> >> > for debian the packages are there:
> >>> >> >
> >>>
> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
> >>> >> >
> >>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if
> >>> there
> >>> >> > are some errors etc. and report back.
> >>> >> >
> >>> >> > btw: no release notes for 5.4 and 5.5 so far?
> >>> >> > https://docs.gluster.org/en/latest/release-notes/ ?
> >>> >> >
> >>> >> > Am Fr., 15. März 2019 um 14:28 Uhr schrieb Shyam Ranganathan
> >>> >> > :
> >>> >> > >
> >>> >> > > We created a 5.5 release tag, and it is under packaging now. It
> >>> should
> >>> >> > > be packaged and ready for testing early next week and should be
> >>> released
> >>> >> > > close to mid-week next week.
> >>> >> > >
> >>> >> > > Thanks,
> >>> >> > > Shyam
> >>> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote:
> >>> >> > > > Wednesday now with no update :-/
> >>> >> > > >
> >>> >> > > > Sincerely,
> >>> >> > > > Artem
> >>> >> > > >
> >>> >> > > > --
> >>> >> > > > Founder, Android Police , APK
> >>> Mirror
> >>> >> > > > , Illogical Robot LLC
> >>> >> > > > beerpla.net  | +ArtemRussakovskii
> >>> >> > > >  | @ArtemR
> >>> >> > > > 
> >>> >> > > >
> >>> >> > > >
> >>> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii <
> >>> archon...@gmail.com

Re: [Gluster-users] recovery from reboot time?

2019-03-19 Thread Amar Tumballi Suryanarayan
There are 2 things happen after a reboot.

1. glusterd (management layer) does a sanity check of its volumes, and sees
if there are anything different while it went down, and tries to correct
its state.
  - This is fine as long as number of volumes are less, or numbers of nodes
are less. (less is referred as < 100).

2. If it is a replicate or disperse volume, then self-heal daemon does
check if there are any self-heal pending.
  - This does a 'index' crawl to check which files actually changed when
one of the brick/node was down.
  - If this list is big, it can sometimes does take some time.

But 'Days/weeks/month' is not a expected/observed behavior. Is there any
logs in the log file? If not, can you do a 'strace -f' to the pid which is
consuming major CPU?? (strace for 1 mins sample is good enough).

-Amar


On Wed, Mar 20, 2019 at 2:05 AM Alvin Starr  wrote:

> We have a simple replicated volume  with 1 brick on each node of 17TB.
>
> There is something like 35M files and directories on the volume.
>
> One of the servers rebooted and is now "doing something".
>
> It kind of looks like its doing some kind of sality check with the node
> that did not reboot but its hard to say and it looks like it may run for
> hours/days/months
>
> Will Gluster take a long time with Lots of little files to resync?
>
>
> --
> Alvin Starr   ||   land:  (905)513-7688
> Netvel Inc.   ||   Cell:  (416)806-0133
> al...@netvel.net  ||
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Help analise statedumps

2019-03-19 Thread Amar Tumballi Suryanarayan
It is really good to hear the good news.

The one thing we did in 5.4 (and which is present in 6.0 too), is
implementing garbage collection logic in fuse module, which keeps a check
on memory usage. Looks like the feature is working as expected.

Regards,
Amar

On Wed, Mar 20, 2019 at 7:24 AM Sankarshan Mukhopadhyay <
sankarshan.mukhopadh...@gmail.com> wrote:

> On Tue, Mar 19, 2019 at 11:09 PM Pedro Costa  wrote:
>
> > Sorry to revive old thread, but just to let you know that with the
> latest 5.4 version this has virtually stopped happening.
> >
> > I can’t ascertain for sure yet, but since the update the memory
> footprint of Gluster has been massively reduced.
> >
> > Thanks to everyone, great job.
>
> Good news is always fantastic to hear! Thank you for reviving the
> thread and providing feedback.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Transport endpoint is not connected failures in 5.3 under high I/O load

2019-03-19 Thread Amar Tumballi Suryanarayan
Hi Brandon,

There were few concerns raised about 5.3 issues recently, and we fixed some
of them and made 5.5 (in 5.4 we faced an upgrade issue, so 5.5 is
recommended upgrade version).

Can you please upgrade to 5.5 version?

-Amar


On Mon, Mar 18, 2019 at 10:16 PM  wrote:

> Hello list,
>
>
>
> We are having critical failures under load of CentOS7 glusterfs 5.3 with
> our servers losing their local mount point with the issue - "Transport
> endpoint is not connected"
>
>
>
> Not sure if it is related but the logs are full of the following message.
>
>
>
> [2019-03-18 14:00:02.656876] E [MSGID: 101191]
> [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch
> handler
>
>
>
> We operate multiple separate glusterfs distributed clusters of about 6-8
> nodes.  Our 2 biggest, separate, and most I/O active glusterfs clusters are
> both having the issues.
>
>
>
> We are trying to use glusterfs as a unified file system for pureftpd
> backup services for a VPS service.  We have a relatively small backup
> window of the weekend where all our servers backup at the same time.  When
> backups start early on Saturday it causes a sustained massive amount of FTP
> file upload I/O for around 48 hours with all the compressed backup files
> being uploaded.   For our london 8 node cluster for example there is about
> 45 TB of uploads in ~48 hours currently.
>
>
>
> We do have some other smaller issues with directory listing under this
> load too but, it has been working for a couple years since 3.x until we've
> updated recently and randomly now all servers are losing their glusterfs
> mount with the "Transport endpoint is not connected" issue.
>
>
>
> Our glusterfs servers are all mostly the same with small variations.
> Mostly they are supermicro E3 cpu, 16 gb ram, LSI raid10 hdd (with and
> without bbu).  Drive arrays vary between 4-16 sata3 hdd drives each node
> depending on if the servers are older or newer. Firmware is kept up-to-date
> as well as running the latest LSI compiled driver.  the newer 16 drive
> backup servers have 2 x 1Gbit LACP teamed interfaces also.
>
>
>
> [root@lonbaknode3 ~]# uname -r
>
> 3.10.0-957.5.1.el7.x86_64
>
>
>
> [root@lonbaknode3 ~]# rpm -qa |grep gluster
>
> centos-release-gluster5-1.0-1.el7.centos.noarch
>
> glusterfs-libs-5.3-2.el7.x86_64
>
> glusterfs-api-5.3-2.el7.x86_64
>
> glusterfs-5.3-2.el7.x86_64
>
> glusterfs-cli-5.3-2.el7.x86_64
>
> glusterfs-client-xlators-5.3-2.el7.x86_64
>
> glusterfs-server-5.3-2.el7.x86_64
>
> glusterfs-fuse-5.3-2.el7.x86_64
>
> [root@lonbaknode3 ~]#
>
>
>
> [root@lonbaknode3 ~]# gluster volume info all
>
>
>
> Volume Name: volbackups
>
> Type: Distribute
>
> Volume ID: 32bf4fe9-5450-49f8-b6aa-05471d3bdffa
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 8
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: lonbaknode3.domain.net:/lvbackups/brick
>
> Brick2: lonbaknode4.domain.net:/lvbackups/brick
>
> Brick3: lonbaknode5.domain.net:/lvbackups/brick
>
> Brick4: lonbaknode6.domain.net:/lvbackups/brick
>
> Brick5: lonbaknode7.domain.net:/lvbackups/brick
>
> Brick6: lonbaknode8.domain.net:/lvbackups/brick
>
> Brick7: lonbaknode9.domain.net:/lvbackups/brick
>
> Brick8: lonbaknode10.domain.net:/lvbackups/brick
>
> Options Reconfigured:
>
> transport.address-family: inet
>
> nfs.disable: on
>
> cluster.min-free-disk: 1%
>
> performance.cache-size: 8GB
>
> performance.cache-max-file-size: 128MB
>
> diagnostics.brick-log-level: WARNING
>
> diagnostics.brick-sys-log-level: WARNING
>
> client.event-threads: 3
>
> performance.client-io-threads: on
>
> performance.io-thread-count: 24
>
> network.inode-lru-limit: 1048576
>
> performance.parallel-readdir: on
>
> performance.cache-invalidation: on
>
> performance.md-cache-timeout: 600
>
> features.cache-invalidation: on
>
> features.cache-invalidation-timeout: 600
>
> [root@lonbaknode3 ~]#
>
>
>
> Mount output shows the following:
>
>
>
> lonbaknode3.domain.net:/volbackups on /home/volbackups type
> fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
>
>
>
> If you notice anything in our volume or mount settings above missing or
> otherwise bad feel free to let us know.  Still learning this glusterfs.  I
> tried searching for any recommended performance settings but, it's not
> always clear which setting is most applicable or beneficial to our workload.
>
>
>
> I have just found this post that looks like it is the same issues.
>
>
>
> https://lists.gluster.org/pipermail/gluster-users/2019-March/035958.html
>
>
>
> We have not yet tried the suggestion of "performance.write-behind: off"
> but, we will do so if that is recommended.
>
>
>
> Could someone knowledgeable advise anything for these issues?
>
>
>
> If any more information is needed do let us know.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users



-- 
Amar Tumballi (ama

[Gluster-users] Docu - how to debug issues

2019-03-19 Thread Strahil
Hello Community,

Is there a docu page clearing what information is needed to be gathered in 
advance in order to help the devs resolve issues ?
So far I couldn't find one - but I have missed that.
If not, it will be nice to have that info posted somewhere.
For example  - FUSE issues -  do 1,2,3...
Same for other client-side issues and then for cluster-side also.
I guess this will save a lot of  'what is your output of gluster volume info 
vol' questions.

Best Regards,
Strahil Nikolov___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-19 Thread Artem Russakovskii
Can I roll back performance.write-behind: off and lru-limit=0 then? I'm
waiting for the debug packages to be available for OpenSUSE, then I can
help Amar with another debug session.

In the meantime, have you had time to set up 1x4 replicate testing? I was
told you were only testing 1x3, and it's the 4th brick that may be causing
the crash, which is consistent with this whole time only 1 of 4 bricks
constantly crashing. The other 3 have been rock solid. I'm hoping you could
find the issue without a debug session this way.

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Tue, Mar 19, 2019 at 8:27 PM Nithya Balachandran 
wrote:

> Hi Artem,
>
> I think you are running into a different crash. The ones reported which
> were prevented by turning off write-behind are now fixed.
> We will need to look into the one you are seeing to see why it is
> happening.
>
> Regards,
> Nithya
>
>
> On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii 
> wrote:
>
>> The flood is indeed fixed for us on 5.5. However, the crashes are not.
>>
>> Sincerely,
>> Artem
>>
>> --
>> Founder, Android Police , APK Mirror
>> , Illogical Robot LLC
>> beerpla.net | +ArtemRussakovskii
>>  | @ArtemR
>> 
>>
>>
>> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert  wrote:
>>
>>> Hi Amar,
>>>
>>> if you refer to this bug:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test
>>> setup i haven't seen those entries, while copying & deleting a few GBs
>>> of data. For a final statement we have to wait until i updated our
>>> live gluster servers - could take place on tuesday or wednesday.
>>>
>>> Maybe other users can do an update to 5.4 as well and report back here.
>>>
>>>
>>> Hubert
>>>
>>>
>>>
>>> Am Mo., 18. März 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan
>>> :
>>> >
>>> > Hi Hu Bert,
>>> >
>>> > Appreciate the feedback. Also are the other boiling issues related to
>>> logs fixed now?
>>> >
>>> > -Amar
>>> >
>>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert 
>>> wrote:
>>> >>
>>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2
>>> >> volumes done. In 'gluster peer status' the peers stay connected during
>>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the
>>> >> logs. Looks good :-)
>>> >>
>>> >> Am Mo., 18. März 2019 um 09:54 Uhr schrieb Hu Bert <
>>> revi...@googlemail.com>:
>>> >> >
>>> >> > Good morning :-)
>>> >> >
>>> >> > for debian the packages are there:
>>> >> >
>>> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
>>> >> >
>>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if
>>> there
>>> >> > are some errors etc. and report back.
>>> >> >
>>> >> > btw: no release notes for 5.4 and 5.5 so far?
>>> >> > https://docs.gluster.org/en/latest/release-notes/ ?
>>> >> >
>>> >> > Am Fr., 15. März 2019 um 14:28 Uhr schrieb Shyam Ranganathan
>>> >> > :
>>> >> > >
>>> >> > > We created a 5.5 release tag, and it is under packaging now. It
>>> should
>>> >> > > be packaged and ready for testing early next week and should be
>>> released
>>> >> > > close to mid-week next week.
>>> >> > >
>>> >> > > Thanks,
>>> >> > > Shyam
>>> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote:
>>> >> > > > Wednesday now with no update :-/
>>> >> > > >
>>> >> > > > Sincerely,
>>> >> > > > Artem
>>> >> > > >
>>> >> > > > --
>>> >> > > > Founder, Android Police , APK
>>> Mirror
>>> >> > > > , Illogical Robot LLC
>>> >> > > > beerpla.net  | +ArtemRussakovskii
>>> >> > > >  | @ArtemR
>>> >> > > > 
>>> >> > > >
>>> >> > > >
>>> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii <
>>> archon...@gmail.com
>>> >> > > > > wrote:
>>> >> > > >
>>> >> > > > Hi Amar,
>>> >> > > >
>>> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE
>>> build
>>> >> > > > repos. Maybe later today?
>>> >> > > >
>>> >> > > > Thanks.
>>> >> > > >
>>> >> > > > Sincerely,
>>> >> > > > Artem
>>> >> > > >
>>> >> > > > --
>>> >> > > > Founder, Android Police ,
>>> APK Mirror
>>> >> > > > , Illogical Robot LLC
>>> >> > > > beerpla.net  | +ArtemRussakovskii
>>> >> > > >  | @ArtemR
>>> >> > > > 
>>> >> > > >
>>> >> > > >
>>> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan
>>> >> > > > mailto:atumb...@redhat.com>> wrote:
>>> >> 

Re: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release?

2019-03-19 Thread Artem Russakovskii
Brandon, I've had performance.write-behind: off for weeks ever since it was
suggested as a fix, but the crashes kept coming.

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Tue, Mar 19, 2019 at 9:01 AM  wrote:

> Hey Artem,
>
>
>
> Wondering have you tried this "performance.write-behind: off" setting?
> I've added this to my multiple separate gluster clusters but, I won't know
> until weekend ftp backups run again if it helps with our situation as a
> workaround.
>
>
>
> We need this fixed highest priority I know that though.
>
>
>
> Can anyone please advise what steps can I take to get similar crash log
> information from CentOS 7 yum repo built gluster?  Would that help if I
> shared that?
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Vijay Bellur
I tried this configuration on my local setup and the test passed fine.

Adding the fuse and write-behind maintainers in Gluster to check if they
are aware of any oddities with using mmap & fuse.

Thanks,
Vijay

On Tue, Mar 19, 2019 at 2:21 PM Jim Kinney  wrote:

> Volume Name: home
> Type: Replicate
> Volume ID: 5367adb1-99fc-44c3-98c4-71f7a41e628a
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp,rdma
> Bricks:
> Brick1: bmidata1:/data/glusterfs/home/brick/brick
> Brick2: bmidata2:/data/glusterfs/home/brick/brick
> Options Reconfigured:
> performance.client-io-threads: off
> storage.build-pgfid: on
> cluster.self-heal-daemon: enable
> performance.readdir-ahead: off
> nfs.disable: off
>
>
> There are 11 other volumes and all are similar.
>
>
> On Tue, 2019-03-19 at 13:59 -0700, Vijay Bellur wrote:
>
> Thank you for the reproducer! Can you please let us know the output of
> `gluster volume info`?
>
> Regards,
> Vijay
>
> On Tue, Mar 19, 2019 at 12:53 PM Jim Kinney  wrote:
>
> This python will fail when writing to a file in a glusterfs fuse mounted
> directory.
>
> import mmap
>
> # write a simple example file
> with open("hello.txt", "wb") as f:
> f.write("Hello Python!\n")
>
> with open("hello.txt", "r+b") as f:
> # memory-map the file, size 0 means whole file
> mm = mmap.mmap(f.fileno(), 0)
> # read content via standard file methods
> print mm.readline()  # prints "Hello Python!"
> # read content via slice notation
> print mm[:5]  # prints "Hello"
> # update content using slice notation;
> # note that new content must have same size
> mm[6:] = " world!\n"
> # ... and read again using standard file methods
> mm.seek(0)
> print mm.readline()  # prints "Hello  world!"
> # close the map
> mm.close()
>
>
>
>
>
>
>
> On Tue, 2019-03-19 at 12:06 -0400, Jim Kinney wrote:
>
> Native mount issue with multiple clients (centos7 glusterfs 3.12).
>
> Seems to hit python 2.7 and 3+. User tries to open file(s) for write on
> long process and system eventually times out.
>
> Switching to NFS stops the error.
>
> No bug notice yet. Too many pans on the fire :-(
>
> On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan wrote:
>
> Hi Jim,
>
> On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney  wrote:
>
>
> Issues with glusterfs fuse mounts cause issues with python file open for
> write. We have to use nfs to avoid this.
>
> Really want to see better back-end tools to facilitate cleaning up of
> glusterfs failures. If system is going to use hard linked ID, need a
> mapping of id to file to fix things. That option is now on for all exports.
> It should be the default If a host is down and users delete files by the
> thousands, gluster _never_ catches up. Finding path names for ids across
> even a 40TB mount, much less the 200+TB one, is a slow process. A network
> outage of 2 minutes and one system didn't get the call to recursively
> delete several dozen directories each with several thousand files.
>
>
>
> Are you talking about some issues in geo-replication module or some other
> application using native mount? Happy to take the discussion forward about
> these issues.
>
> Are there any bugs open on this?
>
> Thanks,
> Amar
>
>
>
>
> nfs
> On March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe  wrote:
>
> Hi,
>
> Looking into something else I fell over this proposal. Being a shop that
> are going into "Leaving GlusterFS" mode, I thought I would give my two
> cents.
>
> While being partially an HPC shop with a few Lustre filesystems,  we chose
> GlusterFS for an archiving solution (2-3 PB), because we could find files
> in the underlying ZFS filesystems if GlusterFS went sour.
>
> We have used the access to the underlying files plenty, because of the
> continuous instability of GlusterFS'. Meanwhile, Lustre have been almost
> effortless to run and mainly for that reason we are planning to move away
> from GlusterFS.
>
> Reading this proposal kind of underlined that "Leaving GluserFS" is the
> right thing to do. While I never understood why GlusterFS has been in
> feature crazy mode instead of stabilizing mode, taking away crucial
> features I don't get. With RoCE, RDMA is getting mainstream. Quotas are
> very useful, even though the current implementation are not perfect.
> Tiering also makes so much sense, but, for large files, not on a per-file
> level.
>
> To be honest we only use quotas. We got scared of trying out new
> performance features that potentially would open up a new back of issues.
>
> Sorry for being such a buzzkill. I really wanted it to be different.
>
> Cheers,
> Hans Henrik
> On 19/07/2018 08.56, Amar Tumballi wrote:
>
>
> * Hi all, Over last 12 years of Gluster, we have developed many features,
> and continue to support most of it till now. But along the way, we have
> figured out better methods of doing things. Also we are not actively
> maintaining some of these features. We are now thinking of cleaning up

Re: [Gluster-users] NFS export of gluster - solution

2019-03-19 Thread Jiffin Thottan



- Original Message -
From: "Sankarshan Mukhopadhyay" 
Cc: "gluster-users" 
Sent: Tuesday, March 19, 2019 10:07:36 AM
Subject: Re: [Gluster-users] NFS export of gluster - solution

On Tue, Mar 19, 2019 at 9:25 AM Jiffin Thottan  wrote:
>
> Thanks Valerio for sharing the information
>
> - Original Message -
> From: "Valerio Luccio" 
> To: "gluster-users" 
> Sent: Monday, March 18, 2019 8:37:46 PM
> Subject: [Gluster-users] NFS export of gluster - solution
>
> So, I recently start NFS exporting of my gluster so that I could mount
> it from a legacy Mac OS X server. Every 24/36 hours the export seemed to
> freeze causing the server to seize up. The ganesha log was filled with
> errors related to RQUOTA. Frank Filz of the nfs-ganesha suggested that
> I'd try setting "Enable_RQUOTA = false;" in the NFS_CORE_PARAM config
> block of the ganesha.conf file and that seems to have done the trick, 5
> days and counting without a problem.
>

Does this configuration change need to be updated in any existing
documentation (for Gluster, nfs-ganesha)?

Created a pull for the same https://github.com/gluster/glusterdocs/pull/461

--
Jiffin


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-19 Thread Nithya Balachandran
Hi Artem,

I think you are running into a different crash. The ones reported which
were prevented by turning off write-behind are now fixed.
We will need to look into the one you are seeing to see why it is happening.

Regards,
Nithya


On Tue, 19 Mar 2019 at 20:25, Artem Russakovskii 
wrote:

> The flood is indeed fixed for us on 5.5. However, the crashes are not.
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
>
> On Mon, Mar 18, 2019 at 5:41 AM Hu Bert  wrote:
>
>> Hi Amar,
>>
>> if you refer to this bug:
>> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test
>> setup i haven't seen those entries, while copying & deleting a few GBs
>> of data. For a final statement we have to wait until i updated our
>> live gluster servers - could take place on tuesday or wednesday.
>>
>> Maybe other users can do an update to 5.4 as well and report back here.
>>
>>
>> Hubert
>>
>>
>>
>> Am Mo., 18. März 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan
>> :
>> >
>> > Hi Hu Bert,
>> >
>> > Appreciate the feedback. Also are the other boiling issues related to
>> logs fixed now?
>> >
>> > -Amar
>> >
>> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert  wrote:
>> >>
>> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2
>> >> volumes done. In 'gluster peer status' the peers stay connected during
>> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the
>> >> logs. Looks good :-)
>> >>
>> >> Am Mo., 18. März 2019 um 09:54 Uhr schrieb Hu Bert <
>> revi...@googlemail.com>:
>> >> >
>> >> > Good morning :-)
>> >> >
>> >> > for debian the packages are there:
>> >> >
>> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
>> >> >
>> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there
>> >> > are some errors etc. and report back.
>> >> >
>> >> > btw: no release notes for 5.4 and 5.5 so far?
>> >> > https://docs.gluster.org/en/latest/release-notes/ ?
>> >> >
>> >> > Am Fr., 15. März 2019 um 14:28 Uhr schrieb Shyam Ranganathan
>> >> > :
>> >> > >
>> >> > > We created a 5.5 release tag, and it is under packaging now. It
>> should
>> >> > > be packaged and ready for testing early next week and should be
>> released
>> >> > > close to mid-week next week.
>> >> > >
>> >> > > Thanks,
>> >> > > Shyam
>> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote:
>> >> > > > Wednesday now with no update :-/
>> >> > > >
>> >> > > > Sincerely,
>> >> > > > Artem
>> >> > > >
>> >> > > > --
>> >> > > > Founder, Android Police , APK
>> Mirror
>> >> > > > , Illogical Robot LLC
>> >> > > > beerpla.net  | +ArtemRussakovskii
>> >> > > >  | @ArtemR
>> >> > > > 
>> >> > > >
>> >> > > >
>> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii <
>> archon...@gmail.com
>> >> > > > > wrote:
>> >> > > >
>> >> > > > Hi Amar,
>> >> > > >
>> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE
>> build
>> >> > > > repos. Maybe later today?
>> >> > > >
>> >> > > > Thanks.
>> >> > > >
>> >> > > > Sincerely,
>> >> > > > Artem
>> >> > > >
>> >> > > > --
>> >> > > > Founder, Android Police , APK
>> Mirror
>> >> > > > , Illogical Robot LLC
>> >> > > > beerpla.net  | +ArtemRussakovskii
>> >> > > >  | @ArtemR
>> >> > > > 
>> >> > > >
>> >> > > >
>> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan
>> >> > > > mailto:atumb...@redhat.com>> wrote:
>> >> > > >
>> >> > > > We are talking days. Not weeks. Considering already it is
>> >> > > > Thursday here. 1 more day for tagging, and packaging.
>> May be ok
>> >> > > > to expect it on Monday.
>> >> > > >
>> >> > > > -Amar
>> >> > > >
>> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii
>> >> > > > mailto:archon...@gmail.com>>
>> wrote:
>> >> > > >
>> >> > > > Is the next release going to be an imminent hotfix,
>> i.e.
>> >> > > > something like today/tomorrow, or are we talking
>> weeks?
>> >> > > >
>> >> > > > Sincerely,
>> >> > > > Artem
>> >> > > >
>> >> > > > --
>> >> > > > Founder, Android Police <
>> http://www.androidpolice.com>, APK
>> >> > > > Mirror , Illogical Robot
>> LLC
>> >> > > > beerpla.net  |
>> +ArtemRussakovskii
>> >> > > > 

Re: [Gluster-users] / - is in split-brain

2019-03-19 Thread Nithya Balachandran
Hi,

What is the output of the gluster volume info ?

Thanks,
Nithya

On Wed, 20 Mar 2019 at 01:58, Pablo Schandin  wrote:

> Hello all!
>
> I had a volume with only a local brick running vms and recently added a
> second (remote) brick to the volume. After adding the brick, the heal
> command reported the following:
>
> root@gluster-gu1:~# gluster volume heal gv1 info
>> Brick gluster-gu1:/mnt/gv_gu1/brick
>> / - Is in split-brain
>> Status: Connected
>> Number of entries: 1
>> Brick gluster-gu2:/mnt/gv_gu1/brick
>> Status: Connected
>> Number of entries: 0
>
>
> All other files healed correctly. I noticed that in the xfs of the brick I
> see a directory named localadmin but when I ls the gluster volume
> mountpoint I got an error and a lot of ???
>
> root@gluster-gu1:/var/lib/vmImages_gu1# ll
>> ls: cannot access 'localadmin': No data available
>> d?  ? ??   ?? localadmin/
>
>
> This goes for both servers that have that volume gv1 mounted. Both see
> that directory like that. While in the xfs brick
> /mnt/gv_gu1/brick/localadmin is an accessible directory.
>
> root@gluster-gu1:/mnt/gv_gu1/brick/localadmin# ll
>> total 4
>> drwxr-xr-x 2 localadmin root6 Mar  7 09:40 ./
>> drwxr-xr-x 6 root   root 4096 Mar  7 09:40 ../
>
>
> When I added the second brick to the volume, this localadmin folder was
> not replicated there I imagine because of this strange behavior.
>
> Can someone help me with this?
> Thanks!
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Help analise statedumps

2019-03-19 Thread Sankarshan Mukhopadhyay
On Tue, Mar 19, 2019 at 11:09 PM Pedro Costa  wrote:

> Sorry to revive old thread, but just to let you know that with the latest 5.4 
> version this has virtually stopped happening.
>
> I can’t ascertain for sure yet, but since the update the memory footprint of 
> Gluster has been massively reduced.
>
> Thanks to everyone, great job.

Good news is always fantastic to hear! Thank you for reviving the
thread and providing feedback.
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-Maintainers] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Sankarshan Mukhopadhyay
Now that there are sufficient detail in place, could a Gluster team
member file a RHBZ and post it back to this thread?

On Wed, Mar 20, 2019 at 2:51 AM Jim Kinney  wrote:
>
> Volume Name: home
> Type: Replicate
> Volume ID: 5367adb1-99fc-44c3-98c4-71f7a41e628a
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp,rdma
> Bricks:
> Brick1: bmidata1:/data/glusterfs/home/brick/brick
> Brick2: bmidata2:/data/glusterfs/home/brick/brick
> Options Reconfigured:
> performance.client-io-threads: off
> storage.build-pgfid: on
> cluster.self-heal-daemon: enable
> performance.readdir-ahead: off
> nfs.disable: off
>
>
> There are 11 other volumes and all are similar.
>
>
> On Tue, 2019-03-19 at 13:59 -0700, Vijay Bellur wrote:
>
> Thank you for the reproducer! Can you please let us know the output of 
> `gluster volume info`?
>
> Regards,
> Vijay
>
> On Tue, Mar 19, 2019 at 12:53 PM Jim Kinney  wrote:
>
> This python will fail when writing to a file in a glusterfs fuse mounted 
> directory.
>
> import mmap
>
> # write a simple example file
> with open("hello.txt", "wb") as f:
> f.write("Hello Python!\n")
>
> with open("hello.txt", "r+b") as f:
> # memory-map the file, size 0 means whole file
> mm = mmap.mmap(f.fileno(), 0)
> # read content via standard file methods
> print mm.readline()  # prints "Hello Python!"
> # read content via slice notation
> print mm[:5]  # prints "Hello"
> # update content using slice notation;
> # note that new content must have same size
> mm[6:] = " world!\n"
> # ... and read again using standard file methods
> mm.seek(0)
> print mm.readline()  # prints "Hello  world!"
> # close the map
> mm.close()
>
>
>
>
>
>
>
> On Tue, 2019-03-19 at 12:06 -0400, Jim Kinney wrote:
>
> Native mount issue with multiple clients (centos7 glusterfs 3.12).
>
> Seems to hit python 2.7 and 3+. User tries to open file(s) for write on long 
> process and system eventually times out.
>
> Switching to NFS stops the error.
>
> No bug notice yet. Too many pans on the fire :-(
>
> On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan wrote:
>
> Hi Jim,
>
> On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney  wrote:
>
>
> Issues with glusterfs fuse mounts cause issues with python file open for 
> write. We have to use nfs to avoid this.
>
> Really want to see better back-end tools to facilitate cleaning up of 
> glusterfs failures. If system is going to use hard linked ID, need a mapping 
> of id to file to fix things. That option is now on for all exports. It should 
> be the default If a host is down and users delete files by the thousands, 
> gluster _never_ catches up. Finding path names for ids across even a 40TB 
> mount, much less the 200+TB one, is a slow process. A network outage of 2 
> minutes and one system didn't get the call to recursively delete several 
> dozen directories each with several thousand files.
>
>
>
> Are you talking about some issues in geo-replication module or some other 
> application using native mount? Happy to take the discussion forward about 
> these issues.
>
> Are there any bugs open on this?
>
> Thanks,
> Amar
>
>
>
>
> nfs
> On March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe  wrote:
>
> Hi,
>
> Looking into something else I fell over this proposal. Being a shop that are 
> going into "Leaving GlusterFS" mode, I thought I would give my two cents.
>
> While being partially an HPC shop with a few Lustre filesystems,  we chose 
> GlusterFS for an archiving solution (2-3 PB), because we could find files in 
> the underlying ZFS filesystems if GlusterFS went sour.
>
> We have used the access to the underlying files plenty, because of the 
> continuous instability of GlusterFS'. Meanwhile, Lustre have been almost 
> effortless to run and mainly for that reason we are planning to move away 
> from GlusterFS.
>
> Reading this proposal kind of underlined that "Leaving GluserFS" is the right 
> thing to do. While I never understood why GlusterFS has been in feature crazy 
> mode instead of stabilizing mode, taking away crucial features I don't get. 
> With RoCE, RDMA is getting mainstream. Quotas are very useful, even though 
> the current implementation are not perfect. Tiering also makes so much sense, 
> but, for large files, not on a per-file level.
>
> To be honest we only use quotas. We got scared of trying out new performance 
> features that potentially would open up a new back of issues.
>
> Sorry for being such a buzzkill. I really wanted it to be different.
>
> Cheers,
> Hans Henrik
>
> On 19/07/2018 08.56, Amar Tumballi wrote:
>
> Hi all,
>
> Over last 12 years of Gluster, we have developed many features, and continue 
> to support most of it till now. But along the way, we have figured out better 
> methods of doing things. Also we are not actively maintaining some of these 
> features.
>
> We are now thinking of cleaning up some of these ‘unsupported’ features, and 
> mark them as 

Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Jim Kinney
Volume Name: home
Type: Replicate
Volume ID: 5367adb1-99fc-44c3-98c4-71f7a41e628a
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp,rdma
Bricks:
Brick1: bmidata1:/data/glusterfs/home/brick/brick
Brick2: bmidata2:/data/glusterfs/home/brick/brick
Options Reconfigured:
performance.client-io-threads: off
storage.build-pgfid: on
cluster.self-heal-daemon: enable
performance.readdir-ahead: off
nfs.disable: off


There are 11 other volumes and all are similar.


On Tue, 2019-03-19 at 13:59 -0700, Vijay Bellur wrote:
> Thank you for the reproducer! Can you please let us know the output
> of `gluster volume info`?
> Regards,
> Vijay
> 
> On Tue, Mar 19, 2019 at 12:53 PM Jim Kinney 
> wrote:
> > This python will fail when writing to a file in a glusterfs fuse
> > mounted directory.
> > 
> > import mmap
> >   
> > # write a simple example file
> > with open("hello.txt", "wb") as f:
> > f.write("Hello Python!\n")
> > 
> > with open("hello.txt", "r+b") as f:
> > # memory-map the file, size 0 means whole file
> > mm = mmap.mmap(f.fileno(), 0)
> > # read content via standard file methods
> > print mm.readline()  # prints "Hello Python!"
> > # read content via slice notation
> > print mm[:5]  # prints "Hello"
> > # update content using slice notation;
> > # note that new content must have same size
> > mm[6:] = " world!\n"
> > # ... and read again using standard file methods
> > mm.seek(0)
> > print mm.readline()  # prints "Hello  world!"
> > # close the map
> > mm.close()
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > On Tue, 2019-03-19 at 12:06 -0400, Jim Kinney wrote:
> > > Native mount issue with multiple clients (centos7 glusterfs
> > > 3.12).
> > > Seems to hit python 2.7 and 3+. User tries to open file(s) for
> > > write on long process and system eventually times out.
> > > Switching to NFS stops the error.
> > > No bug notice yet. Too many pans on the fire :-(
> > > On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan
> > > wrote:
> > > > Hi Jim,
> > > > 
> > > > On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney <
> > > > jim.kin...@gmail.com> wrote:
> > > > > 
> > > > >   
> > > > >   
> > > > > Issues with glusterfs fuse mounts cause issues with python
> > > > > file open for write. We have to use nfs to avoid this. 
> > > > > 
> > > > > Really want to see better back-end tools to facilitate
> > > > > cleaning up of glusterfs failures. If system is going to use
> > > > > hard linked ID, need a mapping of id to file to fix things.
> > > > > That option is now on for all exports. It should be the
> > > > > default  If a host is down and users delete files by the
> > > > > thousands, gluster _never_ catches up. Finding path names for
> > > > > ids across even a 40TB mount, much less  the 200+TB one, is a
> > > > > slow process. A network outage of 2 minutes and one system
> > > > > didn't get the call to recursively delete several dozen
> > > > > directories each with several thousand files. 
> > > > > 
> > > > > 
> > > > 
> > > > Are you talking about some issues in geo-replication module or
> > > > some other application using native mount? Happy to take the
> > > > discussion forward about these issues. 
> > > > Are there any bugs open on this?
> > > > Thanks,Amar 
> > > > > nfsOn March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe <
> > > > > ha...@nbi.dk> wrote:
> > > > > > Hi,
> > > > > > Looking into something else I fell over this proposal.
> > > > > > Being a
> > > > > >   shop that are going into "Leaving GlusterFS" mode, I
> > > > > > thought I
> > > > > >   would give my two cents.
> > > > > > 
> > > > > > 
> > > > > > While being partially an HPC shop with a few Lustre
> > > > > > filesystems, 
> > > > > >   we chose GlusterFS for an archiving solution (2-3
> > > > > > PB), because we
> > > > > >   could find files in the underlying ZFS filesystems if
> > > > > > GlusterFS
> > > > > >   went sour.
> > > > > > We have used the access to the underlying files plenty,
> > > > > > because
> > > > > >   of the continuous instability of GlusterFS'.
> > > > > > Meanwhile, Lustre
> > > > > >   have been almost effortless to run and mainly for
> > > > > > that reason we
> > > > > >   are planning to move away from GlusterFS.
> > > > > > Reading this proposal kind of underlined that "Leaving
> > > > > > GluserFS"
> > > > > >   is the right thing to do. While I never understood
> > > > > > why GlusterFS
> > > > > >   has been in feature crazy mode instead of stabilizing
> > > > > > mode, taking
> > > > > >   away crucial features I don't get. With RoCE, RDMA is
> > > > > > getting
> > > > > >   mainstream. Quotas are very useful, even though the
> > > > > > current
> > > > > >   implementation are not perfect. Tiering also makes so
> > > > > > much sense,
> > > > > >   but, for large files, not on a per-file level.
> > > > > > To be honest w

Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Vijay Bellur
Thank you for the reproducer! Can you please let us know the output of
`gluster volume info`?

Regards,
Vijay

On Tue, Mar 19, 2019 at 12:53 PM Jim Kinney  wrote:

> This python will fail when writing to a file in a glusterfs fuse mounted
> directory.
>
> import mmap
>
> # write a simple example file
> with open("hello.txt", "wb") as f:
> f.write("Hello Python!\n")
>
> with open("hello.txt", "r+b") as f:
> # memory-map the file, size 0 means whole file
> mm = mmap.mmap(f.fileno(), 0)
> # read content via standard file methods
> print mm.readline()  # prints "Hello Python!"
> # read content via slice notation
> print mm[:5]  # prints "Hello"
> # update content using slice notation;
> # note that new content must have same size
> mm[6:] = " world!\n"
> # ... and read again using standard file methods
> mm.seek(0)
> print mm.readline()  # prints "Hello  world!"
> # close the map
> mm.close()
>
>
>
>
>
>
>
> On Tue, 2019-03-19 at 12:06 -0400, Jim Kinney wrote:
>
> Native mount issue with multiple clients (centos7 glusterfs 3.12).
>
> Seems to hit python 2.7 and 3+. User tries to open file(s) for write on
> long process and system eventually times out.
>
> Switching to NFS stops the error.
>
> No bug notice yet. Too many pans on the fire :-(
>
> On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan wrote:
>
> Hi Jim,
>
> On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney  wrote:
>
>
> Issues with glusterfs fuse mounts cause issues with python file open for
> write. We have to use nfs to avoid this.
>
> Really want to see better back-end tools to facilitate cleaning up of
> glusterfs failures. If system is going to use hard linked ID, need a
> mapping of id to file to fix things. That option is now on for all exports.
> It should be the default If a host is down and users delete files by the
> thousands, gluster _never_ catches up. Finding path names for ids across
> even a 40TB mount, much less the 200+TB one, is a slow process. A network
> outage of 2 minutes and one system didn't get the call to recursively
> delete several dozen directories each with several thousand files.
>
>
>
> Are you talking about some issues in geo-replication module or some other
> application using native mount? Happy to take the discussion forward about
> these issues.
>
> Are there any bugs open on this?
>
> Thanks,
> Amar
>
>
>
>
> nfs
> On March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe  wrote:
>
> Hi,
>
> Looking into something else I fell over this proposal. Being a shop that
> are going into "Leaving GlusterFS" mode, I thought I would give my two
> cents.
>
> While being partially an HPC shop with a few Lustre filesystems,  we chose
> GlusterFS for an archiving solution (2-3 PB), because we could find files
> in the underlying ZFS filesystems if GlusterFS went sour.
>
> We have used the access to the underlying files plenty, because of the
> continuous instability of GlusterFS'. Meanwhile, Lustre have been almost
> effortless to run and mainly for that reason we are planning to move away
> from GlusterFS.
>
> Reading this proposal kind of underlined that "Leaving GluserFS" is the
> right thing to do. While I never understood why GlusterFS has been in
> feature crazy mode instead of stabilizing mode, taking away crucial
> features I don't get. With RoCE, RDMA is getting mainstream. Quotas are
> very useful, even though the current implementation are not perfect.
> Tiering also makes so much sense, but, for large files, not on a per-file
> level.
>
> To be honest we only use quotas. We got scared of trying out new
> performance features that potentially would open up a new back of issues.
>
> Sorry for being such a buzzkill. I really wanted it to be different.
>
> Cheers,
> Hans Henrik
> On 19/07/2018 08.56, Amar Tumballi wrote:
>
>
> * Hi all, Over last 12 years of Gluster, we have developed many features,
> and continue to support most of it till now. But along the way, we have
> figured out better methods of doing things. Also we are not actively
> maintaining some of these features. We are now thinking of cleaning up some
> of these ‘unsupported’ features, and mark them as ‘SunSet’ (i.e., would be
> totally taken out of codebase in following releases) in next upcoming
> release, v5.0. The release notes will provide options for smoothly
> migrating to the supported configurations. If you are using any of these
> features, do let us know, so that we can help you with ‘migration’.. Also,
> we are happy to guide new developers to work on those components which are
> not actively being maintained by current set of developers. List of
> features hitting sunset: ‘cluster/stripe’ translator: This translator was
> developed very early in the evolution of GlusterFS, and addressed one of
> the very common question of Distributed FS, which is “What happens if one
> of my file is bigger than the available brick. Say, I have 2 TB hard drive,
> exported in glusterfs, my file is 

[Gluster-users] recovery from reboot time?

2019-03-19 Thread Alvin Starr

We have a simple replicated volume  with 1 brick on each node of 17TB.

There is something like 35M files and directories on the volume.

One of the servers rebooted and is now "doing something".

It kind of looks like its doing some kind of sality check with the node 
that did not reboot but its hard to say and it looks like it may run for 
hours/days/months


Will Gluster take a long time with Lots of little files to resync?


--
Alvin Starr   ||   land:  (905)513-7688
Netvel Inc.   ||   Cell:  (416)806-0133
al...@netvel.net  ||

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] / - is in split-brain

2019-03-19 Thread Pablo Schandin
Hello all!

I had a volume with only a local brick running vms and recently added a
second (remote) brick to the volume. After adding the brick, the heal
command reported the following:

root@gluster-gu1:~# gluster volume heal gv1 info
> Brick gluster-gu1:/mnt/gv_gu1/brick
> / - Is in split-brain
> Status: Connected
> Number of entries: 1
> Brick gluster-gu2:/mnt/gv_gu1/brick
> Status: Connected
> Number of entries: 0


All other files healed correctly. I noticed that in the xfs of the brick I
see a directory named localadmin but when I ls the gluster volume
mountpoint I got an error and a lot of ???

root@gluster-gu1:/var/lib/vmImages_gu1# ll
> ls: cannot access 'localadmin': No data available
> d?  ? ??   ?? localadmin/


This goes for both servers that have that volume gv1 mounted. Both see that
directory like that. While in the xfs brick /mnt/gv_gu1/brick/localadmin is
an accessible directory.

root@gluster-gu1:/mnt/gv_gu1/brick/localadmin# ll
> total 4
> drwxr-xr-x 2 localadmin root6 Mar  7 09:40 ./
> drwxr-xr-x 6 root   root 4096 Mar  7 09:40 ../


When I added the second brick to the volume, this localadmin folder was not
replicated there I imagine because of this strange behavior.

Can someone help me with this?
Thanks!
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Jim Kinney
This python will fail when writing to a file in a glusterfs fuse
mounted directory.
import mmap  # write a simple example filewith open("hello.txt", "wb")
as f:f.write("Hello Python!\n")
with open("hello.txt", "r+b") as f:# memory-map the file, size 0
means whole filemm = mmap.mmap(f.fileno(), 0)# read content via
standard file methodsprint mm.readline()  # prints "Hello
Python!"# read content via slice notationprint mm[:5]  # prints
"Hello"# update content using slice notation;# note that new
content must have same sizemm[6:] = " world!\n"# ... and read
again using standard file methodsmm.seek(0)print
mm.readline()  # prints "Hello  world!"# close the
mapmm.close()





On Tue, 2019-03-19 at 12:06 -0400, Jim Kinney wrote:
> Native mount issue with multiple clients (centos7 glusterfs 3.12).
> Seems to hit python 2.7 and 3+. User tries to open file(s) for write
> on long process and system eventually times out.
> Switching to NFS stops the error.
> No bug notice yet. Too many pans on the fire :-(
> On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan wrote:
> > Hi Jim,
> > 
> > On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney 
> > wrote:
> > > 
> > >   
> > >   
> > > Issues with glusterfs fuse mounts cause issues with python file
> > > open for write. We have to use nfs to avoid this. 
> > > 
> > > Really want to see better back-end tools to facilitate cleaning
> > > up of glusterfs failures. If system is going to use hard linked
> > > ID, need a mapping of id to file to fix things. That option is
> > > now on for all exports. It should be the default  If a host is
> > > down and users delete files by the thousands, gluster _never_
> > > catches up. Finding path names for ids across even a 40TB mount,
> > > much less  the 200+TB one, is a slow process. A network outage of
> > > 2 minutes and one system didn't get the call to recursively
> > > delete several dozen directories each with several thousand
> > > files. 
> > > 
> > > 
> > 
> > Are you talking about some issues in geo-replication module or some
> > other application using native mount? Happy to take the discussion
> > forward about these issues. 
> > Are there any bugs open on this?
> > Thanks,Amar 
> > > nfsOn March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe <
> > > ha...@nbi.dk> wrote:
> > > > Hi,
> > > > Looking into something else I fell over this proposal.
> > > > Being a
> > > >   shop that are going into "Leaving GlusterFS" mode, I
> > > > thought I
> > > >   would give my two cents.
> > > > 
> > > > 
> > > > While being partially an HPC shop with a few Lustre
> > > > filesystems, 
> > > >   we chose GlusterFS for an archiving solution (2-3 PB),
> > > > because we
> > > >   could find files in the underlying ZFS filesystems if
> > > > GlusterFS
> > > >   went sour.
> > > > We have used the access to the underlying files plenty,
> > > > because
> > > >   of the continuous instability of GlusterFS'. Meanwhile,
> > > > Lustre
> > > >   have been almost effortless to run and mainly for that
> > > > reason we
> > > >   are planning to move away from GlusterFS.
> > > > Reading this proposal kind of underlined that "Leaving
> > > > GluserFS"
> > > >   is the right thing to do. While I never understood why
> > > > GlusterFS
> > > >   has been in feature crazy mode instead of stabilizing
> > > > mode, taking
> > > >   away crucial features I don't get. With RoCE, RDMA is
> > > > getting
> > > >   mainstream. Quotas are very useful, even though the
> > > > current
> > > >   implementation are not perfect. Tiering also makes so
> > > > much sense,
> > > >   but, for large files, not on a per-file level.
> > > > To be honest we only use quotas. We got scared of trying
> > > > out new
> > > >   performance features that potentially would open up a new
> > > > back of
> > > >   issues.
> > > > Sorry for being such a buzzkill. I really wanted it to be
> > > >   different.
> > > > 
> > > > 
> > > > Cheers,
> > > > 
> > > >   Hans Henrik
> > > > 
> > > > 
> > > > On 19/07/2018 08.56, Amar Tumballi
> > > >   wrote:
> > > > 
> > > > 
> > > > 
> > > > >   
> > > > >   
> > > > >   Hi all,
> > > > >   Over last 12 years of Gluster, we have developed
> > > > > many features, and continue to support most of it till now.
> > > > > But along the way, we have figured out better methods of
> > > > > doing things. Also we are not actively maintaining some of
> > > > > these features.
> > > > >   We are now thinking of cleaning up some of these
> > > > > ‘unsupported’ features, and mark them as ‘SunSet’ (i.e.,
> > > > > would be totally taken out of codebase in following releases)
> > > > > in next upcoming release, v5.0. The release notes will
> > > > > provide options for smoothly migrating to the supported
> > > > > configurations.
> > > 

Re: [Gluster-users] Help analise statedumps

2019-03-19 Thread Pedro Costa
Hi,

Sorry to revive old thread, but just to let you know that with the latest 5.4 
version this has virtually stopped happening.

I can’t ascertain for sure yet, but since the update the memory footprint of 
Gluster has been massively reduced.

Thanks to everyone, great job.

Cheers,
P.

From: Pedro Costa
Sent: 04 February 2019 11:28
To: 'Sanju Rakonde' 
Cc: 'gluster-users' 
Subject: RE: [Gluster-users] Help analise statedumps

Hi Sanju,

If it helps, here’s also a statedump (taken just now) since the reboot’s:

https://pmcdigital.sharepoint.com/:u:/g/EbsT2RZsuc5BsRrf7F-fw-4BocyeogW-WvEike_sg8CpZg?e=a7nTqS

Many thanks,
P.

From: Pedro Costa
Sent: 04 February 2019 10:12
To: 'Sanju Rakonde' mailto:srako...@redhat.com>>
Cc: gluster-users mailto:gluster-users@gluster.org>>
Subject: RE: [Gluster-users] Help analise statedumps

Hi Sanju,

The process was `glusterfs`, yes I took the statedump for the same process 
(different PID since it was rebooted).

Cheers,
P.

From: Sanju Rakonde mailto:srako...@redhat.com>>
Sent: 04 February 2019 06:10
To: Pedro Costa mailto:pedro@pmc.digital>>
Cc: gluster-users mailto:gluster-users@gluster.org>>
Subject: Re: [Gluster-users] Help analise statedumps

Hi,

Can you please specify which process has leak? Have you took the statedump of 
the same process which has leak?

Thanks,
Sanju

On Sat, Feb 2, 2019 at 3:15 PM Pedro Costa 
mailto:pedro@pmc.digital>> wrote:
Hi,

I have a 3x replicated cluster running 4.1.7 on ubuntu 16.04.5, all 3 replicas 
are also clients hosting a Node.js/Nginx web server.

The current configuration is as such:

Volume Name: gvol1
Type: Replicate
Volume ID: XX
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vm00:/srv/brick1/gvol1
Brick2: vm01:/srv/brick1/gvol1
Brick3: vm02:/srv/brick1/gvol1
Options Reconfigured:
cluster.self-heal-readdir-size: 2KB
cluster.self-heal-window-size: 2
cluster.background-self-heal-count: 20
network.ping-timeout: 5
disperse.eager-lock: off
performance.parallel-readdir: on
performance.readdir-ahead: on
performance.rda-cache-limit: 128MB
performance.cache-refresh-timeout: 10
performance.nl-cache-timeout: 600
performance.nl-cache: on
cluster.nufa: on
performance.enable-least-priority: off
server.outstanding-rpc-limit: 128
performance.strict-o-direct: on
cluster.shd-max-threads: 12
client.event-threads: 4
cluster.lookup-optimize: on
network.inode-lru-limit: 9
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.cache-samba-metadata: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
features.utime: on
storage.ctime: on
server.event-threads: 4
performance.cache-size: 256MB
performance.read-ahead: on
cluster.readdir-optimize: on
cluster.strict-readdir: on
performance.io-thread-count: 8
server.allow-insecure: on
cluster.read-hash-mode: 0
cluster.lookup-unhashed: auto
cluster.choose-local: on

I believe there’s a memory leak somewhere, it just keeps going up until it 
hangs one or more nodes taking the whole cluster down sometimes.

I have taken 2 statedumps on one of the nodes, one where the memory is too high 
and another just after a reboot with the app running and the volume fully 
healed.

https://pmcdigital.sharepoint.com/:u:/g/EYDsNqTf1UdEuE6B0ZNVPfIBf_I-AbaqHotB1lJOnxLlTg?e=boYP09
 (high memory)

https://pmcdigital.sharepoint.com/:u:/g/EWZBsnET2xBHl6OxO52RCfIBvQ0uIDQ1GKJZ1GrnviyMhg?e=wI3yaY
  (after reboot)

Any help would be greatly appreciated,

Kindest Regards,

Pedro Maia Costa
Senior Developer, pmc.digital
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Thanks,
Sanju
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Jim Kinney
Native mount issue with multiple clients (centos7 glusterfs 3.12).
Seems to hit python 2.7 and 3+. User tries to open file(s) for write on
long process and system eventually times out.
Switching to NFS stops the error.
No bug notice yet. Too many pans on the fire :-(
On Tue, 2019-03-19 at 18:42 +0530, Amar Tumballi Suryanarayan wrote:
> Hi Jim,
> 
> On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney 
> wrote:
> > 
> >   
> >   
> > Issues with glusterfs fuse mounts cause issues with python file
> > open for write. We have to use nfs to avoid this. 
> > 
> > Really want to see better back-end tools to facilitate cleaning up
> > of glusterfs failures. If system is going to use hard linked ID,
> > need a mapping of id to file to fix things. That option is now on
> > for all exports. It should be the default  If a host is down and
> > users delete files by the thousands, gluster _never_ catches up.
> > Finding path names for ids across even a 40TB mount, much less  the
> > 200+TB one, is a slow process. A network outage of 2 minutes and
> > one system didn't get the call to recursively delete several dozen
> > directories each with several thousand files. 
> > 
> > 
> 
> Are you talking about some issues in geo-replication module or some
> other application using native mount? Happy to take the discussion
> forward about these issues. 
> Are there any bugs open on this?
> Thanks,Amar 
> > nfsOn March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe <
> > ha...@nbi.dk> wrote:
> > > Hi,
> > > Looking into something else I fell over this proposal. Being
> > > a
> > >   shop that are going into "Leaving GlusterFS" mode, I
> > > thought I
> > >   would give my two cents.
> > > 
> > > 
> > > While being partially an HPC shop with a few Lustre
> > > filesystems, 
> > >   we chose GlusterFS for an archiving solution (2-3 PB),
> > > because we
> > >   could find files in the underlying ZFS filesystems if
> > > GlusterFS
> > >   went sour.
> > > We have used the access to the underlying files plenty,
> > > because
> > >   of the continuous instability of GlusterFS'. Meanwhile,
> > > Lustre
> > >   have been almost effortless to run and mainly for that
> > > reason we
> > >   are planning to move away from GlusterFS.
> > > Reading this proposal kind of underlined that "Leaving
> > > GluserFS"
> > >   is the right thing to do. While I never understood why
> > > GlusterFS
> > >   has been in feature crazy mode instead of stabilizing mode,
> > > taking
> > >   away crucial features I don't get. With RoCE, RDMA is
> > > getting
> > >   mainstream. Quotas are very useful, even though the current
> > >   implementation are not perfect. Tiering also makes so much
> > > sense,
> > >   but, for large files, not on a per-file level.
> > > To be honest we only use quotas. We got scared of trying out
> > > new
> > >   performance features that potentially would open up a new
> > > back of
> > >   issues.
> > > Sorry for being such a buzzkill. I really wanted it to be
> > >   different.
> > > 
> > > 
> > > Cheers,
> > > 
> > >   Hans Henrik
> > > 
> > > 
> > > On 19/07/2018 08.56, Amar Tumballi
> > >   wrote:
> > > 
> > > 
> > > 
> > > >   
> > > >   
> > > >   Hi all,
> > > >   Over last 12 years of Gluster, we have developed many
> > > > features, and continue to support most of it till now. But
> > > > along the way, we have figured out better methods of doing
> > > > things. Also we are not actively maintaining some of these
> > > > features.
> > > >   We are now thinking of cleaning up some of these
> > > > ‘unsupported’ features, and mark them as ‘SunSet’ (i.e., would
> > > > be totally taken out of codebase in following releases) in next
> > > > upcoming release, v5.0. The release notes will provide options
> > > > for smoothly migrating to the supported configurations.
> > > >   If you are using any of these features, do let us
> > > > know, so that we can help you with ‘migration’.. Also, we are
> > > > happy to guide new developers to work on those components which
> > > > are not actively being maintained by current set of developers.
> > > >   List of features hitting sunset:
> > > >   ‘cluster/stripe’ translator:
> > > >   This translator was developed very early in the
> > > > evolution of GlusterFS, and addressed one of the very common
> > > > question of Distributed FS, which is “What happens if one of my
> > > > file is bigger than the available brick. Say, I have 2 TB hard
> > > > drive, exported in glusterfs, my file is 3 TB”. While it solved
> > > > the purpose, it was very hard to handle failure scenarios, and
> > > > give a real good experience to our users with this feature.
> > > > Over the time, Gluster solved the problem with it’s ‘Shard’
> > > > feature, which solves the problem in much better way, and
> > > > provides much 

Re: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release?

2019-03-19 Thread brandon
Hey Artem,

 

Wondering have you tried this "performance.write-behind: off" setting?  I've
added this to my multiple separate gluster clusters but, I won't know until
weekend ftp backups run again if it helps with our situation as a
workaround.  

 

We need this fixed highest priority I know that though.   

 

Can anyone please advise what steps can I take to get similar crash log
information from CentOS 7 yum repo built gluster?  Would that help if I
shared that?

 

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Release 6: Tagged and ready for packaging

2019-03-19 Thread Shyam Ranganathan
Hi,

RC1 testing is complete and blockers have been addressed. The release is
now tagged for a final round of packaging and package testing before
release.

Thanks for testing out the RC builds and reporting issues that needed to
be addressed.

As packaging and final package testing is finishing up, we would be
writing the upgrade guide for the release as well, before announcing the
release for general consumption.

Shyam
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Constant fuse client crashes "fixed" by setting performance.write-behind: off. Any hope for a 4.1.8 release?

2019-03-19 Thread Artem Russakovskii
I upgraded the node that was crashing to 5.5 yesterday. Today, it got
another crash. This is a 1x4 replicate cluster, you can find the config
mentioned in my previous reports, and Amar should have it as well. Here's
the log:

==> mnt-_data1.log <==
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk]
0-_data1-replicate-0: selecting local read_child
_data1-client-3" repeated 4 times between [2019-03-19
14:40:50.741147] and [2019-03-19 14:40:56.874832]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(1) op(READ)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-03-19 14:40:57
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.5
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7ff841f8364c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7ff841f8dd26]
/lib64/libc.so.6(+0x36160)[0x7ff84114a160]
/lib64/libc.so.6(gsignal+0x110)[0x7ff84114a0e0]
/lib64/libc.so.6(abort+0x151)[0x7ff84114b6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7ff8411426fa]
/lib64/libc.so.6(+0x2e772)[0x7ff841142772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7ff8414d80b8]
/usr/lib64/glusterfs/5.5/xlator/cluster/replicate.so(+0x5de3d)[0x7ff839fbae3d]
/usr/lib64/glusterfs/5.5/xlator/cluster/replicate.so(+0x70d51)[0x7ff839fcdd51]
/usr/lib64/glusterfs/5.5/xlator/protocol/client.so(+0x58e1f)[0x7ff83a252e1f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7ff841d4e820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7ff841d4eb6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7ff841d4b063]
/usr/lib64/glusterfs/5.5/rpc-transport/socket.so(+0xa0ce)[0x7ff83b9690ce]
/usr/lib64/libglusterfs.so.0(+0x85519)[0x7ff841fe1519]
/lib64/libpthread.so.0(+0x7559)[0x7ff8414d5559]
/lib64/libc.so.6(clone+0x3f)[0x7ff84120c81f]
-

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Mon, Mar 18, 2019 at 9:46 PM Amar Tumballi Suryanarayan <
atumb...@redhat.com> wrote:

> Due to this issue, along with few other logging issues, we did make a
> glusterfs-5.5 release, which has the fix for particular crash.
>
> Regards,
> Amar
>
> On Tue, 19 Mar, 2019, 1:04 AM ,  wrote:
>
>> Hello Ville-Pekka and list,
>>
>>
>>
>> I believe we are experiencing similar gluster fuse client crashes on 5.3
>> as mentioned here.  This morning I made a post in regards.
>>
>>
>>
>> https://lists.gluster.org/pipermail/gluster-users/2019-March/036036.html
>>
>>
>>
>> Has this "performance.write-behind: off" setting continued to be all you
>> needed to workaround the issue?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Brandon
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Upgrade 5.3 -> 5.4 on debian: public IP is used instead of LAN IP

2019-03-19 Thread Artem Russakovskii
The flood is indeed fixed for us on 5.5. However, the crashes are not.

Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR



On Mon, Mar 18, 2019 at 5:41 AM Hu Bert  wrote:

> Hi Amar,
>
> if you refer to this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1674225 : in the test
> setup i haven't seen those entries, while copying & deleting a few GBs
> of data. For a final statement we have to wait until i updated our
> live gluster servers - could take place on tuesday or wednesday.
>
> Maybe other users can do an update to 5.4 as well and report back here.
>
>
> Hubert
>
>
>
> Am Mo., 18. März 2019 um 11:36 Uhr schrieb Amar Tumballi Suryanarayan
> :
> >
> > Hi Hu Bert,
> >
> > Appreciate the feedback. Also are the other boiling issues related to
> logs fixed now?
> >
> > -Amar
> >
> > On Mon, Mar 18, 2019 at 3:54 PM Hu Bert  wrote:
> >>
> >> update: upgrade from 5.3 -> 5.5 in a replicate 3 test setup with 2
> >> volumes done. In 'gluster peer status' the peers stay connected during
> >> the upgrade, no 'peer rejected' messages. No cksum mismatches in the
> >> logs. Looks good :-)
> >>
> >> Am Mo., 18. März 2019 um 09:54 Uhr schrieb Hu Bert <
> revi...@googlemail.com>:
> >> >
> >> > Good morning :-)
> >> >
> >> > for debian the packages are there:
> >> >
> https://download.gluster.org/pub/gluster/glusterfs/5/5.5/Debian/stretch/amd64/apt/pool/main/g/glusterfs/
> >> >
> >> > I'll do an upgrade of a test installation 5.3 -> 5.5 and see if there
> >> > are some errors etc. and report back.
> >> >
> >> > btw: no release notes for 5.4 and 5.5 so far?
> >> > https://docs.gluster.org/en/latest/release-notes/ ?
> >> >
> >> > Am Fr., 15. März 2019 um 14:28 Uhr schrieb Shyam Ranganathan
> >> > :
> >> > >
> >> > > We created a 5.5 release tag, and it is under packaging now. It
> should
> >> > > be packaged and ready for testing early next week and should be
> released
> >> > > close to mid-week next week.
> >> > >
> >> > > Thanks,
> >> > > Shyam
> >> > > On 3/13/19 12:34 PM, Artem Russakovskii wrote:
> >> > > > Wednesday now with no update :-/
> >> > > >
> >> > > > Sincerely,
> >> > > > Artem
> >> > > >
> >> > > > --
> >> > > > Founder, Android Police , APK
> Mirror
> >> > > > , Illogical Robot LLC
> >> > > > beerpla.net  | +ArtemRussakovskii
> >> > > >  | @ArtemR
> >> > > > 
> >> > > >
> >> > > >
> >> > > > On Tue, Mar 12, 2019 at 10:28 AM Artem Russakovskii <
> archon...@gmail.com
> >> > > > > wrote:
> >> > > >
> >> > > > Hi Amar,
> >> > > >
> >> > > > Any updates on this? I'm still not seeing it in OpenSUSE build
> >> > > > repos. Maybe later today?
> >> > > >
> >> > > > Thanks.
> >> > > >
> >> > > > Sincerely,
> >> > > > Artem
> >> > > >
> >> > > > --
> >> > > > Founder, Android Police , APK
> Mirror
> >> > > > , Illogical Robot LLC
> >> > > > beerpla.net  | +ArtemRussakovskii
> >> > > >  | @ArtemR
> >> > > > 
> >> > > >
> >> > > >
> >> > > > On Wed, Mar 6, 2019 at 10:30 PM Amar Tumballi Suryanarayan
> >> > > > mailto:atumb...@redhat.com>> wrote:
> >> > > >
> >> > > > We are talking days. Not weeks. Considering already it is
> >> > > > Thursday here. 1 more day for tagging, and packaging. May
> be ok
> >> > > > to expect it on Monday.
> >> > > >
> >> > > > -Amar
> >> > > >
> >> > > > On Thu, Mar 7, 2019 at 11:54 AM Artem Russakovskii
> >> > > > mailto:archon...@gmail.com>> wrote:
> >> > > >
> >> > > > Is the next release going to be an imminent hotfix,
> i.e.
> >> > > > something like today/tomorrow, or are we talking
> weeks?
> >> > > >
> >> > > > Sincerely,
> >> > > > Artem
> >> > > >
> >> > > > --
> >> > > > Founder, Android Police ,
> APK
> >> > > > Mirror , Illogical Robot
> LLC
> >> > > > beerpla.net  |
> +ArtemRussakovskii
> >> > > >  |
> @ArtemR
> >> > > > 
> >> > > >
> >> > > >
> >> > > > On Tue, Mar 5, 2019 at 11:09 AM Artem Russakovskii
> >> > > > mailto:archon...@gmail.com>>
> wrote:
> >> > > >
> >> > > > Ended up downgrading to 5.3 just in case. Peer
> status
> >> > > > and volume status are OK now.
> >> > > >
> >> > > > zypper install --oldpa

Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Hans Henrik Happe

On 19/03/2019 14.10, Amar Tumballi Suryanarayan wrote:
> Hi Hans,
>
> Thanks for the honest feedback. Appreciate this.
>
> On Tue, Mar 19, 2019 at 5:39 PM Hans Henrik Happe  > wrote:
>
> Hi,
>
> Looking into something else I fell over this proposal. Being a
> shop that are going into "Leaving GlusterFS" mode, I thought I
> would give my two cents.
>
> While being partially an HPC shop with a few Lustre filesystems, 
> we chose GlusterFS for an archiving solution (2-3 PB), because we
> could find files in the underlying ZFS filesystems if GlusterFS
> went sour.
>
> We have used the access to the underlying files plenty, because of
> the continuous instability of GlusterFS'. Meanwhile, Lustre have
> been almost effortless to run and mainly for that reason we are
> planning to move away from GlusterFS.
>
> Reading this proposal kind of underlined that "Leaving GluserFS"
> is the right thing to do. While I never understood why GlusterFS
> has been in feature crazy mode instead of stabilizing mode, taking
> away crucial features I don't get. With RoCE, RDMA is getting
> mainstream. Quotas are very useful, even though the current
> implementation are not perfect. Tiering also makes so much sense,
> but, for large files, not on a per-file level.
>
>
> It is a right concern to raise, and removing the existing features is
> not a good thing most of the times. But, one thing we noticed over the
> years is, the features which we develop, and not take to completion
> cause the major heart-burn. People think it is present, and it is
> already few years since its introduced, but if the developers are not
> working on it, users would always feel that the product doesn't work,
> because that one feature didn't work. 
>
> Other than Quota in the proposal email, for all other features, even
> though we have *some* users, we are inclined towards deprecating them,
> considering projects overall goals of stability in the longer run.
>  
>
> To be honest we only use quotas. We got scared of trying out new
> performance features that potentially would open up a new back of
> issues.
>
> About Quota, we heard enough voices, so we will make sure we keep it.
> The original email was 'Proposal', and hence these opinions matter for
> decision.
>
> Sorry for being such a buzzkill. I really wanted it to be different.
>
> We hear you. Please let us know one thing, which were the versions you
> tried ?
>
We started at 3.6 4 years ago. Now we are at 3.12.15, working towards
moving to 4.1.latest.
> We hope in coming months, our recent focus on Stability and Technical
> debt reduction will help you to re-look at Gluster after sometime.
That's great to hear.
>
> Cheers,
> Hans Henrik
>
> On 19/07/2018 08.56, Amar Tumballi wrote:
>> *
>>
>> Hi all,
>>
>> Over last 12 years of Gluster, we have developed many features,
>> and continue to support most of it till now. But along the way,
>> we have figured out better methods of doing things. Also we are
>> not actively maintaining some of these features.
>>
>> We are now thinking of cleaning up some of these ‘unsupported’
>> features, and mark them as ‘SunSet’ (i.e., would be totally taken
>> out of codebase in following releases) in next upcoming release,
>> v5.0. The release notes will provide options for smoothly
>> migrating to the supported configurations.
>>
>> If you are using any of these features, do let us know, so that
>> we can help you with ‘migration’.. Also, we are happy to guide
>> new developers to work on those components which are not actively
>> being maintained by current set of developers.
>>
>>
>>   List of features hitting sunset:
>>
>>
>> ‘cluster/stripe’ translator:
>>
>> This translator was developed very early in the evolution of
>> GlusterFS, and addressed one of the very common question of
>> Distributed FS, which is “What happens if one of my file is
>> bigger than the available brick. Say, I have 2 TB hard drive,
>> exported in glusterfs, my file is 3 TB”. While it solved the
>> purpose, it was very hard to handle failure scenarios, and give a
>> real good experience to our users with this feature. Over the
>> time, Gluster solved the problem with it’s ‘Shard’ feature, which
>> solves the problem in much better way, and provides much better
>> solution with existing well supported stack. Hence the proposal
>> for Deprecation.
>>
>> If you are using this feature, then do write to us, as it needs a
>> proper migration from existing volume to a new full supported
>> volume type before you upgrade.
>>
>>
>> ‘storage/bd’ translator:
>>
>> This feature got into the code base 5 years back with this patch
>> [1]. Plan was to use a block
>> device directly as a b

Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Amar Tumballi Suryanarayan
Hi Jim,

On Tue, Mar 19, 2019 at 6:21 PM Jim Kinney  wrote:

>
> Issues with glusterfs fuse mounts cause issues with python file open for
> write. We have to use nfs to avoid this.
>
> Really want to see better back-end tools to facilitate cleaning up of
> glusterfs failures. If system is going to use hard linked ID, need a
> mapping of id to file to fix things. That option is now on for all exports.
> It should be the default If a host is down and users delete files by the
> thousands, gluster _never_ catches up. Finding path names for ids across
> even a 40TB mount, much less the 200+TB one, is a slow process. A network
> outage of 2 minutes and one system didn't get the call to recursively
> delete several dozen directories each with several thousand files.
>
>
Are you talking about some issues in geo-replication module or some other
application using native mount? Happy to take the discussion forward about
these issues.

Are there any bugs open on this?

Thanks,
Amar


>
>
> nfs
> On March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe  wrote:
>>
>> Hi,
>>
>> Looking into something else I fell over this proposal. Being a shop that
>> are going into "Leaving GlusterFS" mode, I thought I would give my two
>> cents.
>>
>> While being partially an HPC shop with a few Lustre filesystems,  we
>> chose GlusterFS for an archiving solution (2-3 PB), because we could find
>> files in the underlying ZFS filesystems if GlusterFS went sour.
>>
>> We have used the access to the underlying files plenty, because of the
>> continuous instability of GlusterFS'. Meanwhile, Lustre have been almost
>> effortless to run and mainly for that reason we are planning to move away
>> from GlusterFS.
>>
>> Reading this proposal kind of underlined that "Leaving GluserFS" is the
>> right thing to do. While I never understood why GlusterFS has been in
>> feature crazy mode instead of stabilizing mode, taking away crucial
>> features I don't get. With RoCE, RDMA is getting mainstream. Quotas are
>> very useful, even though the current implementation are not perfect.
>> Tiering also makes so much sense, but, for large files, not on a per-file
>> level.
>>
>> To be honest we only use quotas. We got scared of trying out new
>> performance features that potentially would open up a new back of issues.
>>
>> Sorry for being such a buzzkill. I really wanted it to be different.
>>
>> Cheers,
>> Hans Henrik
>> On 19/07/2018 08.56, Amar Tumballi wrote:
>>
>>
>> * Hi all, Over last 12 years of Gluster, we have developed many features,
>> and continue to support most of it till now. But along the way, we have
>> figured out better methods of doing things. Also we are not actively
>> maintaining some of these features. We are now thinking of cleaning up some
>> of these ‘unsupported’ features, and mark them as ‘SunSet’ (i.e., would be
>> totally taken out of codebase in following releases) in next upcoming
>> release, v5.0. The release notes will provide options for smoothly
>> migrating to the supported configurations. If you are using any of these
>> features, do let us know, so that we can help you with ‘migration’.. Also,
>> we are happy to guide new developers to work on those components which are
>> not actively being maintained by current set of developers. List of
>> features hitting sunset: ‘cluster/stripe’ translator: This translator was
>> developed very early in the evolution of GlusterFS, and addressed one of
>> the very common question of Distributed FS, which is “What happens if one
>> of my file is bigger than the available brick. Say, I have 2 TB hard drive,
>> exported in glusterfs, my file is 3 TB”. While it solved the purpose, it
>> was very hard to handle failure scenarios, and give a real good experience
>> to our users with this feature. Over the time, Gluster solved the problem
>> with it’s ‘Shard’ feature, which solves the problem in much better way, and
>> provides much better solution with existing well supported stack. Hence the
>> proposal for Deprecation. If you are using this feature, then do write to
>> us, as it needs a proper migration from existing volume to a new full
>> supported volume type before you upgrade. ‘storage/bd’ translator: This
>> feature got into the code base 5 years back with this patch
>> [1]. Plan was to use a block device
>> directly as a brick, which would help to handle disk-image storage much
>> easily in glusterfs. As the feature is not getting more contribution, and
>> we are not seeing any user traction on this, would like to propose for
>> Deprecation. If you are using the feature, plan to move to a supported
>> gluster volume configuration, and have your setup ‘supported’ before
>> upgrading to your new gluster version. ‘RDMA’ transport support: Gluster
>> started supporting RDMA while ib-verbs was still new, and very high-end
>> infra around that time were using Infiniband. Engineers did work with
>> Mellanox, and got the technology into GlusterFS for bett

Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Amar Tumballi Suryanarayan
Hi Hans,

Thanks for the honest feedback. Appreciate this.

On Tue, Mar 19, 2019 at 5:39 PM Hans Henrik Happe  wrote:

> Hi,
>
> Looking into something else I fell over this proposal. Being a shop that
> are going into "Leaving GlusterFS" mode, I thought I would give my two
> cents.
>
> While being partially an HPC shop with a few Lustre filesystems,  we chose
> GlusterFS for an archiving solution (2-3 PB), because we could find files
> in the underlying ZFS filesystems if GlusterFS went sour.
>
> We have used the access to the underlying files plenty, because of the
> continuous instability of GlusterFS'. Meanwhile, Lustre have been almost
> effortless to run and mainly for that reason we are planning to move away
> from GlusterFS.
>
> Reading this proposal kind of underlined that "Leaving GluserFS" is the
> right thing to do. While I never understood why GlusterFS has been in
> feature crazy mode instead of stabilizing mode, taking away crucial
> features I don't get. With RoCE, RDMA is getting mainstream. Quotas are
> very useful, even though the current implementation are not perfect.
> Tiering also makes so much sense, but, for large files, not on a per-file
> level.
>
>
It is a right concern to raise, and removing the existing features is not a
good thing most of the times. But, one thing we noticed over the years is,
the features which we develop, and not take to completion cause the major
heart-burn. People think it is present, and it is already few years since
its introduced, but if the developers are not working on it, users would
always feel that the product doesn't work, because that one feature didn't
work.

Other than Quota in the proposal email, for all other features, even though
we have *some* users, we are inclined towards deprecating them, considering
projects overall goals of stability in the longer run.


> To be honest we only use quotas. We got scared of trying out new
> performance features that potentially would open up a new back of issues.
>
> About Quota, we heard enough voices, so we will make sure we keep it. The
original email was 'Proposal', and hence these opinions matter for decision.

Sorry for being such a buzzkill. I really wanted it to be different.
>
> We hear you. Please let us know one thing, which were the versions you
tried ?

We hope in coming months, our recent focus on Stability and Technical debt
reduction will help you to re-look at Gluster after sometime.


> Cheers,
> Hans Henrik
> On 19/07/2018 08.56, Amar Tumballi wrote:
>
>
> * Hi all, Over last 12 years of Gluster, we have developed many features,
> and continue to support most of it till now. But along the way, we have
> figured out better methods of doing things. Also we are not actively
> maintaining some of these features. We are now thinking of cleaning up some
> of these ‘unsupported’ features, and mark them as ‘SunSet’ (i.e., would be
> totally taken out of codebase in following releases) in next upcoming
> release, v5.0. The release notes will provide options for smoothly
> migrating to the supported configurations. If you are using any of these
> features, do let us know, so that we can help you with ‘migration’.. Also,
> we are happy to guide new developers to work on those components which are
> not actively being maintained by current set of developers. List of
> features hitting sunset: ‘cluster/stripe’ translator: This translator was
> developed very early in the evolution of GlusterFS, and addressed one of
> the very common question of Distributed FS, which is “What happens if one
> of my file is bigger than the available brick. Say, I have 2 TB hard drive,
> exported in glusterfs, my file is 3 TB”. While it solved the purpose, it
> was very hard to handle failure scenarios, and give a real good experience
> to our users with this feature. Over the time, Gluster solved the problem
> with it’s ‘Shard’ feature, which solves the problem in much better way, and
> provides much better solution with existing well supported stack. Hence the
> proposal for Deprecation. If you are using this feature, then do write to
> us, as it needs a proper migration from existing volume to a new full
> supported volume type before you upgrade. ‘storage/bd’ translator: This
> feature got into the code base 5 years back with this patch
> [1]. Plan was to use a block device
> directly as a brick, which would help to handle disk-image storage much
> easily in glusterfs. As the feature is not getting more contribution, and
> we are not seeing any user traction on this, would like to propose for
> Deprecation. If you are using the feature, plan to move to a supported
> gluster volume configuration, and have your setup ‘supported’ before
> upgrading to your new gluster version. ‘RDMA’ transport support: Gluster
> started supporting RDMA while ib-verbs was still new, and very high-end
> infra around that time were using Infiniband. Engineers did work with
> Mellanox, and got the technol

Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Jim Kinney
0">For my uses, the RDMA transport is essential. Much of my storage is used for 
HPC systems and IB is the network layer. We still use v3.12.

Issues with glusterfs fuse mounts cause issues with python file open for write. 
We have to use nfs to avoid this. 

Really want to see better back-end tools to facilitate cleaning up of glusterfs 
failures. If system is going to use hard linked ID, need a mapping of id to 
file to fix things. That option is now on for all exports. It should be the 
default  If a host is down and users delete files by the thousands, gluster 
_never_ catches up. Finding path names for ids across even a 40TB mount, much 
less  the 200+TB one, is a slow process. A network outage of 2 minutes and one 
system didn't get the call to recursively delete several dozen directories each 
with several thousand files. 



On March 19, 2019 8:09:01 AM EDT, Hans Henrik Happe  wrote:
>Hi,
>
>Looking into something else I fell over this proposal. Being a shop
>that
>are going into "Leaving GlusterFS" mode, I thought I would give my two
>cents.
>
>While being partially an HPC shop with a few Lustre filesystems,  we
>chose GlusterFS for an archiving solution (2-3 PB), because we could
>find files in the underlying ZFS filesystems if GlusterFS went sour.
>
>We have used the access to the underlying files plenty, because of the
>continuous instability of GlusterFS'. Meanwhile, Lustre have been
>almost
>effortless to run and mainly for that reason we are planning to move
>away from GlusterFS.
>
>Reading this proposal kind of underlined that "Leaving GluserFS" is the
>right thing to do. While I never understood why GlusterFS has been in
>feature crazy mode instead of stabilizing mode, taking away crucial
>features I don't get. With RoCE, RDMA is getting mainstream. Quotas are
>very useful, even though the current implementation are not perfect.
>Tiering also makes so much sense, but, for large files, not on a
>per-file level.
>
>To be honest we only use quotas. We got scared of trying out new
>performance features that potentially would open up a new back of
>issues.
>
>Sorry for being such a buzzkill. I really wanted it to be different.
>
>Cheers,
>Hans Henrik
>
>On 19/07/2018 08.56, Amar Tumballi wrote:
>> *
>>
>> Hi all,
>>
>> Over last 12 years of Gluster, we have developed many features, and
>> continue to support most of it till now. But along the way, we have
>> figured out better methods of doing things. Also we are not actively
>> maintaining some of these features.
>>
>> We are now thinking of cleaning up some of these ‘unsupported’
>> features, and mark them as ‘SunSet’ (i.e., would be totally taken out
>> of codebase in following releases) in next upcoming release, v5.0.
>The
>> release notes will provide options for smoothly migrating to the
>> supported configurations.
>>
>> If you are using any of these features, do let us know, so that we
>can
>> help you with ‘migration’.. Also, we are happy to guide new
>developers
>> to work on those components which are not actively being maintained
>by
>> current set of developers.
>>
>>
>>   List of features hitting sunset:
>>
>>
>> ‘cluster/stripe’ translator:
>>
>> This translator was developed very early in the evolution of
>> GlusterFS, and addressed one of the very common question of
>> Distributed FS, which is “What happens if one of my file is bigger
>> than the available brick. Say, I have 2 TB hard drive, exported in
>> glusterfs, my file is 3 TB”. While it solved the purpose, it was very
>> hard to handle failure scenarios, and give a real good experience to
>> our users with this feature. Over the time, Gluster solved the
>problem
>> with it’s ‘Shard’ feature, which solves the problem in much better
>> way, and provides much better solution with existing well supported
>> stack. Hence the proposal for Deprecation.
>>
>> If you are using this feature, then do write to us, as it needs a
>> proper migration from existing volume to a new full supported volume
>> type before you upgrade.
>>
>>
>> ‘storage/bd’ translator:
>>
>> This feature got into the code base 5 years back with this patch
>> [1]. Plan was to use a block device
>> directly as a brick, which would help to handle disk-image storage
>> much easily in glusterfs.
>>
>> As the feature is not getting more contribution, and we are not
>seeing
>> any user traction on this, would like to propose for Deprecation.
>>
>> If you are using the feature, plan to move to a supported gluster
>> volume configuration, and have your setup ‘supported’ before
>upgrading
>> to your new gluster version.
>>
>>
>> ‘RDMA’ transport support:
>>
>> Gluster started supporting RDMA while ib-verbs was still new, and
>very
>> high-end infra around that time were using Infiniband. Engineers did
>> work with Mellanox, and got the technology into GlusterFS for better
>> data migration, data copy. While current day kernels support very
>good
>> speed with 

Re: [Gluster-users] Proposal to mark few features as Deprecated / SunSet from Version 5.0

2019-03-19 Thread Hans Henrik Happe
Hi,

Looking into something else I fell over this proposal. Being a shop that
are going into "Leaving GlusterFS" mode, I thought I would give my two
cents.

While being partially an HPC shop with a few Lustre filesystems,  we
chose GlusterFS for an archiving solution (2-3 PB), because we could
find files in the underlying ZFS filesystems if GlusterFS went sour.

We have used the access to the underlying files plenty, because of the
continuous instability of GlusterFS'. Meanwhile, Lustre have been almost
effortless to run and mainly for that reason we are planning to move
away from GlusterFS.

Reading this proposal kind of underlined that "Leaving GluserFS" is the
right thing to do. While I never understood why GlusterFS has been in
feature crazy mode instead of stabilizing mode, taking away crucial
features I don't get. With RoCE, RDMA is getting mainstream. Quotas are
very useful, even though the current implementation are not perfect.
Tiering also makes so much sense, but, for large files, not on a
per-file level.

To be honest we only use quotas. We got scared of trying out new
performance features that potentially would open up a new back of issues.

Sorry for being such a buzzkill. I really wanted it to be different.

Cheers,
Hans Henrik

On 19/07/2018 08.56, Amar Tumballi wrote:
> *
>
> Hi all,
>
> Over last 12 years of Gluster, we have developed many features, and
> continue to support most of it till now. But along the way, we have
> figured out better methods of doing things. Also we are not actively
> maintaining some of these features.
>
> We are now thinking of cleaning up some of these ‘unsupported’
> features, and mark them as ‘SunSet’ (i.e., would be totally taken out
> of codebase in following releases) in next upcoming release, v5.0. The
> release notes will provide options for smoothly migrating to the
> supported configurations.
>
> If you are using any of these features, do let us know, so that we can
> help you with ‘migration’.. Also, we are happy to guide new developers
> to work on those components which are not actively being maintained by
> current set of developers.
>
>
>   List of features hitting sunset:
>
>
> ‘cluster/stripe’ translator:
>
> This translator was developed very early in the evolution of
> GlusterFS, and addressed one of the very common question of
> Distributed FS, which is “What happens if one of my file is bigger
> than the available brick. Say, I have 2 TB hard drive, exported in
> glusterfs, my file is 3 TB”. While it solved the purpose, it was very
> hard to handle failure scenarios, and give a real good experience to
> our users with this feature. Over the time, Gluster solved the problem
> with it’s ‘Shard’ feature, which solves the problem in much better
> way, and provides much better solution with existing well supported
> stack. Hence the proposal for Deprecation.
>
> If you are using this feature, then do write to us, as it needs a
> proper migration from existing volume to a new full supported volume
> type before you upgrade.
>
>
> ‘storage/bd’ translator:
>
> This feature got into the code base 5 years back with this patch
> [1]. Plan was to use a block device
> directly as a brick, which would help to handle disk-image storage
> much easily in glusterfs.
>
> As the feature is not getting more contribution, and we are not seeing
> any user traction on this, would like to propose for Deprecation.
>
> If you are using the feature, plan to move to a supported gluster
> volume configuration, and have your setup ‘supported’ before upgrading
> to your new gluster version.
>
>
> ‘RDMA’ transport support:
>
> Gluster started supporting RDMA while ib-verbs was still new, and very
> high-end infra around that time were using Infiniband. Engineers did
> work with Mellanox, and got the technology into GlusterFS for better
> data migration, data copy. While current day kernels support very good
> speed with IPoIB module itself, and there are no more bandwidth for
> experts in these area to maintain the feature, we recommend migrating
> over to TCP (IP based) network for your volume.
>
> If you are successfully using RDMA transport, do get in touch with us
> to prioritize the migration plan for your volume. Plan is to work on
> this after the release, so by version 6.0, we will have a cleaner
> transport code, which just needs to support one type.
>
>
> ‘Tiering’ feature
>
> Gluster’s tiering feature which was planned to be providing an option
> to keep your ‘hot’ data in different location than your cold data, so
> one can get better performance. While we saw some users for the
> feature, it needs much more attention to be completely bug free. At
> the time, we are not having any active maintainers for the feature,
> and hence suggesting to take it out of the ‘supported’ tag.
>
> If you are willing to take it up, and maintain it, do let us know, and
> we are happy to assist you.
>
> If you are