Re: [Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

2019-11-09 Thread Erik Jacobson
> Here is what was the setup :

I thought I'd share an update in case it helps others. Your ideas
inspired me to try a different approach.

We support 4 main distros (and a 2 variants of some). We try not to
provide our own versions of distro-supported packages like CTDB where
possible. So a concern for me in modifying services is that they could
be replaced in package updates. There are ways to mitigate that but
that thought combined with yourr ideas led me to try this:

- Be sure ctdb service is disabled
- Added a systemd serivce of my own, oneshot, that runs a helper script
- The helper script first ensures the gluster volumes show up
  (I use localhost in my case and besides, in our environment, we don't
  want CTDB to have a public IP anyway until NFS can be served so this
  helps there too)
- Even with the gluster volume showing good, during init startup, first
  attempts to mount gluster volumes fail. So the helper script keeps
  looping until they work. It seems they work on the 2nd try (after a 3s
  sleep at failure).
- Once the mounts are confirmed working and mounted, then my helper
  starts the ctdb service.
- Awkward CTDB problems (where the lock check sometimes fails to detect
  a lock problem) are avoided since we won't start CTDB until we're 100%
  sure the gluster lock is mounted and pointing at gluster.

The above is working in prototype form so I'm going to start adding
my bind mounts to the equation.

I think I have a solution that will work now and I thank you so much for
the ideas.

I'm taking things from prototype form now on to something we can provide
people.


With regards to pacemaker. There are a few pacemaker solutions that I've
touched, and one I even helped implement. Now, it could be that I'm not
an expert at writing rules, but pacemaker seems to have often given us
more trouble than the problem it solves. I believe this is due to the
complexity of the software and the power of it. I am not knocking
pacemaker. However, a person really has to be a pacemaker expert
to not make a mistake that could cause a down time. So I have attempted
to avoid pacemaker in the new solution. I know there are down sides --
fencing is there for a reason -- but as far as I can tell the decision
has been right for us. CTDB is less complicated even if does not provide
100% true full HA abilities. That said, in the solution, I've been
careful to future-proof a move to pacemaker. For example, on the gluster
servers/NFS servers, I bring up IP aliases (interfaces) on the network the
BMCs reside so we're seamlessly able to switch to pacemaker with
IPMI/BMC/redfish fencing later if needed without causing too much pain in
the field with deployed servers.

I do realize there are tools to help configure pacemaker for you. Some
that I've tried have given me mixed results, perhaps due to the
complexity of networking setup in the solutions we have.

As we start to deploy this to more locations, I'll gain a feel for if
a move to pacemaker is right or not. I just share this in the interest
of learning. I'm always willing to learn and improve if I've overlooked
something.

Erik


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

2019-11-05 Thread Erik Jacobson
On Tue, Nov 05, 2019 at 05:05:08AM +0200, Strahil wrote:
> Sure,
> 
> Here is what was the setup :

Thank you! You're very kind to send me this. I will verify it with my
setup soon. Hoping to to rid myself of these dep problems. Thank you !!!

Erik


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

2019-11-04 Thread Strahil
Sure,

Here is what was the setup :


[root@ovirt1 ~]# systemctl cat var-run-gluster-shared_storage.mount --no-pager
# /run/systemd/generator/var-run-gluster-shared_storage.mount
# Automatically generated by systemd-fstab-generator

[Unit]
SourcePath=/etc/fstab
Documentation=man:fstab(5) man:systemd-fstab-generator(8)

[Mount]
What=gluster1:/gluster_shared_storage
Where=/var/run/gluster/shared_storage
Type=glusterfs
Options=defaults,x-systemd.requires=glusterd.service,x-systemd.automount

[root@ovirt1 ~]# systemctl cat var-run-gluster-shared_storage.automount 
--no-pager
# /run/systemd/generator/var-run-gluster-shared_storage.automount
# Automatically generated by systemd-fstab-generator

[Unit]
SourcePath=/etc/fstab
Documentation=man:fstab(5) man:systemd-fstab-generator(8)
Before=remote-fs.target
After=glusterd.service
Requires=glusterd.service
[Automount]
Where=/var/run/gluster/shared_storage


[root@ovirt1 ~]# systemctl cat glusterd --no-pager
# /etc/systemd/system/glusterd.service
[Unit]
Description=GlusterFS, a clustered file-system server
Requires=rpcbind.service gluster_bricks-engine.mount gluster_bricks-data.mount 
gluster_bricks-isos.mount
After=network.target rpcbind.service gluster_bricks-engine.mount 
gluster_bricks-data.mount gluster_bricks-isos.mount
Before=network-online.target

[Service]
Type=forking
PIDFile=/var/run/glusterd.pid
LimitNOFILE=65536
Environment="LOG_LEVEL=INFO"
EnvironmentFile=-/etc/sysconfig/glusterd
ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid  --log-level $LOG_LEVEL 
$GLUSTERD_OPTIONS
KillMode=process
SuccessExitStatus=15

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/glusterd.service.d/99-cpu.conf
[Service]
CPUAccounting=yes
Slice=glusterfs.slice

[root@ovirt1 ~]# systemctl cat ctdb  --no-pager
# /etc/systemd/system/ctdb.service
[Unit]
Description=CTDB
Documentation=man:ctdbd(1) man:ctdb(7)
After=network-online.target time-sync.target glusterd.service 
var-run-gluster-shared_storage.automount
Conflicts=var-lib-nfs-rpc_pipefs.mount

[Service]
Environment=SYSTEMD_LOG_LEVEL=debug
Type=forking
LimitCORE=infinity
PIDFile=/run/ctdb/ctdbd.pid
ExecStartPre=/bin/bash -c "sleep 2; if [ -f 
/sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us ]; then echo 1 > 
/sys/fs/cgroup/cpu/system.slice/cpu.rt_runtime_us; fi"
ExecStartPre=/bin/bash -c 'if [[ $(find /var/log/log.ctdb -type f -size 
+20971520c 2>/dev/null) ]]; then truncate -s 0 /var/log/log.ctdb;fi'
ExecStartPre=/bin/bash -c 'if [ -d "/var/run/gluster/shared_storage/lock" ] 
;then exit 4; fi'
ExecStart=/usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid start
ExecStop=/usr/sbin/ctdbd_wrapper /run/ctdb/ctdbd.pid stop
KillMode=control-group
Restart=no

[Install]
WantedBy=multi-user.target


[root@ovirt1 ~]# systemctl cat nfs-ganesha --no-pager
# /usr/lib/systemd/system/nfs-ganesha.service
# This file is part of nfs-ganesha.
#
# There can only be one NFS-server active on a system. When NFS-Ganesha is
# started, the kernel NFS-server should have been stopped. This is achieved by
# the 'Conflicts' directive in this unit.
#
# The Network Locking Manager (rpc.statd) is provided by the nfs-utils package.
# NFS-Ganesha comes with its own nfs-ganesha-lock.service to resolve potential
# conflicts in starting multiple rpc.statd processes. See the comments in the
# nfs-ganesha-lock.service for more details.
#

[Unit]
Description=NFS-Ganesha file server
Documentation=http://github.com/nfs-ganesha/nfs-ganesha/wiki
After=rpcbind.service nfs-ganesha-lock.service
Wants=rpcbind.service nfs-ganesha-lock.service
Conflicts=nfs.target

After=nfs-ganesha-config.service
Wants=nfs-ganesha-config.service

[Service]
Type=forking
Environment="NOFILE=1048576"
EnvironmentFile=-/run/sysconfig/ganesha
ExecStart=/bin/bash -c "${NUMACTL} ${NUMAOPTS} /usr/bin/ganesha.nfsd ${OPTIONS} 
${EPOCH}"
ExecStartPost=-/bin/bash -c "prlimit --pid $MAINPID --nofile=$NOFILE:$NOFILE"
ExecStartPost=-/bin/bash -c "/usr/bin/sleep 2 && /bin/dbus-send --system   
--dest=org.ganesha.nfsd --type=method_call /org/ganesha/nfsd/admin  
org.ganesha.nfsd.admin.init_fds_limit"
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/dbus-send --system   --dest=org.ganesha.nfsd --type=method_call 
/org/ganesha/nfsd/admin org.ganesha.nfsd.admin.shutdown

[Install]
WantedBy=multi-user.target
Also=nfs-ganesha-lock.service


I can't guarantee that it will work 100% in your setup, but I remmember I had 
only few hicups after all node powerdown+powerup.

P.S.: I still prefer corosync/pacemaker but in my setup I cannot have fencing 
and in hyperconverged setup it gets even more complex. If your cluster is 
gluster only - consider pacemaker for that task.

Best Regards,
Strahil NikolovOn Nov 4, 2019 15:57, Erik Jacobson  
wrote:
>
> Thank you! I am very interested. I hadn't considered the automounter 
> idea. 
>
> Also, your fstab has a different dependency approach than mine otherwise 
> as well. 
>
> If you happen to have the examples handy, I'll give them a shot here. 
>
> I'm looking 

Re: [Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

2019-11-04 Thread Erik Jacobson
Thank you! I am very interested. I hadn't considered the automounter
idea.

Also, your fstab has a different dependency approach than mine otherwise
as well.

If you happen to have the examples handy, I'll give them a shot here.

I'm looking forward to emerging from this dark place of dependencies not
working!!

Thank you so much for writing back,

Erik

On Mon, Nov 04, 2019 at 06:59:10AM +0200, Strahil wrote:
> Hi Erik,
> 
> I took another approach.
> 
> 1.  I got a systemd mount unit for my ctdb lock volume's brick:
> [root@ovirt1 system]# grep var /etc/fstab
> gluster1:/gluster_shared_storage /var/run/gluster/shared_storage/ glusterfs 
> defaults,x-systemd.requires=glusterd.service,x-systemd.automount0 0
> 
> As you can see - it is an automounter, because sometimes it fails to mount on 
> time
> 
> 2.  I got custom systemd services for glusterd,ctdb and vdo -  as I need to 
> 'put' dependencies for each of those.
> 
> Now, I'm no longer using ctdb & NFS Ganesha (as my version of ctdb cannot use 
> hpstnames and my environment is a little bit crazy), but I can still provide 
> hints how I did it.
> 
> Best Regards,
> Strahil NikolovOn Nov 3, 2019 22:46, Erik Jacobson  
> wrote:
> >
> > So, I have a solution I have written about in the based that is based on 
> > gluster with CTDB for IP and a level of redundancy. 
> >
> > It's been working fine except for a few quirks I need to work out on 
> > giant clusters when I get access. 
> >
> > I have 3x9 gluster volume, each are also NFS servers, using gluster 
> > NFS (ganesha isn't reliable for my workload yet). There are 9 IP 
> > aliases spread across 9 servers. 
> >
> > I also have many bind mounts that point to the shared storage as a 
> > source, and the /gluster/lock volume ("ctdb") of course. 
> >
> > glusterfs 4.1.6 (rhel8 today, but I use rhel7, rhel8, sles12, and 
> > sles15) 
> >
> > Things work well when everything is up and running. IP failover works 
> > well when one of the servers goes down. My issue is when that server 
> > comes back up. Despite my best efforts with systemd fstab dependencies, 
> > the shared storage areas including the gluster lock for CTDB do not 
> > always get mounted before CTDB starts. This causes trouble for CTDB 
> > correctly joining the collective. I also have problems where my 
> > bind mounts can happen before the shared storage is mounted, despite my 
> > attempts at preventing this with dependencies in fstab. 
> >
> > I decided a better approach would be to use a gluster hook and just 
> > mount everything I need as I need it, and start up ctdb when I know and 
> > verify that /gluster/lock is really gluster and not a local disk. 
> >
> > I started down a road of doing this with a start host hook and after 
> > spending a while at it, I realized my logic error. This will only fire 
> > when the volume is *started*, not when a server that was down re-joins. 
> >
> > I took a look at the code, glusterd-hooks.c, and found that support 
> > for "brick start" is not in place for a hook script but it's nearly 
> > there: 
> >
> >     [GD_OP_START_BRICK] = EMPTY, 
> > ... 
> >
> > and no entry in glusterd_hooks_add_op_args() yet. 
> >
> >
> > Before I make a patch for my own use, I wanted to do a sanity check and 
> > find out if others have solved this better than the road I'm heading 
> > down. 
> >
> > What I was thinking of doing is enabling a brick start hook, and 
> > do my processing for volumes being mounted from there. However, I 
> > suppose brick start is a bad choice for the case of simply stopping and 
> > starting the volume, because my processing would try to complete before 
> > the gluster volume was fully started. It would probably work for a brick 
> > "coming back and joining" but not "stop volume/start volume". 
> >
> > Any suggestions? 
> >
> > My end goal is: 
> > - mount shared storage every boot 
> > - only attempt to mount when gluster is available (_netdev doesn't seem 
> >    to be enough) 
> > - never start ctdb unless /gluster/lock is a shared storage and not a 
> >    directory. 
> > - only do my bind mounts from shared storage in to the rest of the 
> >    layout when we are sure the shared storage is mounted (don't 
> >    bind-mount using an empty directory as a source by accident!) 
> >
> > Thanks so much for reading my question, 
> >
> > Erik 
> >  
> >
> > Community Meeting Calendar: 
> >
> > APAC Schedule - 
> > Every 2nd and 4th Tuesday at 11:30 AM IST 
> > Bridge: https://bluejeans.com/118564314  
> >
> > NA/EMEA Schedule - 
> > Every 1st and 3rd Tuesday at 01:00 PM EDT 
> > Bridge: https://bluejeans.com/118564314  
> >
> > Gluster-users mailing list 
> > Gluster-users@gluster.org 
> > https://lists.gluster.org/mailman/listinfo/gluster-users  


Erik Jacobson
Software Engineer

erik.jacob...@hpe.com
+1 612 851 0550 Office

Eagan, MN
hpe.com


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: 

Re: [Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

2019-11-03 Thread Strahil
Hi Erik,

I took another approach.

1.  I got a systemd mount unit for my ctdb lock volume's brick:
[root@ovirt1 system]# grep var /etc/fstab
gluster1:/gluster_shared_storage /var/run/gluster/shared_storage/ glusterfs 
defaults,x-systemd.requires=glusterd.service,x-systemd.automount0 0

As you can see - it is an automounter, because sometimes it fails to mount on 
time

2.  I got custom systemd services for glusterd,ctdb and vdo -  as I need to 
'put' dependencies for each of those.

Now, I'm no longer using ctdb & NFS Ganesha (as my version of ctdb cannot use 
hpstnames and my environment is a little bit crazy), but I can still provide 
hints how I did it.

Best Regards,
Strahil NikolovOn Nov 3, 2019 22:46, Erik Jacobson  
wrote:
>
> So, I have a solution I have written about in the based that is based on 
> gluster with CTDB for IP and a level of redundancy. 
>
> It's been working fine except for a few quirks I need to work out on 
> giant clusters when I get access. 
>
> I have 3x9 gluster volume, each are also NFS servers, using gluster 
> NFS (ganesha isn't reliable for my workload yet). There are 9 IP 
> aliases spread across 9 servers. 
>
> I also have many bind mounts that point to the shared storage as a 
> source, and the /gluster/lock volume ("ctdb") of course. 
>
> glusterfs 4.1.6 (rhel8 today, but I use rhel7, rhel8, sles12, and 
> sles15) 
>
> Things work well when everything is up and running. IP failover works 
> well when one of the servers goes down. My issue is when that server 
> comes back up. Despite my best efforts with systemd fstab dependencies, 
> the shared storage areas including the gluster lock for CTDB do not 
> always get mounted before CTDB starts. This causes trouble for CTDB 
> correctly joining the collective. I also have problems where my 
> bind mounts can happen before the shared storage is mounted, despite my 
> attempts at preventing this with dependencies in fstab. 
>
> I decided a better approach would be to use a gluster hook and just 
> mount everything I need as I need it, and start up ctdb when I know and 
> verify that /gluster/lock is really gluster and not a local disk. 
>
> I started down a road of doing this with a start host hook and after 
> spending a while at it, I realized my logic error. This will only fire 
> when the volume is *started*, not when a server that was down re-joins. 
>
> I took a look at the code, glusterd-hooks.c, and found that support 
> for "brick start" is not in place for a hook script but it's nearly 
> there: 
>
>     [GD_OP_START_BRICK] = EMPTY, 
> ... 
>
> and no entry in glusterd_hooks_add_op_args() yet. 
>
>
> Before I make a patch for my own use, I wanted to do a sanity check and 
> find out if others have solved this better than the road I'm heading 
> down. 
>
> What I was thinking of doing is enabling a brick start hook, and 
> do my processing for volumes being mounted from there. However, I 
> suppose brick start is a bad choice for the case of simply stopping and 
> starting the volume, because my processing would try to complete before 
> the gluster volume was fully started. It would probably work for a brick 
> "coming back and joining" but not "stop volume/start volume". 
>
> Any suggestions? 
>
> My end goal is: 
> - mount shared storage every boot 
> - only attempt to mount when gluster is available (_netdev doesn't seem 
>    to be enough) 
> - never start ctdb unless /gluster/lock is a shared storage and not a 
>    directory. 
> - only do my bind mounts from shared storage in to the rest of the 
>    layout when we are sure the shared storage is mounted (don't 
>    bind-mount using an empty directory as a source by accident!) 
>
> Thanks so much for reading my question, 
>
> Erik 
>  
>
> Community Meeting Calendar: 
>
> APAC Schedule - 
> Every 2nd and 4th Tuesday at 11:30 AM IST 
> Bridge: https://bluejeans.com/118564314 
>
> NA/EMEA Schedule - 
> Every 1st and 3rd Tuesday at 01:00 PM EDT 
> Bridge: https://bluejeans.com/118564314 
>
> Gluster-users mailing list 
> Gluster-users@gluster.org 
> https://lists.gluster.org/mailman/listinfo/gluster-users 


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] hook script question related to ctdb, shared storage, and bind mounts

2019-11-03 Thread Erik Jacobson
So, I have a solution I have written about in the based that is based on
gluster with CTDB for IP and a level of redundancy.

It's been working fine except for a few quirks I need to work out on
giant clusters when I get access.

I have 3x9 gluster volume, each are also NFS servers, using gluster
NFS (ganesha isn't reliable for my workload yet). There are 9 IP
aliases spread across 9 servers.

I also have many bind mounts that point to the shared storage as a
source, and the /gluster/lock volume ("ctdb") of course.

glusterfs 4.1.6 (rhel8 today, but I use rhel7, rhel8, sles12, and
sles15)

Things work well when everything is up and running. IP failover works
well when one of the servers goes down. My issue is when that server
comes back up. Despite my best efforts with systemd fstab dependencies,
the shared storage areas including the gluster lock for CTDB do not
always get mounted before CTDB starts. This causes trouble for CTDB
correctly joining the collective. I also have problems where my
bind mounts can happen before the shared storage is mounted, despite my
attempts at preventing this with dependencies in fstab.

I decided a better approach would be to use a gluster hook and just
mount everything I need as I need it, and start up ctdb when I know and
verify that /gluster/lock is really gluster and not a local disk.

I started down a road of doing this with a start host hook and after
spending a while at it, I realized my logic error. This will only fire
when the volume is *started*, not when a server that was down re-joins.

I took a look at the code, glusterd-hooks.c, and found that support
for "brick start" is not in place for a hook script but it's nearly
there:

[GD_OP_START_BRICK] = EMPTY,
...

and no entry in glusterd_hooks_add_op_args() yet.


Before I make a patch for my own use, I wanted to do a sanity check and
find out if others have solved this better than the road I'm heading
down.

What I was thinking of doing is enabling a brick start hook, and
do my processing for volumes being mounted from there. However, I
suppose brick start is a bad choice for the case of simply stopping and
starting the volume, because my processing would try to complete before
the gluster volume was fully started. It would probably work for a brick
"coming back and joining" but not "stop volume/start volume".

Any suggestions?

My end goal is:
 - mount shared storage every boot
 - only attempt to mount when gluster is available (_netdev doesn't seem
   to be enough)
 - never start ctdb unless /gluster/lock is a shared storage and not a
   directory.
 - only do my bind mounts from shared storage in to the rest of the
   layout when we are sure the shared storage is mounted (don't
   bind-mount using an empty directory as a source by accident!)

Thanks so much for reading my question,

Erik


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users