Fix Policy Based Routing for private gateway static routes

2022-01-08 Thread Vivek Kumar
Hello Folks,

Below issue was supposed to be fixed after 4.13 but looks like it’s still there 
is current release.  I have faced this issue in 4.15.2

https://github.com/apache/cloudstack/pull/3604 




Vivek Kumar
Sr. Manager - Cloud & DevOps 
IndiQus Technologies
24*7  O +91 11 4055 1411  |   M +91 7503460090 
www.indiqus.com



-- 
This message is intended only for the use of the individual or entity to 
which it is addressed and may contain confidential and/or privileged 
information. If you are not the intended recipient, please delete the 
original message and any copy of it from your computer system. You are 
hereby notified that any dissemination, distribution or copying of this 
communication is strictly prohibited unless proper authorization has been 
obtained for such action. If you have received this communication in error, 
please notify the sender immediately. Although IndiQus attempts to sweep 
e-mail and attachments for viruses, it does not guarantee that both are 
virus-free and accepts no liability for any damage sustained as a result of 
viruses.


Re: 4.16.0: Unpredictable failure to successfully complete deploy-kube-system on HA k8s clusters

2022-01-08 Thread William (B.J.) Lawson, MD
Hi Pearl, thanks for your help.

You are correct -- in all cases, running the deploy-kube-system script
manually on the failed node completes successfully and without issues.

I suspect you are also correct that there may be some race condition where
setup-kube-system has not completed successfully before deploy-kube-system
first runs, although by the time we log into a node that has failed initial
setup, the setup-kube system appears to have completed normally:

● setup-kube-system.service
 Loaded: loaded (/etc/systemd/system/setup-kube-system.service; static)
 Active: inactive (dead)

We are running the kubernetes version 1.22.2 and have tried templates from
CloudStack and ShapeBlue -- although the checksums are identical so didn't
expect a difference there.

Are there any longer-running operations in setup-kube-system that might be
worth trying to explore in more detail?

I've seen in the source code that there are separate cloud-init files
for k8s-control-node-add (think that means "additional"?)
and k8s-control-node (which might be the primary control node)... so was
thinking there might be a possible process in the k8s-control-node-add file
that was tripping things up since the issue only shows up with > 1 control
node.

However, the node(s) that fail to complete seem to be random -- typically
it is one more more of the additional control nodes, and compute nodes also
will fail to complete.

Appreciate any thoughts / logging techniques to look for more information!

BJ

On Thu, Jan 6, 2022 at 11:29 PM Pearl d'Silva 
wrote:

> Hi,
>
> Could you please try running the deploy-kube-system script manually on the
> node. Maybe this would give us a hint as to what the issue could be. It
> could so happen that setup-kube-system service may have not completed
> successfully - particularly the 'kubeadm init' operation and
> deploy-kube-system requires setup-kube-system service to have run
> successfully. So, it also may be worth checking the status of the
> setup-kube-system service. Could also please share the Kubernetes version.
>
> Thanks,
> Pearl
>
>
> 
> From: William (B.J.) Lawson, MD 
> Sent: Thursday, January 6, 2022 8:04 PM
> To: users@cloudstack.apache.org 
> Subject: 4.16.0: Unpredictable failure to successfully complete
> deploy-kube-system on HA k8s clusters
>
> Good morning... we have two Cloudstack 4.16.0 environments where HA k8s
> clusters (meaning clusters with > 1 control node) consistently fail to
> provision successfully.
>
> Clusters with 1 control reliably deploy their VMs, networking, and start...
> however, when allocating 2+ control nodes, invariably the K8s cluster
> remains indefinitely in the "Starting" state despite all of the VMs being
> started.
>
> Logging into the nodes reveals that not all of the nodes are running
> deploy-kube-system successfully. The failed nodes lack a "success" file in
> the core user's home directory. In every case, we can manually
> re-run deploy-kube-system and the process will complete on that node.
>
> When looking at the cloud-init.log file, we see:
>
> ###
>
> 2022-01-06 13:57:58,275 - subp.py[DEBUG]: Running command
> ['/var/lib/cloud/instance/scripts/runcmd'] with allowed return codes [0]
> (shell=False, capture=False)
> 2022-01-06 13:57:58,296 - subp.py[DEBUG]: Unexpected error while running
> command.
> Command: ['/var/lib/cloud/instance/scripts/runcmd']
> Exit code: 5
> Reason: -
> Stdout: -
> Stderr: -
> 2022-01-06 13:57:58,296 - cc_scripts_user.py[WARNING]: Failed to run module
> scripts-user (scripts in /var/lib/cloud/instance/scripts)
> 2022-01-06 13:57:58,296 - handlers.py[DEBUG]: finish:
> modules-final/config-scripts-user: FAIL: running config-scripts-user with
> frequency once-per-instance
> 2022-01-06 13:57:58,296 - util.py[WARNING]: Running module scripts-user
> ( '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>)
> failed
> 2022-01-06 13:57:58,296 - util.py[DEBUG]: Running module scripts-user
> ( '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>)
> failed
> Traceback (most recent call last):
>   File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 848, in
> _run_modules
> ran, _r = cc.run(run_name, mod.handle, func_args,
>   File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
> return self._runners.run(name, functor, args, freq, clear_on_fail)
>   File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 185, in
> run
> results = functor(*args)
>   File
> "/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py", line
> 45, in handle
> subp.runparts(runparts_path)
>   File "/usr/lib/python3/dist-packages/cloudinit/subp.py", line 384, in
> runparts
> raise RuntimeError(
> RuntimeError: Runparts: 1 failures (runcmd) in 1 attempted commands
> 2022-01-06 13:57:58,300 - stages.py[DEBUG]: Running module
> ssh-authkey-fingerprints ( 'cloudinit.config.cc_ssh_authkey_fingerprints' from
>
>