Re: OKD 3.9 to 3.10 upgrade failure on CentOS

2018-12-05 Thread Dharmit Shah
Hi Dan,

In-line responses below.

On 30/11, Dan Pungă wrote:
> Hi Dharmit,
> 
> What you're experiencing looks a lot like a problem I had with the upgrade.
> I ended up doing a fresh install.
> 
> I've tried fiddling around with the ansible config and as I was trying to
> get my head about what was happening I discovered an issue about node names.
> With this reply from Michael Gugino that shed some light on the matter: 
> https://github.com/openshift/openshift-ansible/issues/9935#issuecomment-423268110
> 
> Basically my problem was that the upgrade playbook of OKD 3.10 expected that
> the node names from the previously isntalled version be the short name
> versions and not the FQDN.

My understanding is that with 3.10 you are required to have proper DNS
setup in the cluster. Inventory file needs to have the FQDN of the
systems in cluster and not their IP addresses.

> I guess I was precisely in your position and I really didn't know what else
> to try except doing a fresh install. I have no idea if there is a way of
> changing node names of a running cluster. Maybe someone who knows more about
> the internals could be of help in this respect...

I'm not sure how to change the node names either. But I *think* it could
be done by removing a node from the cluster and then adding it back
through scale-up playbook. There's documentation to do this. It's easier
said than done but if you're careful, this is not entirely impossible.

> Since I see your installation is also a fresh one, maybe it would worth
> uninstalling 3.9 and installing the 3.10. Or maybe have a try at the newest
> 3.11.

This is my test environment where I can play however I wish to.
Unfortunately, I can't do the same with production where we are supposed
to upgrade as well. :(

I managed to fix the issue in my test environment and am going to
upgrade production cluster soon.

Since 3.9 to 3.10 upgrade wasn't working, we planned to uninstall OKD by
executing uninstall playbook
(/usr/share/ansible/openshift-ansible/playbooks/adhoc/uninstall.yml). We
planned to re-use the Jenkins PV after successful 3.11 deployment. While
doing 3.11 setup I faced this issue [1].

I decided to completely remove configuration for "kubeletArguments" from
the hosts file. Configuring the kubelet arguments could be done by
setting "openshift_node_kubelet_args" in 3.9. With 3.10, it's deprecated
and has to be specified in "openshift_node_groups". I'm guessing I was
doing something wrong there. Or maybe it's an issue with OKD
documentation mentioning arguments for container garbage collection [2]
that are not available in upstream kubelet documentation [3]. I have no
clue!

But after removing the kubeletArguments from "openshift_node_groups",
3.9 to 3.10 upgrade using the playbook went just fine!

Hope that helps. :)

Regards,
Dharmit

[1] https://github.com/openshift/openshift-ansible/issues/10774
[2] 
https://docs.okd.io/3.10/admin_guide/garbage_collection.html#container-garbage-collection
[3] https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/

> 
> Hope it helps,
> 
> Dan
> 
> On 20.11.2018 04:38, Dharmit Shah wrote:
> > Hi,
> > 
> > I'm trying to upgrade my OKD 3.9 cluster to 3.10 using
> > openshift-ansible. I have already described the problem in detail and
> > provided logs on the GitHub issue [1].
> > 
> > I could really use some help on this issue!
> > 
> > Regards,
> > Dharmit
> > 
> > [1] https://github.com/openshift/openshift-ansible/issues/10690
> > 

-- 
Dharmit Shah
Red Hat Developer Tools (https://developers.redhat.com/)
irc, mattermost: dharmit
https://dharmitshah.com

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: OKD 3.9 to 3.10 upgrade failure on CentOS

2018-11-29 Thread Dan Pungă

Hi Dharmit,

What you're experiencing looks a lot like a problem I had with the 
upgrade. I ended up doing a fresh install.


I've tried fiddling around with the ansible config and as I was trying 
to get my head about what was happening I discovered an issue about node 
names. With this reply from Michael Gugino that shed some light on the 
matter: 
https://github.com/openshift/openshift-ansible/issues/9935#issuecomment-423268110


Basically my problem was that the upgrade playbook of OKD 3.10 expected 
that the node names from the previously isntalled version be the short 
name versions and not the FQDN.


I guess I was precisely in your position and I really didn't know what 
else to try except doing a fresh install. I have no idea if there is a 
way of changing node names of a running cluster. Maybe someone who knows 
more about the internals could be of help in this respect...


Since I see your installation is also a fresh one, maybe it would worth 
uninstalling 3.9 and installing the 3.10. Or maybe have a try at the 
newest 3.11.


Hope it helps,

Dan

On 20.11.2018 04:38, Dharmit Shah wrote:

Hi,

I'm trying to upgrade my OKD 3.9 cluster to 3.10 using
openshift-ansible. I have already described the problem in detail and
provided logs on the GitHub issue [1].

I could really use some help on this issue!

Regards,
Dharmit

[1] https://github.com/openshift/openshift-ansible/issues/10690



___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


OKD 3.9 to 3.10 upgrade failure on CentOS

2018-11-19 Thread Dharmit Shah
Hi,

I'm trying to upgrade my OKD 3.9 cluster to 3.10 using
openshift-ansible. I have already described the problem in detail and
provided logs on the GitHub issue [1].

I could really use some help on this issue!

Regards,
Dharmit

[1] https://github.com/openshift/openshift-ansible/issues/10690

-- 
Dharmit Shah
Red Hat Developer Tools (https://developers.redhat.com/)
irc, mattermost: dharmit
https://dharmitshah.com

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users