Thanks for all of your help everyone,

I've been busy with other thing but was able to pick up where I left regarding Magnum. After fixing some issues I have been able to provision a working Kubernetes cluster.

I'm still having issues with getting Docker Swarm working, I've tried with both Docker and flannel as the networking layer but none of these works. After investigating the issue seems to be that etcd.service is not installed (unit file doesn't exist) so the master doesn't work, the minion swarm node is provisioned but cannot join the cluster because there is no etcd.

Anybody seen this issue before? I've been digging through all cloud-init logs and cannot see anything that would cause this.

I also have another separate issue, when provisioning using the magnum-ui in Horizon and selecting ubuntu with Mesos I get the error "The Parameter (nodes_affinity_policy) was not provided". The nodes_affinity_policy do have a default value in magnum.conf so I'm starting
to think this might be an issue with the magnum-ui dashboard?

Best regards
Tobias

On 08/04/2018 06:24 PM, Joe Topjian wrote:
We recently deployed Magnum and I've been making my way through getting both Swarm and Kubernetes running. I also ran into some initial issues. These notes may or may not help, but thought I'd share them in case:

* We're using Barbican for SSL. I have not tried with the internal x509keypair.

* I was only able to get things running with Fedora Atomic 27, specifically the version used in the Magnum docs: https://docs.openstack.org/magnum/latest/install/launch-instance.html

Anything beyond that wouldn't even boot in my cloud. I haven't dug into this.

* Kubernetes requires a Cluster Template to have a label of cert_manager_api=true set in order for the cluster to fully come up (at least, it didn't work for me until I set this).

As far as troubleshooting methods go, check the cloud-init logs on the individual instances to see if any of the "parts" have failed to run. Manually re-run the parts on the command-line to get a better idea of why they failed. Review the actual script, figure out the variable interpolation and how it relates to the Cluster Template being used.

Eventually I was able to get clusters running with the stock driver/templates, but wanted to tune them in order to better fit in our cloud, so I've "forked" them. This is in no way a slight against the existing drivers/templates nor do I recommend doing this until you reach a point where the stock drivers won't meet your needs. But I mention it because it's possible to do and it's not terribly hard. This is still a work-in-progress and a bit hacky:

https://github.com/cybera/magnum-templates

Hope that helps,
Joe

On Fri, Aug 3, 2018 at 6:46 AM, Tobias Urdin <tobias.ur...@binero.se <mailto:tobias.ur...@binero.se>> wrote:

    Hello,

    I'm testing around with Magnum and have so far only had issues.
    I've tried deploying Docker Swarm (on Fedora Atomic 27, Fedora
    Atomic 28) and Kubernetes (on Fedora Atomic 27) and haven't been
    able to get it working.

    Running Queens, is there any information about supported images?
    Is Magnum maintained to support Fedora Atomic still?
    What is in charge of population the certificates inside the
    instances, because this seems to be the root of all issues, I'm
    not using Barbican but the x509keypair driver
    is that the reason?

    Perhaps I missed some documentation that x509keypair does not
    support what I'm trying to do?

    I've seen the following issues:

    Docker:
    * Master does not start and listen on TCP because of certificate
    issues
    dockerd-current[1909]: Could not load X509 key pair (cert:
    "/etc/docker/server.crt", key: "/etc/docker/server.key")

    * Node does not start with:
    Dependency failed for Docker Application Container Engine.
    docker.service: Job docker.service/start failed with result
    'dependency'.

    Kubernetes:
    * Master etcd does not start because /run/etcd does not exist
    ** When that is created it fails to start because of certificate
    2018-08-03 12:41:16.554257 C | etcdmain: open
    /etc/etcd/certs/server.crt: no such file or directory

    * Master kube-apiserver does not start because of certificate
    unable to load server certificate: open
    /etc/kubernetes/certs/server.crt: no such file or directory

    * Master heat script just sleeps forever waiting for port 8080 to
    become available (kube-apiserver) so it can never kubectl apply
    the final steps.

    * Node does not even start and times out when Heat deploys it,
    probably because master never finishes

    Any help is appreciated perhaps I've missed something crucial,
    I've not tested Kubernetes on CoreOS yet.

    Best regards
    Tobias

    __________________________________________________________________________
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe:
    openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
    <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>



__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to