Re: Atomic Centos, can't upgrade
We're using postgres 9.5. It was working fine before the upgrade. Unfortunately, we upgraded atomic AND Openshift at the same time, so I can't tell if it's a problem with docker 1.10 or openshift 1.2.1. I'd tend to say Docker 1.10, but we need to isolate this first. Thanks, Philippe ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Sorry, wrong thread, disregard my reply. On Mon, Jul 18, 2016 at 12:59 PM, Scott Dodsonwrote: > We've worked around this change by checking for those booleans and > setting each if they exist. > https://github.com/openshift/openshift-ansible/pull/2166 > > On Mon, Jul 18, 2016 at 12:56 PM, Josh Berkus wrote: >> On 07/15/2016 08:59 AM, Philippe Lafoucrière wrote: >>> We're having a potential issue. One postgresql service is not starting >>> on the beta cluster: >>> >>> FATAL: could not open shared memory segment "/PostgreSQL.1804289383": >>> Permission denied >>> >>> We need to investigate that, but it could be related to docker mounts >>> (especially /dev/shm) >> >> This is due to a change in Docker, lemme see if I can find docs on it. >> I know that Docker added constraints on dynamic shared memory at some point. >> >> >> -- >> -- >> Josh Berkus >> Project Atomic >> Red Hat OSAS ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
On 07/18/2016 09:59 AM, Scott Dodson wrote: > We've worked around this change by checking for those booleans and > setting each if they exist. > https://github.com/openshift/openshift-ansible/pull/2166 What version of Postgres is in this container? -- -- Josh Berkus Project Atomic Red Hat OSAS ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
We've worked around this change by checking for those booleans and setting each if they exist. https://github.com/openshift/openshift-ansible/pull/2166 On Mon, Jul 18, 2016 at 12:56 PM, Josh Berkuswrote: > On 07/15/2016 08:59 AM, Philippe Lafoucrière wrote: >> We're having a potential issue. One postgresql service is not starting >> on the beta cluster: >> >> FATAL: could not open shared memory segment "/PostgreSQL.1804289383": >> Permission denied >> >> We need to investigate that, but it could be related to docker mounts >> (especially /dev/shm) > > This is due to a change in Docker, lemme see if I can find docs on it. > I know that Docker added constraints on dynamic shared memory at some point. > > > -- > -- > Josh Berkus > Project Atomic > Red Hat OSAS ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
On Mon, Jul 11, 2016, at 09:56 AM, Scott Dodson wrote: > That commit is mostly related to the fact that we cannot > upgrade/downgrade docker on atomic host like can on RHEL so abort the > docker upgrade playbook early. For short term fixes, it is however possible to use `atomic host deploy` to reset to an earlier known version. But it's not a long term solution because that also means one isn't getting kernel security updates and such. We're working on new mechanisms addressing the privileged/system container case. ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
https://docs.openshift.org/latest/dev_guide/shared_memory.html fixed the issue, but It seems something changed regarding /dev or shm docker mounts between 1.2.0 and 1.2.1. Can someone confirm? ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
I confirm: it's fixed :) thanks! ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
We pulled that into v1.2.1 along with the security update. Can you give that a try? On Thu, Jul 14, 2016 at 11:11 AM, Philippe Lafoucrièrewrote: > > On Tue, Jul 12, 2016 at 5:22 PM, Scott Dodson wrote: >> >> I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but >> you can also rebuild the node image placing the docker wrapper script >> in /usr/local/bin > > > Any news on this? > Thanks > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
On Tue, Jul 12, 2016 at 5:22 PM, Scott Dodsonwrote: > I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but > you can also rebuild the node image placing the docker wrapper script > in /usr/local/bin > Any news on this? Thanks ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Testing it right away. Thanks guys :) ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
https://github.com/openshift/origin/pull/9046 is the real fix for this. I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but you can also rebuild the node image placing the docker wrapper script in /usr/local/bin On Tue, Jul 12, 2016 at 4:36 PM, Philippe Lafoucrièrewrote: > Good catch Scott: > > > [plafoucriere@atomic-test-node-1 origin]# docker info > /usr/bin/docker-current: error while loading shared libraries: > libseccomp.so.2: cannot open shared object file: No such file or directory ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Good catch Scott: [plafoucriere@atomic-test-node-1 origin]# docker info /usr/bin/docker-current: error while loading shared libraries: libseccomp.so.2: cannot open shared object file: No such file or directory ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Maybe this is another bug? Can you exec into your node container and try to run `docker info` and see what errors it yields? On Tue, Jul 12, 2016 at 2:13 PM, Philippe Lafoucrièrewrote: > > On Tue, Jul 12, 2016 at 10:27 AM, Scott Dodson wrote: >> >> Lets say openshift-ansible for now. > > > ok thanks > >> >> I suspect that adding `-v /etc/sysconfig/docker:/etc/sysconfig/docker` >> to ExecStart in /etc/systemd/system/origin-node.service will fix this, >> also verify that you've got `-v >> /usr/bin/docker-current:/usr/bin/docker-current` too but the current >> installer should take care of that. After you've added that `systemctl >> daemon-reload && systemctl restart origin-node` > > > I think that's exactly what > https://github.com/openshift/openshift-ansible/pull/2037 is supposed to do, > and while it was allowing the previous atomic update, it's not working this > time :( > > It's already there on our nodes: > > -bash-4.2# cat /etc/sysconfig/origin-node-dep > DOCKER_ADDTL_BIND_MOUNTS=--volume=/usr/bin/docker-current:/usr/bin/docker-current:ro > --volume=/etc/sysconfig/docker:/etc/sysconfig/docker:ro > > >> >> This is actually fixed in the v1.3 images because docker runs chroot >> /rootfs. > > > yes, but 1.3 is still Alpha, we can't install it on our production clusters. > > Thanks, > Philippe ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Lets say openshift-ansible for now. I suspect that adding `-v /etc/sysconfig/docker:/etc/sysconfig/docker` to ExecStart in /etc/systemd/system/origin-node.service will fix this, also verify that you've got `-v /usr/bin/docker-current:/usr/bin/docker-current` too but the current installer should take care of that. After you've added that `systemctl daemon-reload && systemctl restart origin-node` This is actually fixed in the v1.3 images because docker runs chroot /rootfs. On Mon, Jul 11, 2016 at 1:47 PM, Philippe Lafoucrièrewrote: > > On Mon, Jul 11, 2016 at 9:56 AM, Scott Dodson wrote: >> >> That commit is mostly related to the fact that we cannot >> upgrade/downgrade docker on atomic host like can on RHEL so abort the >> docker upgrade playbook early. > > > Ok, I get it now, thanks. > > Anyway, we couldn't fix our beta cluster, and had to restore snapshots, as > nothing was deploying anymore ("Failed to setup network for pod [...]"). > Even with the latest version of the playbook :( > Should I open an issue in openshift, or openshift-ansible project for that? > > Thanks > Philippe > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
On Mon, Jul 11, 2016 at 9:56 AM, Scott Dodsonwrote: > That commit is mostly related to the fact that we cannot > upgrade/downgrade docker on atomic host like can on RHEL so abort the > docker upgrade playbook early. > Ok, I get it now, thanks. Anyway, we couldn't fix our beta cluster, and had to restore snapshots, as nothing was deploying anymore ("Failed to setup network for pod [...]"). Even with the latest version of the playbook :( Should I open an issue in openshift, or openshift-ansible project for that? Thanks Philippe ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
That commit is mostly related to the fact that we cannot upgrade/downgrade docker on atomic host like can on RHEL so abort the docker upgrade playbook early. On Sun, Jul 10, 2016 at 2:45 PM, Philippe Lafoucrièrewrote: > Sounds like docker 1.10 is a bad idea, I found this commit: > > https://github.com/openshift/openshift-ansible/commit/b377f9d85df11c532281c213eda1869596642204 > > I was probably running openshift-ansible with a wrong tag :( > > ___ > users mailing list > users@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Sounds like docker 1.10 is a bad idea, I found this commit: https://github.com/openshift/openshift-ansible/commit/b377f9d85df11c532281c213eda1869596642204 I was probably running openshift-ansible with a wrong tag :( ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
We have updated our beta cluster to latest atomic centos: -bash-4.2# atomic host status TIMESTAMP (UTC) VERSIONID OSNAME REFSPEC * 2016-07-07 21:23:41 7.20160707 cd47a72eb5 centos-atomic-host centos-atomic-host:centos-atomic-host/7/x86_64/standard 2016-06-10 13:15:00 7.20160610 3c3786d1dd centos-atomic-host centos-atomic-host:centos-atomic-host/7/x86_64/standard GPG: Found 1 signature on the booted deployment (*): Signature made Thu Jul 7 23:34:40 2016 using RSA key ID F17E745691BA8335 Good signature from "CentOS Atomic SIG" And the problem re-appeared: Jul 10 01:40:08 atomic-test-node-1 origin-node[3150]: I0710 01:40:08.000177 3201 manager.go:1400] Container "0cf256d23de1b837a295233491e6650c90519fa2d0807d37f95a8164a842257b gemnasium-enterprise/gemnasium-enterprise-7-8unp4" exited after 121.527557ms Jul 10 01:40:08 atomic-test-node-1.priv.tech-angels.net origin-node[3150]: E0710 01:40:08.0002433201 pod_workers.go:138] Error syncing pod 6e4dd3f7-462d-11e6-89a2-005056b17dcc, skipping: failed to "SetupNetwork" for "gemnasium-enterprise-7-8unp4_gemnasium-enterprise" with SetupNetworkError: "Failed to setup network for pod \"gemnasium-enterprise-7-8unp4_gemnasium-enterprise(6e4dd3f7-462d-11e6-89a2-005056b17dcc)\" using network plugins \"redhat/openshift-ovs-multitenant\": exit status 1; Skipping pod" Running the playbook doesn't seem to fix the problem this time. I've seen docker has been updated to 1.10, could it be an issue? Thanks Philippe ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Thanks Tobias for the detailed help! I should have thought of running again ansible, I was focused on the error. ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Hi, that's a known problem with known fix, but maybe some publicity around it might be good. > We have tried to update our atomic host centos 7, with the > tree 3c3786d1dd (from the tree e39c28570a), but deployments are all > failing after the updates on the nodes: > > Error syncing pod, skipping: failed to "SetupNetwork" for "some_deploy" > with SetupNetworkError: "Failed to setup network for pod > \"some_deploy(d080f8d4-3498-11e6-8512-005056b1755a)\" using network > plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod" It's a problem that is fixed by either * reprovisioning via openshift-ansible, or * upgrading to openshift v1.3 (alpha or latest). The problem is, that due to the docker-current/docker-latest diversion, /usr/bin/docker is a shell script that needs /etc/sysconfig/docker, but that is not mounted by the old origin-node systemd unit. See https://github.com/openshift/openshift-ansible/pull/2037 for the fix for origin pre 1.3. Origin v1.3 will run docker chrooted to the host fs, so the problem does not manifest itself. Chdeers, Tobias Flore ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users