Re: Postgres WAS Atomic Centos, can't upgrade
On 07/18/2016 07:07 PM, Philippe Lafoucrière wrote: > We're using postgres 9.5. > It was working fine before the upgrade. Unfortunately, we upgraded > atomic AND Openshift at the same time, so I can't tell if it's a problem > with docker 1.10 or openshift 1.2.1. > I'd tend to say Docker 1.10, but we need to isolate this first. Is this the official Postgres docker hub image, or something else? I'm asking because you shouldn't be getting that particular error; it's something we resolved a while ago (in the Postgres project). -- -- Josh Berkus Project Atomic Red Hat OSAS ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
We're using postgres 9.5. It was working fine before the upgrade. Unfortunately, we upgraded atomic AND Openshift at the same time, so I can't tell if it's a problem with docker 1.10 or openshift 1.2.1. I'd tend to say Docker 1.10, but we need to isolate this first. Thanks, Philippe ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Sorry, wrong thread, disregard my reply. On Mon, Jul 18, 2016 at 12:59 PM, Scott Dodsonwrote: > We've worked around this change by checking for those booleans and > setting each if they exist. > https://github.com/openshift/openshift-ansible/pull/2166 > > On Mon, Jul 18, 2016 at 12:56 PM, Josh Berkus wrote: >> On 07/15/2016 08:59 AM, Philippe Lafoucrière wrote: >>> We're having a potential issue. One postgresql service is not starting >>> on the beta cluster: >>> >>> FATAL: could not open shared memory segment "/PostgreSQL.1804289383": >>> Permission denied >>> >>> We need to investigate that, but it could be related to docker mounts >>> (especially /dev/shm) >> >> This is due to a change in Docker, lemme see if I can find docs on it. >> I know that Docker added constraints on dynamic shared memory at some point. >> >> >> -- >> -- >> Josh Berkus >> Project Atomic >> Red Hat OSAS ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
On 07/18/2016 09:59 AM, Scott Dodson wrote: > We've worked around this change by checking for those booleans and > setting each if they exist. > https://github.com/openshift/openshift-ansible/pull/2166 What version of Postgres is in this container? -- -- Josh Berkus Project Atomic Red Hat OSAS ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
We've worked around this change by checking for those booleans and setting each if they exist. https://github.com/openshift/openshift-ansible/pull/2166 On Mon, Jul 18, 2016 at 12:56 PM, Josh Berkuswrote: > On 07/15/2016 08:59 AM, Philippe Lafoucrière wrote: >> We're having a potential issue. One postgresql service is not starting >> on the beta cluster: >> >> FATAL: could not open shared memory segment "/PostgreSQL.1804289383": >> Permission denied >> >> We need to investigate that, but it could be related to docker mounts >> (especially /dev/shm) > > This is due to a change in Docker, lemme see if I can find docs on it. > I know that Docker added constraints on dynamic shared memory at some point. > > > -- > -- > Josh Berkus > Project Atomic > Red Hat OSAS ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
On Mon, Jul 11, 2016, at 09:56 AM, Scott Dodson wrote: > That commit is mostly related to the fact that we cannot > upgrade/downgrade docker on atomic host like can on RHEL so abort the > docker upgrade playbook early. For short term fixes, it is however possible to use `atomic host deploy` to reset to an earlier known version. But it's not a long term solution because that also means one isn't getting kernel security updates and such. We're working on new mechanisms addressing the privileged/system container case. ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
https://docs.openshift.org/latest/dev_guide/shared_memory.html fixed the issue, but It seems something changed regarding /dev or shm docker mounts between 1.2.0 and 1.2.1. Can someone confirm? ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
I confirm: it's fixed :) thanks! ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
We pulled that into v1.2.1 along with the security update. Can you give that a try? On Thu, Jul 14, 2016 at 11:11 AM, Philippe Lafoucrièrewrote: > > On Tue, Jul 12, 2016 at 5:22 PM, Scott Dodson wrote: >> >> I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but >> you can also rebuild the node image placing the docker wrapper script >> in /usr/local/bin > > > Any news on this? > Thanks > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
On Tue, Jul 12, 2016 at 5:22 PM, Scott Dodsonwrote: > I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but > you can also rebuild the node image placing the docker wrapper script > in /usr/local/bin > Any news on this? Thanks ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Testing it right away. Thanks guys :) ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
https://github.com/openshift/origin/pull/9046 is the real fix for this. I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but you can also rebuild the node image placing the docker wrapper script in /usr/local/bin On Tue, Jul 12, 2016 at 4:36 PM, Philippe Lafoucrièrewrote: > Good catch Scott: > > > [plafoucriere@atomic-test-node-1 origin]# docker info > /usr/bin/docker-current: error while loading shared libraries: > libseccomp.so.2: cannot open shared object file: No such file or directory ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Good catch Scott: [plafoucriere@atomic-test-node-1 origin]# docker info /usr/bin/docker-current: error while loading shared libraries: libseccomp.so.2: cannot open shared object file: No such file or directory ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Maybe this is another bug? Can you exec into your node container and try to run `docker info` and see what errors it yields? On Tue, Jul 12, 2016 at 2:13 PM, Philippe Lafoucrièrewrote: > > On Tue, Jul 12, 2016 at 10:27 AM, Scott Dodson wrote: >> >> Lets say openshift-ansible for now. > > > ok thanks > >> >> I suspect that adding `-v /etc/sysconfig/docker:/etc/sysconfig/docker` >> to ExecStart in /etc/systemd/system/origin-node.service will fix this, >> also verify that you've got `-v >> /usr/bin/docker-current:/usr/bin/docker-current` too but the current >> installer should take care of that. After you've added that `systemctl >> daemon-reload && systemctl restart origin-node` > > > I think that's exactly what > https://github.com/openshift/openshift-ansible/pull/2037 is supposed to do, > and while it was allowing the previous atomic update, it's not working this > time :( > > It's already there on our nodes: > > -bash-4.2# cat /etc/sysconfig/origin-node-dep > DOCKER_ADDTL_BIND_MOUNTS=--volume=/usr/bin/docker-current:/usr/bin/docker-current:ro > --volume=/etc/sysconfig/docker:/etc/sysconfig/docker:ro > > >> >> This is actually fixed in the v1.3 images because docker runs chroot >> /rootfs. > > > yes, but 1.3 is still Alpha, we can't install it on our production clusters. > > Thanks, > Philippe ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Lets say openshift-ansible for now. I suspect that adding `-v /etc/sysconfig/docker:/etc/sysconfig/docker` to ExecStart in /etc/systemd/system/origin-node.service will fix this, also verify that you've got `-v /usr/bin/docker-current:/usr/bin/docker-current` too but the current installer should take care of that. After you've added that `systemctl daemon-reload && systemctl restart origin-node` This is actually fixed in the v1.3 images because docker runs chroot /rootfs. On Mon, Jul 11, 2016 at 1:47 PM, Philippe Lafoucrièrewrote: > > On Mon, Jul 11, 2016 at 9:56 AM, Scott Dodson wrote: >> >> That commit is mostly related to the fact that we cannot >> upgrade/downgrade docker on atomic host like can on RHEL so abort the >> docker upgrade playbook early. > > > Ok, I get it now, thanks. > > Anyway, we couldn't fix our beta cluster, and had to restore snapshots, as > nothing was deploying anymore ("Failed to setup network for pod [...]"). > Even with the latest version of the playbook :( > Should I open an issue in openshift, or openshift-ansible project for that? > > Thanks > Philippe > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
On Mon, Jul 11, 2016 at 9:56 AM, Scott Dodsonwrote: > That commit is mostly related to the fact that we cannot > upgrade/downgrade docker on atomic host like can on RHEL so abort the > docker upgrade playbook early. > Ok, I get it now, thanks. Anyway, we couldn't fix our beta cluster, and had to restore snapshots, as nothing was deploying anymore ("Failed to setup network for pod [...]"). Even with the latest version of the playbook :( Should I open an issue in openshift, or openshift-ansible project for that? Thanks Philippe ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
That commit is mostly related to the fact that we cannot upgrade/downgrade docker on atomic host like can on RHEL so abort the docker upgrade playbook early. On Sun, Jul 10, 2016 at 2:45 PM, Philippe Lafoucrièrewrote: > Sounds like docker 1.10 is a bad idea, I found this commit: > > https://github.com/openshift/openshift-ansible/commit/b377f9d85df11c532281c213eda1869596642204 > > I was probably running openshift-ansible with a wrong tag :( > > ___ > users mailing list > users@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Sounds like docker 1.10 is a bad idea, I found this commit: https://github.com/openshift/openshift-ansible/commit/b377f9d85df11c532281c213eda1869596642204 I was probably running openshift-ansible with a wrong tag :( ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
We have updated our beta cluster to latest atomic centos: -bash-4.2# atomic host status TIMESTAMP (UTC) VERSIONID OSNAME REFSPEC * 2016-07-07 21:23:41 7.20160707 cd47a72eb5 centos-atomic-host centos-atomic-host:centos-atomic-host/7/x86_64/standard 2016-06-10 13:15:00 7.20160610 3c3786d1dd centos-atomic-host centos-atomic-host:centos-atomic-host/7/x86_64/standard GPG: Found 1 signature on the booted deployment (*): Signature made Thu Jul 7 23:34:40 2016 using RSA key ID F17E745691BA8335 Good signature from "CentOS Atomic SIG" And the problem re-appeared: Jul 10 01:40:08 atomic-test-node-1 origin-node[3150]: I0710 01:40:08.000177 3201 manager.go:1400] Container "0cf256d23de1b837a295233491e6650c90519fa2d0807d37f95a8164a842257b gemnasium-enterprise/gemnasium-enterprise-7-8unp4" exited after 121.527557ms Jul 10 01:40:08 atomic-test-node-1.priv.tech-angels.net origin-node[3150]: E0710 01:40:08.0002433201 pod_workers.go:138] Error syncing pod 6e4dd3f7-462d-11e6-89a2-005056b17dcc, skipping: failed to "SetupNetwork" for "gemnasium-enterprise-7-8unp4_gemnasium-enterprise" with SetupNetworkError: "Failed to setup network for pod \"gemnasium-enterprise-7-8unp4_gemnasium-enterprise(6e4dd3f7-462d-11e6-89a2-005056b17dcc)\" using network plugins \"redhat/openshift-ovs-multitenant\": exit status 1; Skipping pod" Running the playbook doesn't seem to fix the problem this time. I've seen docker has been updated to 1.10, could it be an issue? Thanks Philippe ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Thanks Tobias for the detailed help! I should have thought of running again ansible, I was focused on the error. ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Atomic Centos, can't upgrade
Hi, that's a known problem with known fix, but maybe some publicity around it might be good. > We have tried to update our atomic host centos 7, with the > tree 3c3786d1dd (from the tree e39c28570a), but deployments are all > failing after the updates on the nodes: > > Error syncing pod, skipping: failed to "SetupNetwork" for "some_deploy" > with SetupNetworkError: "Failed to setup network for pod > \"some_deploy(d080f8d4-3498-11e6-8512-005056b1755a)\" using network > plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod" It's a problem that is fixed by either * reprovisioning via openshift-ansible, or * upgrading to openshift v1.3 (alpha or latest). The problem is, that due to the docker-current/docker-latest diversion, /usr/bin/docker is a shell script that needs /etc/sysconfig/docker, but that is not mounted by the old origin-node systemd unit. See https://github.com/openshift/openshift-ansible/pull/2037 for the fix for origin pre 1.3. Origin v1.3 will run docker chrooted to the host fs, so the problem does not manifest itself. Chdeers, Tobias Flore ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Atomic Centos, can't upgrade
Hi, We have tried to update our atomic host centos 7, with the tree 3c3786d1dd (from the tree e39c28570a), but deployments are all failing after the updates on the nodes: Error syncing pod, skipping: failed to "SetupNetwork" for "some_deploy" with SetupNetworkError: "Failed to setup network for pod \"some_deploy(d080f8d4-3498-11e6-8512-005056b1755a)\" using network plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod" Where can I fill a report for that? Openshift or Atomic host bugtracker (whereever it is)? Thanks ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users