Re: Atomic Centos, can't upgrade

2016-07-18 Thread Philippe Lafoucrière
We're using postgres 9.5.
It was working fine before the upgrade. Unfortunately, we upgraded atomic
AND Openshift at the same time, so I can't tell if it's a problem with
docker 1.10 or openshift 1.2.1.
I'd tend to say Docker 1.10, but we need to isolate this first.

Thanks,
Philippe
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-18 Thread Scott Dodson
Sorry, wrong thread, disregard my reply.

On Mon, Jul 18, 2016 at 12:59 PM, Scott Dodson  wrote:
> We've worked around this change by checking for those booleans and
> setting each if they exist.
> https://github.com/openshift/openshift-ansible/pull/2166
>
> On Mon, Jul 18, 2016 at 12:56 PM, Josh Berkus  wrote:
>> On 07/15/2016 08:59 AM, Philippe Lafoucrière wrote:
>>> We're having a potential issue. One postgresql service is not starting
>>> on the beta cluster:
>>>
>>> FATAL:  could not open shared memory segment "/PostgreSQL.1804289383":
>>> Permission denied
>>>
>>> We need to investigate that, but it could be related to docker mounts
>>> (especially /dev/shm)
>>
>> This is due to a change in Docker, lemme see if I can find docs on it.
>> I know that Docker added constraints on dynamic shared memory at some point.
>>
>>
>> --
>> --
>> Josh Berkus
>> Project Atomic
>> Red Hat OSAS

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-18 Thread Josh Berkus
On 07/18/2016 09:59 AM, Scott Dodson wrote:
> We've worked around this change by checking for those booleans and
> setting each if they exist.
> https://github.com/openshift/openshift-ansible/pull/2166

What version of Postgres is in this container?

-- 
--
Josh Berkus
Project Atomic
Red Hat OSAS

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-18 Thread Scott Dodson
We've worked around this change by checking for those booleans and
setting each if they exist.
https://github.com/openshift/openshift-ansible/pull/2166

On Mon, Jul 18, 2016 at 12:56 PM, Josh Berkus  wrote:
> On 07/15/2016 08:59 AM, Philippe Lafoucrière wrote:
>> We're having a potential issue. One postgresql service is not starting
>> on the beta cluster:
>>
>> FATAL:  could not open shared memory segment "/PostgreSQL.1804289383":
>> Permission denied
>>
>> We need to investigate that, but it could be related to docker mounts
>> (especially /dev/shm)
>
> This is due to a change in Docker, lemme see if I can find docs on it.
> I know that Docker added constraints on dynamic shared memory at some point.
>
>
> --
> --
> Josh Berkus
> Project Atomic
> Red Hat OSAS

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-15 Thread Colin Walters
On Mon, Jul 11, 2016, at 09:56 AM, Scott Dodson wrote:
> That commit is mostly related to the fact that we cannot
> upgrade/downgrade docker on atomic host like can on RHEL so abort the
> docker upgrade playbook early.

For short term fixes, it is however possible to use `atomic host deploy` to 
reset to
an earlier known version.  But it's not a long term solution because that also
means one isn't getting kernel security updates and such.

We're working on new mechanisms addressing the privileged/system container case.

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-15 Thread Philippe Lafoucrière
https://docs.openshift.org/latest/dev_guide/shared_memory.html

fixed the issue, but It seems something changed regarding /dev or shm
docker mounts between 1.2.0 and 1.2.1.
Can someone confirm?
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-15 Thread Philippe Lafoucrière
I confirm: it's fixed :)​
thanks!
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-14 Thread Scott Dodson
We pulled that into v1.2.1 along with the security update. Can you
give that a try?

On Thu, Jul 14, 2016 at 11:11 AM, Philippe Lafoucrière
 wrote:
>
> On Tue, Jul 12, 2016 at 5:22 PM, Scott Dodson  wrote:
>>
>> I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but
>> you can also rebuild the node image placing the docker wrapper script
>> in /usr/local/bin
>
>
> Any news on this?
> Thanks
>

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-14 Thread Philippe Lafoucrière
On Tue, Jul 12, 2016 at 5:22 PM, Scott Dodson  wrote:

> I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but
> you can also rebuild the node image placing the docker wrapper script
> in /usr/local/bin
>

Any news on this?
Thanks
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-12 Thread Philippe Lafoucrière
Testing it right away.
Thanks guys :)
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-12 Thread Scott Dodson
https://github.com/openshift/origin/pull/9046 is the real fix for this.

I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but
you can also rebuild the node image placing the docker wrapper script
in /usr/local/bin

On Tue, Jul 12, 2016 at 4:36 PM, Philippe Lafoucrière
 wrote:
> Good catch Scott:
>
>
> [plafoucriere@atomic-test-node-1 origin]# docker info
> /usr/bin/docker-current: error while loading shared libraries:
> libseccomp.so.2: cannot open shared object file: No such file or directory

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-12 Thread Philippe Lafoucrière
Good catch Scott:


[plafoucriere@atomic-test-node-1 origin]# docker info
/usr/bin/docker-current: error while loading shared libraries:
libseccomp.so.2: cannot open shared object file: No such file or directory
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-12 Thread Scott Dodson
Maybe this is another bug? Can you exec into your node container and
try to run `docker info` and see what errors it yields?

On Tue, Jul 12, 2016 at 2:13 PM, Philippe Lafoucrière
 wrote:
>
> On Tue, Jul 12, 2016 at 10:27 AM, Scott Dodson  wrote:
>>
>> Lets say openshift-ansible for now.
>
>
> ok thanks
>
>>
>> I suspect that adding `-v /etc/sysconfig/docker:/etc/sysconfig/docker`
>> to ExecStart in /etc/systemd/system/origin-node.service will fix this,
>> also verify that you've got `-v
>> /usr/bin/docker-current:/usr/bin/docker-current` too but the current
>> installer should take care of that. After you've added that `systemctl
>> daemon-reload && systemctl restart origin-node`
>
>
> I think that's exactly what
> https://github.com/openshift/openshift-ansible/pull/2037 is supposed to do,
> and while it was allowing the previous atomic update, it's not working this
> time :(
>
> It's already there on our nodes:
>
>  -bash-4.2# cat /etc/sysconfig/origin-node-dep
> DOCKER_ADDTL_BIND_MOUNTS=--volume=/usr/bin/docker-current:/usr/bin/docker-current:ro
> --volume=/etc/sysconfig/docker:/etc/sysconfig/docker:ro
>
>
>>
>> This is actually fixed in the v1.3 images because docker runs chroot
>> /rootfs.
>
>
> yes, but 1.3 is still Alpha, we can't install it on our production clusters.
>
> Thanks,
> Philippe

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-12 Thread Scott Dodson
Lets say openshift-ansible for now.

I suspect that adding `-v /etc/sysconfig/docker:/etc/sysconfig/docker`
to ExecStart in /etc/systemd/system/origin-node.service will fix this,
also verify that you've got `-v
/usr/bin/docker-current:/usr/bin/docker-current` too but the current
installer should take care of that. After you've added that `systemctl
daemon-reload && systemctl restart origin-node`

This is actually fixed in the v1.3 images because docker runs chroot /rootfs.

On Mon, Jul 11, 2016 at 1:47 PM, Philippe Lafoucrière
 wrote:
>
> On Mon, Jul 11, 2016 at 9:56 AM, Scott Dodson  wrote:
>>
>> That commit is mostly related to the fact that we cannot
>> upgrade/downgrade docker on atomic host like can on RHEL so abort the
>> docker upgrade playbook early.
>
>
> Ok, I get it now, thanks.
>
> Anyway, we couldn't fix our beta cluster, and had to restore snapshots, as
> nothing was deploying anymore ("Failed to setup network for pod [...]").
> Even with the latest version of the playbook :(
> Should I open an issue in openshift, or openshift-ansible project for that?
>
> Thanks
> Philippe
>

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-11 Thread Philippe Lafoucrière
On Mon, Jul 11, 2016 at 9:56 AM, Scott Dodson  wrote:

> That commit is mostly related to the fact that we cannot
> upgrade/downgrade docker on atomic host like can on RHEL so abort the
> docker upgrade playbook early.
>

Ok, I get it now, thanks.

Anyway, we couldn't fix our beta cluster, and had to restore snapshots, as
nothing was deploying anymore ("Failed to setup network for pod [...]").
Even with the latest version of the playbook :(
Should I open an issue in openshift, or openshift-ansible project for that?

Thanks
Philippe
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-11 Thread Scott Dodson
That commit is mostly related to the fact that we cannot
upgrade/downgrade docker on atomic host like can on RHEL so abort the
docker upgrade playbook early.

On Sun, Jul 10, 2016 at 2:45 PM, Philippe Lafoucrière
 wrote:
> Sounds like docker 1.10 is a bad idea, I found this commit:
>
> https://github.com/openshift/openshift-ansible/commit/b377f9d85df11c532281c213eda1869596642204
>
> I was probably running openshift-ansible with a wrong tag :(
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-10 Thread Philippe Lafoucrière
Sounds like docker 1.10 is a bad idea, I found this commit:

https://github.com/openshift/openshift-ansible/commit/b377f9d85df11c532281c213eda1869596642204

I was probably running openshift-ansible with a wrong tag :(
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-09 Thread Philippe Lafoucrière
We have updated our beta cluster to latest atomic centos:

-bash-4.2# atomic host status
  TIMESTAMP (UTC) VERSIONID OSNAME
REFSPEC
* 2016-07-07 21:23:41 7.20160707 cd47a72eb5 centos-atomic-host
centos-atomic-host:centos-atomic-host/7/x86_64/standard
  2016-06-10 13:15:00 7.20160610 3c3786d1dd centos-atomic-host
centos-atomic-host:centos-atomic-host/7/x86_64/standard

GPG: Found 1 signature on the booted deployment (*):

  Signature made Thu Jul  7 23:34:40 2016 using RSA key ID F17E745691BA8335
  Good signature from "CentOS Atomic SIG "


And the problem re-appeared:

Jul 10 01:40:08 atomic-test-node-1 origin-node[3150]: I0710 01:40:08.000177
   3201 manager.go:1400] Container
"0cf256d23de1b837a295233491e6650c90519fa2d0807d37f95a8164a842257b
gemnasium-enterprise/gemnasium-enterprise-7-8unp4" exited after 121.527557ms
Jul 10 01:40:08 atomic-test-node-1.priv.tech-angels.net origin-node[3150]:
E0710 01:40:08.0002433201 pod_workers.go:138] Error syncing pod
6e4dd3f7-462d-11e6-89a2-005056b17dcc, skipping: failed to "SetupNetwork"
for "gemnasium-enterprise-7-8unp4_gemnasium-enterprise" with
SetupNetworkError: "Failed to setup network for pod
\"gemnasium-enterprise-7-8unp4_gemnasium-enterprise(6e4dd3f7-462d-11e6-89a2-005056b17dcc)\"
using network plugins \"redhat/openshift-ovs-multitenant\": exit status 1;
Skipping pod"


Running the playbook doesn't seem to fix the problem this time. I've seen
docker has been updated to 1.10, could it be an issue?
​
Thanks
Philippe
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-06-17 Thread Philippe Lafoucrière
Thanks Tobias for the detailed help!
I should have thought of running again ansible, I was focused on the error.


​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-06-17 Thread Tobias Florek
Hi,

that's a known problem with known fix, but maybe some publicity around
it might be good.


> We have tried to update our atomic host centos 7, with the
> tree 3c3786d1dd (from the tree e39c28570a), but deployments are all
> failing after the updates on the nodes:
> 
> Error syncing pod, skipping: failed to "SetupNetwork" for "some_deploy"
> with SetupNetworkError: "Failed to setup network for pod
> \"some_deploy(d080f8d4-3498-11e6-8512-005056b1755a)\" using network
> plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod"


It's a problem that is fixed by either
 * reprovisioning via openshift-ansible, or
 * upgrading to openshift v1.3 (alpha or latest).

The problem is, that due to the docker-current/docker-latest diversion,
/usr/bin/docker is a shell script that needs /etc/sysconfig/docker, but
that is not mounted by the old origin-node systemd unit.

See
 https://github.com/openshift/openshift-ansible/pull/2037
for the fix for origin pre 1.3. Origin v1.3 will run docker chrooted to
the host fs, so the problem does not manifest itself.

Chdeers,
 Tobias Flore

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users