Re: Postgres WAS Atomic Centos, can't upgrade

2016-07-19 Thread Josh Berkus
On 07/18/2016 07:07 PM, Philippe Lafoucrière wrote:
> We're using postgres 9.5.
> It was working fine before the upgrade. Unfortunately, we upgraded
> atomic AND Openshift at the same time, so I can't tell if it's a problem
> with docker 1.10 or openshift 1.2.1.
> I'd tend to say Docker 1.10, but we need to isolate this first.

Is this the official Postgres docker hub image, or something else?

I'm asking because you shouldn't be getting that particular error; it's
something we resolved a while ago (in the Postgres project).

-- 
--
Josh Berkus
Project Atomic
Red Hat OSAS

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-18 Thread Philippe Lafoucrière
We're using postgres 9.5.
It was working fine before the upgrade. Unfortunately, we upgraded atomic
AND Openshift at the same time, so I can't tell if it's a problem with
docker 1.10 or openshift 1.2.1.
I'd tend to say Docker 1.10, but we need to isolate this first.

Thanks,
Philippe
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-18 Thread Scott Dodson
Sorry, wrong thread, disregard my reply.

On Mon, Jul 18, 2016 at 12:59 PM, Scott Dodson  wrote:
> We've worked around this change by checking for those booleans and
> setting each if they exist.
> https://github.com/openshift/openshift-ansible/pull/2166
>
> On Mon, Jul 18, 2016 at 12:56 PM, Josh Berkus  wrote:
>> On 07/15/2016 08:59 AM, Philippe Lafoucrière wrote:
>>> We're having a potential issue. One postgresql service is not starting
>>> on the beta cluster:
>>>
>>> FATAL:  could not open shared memory segment "/PostgreSQL.1804289383":
>>> Permission denied
>>>
>>> We need to investigate that, but it could be related to docker mounts
>>> (especially /dev/shm)
>>
>> This is due to a change in Docker, lemme see if I can find docs on it.
>> I know that Docker added constraints on dynamic shared memory at some point.
>>
>>
>> --
>> --
>> Josh Berkus
>> Project Atomic
>> Red Hat OSAS

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-18 Thread Josh Berkus
On 07/18/2016 09:59 AM, Scott Dodson wrote:
> We've worked around this change by checking for those booleans and
> setting each if they exist.
> https://github.com/openshift/openshift-ansible/pull/2166

What version of Postgres is in this container?

-- 
--
Josh Berkus
Project Atomic
Red Hat OSAS

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-18 Thread Scott Dodson
We've worked around this change by checking for those booleans and
setting each if they exist.
https://github.com/openshift/openshift-ansible/pull/2166

On Mon, Jul 18, 2016 at 12:56 PM, Josh Berkus  wrote:
> On 07/15/2016 08:59 AM, Philippe Lafoucrière wrote:
>> We're having a potential issue. One postgresql service is not starting
>> on the beta cluster:
>>
>> FATAL:  could not open shared memory segment "/PostgreSQL.1804289383":
>> Permission denied
>>
>> We need to investigate that, but it could be related to docker mounts
>> (especially /dev/shm)
>
> This is due to a change in Docker, lemme see if I can find docs on it.
> I know that Docker added constraints on dynamic shared memory at some point.
>
>
> --
> --
> Josh Berkus
> Project Atomic
> Red Hat OSAS

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-15 Thread Colin Walters
On Mon, Jul 11, 2016, at 09:56 AM, Scott Dodson wrote:
> That commit is mostly related to the fact that we cannot
> upgrade/downgrade docker on atomic host like can on RHEL so abort the
> docker upgrade playbook early.

For short term fixes, it is however possible to use `atomic host deploy` to 
reset to
an earlier known version.  But it's not a long term solution because that also
means one isn't getting kernel security updates and such.

We're working on new mechanisms addressing the privileged/system container case.

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-15 Thread Philippe Lafoucrière
https://docs.openshift.org/latest/dev_guide/shared_memory.html

fixed the issue, but It seems something changed regarding /dev or shm
docker mounts between 1.2.0 and 1.2.1.
Can someone confirm?
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-15 Thread Philippe Lafoucrière
I confirm: it's fixed :)​
thanks!
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-14 Thread Scott Dodson
We pulled that into v1.2.1 along with the security update. Can you
give that a try?

On Thu, Jul 14, 2016 at 11:11 AM, Philippe Lafoucrière
 wrote:
>
> On Tue, Jul 12, 2016 at 5:22 PM, Scott Dodson  wrote:
>>
>> I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but
>> you can also rebuild the node image placing the docker wrapper script
>> in /usr/local/bin
>
>
> Any news on this?
> Thanks
>

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-14 Thread Philippe Lafoucrière
On Tue, Jul 12, 2016 at 5:22 PM, Scott Dodson  wrote:

> I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but
> you can also rebuild the node image placing the docker wrapper script
> in /usr/local/bin
>

Any news on this?
Thanks
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-12 Thread Philippe Lafoucrière
Testing it right away.
Thanks guys :)
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-12 Thread Scott Dodson
https://github.com/openshift/origin/pull/9046 is the real fix for this.

I'll see if I can get openshift/node:v1.2.0 rebuilt with this fix but
you can also rebuild the node image placing the docker wrapper script
in /usr/local/bin

On Tue, Jul 12, 2016 at 4:36 PM, Philippe Lafoucrière
 wrote:
> Good catch Scott:
>
>
> [plafoucriere@atomic-test-node-1 origin]# docker info
> /usr/bin/docker-current: error while loading shared libraries:
> libseccomp.so.2: cannot open shared object file: No such file or directory

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-12 Thread Philippe Lafoucrière
Good catch Scott:


[plafoucriere@atomic-test-node-1 origin]# docker info
/usr/bin/docker-current: error while loading shared libraries:
libseccomp.so.2: cannot open shared object file: No such file or directory
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-12 Thread Scott Dodson
Maybe this is another bug? Can you exec into your node container and
try to run `docker info` and see what errors it yields?

On Tue, Jul 12, 2016 at 2:13 PM, Philippe Lafoucrière
 wrote:
>
> On Tue, Jul 12, 2016 at 10:27 AM, Scott Dodson  wrote:
>>
>> Lets say openshift-ansible for now.
>
>
> ok thanks
>
>>
>> I suspect that adding `-v /etc/sysconfig/docker:/etc/sysconfig/docker`
>> to ExecStart in /etc/systemd/system/origin-node.service will fix this,
>> also verify that you've got `-v
>> /usr/bin/docker-current:/usr/bin/docker-current` too but the current
>> installer should take care of that. After you've added that `systemctl
>> daemon-reload && systemctl restart origin-node`
>
>
> I think that's exactly what
> https://github.com/openshift/openshift-ansible/pull/2037 is supposed to do,
> and while it was allowing the previous atomic update, it's not working this
> time :(
>
> It's already there on our nodes:
>
>  -bash-4.2# cat /etc/sysconfig/origin-node-dep
> DOCKER_ADDTL_BIND_MOUNTS=--volume=/usr/bin/docker-current:/usr/bin/docker-current:ro
> --volume=/etc/sysconfig/docker:/etc/sysconfig/docker:ro
>
>
>>
>> This is actually fixed in the v1.3 images because docker runs chroot
>> /rootfs.
>
>
> yes, but 1.3 is still Alpha, we can't install it on our production clusters.
>
> Thanks,
> Philippe

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-12 Thread Scott Dodson
Lets say openshift-ansible for now.

I suspect that adding `-v /etc/sysconfig/docker:/etc/sysconfig/docker`
to ExecStart in /etc/systemd/system/origin-node.service will fix this,
also verify that you've got `-v
/usr/bin/docker-current:/usr/bin/docker-current` too but the current
installer should take care of that. After you've added that `systemctl
daemon-reload && systemctl restart origin-node`

This is actually fixed in the v1.3 images because docker runs chroot /rootfs.

On Mon, Jul 11, 2016 at 1:47 PM, Philippe Lafoucrière
 wrote:
>
> On Mon, Jul 11, 2016 at 9:56 AM, Scott Dodson  wrote:
>>
>> That commit is mostly related to the fact that we cannot
>> upgrade/downgrade docker on atomic host like can on RHEL so abort the
>> docker upgrade playbook early.
>
>
> Ok, I get it now, thanks.
>
> Anyway, we couldn't fix our beta cluster, and had to restore snapshots, as
> nothing was deploying anymore ("Failed to setup network for pod [...]").
> Even with the latest version of the playbook :(
> Should I open an issue in openshift, or openshift-ansible project for that?
>
> Thanks
> Philippe
>

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-11 Thread Philippe Lafoucrière
On Mon, Jul 11, 2016 at 9:56 AM, Scott Dodson  wrote:

> That commit is mostly related to the fact that we cannot
> upgrade/downgrade docker on atomic host like can on RHEL so abort the
> docker upgrade playbook early.
>

Ok, I get it now, thanks.

Anyway, we couldn't fix our beta cluster, and had to restore snapshots, as
nothing was deploying anymore ("Failed to setup network for pod [...]").
Even with the latest version of the playbook :(
Should I open an issue in openshift, or openshift-ansible project for that?

Thanks
Philippe
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-11 Thread Scott Dodson
That commit is mostly related to the fact that we cannot
upgrade/downgrade docker on atomic host like can on RHEL so abort the
docker upgrade playbook early.

On Sun, Jul 10, 2016 at 2:45 PM, Philippe Lafoucrière
 wrote:
> Sounds like docker 1.10 is a bad idea, I found this commit:
>
> https://github.com/openshift/openshift-ansible/commit/b377f9d85df11c532281c213eda1869596642204
>
> I was probably running openshift-ansible with a wrong tag :(
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-10 Thread Philippe Lafoucrière
Sounds like docker 1.10 is a bad idea, I found this commit:

https://github.com/openshift/openshift-ansible/commit/b377f9d85df11c532281c213eda1869596642204

I was probably running openshift-ansible with a wrong tag :(
​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-07-09 Thread Philippe Lafoucrière
We have updated our beta cluster to latest atomic centos:

-bash-4.2# atomic host status
  TIMESTAMP (UTC) VERSIONID OSNAME
REFSPEC
* 2016-07-07 21:23:41 7.20160707 cd47a72eb5 centos-atomic-host
centos-atomic-host:centos-atomic-host/7/x86_64/standard
  2016-06-10 13:15:00 7.20160610 3c3786d1dd centos-atomic-host
centos-atomic-host:centos-atomic-host/7/x86_64/standard

GPG: Found 1 signature on the booted deployment (*):

  Signature made Thu Jul  7 23:34:40 2016 using RSA key ID F17E745691BA8335
  Good signature from "CentOS Atomic SIG "


And the problem re-appeared:

Jul 10 01:40:08 atomic-test-node-1 origin-node[3150]: I0710 01:40:08.000177
   3201 manager.go:1400] Container
"0cf256d23de1b837a295233491e6650c90519fa2d0807d37f95a8164a842257b
gemnasium-enterprise/gemnasium-enterprise-7-8unp4" exited after 121.527557ms
Jul 10 01:40:08 atomic-test-node-1.priv.tech-angels.net origin-node[3150]:
E0710 01:40:08.0002433201 pod_workers.go:138] Error syncing pod
6e4dd3f7-462d-11e6-89a2-005056b17dcc, skipping: failed to "SetupNetwork"
for "gemnasium-enterprise-7-8unp4_gemnasium-enterprise" with
SetupNetworkError: "Failed to setup network for pod
\"gemnasium-enterprise-7-8unp4_gemnasium-enterprise(6e4dd3f7-462d-11e6-89a2-005056b17dcc)\"
using network plugins \"redhat/openshift-ovs-multitenant\": exit status 1;
Skipping pod"


Running the playbook doesn't seem to fix the problem this time. I've seen
docker has been updated to 1.10, could it be an issue?
​
Thanks
Philippe
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-06-17 Thread Philippe Lafoucrière
Thanks Tobias for the detailed help!
I should have thought of running again ansible, I was focused on the error.


​
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Atomic Centos, can't upgrade

2016-06-17 Thread Tobias Florek
Hi,

that's a known problem with known fix, but maybe some publicity around
it might be good.


> We have tried to update our atomic host centos 7, with the
> tree 3c3786d1dd (from the tree e39c28570a), but deployments are all
> failing after the updates on the nodes:
> 
> Error syncing pod, skipping: failed to "SetupNetwork" for "some_deploy"
> with SetupNetworkError: "Failed to setup network for pod
> \"some_deploy(d080f8d4-3498-11e6-8512-005056b1755a)\" using network
> plugins \"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod"


It's a problem that is fixed by either
 * reprovisioning via openshift-ansible, or
 * upgrading to openshift v1.3 (alpha or latest).

The problem is, that due to the docker-current/docker-latest diversion,
/usr/bin/docker is a shell script that needs /etc/sysconfig/docker, but
that is not mounted by the old origin-node systemd unit.

See
 https://github.com/openshift/openshift-ansible/pull/2037
for the fix for origin pre 1.3. Origin v1.3 will run docker chrooted to
the host fs, so the problem does not manifest itself.

Chdeers,
 Tobias Flore

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Atomic Centos, can't upgrade

2016-06-17 Thread Philippe Lafoucrière
Hi,

We have tried to update our atomic host centos 7, with the tree 3c3786d1dd
(from the tree e39c28570a), but deployments are all failing after the
updates on the nodes:

Error syncing pod, skipping: failed to "SetupNetwork" for "some_deploy"
with SetupNetworkError: "Failed to setup network for pod
\"some_deploy(d080f8d4-3498-11e6-8512-005056b1755a)\" using network plugins
\"redhat/openshift-ovs-subnet\": exit status 1; Skipping pod"
Where can I fill a report for that? Openshift or Atomic host bugtracker
(whereever it is)?

Thanks
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users