Issues running 'cotd' demo on Ubuntu Linux 16.04

2017-09-05 Thread Benjamin
Hi there,

I am trying to follow the book "DevOps With OpenShift" and having some
trouble. When I get to the point of launching the 'cotd' container demo,
specifically this line:

oc new-app --name='cotd' --labels name='cotd' php~https://github.com/devops-
with-openshift/cotd.git -e SELECTOR=cats

It appears to spool up and build okay, but then fails in a crash loop:

AH00558: httpd: Could not reliably determine the server's fully qualified
domain name, using 172.17.0.2. Set the 'ServerName' directive globally to
suppress this message
(13)Permission denied: AH00058: Error retrieving pid file
/opt/rh/httpd24/root/var/run/httpd/httpd.pid
AH00059: Remove it before continuing if it is corrupted.

I'm running Ubuntu 16.04 64-bit, fully up-to-date including the kernel, and
using the latest stable oc binary (Origin v3.6.0), and running Docker from
its official repos. OpenShift itself seems to work great and has no major
issues.

FWIW I've used this exact same book and exact same commands and processes
to successfully get the demo up and running on two different Macs, so it
looks like this is a Ubuntu-specific issue. I would switch to Fedora but my
workstation requires Ubuntu for various annoying reasons. I want to get
this up and running as my workstation has 32GB of RAM which is 2-4x more
than all my other machines!

Version information follows:

foo@bar:~$ oc version
oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7

foo@bar:~$ docker -v
Docker version 17.06.1-ce, build 874a737

foo@bar:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

foo@bar:~$ uname -a
Linux bar 4.4.0-93-generic #116-Ubuntu SMP Fri Aug 11 21:17:51 UTC 2017
x86_64 x86_64 x86_64 GNU/Linux
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Let's Encrypt certificates

2017-09-05 Thread Mateus Caruccio
Hey.
How ready is it to use on production? Is there any plans to change
interfaces/mechanisms in the near future?

Thanks, great job! ;)

Em 5 de set de 2017 2:23 PM, "Tim Dudgeon"  escreveu:

> Tomas
>
> Thanks, that helped.
>
> The problem was that it wasn't clear that you needed to install into a new
> project, and then update the
>
> oc adm policy add-cluster-role-to-user acme-controller
> system:serviceaccount:acme:default
>
> command and replace acme with the name of the project. Once done it
> installs fine and issues certificates as described.
>
> Thanks
> Tim
>
>
> On 05/09/2017 17:38, Tomas Nozicka wrote:
>
>> Hi Tim,
>>
>> (see inline...)
>>
>> On Tue, 2017-09-05 at 17:12 +0100, Tim Dudgeon wrote:
>>
>>> Thanks.
>>>
>>> I'm having problems getting this running.
>>> When I deploy the deploymentconfig the pod fails to start and the
>>> logs
>>> contain these errors:
>>>
>>> 2017-09-05T16:03:11.764025351Z  ERROR cmd.go:138 Unable to
 bootstrap
 certificate database: 'User

>>> and
>>>
 2017-09-05T16:03:11.766213869Z  ERROR cmd.go:173 Couln't
 initialize
 RouteController: 'RouteController could not find its own service:
 'User "system:serviceaccount:acme-controller:default" cannot get
 services in project "acme-controller"''

>>> misconfigured SA is system:serviceaccount:acme-controller:default
>> - notably the namespace is **acme-controller**
>>
>> I already deployed the clusterrole and executed
>>>
>>> oc adm policy add-cluster-role-to-user acme-controller
 system:serviceaccount:acme:default

>>> Even tried as suggested:
>>>
>>> oc adm policy add-cluster-role-to-user cluster-admin
 system:serviceaccount:acme:default

>>> You are modifying SA in namespace **acme** not **acme-controller**
>>
>> I tried this in the default project and in a new acme-controller
>>> project.
>>>
>>> Could you help describe steps to get this running in a new openshift
>>> environment?
>>>
>> Try looking at the exact steps our CI is using to create it from
>> scratch but it should work as described in our docs.
>>
>>https://github.com/tnozicka/openshift-acme/blob/master/.travis.yml#L6
>> 7-L73
>>
>> Thanks
>>> Tim
>>>
>>>
>>>
>>> On 04/09/2017 09:44, Tomas Nozicka wrote:
>>>
 Hi Tim,

 On Mon, 2017-09-04 at 09:16 +0100, Tim Dudgeon wrote:

> Tomas
>
> Thanks for that. Looks very interesting.
>
> I've looked it over and not totally sure how to use this.
>
> Am I right that if this controller is deployed and running
> correctly
> then all you need to do for any routes is add the
> 'kubernetes.io/tls-acme: "true"' annotation to your route  and
> the
> controller will handle creating the initial certificate and
> renewing
> it
> as needed?
>
 Correct.

 And in doing so it will generate/renew certificate for the
> hostname,
> add/update it as a secret, and update the route definition to use
> that
> certificate?
>
 For Routes it will generate a secret with that certificate and also
 inline it into the Route as it doesn't support referencing it.
 (Ingresses do, but the project doesn't support those yet.) The
 secret
 can be useful for checking or mounting it into pods directly if you
 don't want to terminate your TLS in the router but in pods.

 And that this will only apply to external routes. Some mechanism,
> such
> as the Ansible playbook, will still be required to maintain the
> certificates that are used internally by the Openshift
> infrastructure?
>
 I have some thoughts on this but no code :/

 As I said at this point you need to bootstrap the infra using your
 own
 CA/self-signed cert and then you can expose the OpenShift API + web
 console using a Route. This should work fine even for 'oc' client
 unless the Router is down and you need to fix it. For that rare
 case,
 when only the admin will need to log in to fix the router he can
 use
 the internal cert or ssh into the cluster directly.

 So this hack should cover all the use cases for users except this
 special case for an admin.

 Thanks
> Tim
>
> On 25/08/2017 17:09, Tomas Nozicka wrote:
>
>> Hi Tim,
>>
>> there is a controller to take care about generating and
>> renewing
>> Let's
>> Encrypt certificates for you.
>>
>> https://github.com/tnozicka/openshift-acme
>>
>> That said it won't generate it for masters but you can expose
>> master
>> API using Route and certificate for that Route would be fully
>> managed
>> by openshift-acme.
>>
>> Further integrations might be possible in future but this is
>> how
>> you
>> can get it done now.
>>
>> Regards,
>> Tomas
>>
>>
>> On Fri, 2017-08-25 at 16:27 +0100, Tim Dudgeon wrote:
>>
>>> Does 

Re: Let's Encrypt certificates

2017-09-05 Thread Tim Dudgeon

Tomas

Thanks, that helped.

The problem was that it wasn't clear that you needed to install into a 
new project, and then update the


oc adm policy add-cluster-role-to-user acme-controller 
system:serviceaccount:acme:default


command and replace acme with the name of the project. Once done it 
installs fine and issues certificates as described.


Thanks
Tim


On 05/09/2017 17:38, Tomas Nozicka wrote:

Hi Tim,

(see inline...)

On Tue, 2017-09-05 at 17:12 +0100, Tim Dudgeon wrote:

Thanks.

I'm having problems getting this running.
When I deploy the deploymentconfig the pod fails to start and the
logs
contain these errors:


2017-09-05T16:03:11.764025351Z  ERROR cmd.go:138 Unable to
bootstrap
certificate database: 'User

and

2017-09-05T16:03:11.766213869Z  ERROR cmd.go:173 Couln't
initialize
RouteController: 'RouteController could not find its own service:
'User "system:serviceaccount:acme-controller:default" cannot get
services in project "acme-controller"''

misconfigured SA is system:serviceaccount:acme-controller:default
- notably the namespace is **acme-controller**


I already deployed the clusterrole and executed


oc adm policy add-cluster-role-to-user acme-controller
system:serviceaccount:acme:default

Even tried as suggested:


oc adm policy add-cluster-role-to-user cluster-admin
system:serviceaccount:acme:default

You are modifying SA in namespace **acme** not **acme-controller**


I tried this in the default project and in a new acme-controller
project.

Could you help describe steps to get this running in a new openshift
environment?

Try looking at the exact steps our CI is using to create it from
scratch but it should work as described in our docs.

   https://github.com/tnozicka/openshift-acme/blob/master/.travis.yml#L6
7-L73


Thanks
Tim



On 04/09/2017 09:44, Tomas Nozicka wrote:

Hi Tim,

On Mon, 2017-09-04 at 09:16 +0100, Tim Dudgeon wrote:

Tomas

Thanks for that. Looks very interesting.

I've looked it over and not totally sure how to use this.

Am I right that if this controller is deployed and running
correctly
then all you need to do for any routes is add the
'kubernetes.io/tls-acme: "true"' annotation to your route  and
the
controller will handle creating the initial certificate and
renewing
it
as needed?

Correct.


And in doing so it will generate/renew certificate for the
hostname,
add/update it as a secret, and update the route definition to use
that
certificate?

For Routes it will generate a secret with that certificate and also
inline it into the Route as it doesn't support referencing it.
(Ingresses do, but the project doesn't support those yet.) The
secret
can be useful for checking or mounting it into pods directly if you
don't want to terminate your TLS in the router but in pods.


And that this will only apply to external routes. Some mechanism,
such
as the Ansible playbook, will still be required to maintain the
certificates that are used internally by the Openshift
infrastructure?

I have some thoughts on this but no code :/

As I said at this point you need to bootstrap the infra using your
own
CA/self-signed cert and then you can expose the OpenShift API + web
console using a Route. This should work fine even for 'oc' client
unless the Router is down and you need to fix it. For that rare
case,
when only the admin will need to log in to fix the router he can
use
the internal cert or ssh into the cluster directly.

So this hack should cover all the use cases for users except this
special case for an admin.


Thanks
Tim

On 25/08/2017 17:09, Tomas Nozicka wrote:

Hi Tim,

there is a controller to take care about generating and
renewing
Let's
Encrypt certificates for you.

https://github.com/tnozicka/openshift-acme

That said it won't generate it for masters but you can expose
master
API using Route and certificate for that Route would be fully
managed
by openshift-acme.

Further integrations might be possible in future but this is
how
you
can get it done now.

Regards,
Tomas


On Fri, 2017-08-25 at 16:27 +0100, Tim Dudgeon wrote:

Does anyone have any experience on how best to use Let'
Encrypt
certificates for an OpenShift Origin cluster?

In once sense this is simple. The Ansible installer can be
specified
to
use this custom certificate and key to sign all the
certificates
it
generates, and doing so ensures you don't get the dreaded
"This
site
is
insecure" messages from your browser. And there is a playbook
for
updating certificates (which is essential as Let' Encrypt
certificates
are short lived) so this must be automated.

But how best to set this up and automate the certificate
generation
and
renewal?

Let's assume Ansible is being run from a separate machine
that is
not
part of the cluster and needs to deploy those custom
certificates
to
the
master(s). The certificate needs to be present on the ansible
machine
but needs to apply to the master(s) (or load balancer?). So
you
can't
just generate the certificate on the ansible machine (e.g.
using
--standalone 

Re: Let's Encrypt certificates

2017-09-05 Thread Tomas Nozicka
Hi Tim,

(see inline...)

On Tue, 2017-09-05 at 17:12 +0100, Tim Dudgeon wrote:
> Thanks.
> 
> I'm having problems getting this running.
> When I deploy the deploymentconfig the pod fails to start and the
> logs 
> contain these errors:
> 
> > 2017-09-05T16:03:11.764025351Z  ERROR cmd.go:138 Unable to
> > bootstrap 
> > certificate database: 'User
> 
> and
> > 2017-09-05T16:03:11.766213869Z  ERROR cmd.go:173 Couln't
> > initialize 
> > RouteController: 'RouteController could not find its own service: 
> > 'User "system:serviceaccount:acme-controller:default" cannot get 
> > services in project "acme-controller"''
misconfigured SA is system:serviceaccount:acme-controller:default
- notably the namespace is **acme-controller**

> 
> I already deployed the clusterrole and executed
> 
> > oc adm policy add-cluster-role-to-user acme-controller 
> > system:serviceaccount:acme:default
> 
> Even tried as suggested:
> 
> > oc adm policy add-cluster-role-to-user cluster-admin 
> > system:serviceaccount:acme:default
You are modifying SA in namespace **acme** not **acme-controller**

> 
> I tried this in the default project and in a new acme-controller
> project.
> 
> Could you help describe steps to get this running in a new openshift 
> environment?
Try looking at the exact steps our CI is using to create it from
scratch but it should work as described in our docs.

  https://github.com/tnozicka/openshift-acme/blob/master/.travis.yml#L6
7-L73

> 
> Thanks
> Tim
> 
> 
> 
> On 04/09/2017 09:44, Tomas Nozicka wrote:
> > Hi Tim,
> > 
> > On Mon, 2017-09-04 at 09:16 +0100, Tim Dudgeon wrote:
> > > Tomas
> > > 
> > > Thanks for that. Looks very interesting.
> > > 
> > > I've looked it over and not totally sure how to use this.
> > > 
> > > Am I right that if this controller is deployed and running
> > > correctly
> > > then all you need to do for any routes is add the
> > > 'kubernetes.io/tls-acme: "true"' annotation to your route  and
> > > the
> > > controller will handle creating the initial certificate and
> > > renewing
> > > it
> > > as needed?
> > 
> > Correct.
> > 
> > > And in doing so it will generate/renew certificate for the
> > > hostname,
> > > add/update it as a secret, and update the route definition to use
> > > that
> > > certificate?
> > 
> > For Routes it will generate a secret with that certificate and also
> > inline it into the Route as it doesn't support referencing it.
> > (Ingresses do, but the project doesn't support those yet.) The
> > secret
> > can be useful for checking or mounting it into pods directly if you
> > don't want to terminate your TLS in the router but in pods.
> > 
> > > And that this will only apply to external routes. Some mechanism,
> > > such
> > > as the Ansible playbook, will still be required to maintain the
> > > certificates that are used internally by the Openshift
> > > infrastructure?
> > 
> > I have some thoughts on this but no code :/
> > 
> > As I said at this point you need to bootstrap the infra using your
> > own
> > CA/self-signed cert and then you can expose the OpenShift API + web
> > console using a Route. This should work fine even for 'oc' client
> > unless the Router is down and you need to fix it. For that rare
> > case,
> > when only the admin will need to log in to fix the router he can
> > use
> > the internal cert or ssh into the cluster directly.
> > 
> > So this hack should cover all the use cases for users except this
> > special case for an admin.
> > 
> > > Thanks
> > > Tim
> > > 
> > > On 25/08/2017 17:09, Tomas Nozicka wrote:
> > > > Hi Tim,
> > > > 
> > > > there is a controller to take care about generating and
> > > > renewing
> > > > Let's
> > > > Encrypt certificates for you.
> > > > 
> > > > https://github.com/tnozicka/openshift-acme
> > > > 
> > > > That said it won't generate it for masters but you can expose
> > > > master
> > > > API using Route and certificate for that Route would be fully
> > > > managed
> > > > by openshift-acme.
> > > > 
> > > > Further integrations might be possible in future but this is
> > > > how
> > > > you
> > > > can get it done now.
> > > > 
> > > > Regards,
> > > > Tomas
> > > > 
> > > > 
> > > > On Fri, 2017-08-25 at 16:27 +0100, Tim Dudgeon wrote:
> > > > > Does anyone have any experience on how best to use Let'
> > > > > Encrypt
> > > > > certificates for an OpenShift Origin cluster?
> > > > > 
> > > > > In once sense this is simple. The Ansible installer can be
> > > > > specified
> > > > > to
> > > > > use this custom certificate and key to sign all the
> > > > > certificates
> > > > > it
> > > > > generates, and doing so ensures you don't get the dreaded
> > > > > "This
> > > > > site
> > > > > is
> > > > > insecure" messages from your browser. And there is a playbook
> > > > > for
> > > > > updating certificates (which is essential as Let' Encrypt
> > > > > certificates
> > > > > are short lived) so this must be automated.
> > > > > 
> > > > > But how best to set this up and automate the 

Re: Let's Encrypt certificates

2017-09-05 Thread Tim Dudgeon

Thanks.

I'm having problems getting this running.
When I deploy the deploymentconfig the pod fails to start and the logs 
contain these errors:


2017-09-05T16:03:11.764025351Z  ERROR cmd.go:138 Unable to bootstrap 
certificate database: 'User

and
2017-09-05T16:03:11.766213869Z  ERROR cmd.go:173 Couln't initialize 
RouteController: 'RouteController could not find its own service: 
'User "system:serviceaccount:acme-controller:default" cannot get 
services in project "acme-controller"''

I already deployed the clusterrole and executed

oc adm policy add-cluster-role-to-user acme-controller 
system:serviceaccount:acme:default

Even tried as suggested:

oc adm policy add-cluster-role-to-user cluster-admin 
system:serviceaccount:acme:default

I tried this in the default project and in a new acme-controller project.

Could you help describe steps to get this running in a new openshift 
environment?


Thanks
Tim



On 04/09/2017 09:44, Tomas Nozicka wrote:

Hi Tim,

On Mon, 2017-09-04 at 09:16 +0100, Tim Dudgeon wrote:

Tomas

Thanks for that. Looks very interesting.

I've looked it over and not totally sure how to use this.

Am I right that if this controller is deployed and running correctly
then all you need to do for any routes is add the
'kubernetes.io/tls-acme: "true"' annotation to your route  and the
controller will handle creating the initial certificate and renewing
it
as needed?

Correct.


And in doing so it will generate/renew certificate for the hostname,
add/update it as a secret, and update the route definition to use
that
certificate?

For Routes it will generate a secret with that certificate and also
inline it into the Route as it doesn't support referencing it.
(Ingresses do, but the project doesn't support those yet.) The secret
can be useful for checking or mounting it into pods directly if you
don't want to terminate your TLS in the router but in pods.


And that this will only apply to external routes. Some mechanism,
such
as the Ansible playbook, will still be required to maintain the
certificates that are used internally by the Openshift
infrastructure?

I have some thoughts on this but no code :/

As I said at this point you need to bootstrap the infra using your own
CA/self-signed cert and then you can expose the OpenShift API + web
console using a Route. This should work fine even for 'oc' client
unless the Router is down and you need to fix it. For that rare case,
when only the admin will need to log in to fix the router he can use
the internal cert or ssh into the cluster directly.

So this hack should cover all the use cases for users except this
special case for an admin.


Thanks
Tim

On 25/08/2017 17:09, Tomas Nozicka wrote:

Hi Tim,

there is a controller to take care about generating and renewing
Let's
Encrypt certificates for you.

https://github.com/tnozicka/openshift-acme

That said it won't generate it for masters but you can expose
master
API using Route and certificate for that Route would be fully
managed
by openshift-acme.

Further integrations might be possible in future but this is how
you
can get it done now.

Regards,
Tomas


On Fri, 2017-08-25 at 16:27 +0100, Tim Dudgeon wrote:

Does anyone have any experience on how best to use Let' Encrypt
certificates for an OpenShift Origin cluster?

In once sense this is simple. The Ansible installer can be
specified
to
use this custom certificate and key to sign all the certificates
it
generates, and doing so ensures you don't get the dreaded "This
site
is
insecure" messages from your browser. And there is a playbook for
updating certificates (which is essential as Let' Encrypt
certificates
are short lived) so this must be automated.

But how best to set this up and automate the certificate
generation
and
renewal?

Let's assume Ansible is being run from a separate machine that is
not
part of the cluster and needs to deploy those custom certificates
to
the
master(s). The certificate needs to be present on the ansible
machine
but needs to apply to the master(s) (or load balancer?). So you
can't
just generate the certificate on the ansible machine (e.g. using
--standalone option for certbot) as it would not be for the right
machine.

Similarly it doesn't seem right to request and update the
certificates
on the master (which master in the case of multiple masters?),
and
those
certificates need to be present on the ansible machine.

Seems like the answer might be to run a process on the ansible
machine
that requests the certificates using the webroot plugin and in
doing
so
places the magical key that is used to verify ownership of the
domain
under the https://your.site.com/.well-known/acme-challenge
location?
But
how to go about doing this? Ports 80 and 443 seem to be in use on
the
cluster, but not serving up any particular content. How to place
the
content there?

I'm hoping others have already needed to handle this problem and
can
point to some best practice.

Thanks
Tim


___

Re: Metrics not accessible

2017-09-05 Thread Tim Dudgeon

Still no joy with this.
I retried with the latest code and still hitting the same problem.
Metrics does not seem to be working with a new Ansible install.
I'm using a minimal setup with an inventory like this:


[OSEv3:children]
masters
nodes
etcd
nfs

[OSEv3:vars]
ansible_ssh_user=centos
ansible_become=yes

openshift_deployment_type=origin
openshift_release=v3.6

openshift_disable_check=disk_availability,docker_storage,memory_availability

openshift_hosted_metrics_deploy=true
openshift_hosted_metrics_storage_kind=nfs
openshift_hosted_metrics_storage_access_modes=['ReadWriteOnce']
openshift_hosted_metrics_storage_nfs_directory=/exports
openshift_hosted_metrics_storage_nfs_options='*(rw,root_squash)'
openshift_hosted_metrics_storage_volume_name=metrics
openshift_hosted_metrics_storage_volume_size=10Gi
openshift_hosted_metrics_storage_labels={'storage': 'metrics'}

[masters]
ip-10-0-113-31.eu-west-1.compute.internal

[etcd]
ip-10-0-113-31.eu-west-1.compute.internal

[nfs]
ip-10-0-113-31.eu-west-1.compute.internal

[nodes]
ip-10-0-113-31.eu-west-1.compute.internal 
openshift_node_labels="{'region': 'infra','zone': 'default'}" 
openshift_schedulable=true


When the install completes the openshift-infra project pods ends up like 
this:



NAME READY STATUS RESTARTS   AGE
hawkular-cassandra-1-4m7lq   1/1   Running 0  16m
hawkular-metrics-0nl1q   0/1   CrashLoopBackOff 7  16m
heapster-cgw0b   0/1   Running 1  16m


The hawkular-metrics pods is failing, and it looks like its because it 
can't connect to the cassandra pod.

The full log of the hawkular-metrics pod is here:
https://gist.github.com/tdudgeon/f3099911eed441817369ee03635aad7d

Any help resolving this would be appreciated.

Tim





___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: oc -w timeout

2017-09-05 Thread Mateus Caruccio
Thanks a lot! It would take me forever to realize masters are behind an
ELB ;)

Best

--
Mateus Caruccio / Master of Puppets
GetupCloud.com
We make the infrastructure invisible
Gartner Cool Vendor 2017

2017-09-05 9:44 GMT-03:00 Philippe Lafoucrière <
philippe.lafoucri...@tech-angels.com>:

> Hi,
>
> You might want to take a look at this thread: https://lists.
> openshift.redhat.com/openshift-archives/users/2017-June/msg00135.html
> ​
> Cheers
>
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: oc -w timeout

2017-09-05 Thread Philippe Lafoucrière
Hi,

You might want to take a look at this thread:
https://lists.openshift.redhat.com/openshift-archives/users/2017-June/msg00135.html
​
Cheers
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


oc -w timeout

2017-09-05 Thread Mateus Caruccio
Hi there.
Where is located the config to change timeout of watch operations? I'm
getting disconnected after 5 minutes and would like to increase this value.

--
Mateus Caruccio / Master of Puppets
GetupCloud.com
We make the infrastructure invisible
Gartner Cool Vendor 2017
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users