Re: Node startup Failure on SDN

2016-08-16 Thread Jonathan Yu
On Aug 15, 2016 11:08, "Skarbek, John"  wrote:
>
> So I figured it out. Ntp went kaboom on one of our master nodes.
>
> ERROR: [DCli0015 from diagnostic 
> ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285]
For client config context 'default/cluster:8443/system:admin': The server
URL is 'https://cluster:8443' The user authentication is
'system:admin/cluster:8443' The current project is 'default' (*url.Error)
Get https://cluster:8443/api: x509: certificate has expired or is not yet
valid Diagnostics does not have an explanation for what this means. Please
report this error so one can be added.
>
> I ended up finding that the master node clock just…. I have no idea:
>
> [/etc/origin/master]# date Wed Feb 14 12:23:13 UTC 2001
>
> I’d like to suggest that diagnostics checks the date and time of all the
certificates and perhaps do some sort of ntp check and maybe even go the
extra mile and compare the time on the server to …life. I have no idea why
my master node decided to back to Valentines day in 2001. I think I was
single way back when.

Good idea. At minimum it seems like a good idea to record the build date
for the binary and check against that. I think Chrome does something
similar - perhaps figuring out how Chrome handles this is a reasonable
starting point
>
>
>
> --
> John Skarbek
>
> On August 15, 2016 at 13:32:13, Skarbek, John (john.skar...@ca.com) wrote:
>>
>> It would appear the certificate is valid 2018:
>>
>> `[/etc/origin/node]# openssl x509 -enddate -in system:node:node-001.crt
notAfter=Mar 21 15:18:10 2018 GMT
>>
>> Got any other ideas?
>>
>>
>>
>> --
>> John Skarbek
>>
>> On August 15, 2016 at 13:27:57, Clayton Coleman (ccole...@redhat.com)
wrote:
>>>
>>> The node's client certificate may have expired - that a common failure
mode.
>>>
>>> On Aug 15, 2016, at 1:23 PM, Skarbek, John  wrote:
>>>
 Good Morning,

 We recently had a node go down, upon trying to get it back online, the
origin-node service fails to start. The rest of the cluster appears to be
just fine, so with the desire to troubleshoot, what can I look at to
determine the root cause of the following error:

 Aug 15 17:12:59 node-001 origin-node[14536]: E0815 17:12:59.469682
14536 common.go:194] Failed to obtain ClusterNetwork: the server has asked
for the client to provide credentials (get clusterNetworks default) Aug 15
17:12:59 node-001 origin-node[14536]: F0815 17:12:59.469705 14536
node.go:310] error: SDN node startup failed: the server has asked for the
client to provide credentials (get clusterNetworks default)



 --
 John Skarbek

 ___
 users mailing list
 users@lists.openshift.redhat.com
 http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Node startup Failure on SDN

2016-08-15 Thread Skarbek, John
So I figured it out. Ntp went kaboom on one of our master nodes.

ERROR: [DCli0015 from diagnostic 
ConfigContexts@openshift/origin/pkg/diagnostics/client/config_contexts.go:285]
   For client config context 'default/cluster:8443/system:admin':
   The server URL is 'https://cluster:8443'
   The user authentication is 'system:admin/cluster:8443'
   The current project is 'default'
   (*url.Error) Get https://cluster:8443/api: x509: certificate has expired 
or is not yet valid
   Diagnostics does not have an explanation for what this means. Please 
report this error so one can be added.



I ended up finding that the master node clock just…. I have no idea:

[/etc/origin/master]# date
Wed Feb 14 12:23:13 UTC 2001


I’d like to suggest that diagnostics checks the date and time of all the 
certificates and perhaps do some sort of ntp check and maybe even go the extra 
mile and compare the time on the server to …life. I have no idea why my master 
node decided to back to Valentines day in 2001. I think I was single way back 
when.


--
John Skarbek


On August 15, 2016 at 13:32:13, Skarbek, John 
(john.skar...@ca.com) wrote:

It would appear the certificate is valid 2018:

`[/etc/origin/node]# openssl x509 -enddate -in system:node:node-001.crt 
notAfter=Mar 21 15:18:10 2018 GMT

Got any other ideas?


--
John Skarbek


On August 15, 2016 at 13:27:57, Clayton Coleman 
(ccole...@redhat.com) wrote:

The node's client certificate may have expired - that a common failure mode.

On Aug 15, 2016, at 1:23 PM, Skarbek, John 
> wrote:


Good Morning,

We recently had a node go down, upon trying to get it back online, the 
origin-node service fails to start. The rest of the cluster appears to be just 
fine, so with the desire to troubleshoot, what can I look at to determine the 
root cause of the following error:

Aug 15 17:12:59 node-001 origin-node[14536]: E0815 17:12:59.469682   14536 
common.go:194] Failed to obtain ClusterNetwork: the server has asked for the 
client to provide credentials (get clusterNetworks default)
Aug 15 17:12:59 node-001 origin-node[14536]: F0815 17:12:59.469705   14536 
node.go:310] error: SDN node startup failed: the server has asked for the 
client to provide credentials (get clusterNetworks default)



--
John Skarbek
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Node startup Failure on SDN

2016-08-15 Thread Skarbek, John
It would appear the certificate is valid 2018:

` [/etc/origin/node]# openssl x509 -enddate -in system:node:node-001.crt 
notAfter=Mar 21 15:18:10 2018 GMT

Got any other ideas?


--
John Skarbek


On August 15, 2016 at 13:27:57, Clayton Coleman 
(ccole...@redhat.com) wrote:

The node's client certificate may have expired - that a common failure mode.

On Aug 15, 2016, at 1:23 PM, Skarbek, John 
> wrote:


Good Morning,

We recently had a node go down, upon trying to get it back online, the 
origin-node service fails to start. The rest of the cluster appears to be just 
fine, so with the desire to troubleshoot, what can I look at to determine the 
root cause of the following error:

Aug 15 17:12:59 node-001 origin-node[14536]: E0815 17:12:59.469682   14536 
common.go:194] Failed to obtain ClusterNetwork: the server has asked for the 
client to provide credentials (get clusterNetworks default)
Aug 15 17:12:59 node-001 origin-node[14536]: F0815 17:12:59.469705   14536 
node.go:310] error: SDN node startup failed: the server has asked for the 
client to provide credentials (get clusterNetworks default)



--
John Skarbek
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Node startup Failure on SDN

2016-08-15 Thread Skarbek, John
Good Morning,

We recently had a node go down, upon trying to get it back online, the 
origin-node service fails to start. The rest of the cluster appears to be just 
fine, so with the desire to troubleshoot, what can I look at to determine the 
root cause of the following error:

Aug 15 17:12:59 node-001 origin-node[14536]: E0815 17:12:59.469682   14536 
common.go:194] Failed to obtain ClusterNetwork: the server has asked for the 
client to provide credentials (get clusterNetworks default)
Aug 15 17:12:59 node-001 origin-node[14536]: F0815 17:12:59.469705   14536 
node.go:310] error: SDN node startup failed: the server has asked for the 
client to provide credentials (get clusterNetworks default)



--
John Skarbek
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users