Status: New
Owner: ----

New issue 1179 by bcandler...@googlemail.com: gnt-instance move socat error - certificate commonName does not match hostname
https://code.google.com/p/ganeti/issues/detail?id=1179

*What software version are you running?*

root@nuc1:~# gnt-cluster --version
gnt-cluster (ganeti 2.15.2-3) 2.15.2

root@nuc1:~# gnt-cluster version
Software version: 2.15.2
Internode protocol: 2150000
Configuration format: 2150000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version 2.15.2-3

root@nuc1:~# hspace --version
hspace (ganeti) version 2.15.2-3
compiled with ghc 7.10
running on linux x86_64

*What distribution are you using?*

Two nodes both ubuntu 16.04

    10.10.0.238 = nuc1.ws.nsrc.org
    10.10.0.239 = nuc2.ws.nsrc.org

*What steps will reproduce the problem?*

root@nuc1:~# gnt-instance list -o name,pnode,snodes pfsense1
Instance Primary_node     Secondary_Nodes
pfsense1 nuc2.ws.nsrc.org

root@nuc1:~# gnt-instance move -n nuc1.ws.nsrc.org -d pfsense1
2016-05-08 16:22:41,729: gnt-instance move pid=6299 cli:1218 DEBUG Command line: gnt-instance move -n nuc1.ws.nsrc.org -d pfsense1
Instance pfsense1 will be moved. This requires a shutdown of the
instance. Continue?
y/[n]/?: y
Sun May 8 16:22:47 2016 - INFO: Not checking memory on the secondary node as instance will not be started Sun May 8 16:22:48 2016 - INFO: Shutting down instance pfsense1 on source node nuc2.ws.nsrc.org Sun May 8 16:22:49 2016 Exporting disk/0 from nuc2.ws.nsrc.org to nuc1.ws.nsrc.org
Sun May  8 16:22:53 2016 disk/0 is now listening, starting export
Sun May 8 16:22:56 2016 - WARNING: import 'import-disk0-2016-05-08_16_22_49-MXuNPT' on nuc1.ws.nsrc.org failed: Exited with status 1 Sun May 8 16:22:56 2016 disk/0 failed to receive data: Exited with status 1 (recent output: 0+0 records in\n0+0 records out\n0 bytes copied, 4.81635 s, 0.0 kB/s) Sun May 8 16:22:56 2016 - WARNING: Aborting export 'export-disk0-2016-05-08_16_22_54-MM8F_x' on 0b180aa9-2c45-4250-a54a-e56c41b5cc68 Sun May 8 16:22:56 2016 - WARNING: export 'export-disk0-2016-05-08_16_22_54-MM8F_x' on nuc2.ws.nsrc.org failed: Exited with status 1 Sun May 8 16:22:56 2016 disk/0 failed to send data: Exited with status 1 (recent output: socat: E certificate is valid but its commonName does not match hostname\ndd: dd: error writing 'standard output': Broken pipe\ndd: error writing 'standard output': Broken pipe\n2+0 records in\n1+0 records out\n1114112 bytes (1.1 MB, 1.1 MiB) copied, 0.0976589 s, 11.4 MB/s)
Sun May  8 16:22:57 2016  - WARNING: Some disks failed to copy, aborting
2016-05-08 16:22:57,864: gnt-instance move pid=6299 cli:1225 ERROR Error during command processing
Traceback (most recent call last):
  File "/usr/share/ganeti/2.15/ganeti/cli.py", line 1221, in GenericMain
    result = func(options, args)
File "/usr/share/ganeti/2.15/ganeti/client/gnt_instance.py", line 851, in MoveInstance
    SubmitOrSend(op, opts, cl=cl)
  File "/usr/share/ganeti/2.15/ganeti/cli.py", line 1011, in SubmitOrSend
    return SubmitOpCode(op, cl=cl, feedback_fn=feedback_fn, opts=opts)
  File "/usr/share/ganeti/2.15/ganeti/cli.py", line 976, in SubmitOpCode
    reporter=reporter)
  File "/usr/share/ganeti/2.15/ganeti/cli.py", line 955, in PollJob
    return GenericPollJob(job_id, _LuxiJobPollCb(cl), reporter)
  File "/usr/share/ganeti/2.15/ganeti/cli.py", line 777, in GenericPollJob
    errors.MaybeRaise(msg)
  File "/usr/share/ganeti/2.15/ganeti/errors.py", line 531, in MaybeRaise
    raise errcls(*args)
OpExecError: Errors during disk copy: Failed to transfer instance data
Failure: command execution error:
Errors during disk copy: Failed to transfer instance data
root@nuc1:~#

"gnt-cluster renew-crypto --new-node-certificates" is successful, but does not affect the results. Restarting ganeti on both nodes makes no difference.

Replacing /usr/bin/socat with:

#!/bin/sh
echo "$0" "$@" >>/tmp/socat.out
exec socat.real "$@"

allowed me to capture the command arguments.

=== On nuc1: receiver of image and cluster master ===
/usr/bin/socat -ls -d -d -b1048576 -u OPENSSL-LISTEN:0,reuseaddr,forever,intervall=0.01,keepalive,keepidle=60,keepintvl=10,keepcnt=5,verify=1,method=TLS1,cipher=HIGH:-DES:-3DES:-EXPORT:-DH,compress=none,key=/var/lib/ganeti/server.pem,cert=/var/lib/ganeti/server.pem,cafile=/var/run/ganeti/import-export/import-disk0-2016-05-08_16_27_40-mDO5uf/ca,pf=ipv4 stdout

=== On nuc2: sender of image ===
/usr/bin/socat -ls -d -d -b1048576 -u stdin OPENSSL:10.10.0.238:46423,connect-timeout=20,retry=10,intervall=1,keepalive,keepidle=60,keepintvl=10,keepcnt=5,verify=1,method=TLS1,cipher=HIGH:-DES:-3DES:-EXPORT:-DH,compress=none,key=/var/lib/ganeti/server.pem,cert=/var/lib/ganeti/server.pem,cafile=/var/run/ganeti/import-export/export-disk0-2016-05-08_16_27_45-u0Sxcf/ca,pf=ipv4

OK: so that it seems that the sender is trying to connect by IP address (10.10.0.238 = nuc1), but socat fails because it the hostname in the certificate is "nuc1.ws.nsrc.org"

It says in the socat manpage:

NOTE: Up to version 1.7.2.4 the server certificate was only checked for validity against the system certificate store or cafile or capath, but not for match with the server’s name or its IP address. Since version 1.7.3.0 socat checks the peer certificate for match with the <host> parameter or the value of the openssl-com‐ monname option. Socat tries to match it against the certificates subject common‐ Name, and the certifications extension subjectAltName DNS names. Wildcards in the
              certificate are supported.

I think therefore that the connection should be made by name but not IP address, *or* the commonname option should be added.

It's possible that my cluster is in an invalid state as it has been through several upgrades, but I don't think so; it seems to work OK apart from instance move; gnt-cluster verify is fine. Here is the relevant config excerpt (reformatted via 'python -m json.tool')

=== /var/lib/ganeti/config.data ===

...
    "nodes": {
        "0b180aa9-2c45-4250-a54a-e56c41b5cc68": {
            "ctime": 1462607653.915405,
            "drained": false,
            "group": "4f283bc2-68c9-4392-9155-65a569e06a97",
            "master_candidate": true,
            "master_capable": true,
            "mtime": 1462607653.915405,
            "name": "nuc2.ws.nsrc.org",
            "ndparams": {},
            "offline": false,
            "powered": true,
            "primary_ip": "10.10.0.239",
            "secondary_ip": "10.10.0.239",
            "serial_no": 1,
            "tags": [],
            "uuid": "0b180aa9-2c45-4250-a54a-e56c41b5cc68",
            "vm_capable": true
        },
        "52cee655-b738-45ff-b920-b797840a6113": {
            "ctime": 1462717482.202958,
            "drained": false,
            "group": "4f283bc2-68c9-4392-9155-65a569e06a97",
            "master_candidate": true,
            "master_capable": true,
            "mtime": 1462717482.202958,
            "name": "nuc1.ws.nsrc.org",
            "ndparams": {},
            "offline": false,
            "powered": true,
            "primary_ip": "10.10.0.238",
            "secondary_ip": "10.10.0.238",
            "serial_no": 1,
            "tags": [],
            "uuid": "52cee655-b738-45ff-b920-b797840a6113",
            "vm_capable": true
        }
...

== ssconf_node_primary_ips ==

nuc1.ws.nsrc.org 10.10.0.238
nuc2.ws.nsrc.org 10.10.0.239

== ssconf_master_candidates_ips ==

10.10.0.239
10.10.0.238

--
You received this message because this project is configured to send all issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

Reply via email to