I am developing as cinder plugin and I recently faced one problem while running the cinder driver's certification test ( devstack/driver_certs/cinder_driver_cert.sh )
Here is what happened and my understanding of the problem (correct me if I am wrong) >From the test logs I can see the following information: {2} tempest.api.compute.volumes.test_attach_volume.AttachVolumeTestJSON.test_list_get_volume_attachments [238.283153s] ... FAILED ..... Captured traceback: ~~~~~~~~~~~~~~~~~~~ Traceback (most recent call last): File "tempest/api/compute/volumes/test_attach_volume.py", line 46, in _detach self.volumes_client.wait_for_volume_status(volume_id, 'available') File "tempest/services/volume/json/volumes_client.py", line 182, in wait_for_volume_status raise exceptions.TimeoutException(message) TimeoutException: Request timed out Details: Volume bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a failed to reach available status (current: detaching) within the required time (196 s). Traceback (most recent call last): _StringException: Empty attachments: stderr stdout when I check what the test case actually does, I see the following 44 if self.attachment: 45 self.servers_client.detach_volume(server_id, volume_id) 46 self.volumes_client.wait_for_volume_status(volume_id, 'available') So the volume failed to detach within the timeout period. To do simple the test that failed does the following: 1 attach a volume 2 wait for the volume to be attached ( by checking the volume status to be 'in-use') 3 detach the volume 4 wait for the volume to be detached ( by checking the volume status to be 'available') In nova-compute's log I see an Exception that seems to be the reason why we get this timeout. This happens when the detach request is received by nova-compute: 1226617609] Exception during message handling: <type 'NoneType'> can't be decoded 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last): 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher executor_callback)) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher executor_callback) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 414, in decorated_function 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/exception.py", line 88, in wrapped 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher payload) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in __exit__ 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/exception.py", line 71, in wrapped 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 298, in decorated_function 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher pass 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in __exit__ 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 283, in decorated_function 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 326, in decorated_function 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info()) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 82, in __exit__ 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 314, in decorated_function 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 4730, in detach_volume 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher self._detach_volume(context, instance, bdm) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 4671, in _detach_volume 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher connection_info = jsonutils.loads(bdm.connection_info) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 213, in loads 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher return json.loads(encodeutils.safe_decode(s, encoding), **kwargs) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in safe_decode 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher raise TypeError("%s can't be decoded" % type(text)) 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher TypeError: <type 'NoneType'> can't be decoded 2015-02-26 13:50:01.151 10931 TRACE oslo_messaging.rpc.dispatcher When I run only this specific test case, the problem does not occur. Hence I added some logs in nova-compute to find out the difference between these two executions. From those, I can see that the connection_info field is None when the Error is triggered. This log is printed when nova receive the detach request ( just before parsing the connection_info field ). 2015-02-26 13:50:00.572 AUDIT nova.compute.manager [req-9930623e-c073-4c5b-bcaf-f4098245277f AttachVolumeTestJSON-1822846528 AttachVolumeTestJSON-1226617609] print connection info : BlockDeviceMapping(boot_index=None,connection_info=None,created_at=2015-02-26T05:49:36Z,delete_on_termination=False,deleted=False,deleted_at=None,destination_type='volume',device_name='/dev/vdb',device_type=None,disk_bus=None,guest_format=None,id=166,image_id=None,instance=<?>,instance_uuid=9aeea261-30af-46eb-9fbb-53d03b65b43f,no_device=False,snapshot_id=None,source_type='volume',updated_at=None,volume_id='bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a',volume_size=None) Then I tried to figure out why this field was None and went back to the previous step : attach volume. From my basic understanding, when nova attach a volume, it will update two things ( source : nova/virt/libvirt/block_device.py) - 1 The volume status ( in cinder's database ) - 2 The connection_info ( after the call to virt_driver.attach_volume ) >From the logs in cinder (c-api.log ) The first time the volume status is mentioned as 'in-use' is at 13:49:59.491 in (cinder-api's log) 2-26 13:49:59.491 31396 INFO cinder.api.v1.volumes [req-dc56cd34-8ae3-42b0-8c35-8c01cd142955 690795b5c8c74e4fb6d292eeebef12d5 b49a40aa157e460592f3e3ac9a9f16eb - - -] vol={'migration_status': None, 'provider_id': None, 'availability_zone': u'nova', 'terminated_at': None, 'updated_at': datetime.datetime(2015, 2, 26, 5, 49, 58), 'provider_geometry': None, 'replication_extended_status': None, 'replication_status': u'disabled', 'snapshot_id': None, 'ec2_id': None, 'mountpoint': u'/dev/vdb', 'deleted_at': None, 'id': u'bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a', 'size': 1L, 'user_id': u'690795b5c8c74e4fb6d292eeebef12d5', 'attach_time': u'2015-02-26T05:49:52.128113', 'attached_host': None, 'display_description': None, 'volume_admin_metadata': u'[<cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x7fa32ed8cd10>, <cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x7fa32ef789d0>]', 'project_id': u'b49a40aa157e460592f3e3ac9a9f16eb', 'launched_at': datetime.datetime(2015, 2, 26, 5, 49, 35), 'scheduled_at': datetime.datetime(2015, 2, 26, 5, 49, 33), 'status': u'in-use', 'volume_type_id': None, 'deleted': False, 'provider_location': u'999993877', 'host': u'cwisr3s18#disco', 'consistencygroup_id': None, 'source_volid': None, 'provider_auth': None, 'display_name': u'test', 'instance_uuid': u'9aeea261-30af-46eb-9fbb-53d03b65b43f', 'bootable': False, 'created_at': datetime.datetime(2015, 2, 26, 5, 49, 32), 'attach_status': u'attached', 'volume_type': None, 'consistencygroup': None, 'volume_metadata': [], '_name_id': None, 'encryption_key_id': None, 'replication_driver_data': None, 'metadata': {u'readonly': u'False', u'attached_mode': u'rw'}} Which matches approximately the time the test script got the in-use status 2015-02-26 13:49:59,558 31513 DEBUG [tempest_lib.common.rest_client] Request (AttachVolumeTestJSON:test_list_get_volume_attachments): 200 GET http://10.203.18.1:8776/v1/b49a40aa157e460592f3e3ac9a9f16eb/volumes/bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a 0.099s Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'} Body: None Response - Headers: {'status': '200', 'content-length': '748', 'content-location': 'http://10.203.18.1:8776/v1/b49a40aa157e460592f3e3ac9a9f16eb/volumes/bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a', 'x-compute-request-id': 'req-dc56cd34-8ae3-42b0-8c35-8c01cd142955', 'connection': 'close', 'date': 'Thu, 26 Feb 2015 05:49:59 GMT', 'content-type': 'application/json', 'x-openstack-request-id': 'req-dc56cd34-8ae3-42b0-8c35-8c01cd142955'} Body: {"volume": {"status": "in-use", "display_name": "test", "attachments": [{"host_name": null, "device": "/dev/vdb", "server_id": "9aeea261-30af-46eb-9fbb-53d03b65b43f", "id": "bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a", "volume_id": "bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a"}], "availability_zone": "nova", "bootable": "false", "encrypted": false, "created_at": "2015-02-26T05:49:32.000000", "os-vol-tenant-attr:tenant_id": "b49a40aa157e460592f3e3ac9a9f16eb", "os-volume-replication:driver_data": null, "display_description": null, "os-volume-replication:extended_status": null, "volume_type": null, "snapshot_id": null, "source_volid": null, "metadata": {"readonly": "False", "attached_mode": "rw"}, "id": "bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a", "size": 1}} After that the test script will call detach volume and we will face the error previously mentioned Besides I also added one log at the very last line of the function attach in nova/virt/block_device.py in order to check the connection info @update_db def attach(self, context, instance, volume_api, virt_driver, do_check_attach=True, do_driver_attach=False): >From Nova-cpu's log I can see that the log is displayed at 2015-02-26 13:50:02.003 AUDIT nova.virt.block_device [req-fd8124ca-d608-44a0-a1b9-7cb933fd840f AttachVolumeTestJSON-1822846528 AttachVolumeTestJSON-1226617609] test thelo 7 {u'driver_volume_type': u'disco', 'serial': u'bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a', u'data': {'device_path': u'/dev/dms999993877', u'access_mode': u'rw', u'disco_id': u'999993877', u'qos_specs': None, u'name': u'volume-bfcd0dd4-2abd-4c06-b8b8-e05d77306d6a'}} Which means that the detach_volume is called before attach_volume functions actually ends and potentially update the connection_info field in the database. To confirm this logic, I added a "sleep" before we detach the volume in tempest test and run the test again. This time the test passed. Does this logic seem correct? If yes, is that a bug and shouldn't we guarantee that the connection_info is saved into the database before we update the volume status? Thanks for taking the time to read this, Thelo ==================================================================== This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev