[ https://issues.apache.org/jira/browse/CLOUDSTACK-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joris van Lieshout updated CLOUDSTACK-692: ------------------------------------------ Description: Hi there, I think we ran into a bug due to a concurrence of circumstances regarding snapshotting and the cleanup of snapshots. The CleanupSnapshotBackup process on the SSVM deletes vhd files that are not known in the database but when, especially long running snapshot, are being copied to secondary storeage there is a gap between the start and finish of the VDI-copy, where the uuid of the destination vhd is not registered in the database. If the CleanupSnapshotBackup deletes the destinaion vhd during this window it results in hanging sparse_dd process on the XenServer hypervisor pointing to a tapdisk2 process with no file behind it. ===Secondary storage vm (2 hour time difference due to time zone). second to last line you see the vhd being deleted. 2013-09-04 03:14:45,721 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:) Request:Seq 261-1870805144: { Cmd , MgmtId: 345052370018, via: 261, Ver: v1, Flags: 100011, [{"CleanupSnapshotBackupCommand":{"secondaryStoragePoolURL":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"validBackupUUIDs":["1a56760b-d1c0-4620-8cf7-271951500d70","b6157bc9-085b-4ed6-95c2-4341f31c64bf","1ff967e3-3606-4112-9155-b1145b2ef576","12fbe4e3-1fdd-4c35-a961-0fce07cff584","278e9915-4f94-40c8-bef4-9c6bc82d4653","6fba1dd7-4736-47b3-9eed-148304c0e192","b9d8c9d8-6445-463b-b4e1-ab3b3f3a67a2","40ba5d72-c69a-46c2-973b-0570c1cabeac","774f2b0e-cdaf-4594-a9f9-4f872dcaad6e","8269f50b-6bec-427c-8186-540df6a75dbf","7b0c6e75-40cf-4dd7-826a-09b39f3da7b5","df7eac9c-137a-4655-9d21-d781916351f1","11ec2db1-a2fc-4221-ae1a-c1ab2bd59509","dfc348e1-af50-4d77-b4a0-6e86fc954e1c","98f64c0f-7498-4c70-8b70-beaefd723b45","c42f9dd5-079d-4b77-86dc-c19b7fbed817"],"wait":0}}] } 2013-09-04 03:14:45,722 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:) Processing command: com.cloud.agent.api.CleanupSnapshotBackupCommand 2013-09-04 03:14:45,723 DEBUG [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-2:) Executing: mount 2013-09-04 03:14:45,732 DEBUG [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-2:) Execution is successful. 2013-09-04 03:14:45,772 WARN [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-2:) snapshot 8ca9fea4-8a98-4cc3-bba7-cc1dcf32bb24.vhd is not recorded in DB, remove it 2013-09-04 03:14:45,772 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:) Seq 261-1870805144: { Ans: , MgmtId: 345052370018, via: 261, Ver: v1, Flags: 10, [{"Answer":{"result":true,"wait":0}}] } ==== management-server.log. here you see the snapshot being created, the copyToSecStorage process starting, eventually timing out due to the hanging vdi-copy, failing on retrying because vdi in use (although not existing any more the vdi is still know on xen), retrying some more on another HV and eventuall giving up because it tries to create a duplicate SR. 2013-09-04 04:27:10,931 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-69:job-95137) Executing com.cloud.api.commands.CreateSnapshotCmd for job-95137 2013-09-04 04:27:10,971 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 91-780303147: Sending { Cmd , MgmtId: 345052370017, via: 91, Ver: v1, Flags: 100011, [{"ManageSnapshotCommand":{"_commandSwitch":"-c","_volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","_pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"_snapshotName":"vlstws3_ROOT-2736_20130904022710","_snapshotId":71889,"_vmName":"i-45-2736-VM","wait":0}}] } 2013-09-04 04:27:10,971 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 91-780303147: Executing: { Cmd , MgmtId: 345052370017, via: 91, Ver: v1, Flags: 100011, [{"ManageSnapshotCommand":{"_commandSwitch":"-c","_volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","_pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"_snapshotName":"vlstws3_ROOT-2736_20130904022710","_snapshotId":71889,"_vmName":"i-45-2736-VM","wait":0}}] } 2013-09-04 04:27:12,949 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 91-780303147: Received: { Ans: , MgmtId: 345052370017, via: 91, Ver: v1, Flags: 10, { ManageSnapshotAnswer } } 2013-09-04 04:27:12,991 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 91-780303148: Sending { Cmd , MgmtId: 345052370017, via: 91, Ver: v1, Flags: 100011, [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] } 2013-09-04 04:27:12,991 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 91-780303148: Executing: { Cmd , MgmtId: 345052370017, via: 91, Ver: v1, Flags: 100011, [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] } 2013-09-04 04:27:12,992 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-441:null) Seq 91-780303148: Executing request 2013-09-04 04:27:13,561 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-441:null) Creating a not shared SR for nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1/snapshots/45/5863 2013-09-04 04:27:14,499 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-441:null) Checking 00e04f60-4c80-37e4-9abe-855b16fbc9e1 or SR a4ca3b8c-6e6d-5f07-cb71-f6c9b7f950cb on XS[d868bf32-d747-4ec0-b1c2-fba3df1c85bd-192.168.52.15] 2013-09-04 04:27:14,512 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-441:null) Host 192.168.52.15 OpaqueRef:5ee9850b-b15c-b39b-dd27-ff226a1b3126: Created a SR; UUID is a4ca3b8c-6e6d-5f07-cb71-f6c9b7f950cb device config is {serverpath=/volumes/pool0/MCCP-SHARED-1-1/snapshots/45/5863, server=mccpnas7.storage.mccp.mcinfra.net} 2013-09-04 10:27:12,992 DEBUG [agent.manager.AgentAttache] (Job-Executor-69:job-95137) Seq 91-780303148: Waiting some more time because this is the current command 2013-09-04 10:27:34,262 WARN [xen.resource.CitrixResourceBase] (DirectAgent-441:null) Async 21600 seconds timeout for task com.xensource.xenapi.Task@ec1aac0e 2013-09-04 10:27:34,585 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-441:null) Host 192.168.52.15 OpaqueRef:5ee9850b-b15c-b39b-dd27-ff226a1b3126: Removing SR 2013-09-04 10:27:34,624 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-441:null) Host 192.168.52.15 OpaqueRef:5ee9850b-b15c-b39b-dd27-ff226a1b3126: Catch XenAPIException: This operation cannot be performed because this VDI is in use by some other operation 2013-09-04 10:27:34,650 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-441:null) Host 192.168.52.15 OpaqueRef:5ee9850b-b15c-b39b-dd27-ff226a1b3126: Catch XenAPIException: This operation cannot be performed because this VDI is in use by some other operation 2013-09-04 10:27:34,650 WARN [xen.resource.CitrixResourceBase] (DirectAgent-441:null) Host 192.168.52.15 OpaqueRef:5ee9850b-b15c-b39b-dd27-ff226a1b3126: Unable to remove SR 2013-09-04 10:27:34,650 WARN [xen.resource.CitrixResourceBase] (DirectAgent-441:null) BackupSnapshot Failed due to Async 21600 seconds timeout for task com.xensource.xenapi.Task@ec1aac0e 2013-09-04 10:27:34,651 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-441:null) Seq 91-780303148: Response Received: 2013-09-04 10:27:34,651 DEBUG [agent.transport.Request] (DirectAgent-441:null) Seq 91-780303148: Processing: { Ans: , MgmtId: 345052370017, via: 91, Ver: v1, Flags: 10, [{"BackupSnapshotAnswer":{"full":true,"result":false,"details":"BackupSnapshot Failed due to Async 21600 seconds timeout for task com.xensource.xenapi.Task@ec1aac0e","wait":0}}] } 2013-09-04 10:27:34,651 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 91-780303148: Received: { Ans: , MgmtId: 345052370017, via: 91, Ver: v1, Flags: 10, { BackupSnapshotAnswer } } 2013-09-04 10:27:34,651 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-69:job-95137) The result for com.cloud.agent.api.BackupSnapshotCommand is BackupSnapshot Failed due to Async 21600 seconds timeout for task com.xensource.xenapi.Task@ec1aac0e through 91 2013-09-04 10:29:34,652 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-69:job-95137) Retrying... 2013-09-04 10:29:34,659 DEBUG [agent.manager.ClusteredAgentAttache] (Job-Executor-69:job-95137) Seq 89-1251159458: Forwarding Seq 89-1251159458: { Cmd , MgmtId: 345052370017, via: 89, Ver: v1, Flags: 100011, [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] } to 345052370018 2013-09-04 10:29:43,085 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 89-1251159458: Received: { Ans: , MgmtId: 345052370017, via: 89, Ver: v1, Flags: 10, { BackupSnapshotAnswer } } 2013-09-04 10:29:43,086 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-69:job-95137) The result for com.cloud.agent.api.BackupSnapshotCommand is BackupSnapshot Failed due to Can not create second storage SR mountpoint: mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1/snapshots/45/5863 due to SR_BACKEND_FAILURE_40The SR scan failed [opterr=['INTERNAL_ERROR', 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", "df7eac9c-137a-4655-9d21-d781916351f1")']] through 89 2013-09-04 10:30:17,489 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-441:null) Ping from 109 2013-09-04 10:31:18,345 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-441:null) Seq 95-1001324546: Executing request 2013-09-04 10:31:18,642 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-441:null) Seq 95-1001324546: Response Received: 2013-09-04 10:31:18,642 DEBUG [agent.transport.Request] (DirectAgent-441:null) Seq 95-1001324546: Processing: { Ans: , MgmtId: 345052370017, via: 95, Ver: v1, Flags: 10, [{"Answer":{"result":true,"wait":0}}] } 2013-09-04 10:31:19,423 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-441:null) Seq 41-1598291970: Executing request 2013-09-04 10:31:19,663 DEBUG [agent.manager.DirectAgentAttache] (DirectAgent-441:null) Seq 41-1598291970: Response Received: 2013-09-04 10:31:19,663 DEBUG [agent.transport.Request] (DirectAgent-441:null) Seq 41-1598291970: Processing: { Ans: , MgmtId: 345052370017, via: 41, Ver: v1, Flags: 10, [{"Answer":{"result":true,"wait":0}}] } 2013-09-04 10:31:43,086 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-69:job-95137) Retrying... 2013-09-04 10:31:43,091 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 93-156960768: Sending { Cmd , MgmtId: 345052370017, via: 93, Ver: v1, Flags: 100011, [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] } 2013-09-04 10:31:43,091 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 93-156960768: Executing: { Cmd , MgmtId: 345052370017, via: 93, Ver: v1, Flags: 100011, [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] } 2013-09-04 10:31:56,375 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 93-156960768: Received: { Ans: , MgmtId: 345052370017, via: 93, Ver: v1, Flags: 10, { BackupSnapshotAnswer } } 2013-09-04 10:31:56,376 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-69:job-95137) The result for com.cloud.agent.api.BackupSnapshotCommand is BackupSnapshot Failed due to Can not create second storage SR mountpoint: mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1/snapshots/45/5863 due to SR_BACKEND_FAILURE_40The SR scan failed [opterr=['INTERNAL_ERROR', 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", "df7eac9c-137a-4655-9d21-d781916351f1")']] through 93 2013-09-04 10:33:56,376 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-69:job-95137) Retrying... 2013-09-04 10:33:56,381 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 95-1001356346: Sending { Cmd , MgmtId: 345052370017, via: 95, Ver: v1, Flags: 100011, [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] } 2013-09-04 10:33:56,381 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 95-1001356346: Executing: { Cmd , MgmtId: 345052370017, via: 95, Ver: v1, Flags: 100011, [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] } 2013-09-04 10:34:13,682 DEBUG [agent.transport.Request] (Job-Executor-69:job-95137) Seq 95-1001356346: Received: { Ans: , MgmtId: 345052370017, via: 95, Ver: v1, Flags: 10, { BackupSnapshotAnswer } } 2013-09-04 10:34:13,683 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-69:job-95137) The result for com.cloud.agent.api.BackupSnapshotCommand is BackupSnapshot Failed due to Can not create second storage SR mountpoint: mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1/snapshots/45/5863 due to SR_BACKEND_FAILURE_40The SR scan failed [opterr=['INTERNAL_ERROR', 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", "df7eac9c-137a-4655-9d21-d781916351f1")']] through 95 2013-09-04 10:36:13,683 DEBUG [storage.snapshot.SnapshotManagerImpl] (Job-Executor-69:job-95137) Retrying... 2013-09-04 10:36:13,685 WARN [storage.snapshot.SnapshotManagerImpl] (Job-Executor-69:job-95137) Storage unavailable 2013-09-04 10:36:13,685 WARN [storage.snapshot.SnapshotManagerImpl] (Job-Executor-69:job-95137) Failed to back up snapshot on secondary storage, deleting the record from the DB 2013-09-04 10:36:13,702 ERROR [cloud.api.ApiDispatcher] (Job-Executor-69:job-95137) Exception while executing CreateSnapshotCmd: 2013-09-04 10:36:13,702 DEBUG [cloud.async.AsyncJobManagerImpl] (Job-Executor-69:job-95137) Complete async job-95137, jobStatus: 2, resultCode: 530, result: Error Code: 530 Error text: Created snapshot: com.cloud.storage.SnapshotVO$$EnhancerByCGLIB$$3173e384@40abfe02 on primary but failed to backup on secondary was: Hi there, I think we ran into a bug due to a concurrence of circumstances regarding snapshotting and the cleanup of snapshots. The StorageManager-Scavenger instructs the StorageVM to delete a snapshot that is still in the process of being created on a hypervisor at that time when the volume has older snapshots that do need scavenging. ==== The SR gets mounted for the snapshot to be created on. 2012-12-16 08:02:53,831 DEBUG [xen.resource.CitrixResourceBase] (DirectAgent-293:null) Host 192.168.###.42 OpaqueRef:fae7f8be-8cf1-7b84-3d30-7202e172b530: Created a SR; UUID is 1f7530d8-4615-c220-7f37-0 5862ddbfe3b device config is {serverpath=/pool0/####-###-dc-1-sec1/snapshots/163/1161, server=192.168.###.14} ==== The SMlog on the xenserver show that at this time the snapshot is still being created. 2012-12-16 08:37:08,768 DEBUG [agent.transport.Request] (StorageManager-Scavenger-1:null) Seq 159-1958616345: Sending { Cmd , MgmtId: 345052433504, via: 159, Ver: v1, Flags: 100011, [{"CleanupSnapshot BackupCommand":{"secondaryStoragePoolURL":"nfs://192.168.###.14/pool0/####-###-dc-1-sec1","dcId":2,"accountId":163,"volumeId":1161,"validBackupUUIDs":["b714a0ee-406e-4100-a75d-bc594391dca9","209bc1dd-f6 1a-486c-aecf-335590a907eb"],"wait":0}}] } ==== At this time we start seeing tapdisk errors on the XenServer indicating that the vhd file is gone. Dec 16 08:37:08 ####vm8 tapdisk[26553]: ERROR: errno -116 at vhd_complete: /var/run/sr-mount/1f7530d8-4615-c220-7f37-05862ddbfe3b/073893a6-e9cb-4cf6-8070-c6cf771db5d7.vhd: op: 2, lsec: 448131408, secs: 88, nbytes: 45056, blk: 109407, blk_offset: 330368935 Dec 16 08:37:08 ####vm8 tapdisk[26553]: ERROR: errno -116 at vhd_complete: /var/run/sr-mount/1f7530d8-4615-c220-7f37-05862ddbfe3b/073893a6-e9cb-4cf6-8070-c6cf771db5d7.vhd: op: 2, lsec: 448131496, secs: 40, nbytes: 20480, blk: 109407, blk_offset: 330368935 Dec 16 08:37:08 ####vm8 tapdisk[26553]: ERROR: errno -116 at vhd_complete: /var/run/sr-mount/1f7530d8-4615-c220-7f37-05862ddbfe3b/073893a6-e9cb-4cf6-8070-c6cf771db5d7.vhd: op: 4, lsec: 448131072, secs: 1, nbytes: 512, blk: 109407, blk_offset: 330368935 Dec 16 08:37:08 ####vm8 tapdisk[26553]: ERROR: errno -116 at __tapdisk_vbd_complete_td_request: req tap-77.0: write 0x0058 secs @ 0x1ab5f150 - Stale NFS file handle Dec 16 08:37:08 ####vm8 tapdisk[26553]: ERROR: errno -116 at __tapdisk_vbd_complete_td_request: req tap-77.1: write 0x0028 secs @ 0x1ab5f1a8 - Stale NFS file handle > The StorageManager-Scavenger deletes snapshots that are still in the process > of being created. > ---------------------------------------------------------------------------------------------- > > Key: CLOUDSTACK-692 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-692 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Snapshot > Reporter: Joris van Lieshout > Priority: Minor > > Hi there, > I think we ran into a bug due to a concurrence of circumstances regarding > snapshotting and the cleanup of snapshots. > The CleanupSnapshotBackup process on the SSVM deletes vhd files that are not > known in the database but when, especially long running snapshot, are being > copied to secondary storeage there is a gap between the start and finish of > the VDI-copy, where the uuid of the destination vhd is not registered in the > database. If the CleanupSnapshotBackup deletes the destinaion vhd during this > window it results in hanging sparse_dd process on the XenServer hypervisor > pointing to a tapdisk2 process with no file behind it. > ===Secondary storage vm (2 hour time difference due to time zone). second to > last line you see the vhd being deleted. > 2013-09-04 03:14:45,721 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:) > Request:Seq 261-1870805144: { Cmd , MgmtId: 345052370018, via: 261, Ver: v1, > Flags: 100011, > [{"CleanupSnapshotBackupCommand":{"secondaryStoragePoolURL":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"validBackupUUIDs":["1a56760b-d1c0-4620-8cf7-271951500d70","b6157bc9-085b-4ed6-95c2-4341f31c64bf","1ff967e3-3606-4112-9155-b1145b2ef576","12fbe4e3-1fdd-4c35-a961-0fce07cff584","278e9915-4f94-40c8-bef4-9c6bc82d4653","6fba1dd7-4736-47b3-9eed-148304c0e192","b9d8c9d8-6445-463b-b4e1-ab3b3f3a67a2","40ba5d72-c69a-46c2-973b-0570c1cabeac","774f2b0e-cdaf-4594-a9f9-4f872dcaad6e","8269f50b-6bec-427c-8186-540df6a75dbf","7b0c6e75-40cf-4dd7-826a-09b39f3da7b5","df7eac9c-137a-4655-9d21-d781916351f1","11ec2db1-a2fc-4221-ae1a-c1ab2bd59509","dfc348e1-af50-4d77-b4a0-6e86fc954e1c","98f64c0f-7498-4c70-8b70-beaefd723b45","c42f9dd5-079d-4b77-86dc-c19b7fbed817"],"wait":0}}] > } > 2013-09-04 03:14:45,722 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:) > Processing command: com.cloud.agent.api.CleanupSnapshotBackupCommand > 2013-09-04 03:14:45,723 DEBUG [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:) Executing: mount > 2013-09-04 03:14:45,732 DEBUG [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:) Execution is successful. > 2013-09-04 03:14:45,772 WARN [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:) snapshot 8ca9fea4-8a98-4cc3-bba7-cc1dcf32bb24.vhd > is not recorded in DB, remove it > 2013-09-04 03:14:45,772 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:) > Seq 261-1870805144: { Ans: , MgmtId: 345052370018, via: 261, Ver: v1, Flags: > 10, [{"Answer":{"result":true,"wait":0}}] } > ==== management-server.log. here you see the snapshot being created, the > copyToSecStorage process starting, eventually timing out due to the hanging > vdi-copy, failing on retrying because vdi in use (although not existing any > more the vdi is still know on xen), retrying some more on another HV and > eventuall giving up because it tries to create a duplicate SR. > 2013-09-04 04:27:10,931 DEBUG [cloud.async.AsyncJobManagerImpl] > (Job-Executor-69:job-95137) Executing > com.cloud.api.commands.CreateSnapshotCmd for job-95137 > 2013-09-04 04:27:10,971 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 91-780303147: Sending { Cmd , MgmtId: > 345052370017, via: 91, Ver: v1, Flags: 100011, > [{"ManageSnapshotCommand":{"_commandSwitch":"-c","_volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","_pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"_snapshotName":"vlstws3_ROOT-2736_20130904022710","_snapshotId":71889,"_vmName":"i-45-2736-VM","wait":0}}] > } > 2013-09-04 04:27:10,971 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 91-780303147: Executing: { Cmd , MgmtId: > 345052370017, via: 91, Ver: v1, Flags: 100011, > [{"ManageSnapshotCommand":{"_commandSwitch":"-c","_volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","_pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"_snapshotName":"vlstws3_ROOT-2736_20130904022710","_snapshotId":71889,"_vmName":"i-45-2736-VM","wait":0}}] > } > 2013-09-04 04:27:12,949 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 91-780303147: Received: { Ans: , MgmtId: > 345052370017, via: 91, Ver: v1, Flags: 10, { ManageSnapshotAnswer } } > 2013-09-04 04:27:12,991 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 91-780303148: Sending { Cmd , MgmtId: > 345052370017, via: 91, Ver: v1, Flags: 100011, > [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] > } > 2013-09-04 04:27:12,991 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 91-780303148: Executing: { Cmd , MgmtId: > 345052370017, via: 91, Ver: v1, Flags: 100011, > [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] > } > 2013-09-04 04:27:12,992 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-441:null) Seq 91-780303148: Executing request > 2013-09-04 04:27:13,561 DEBUG [xen.resource.CitrixResourceBase] > (DirectAgent-441:null) Creating a not shared SR for > nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1/snapshots/45/5863 > 2013-09-04 04:27:14,499 DEBUG [xen.resource.CitrixResourceBase] > (DirectAgent-441:null) Checking 00e04f60-4c80-37e4-9abe-855b16fbc9e1 or SR > a4ca3b8c-6e6d-5f07-cb71-f6c9b7f950cb on > XS[d868bf32-d747-4ec0-b1c2-fba3df1c85bd-192.168.52.15] > 2013-09-04 04:27:14,512 DEBUG [xen.resource.CitrixResourceBase] > (DirectAgent-441:null) Host 192.168.52.15 > OpaqueRef:5ee9850b-b15c-b39b-dd27-ff226a1b3126: Created a SR; UUID is > a4ca3b8c-6e6d-5f07-cb71-f6c9b7f950cb device config is > {serverpath=/volumes/pool0/MCCP-SHARED-1-1/snapshots/45/5863, > server=mccpnas7.storage.mccp.mcinfra.net} > 2013-09-04 10:27:12,992 DEBUG [agent.manager.AgentAttache] > (Job-Executor-69:job-95137) Seq 91-780303148: Waiting some more time because > this is the current command > 2013-09-04 10:27:34,262 WARN [xen.resource.CitrixResourceBase] > (DirectAgent-441:null) Async 21600 seconds timeout for task > com.xensource.xenapi.Task@ec1aac0e > 2013-09-04 10:27:34,585 DEBUG [xen.resource.CitrixResourceBase] > (DirectAgent-441:null) Host 192.168.52.15 > OpaqueRef:5ee9850b-b15c-b39b-dd27-ff226a1b3126: Removing SR > 2013-09-04 10:27:34,624 DEBUG [xen.resource.CitrixResourceBase] > (DirectAgent-441:null) Host 192.168.52.15 > OpaqueRef:5ee9850b-b15c-b39b-dd27-ff226a1b3126: Catch XenAPIException: This > operation cannot be performed because this VDI is in use by some other > operation > 2013-09-04 10:27:34,650 DEBUG [xen.resource.CitrixResourceBase] > (DirectAgent-441:null) Host 192.168.52.15 > OpaqueRef:5ee9850b-b15c-b39b-dd27-ff226a1b3126: Catch XenAPIException: This > operation cannot be performed because this VDI is in use by some other > operation > 2013-09-04 10:27:34,650 WARN [xen.resource.CitrixResourceBase] > (DirectAgent-441:null) Host 192.168.52.15 > OpaqueRef:5ee9850b-b15c-b39b-dd27-ff226a1b3126: Unable to remove SR > 2013-09-04 10:27:34,650 WARN [xen.resource.CitrixResourceBase] > (DirectAgent-441:null) BackupSnapshot Failed due to Async 21600 seconds > timeout for task com.xensource.xenapi.Task@ec1aac0e > 2013-09-04 10:27:34,651 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-441:null) Seq 91-780303148: Response Received: > 2013-09-04 10:27:34,651 DEBUG [agent.transport.Request] > (DirectAgent-441:null) Seq 91-780303148: Processing: { Ans: , MgmtId: > 345052370017, via: 91, Ver: v1, Flags: 10, > [{"BackupSnapshotAnswer":{"full":true,"result":false,"details":"BackupSnapshot > Failed due to Async 21600 seconds timeout for task > com.xensource.xenapi.Task@ec1aac0e","wait":0}}] } > 2013-09-04 10:27:34,651 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 91-780303148: Received: { Ans: , MgmtId: > 345052370017, via: 91, Ver: v1, Flags: 10, { BackupSnapshotAnswer } } > 2013-09-04 10:27:34,651 DEBUG [storage.snapshot.SnapshotManagerImpl] > (Job-Executor-69:job-95137) The result for > com.cloud.agent.api.BackupSnapshotCommand is BackupSnapshot Failed due to > Async 21600 seconds timeout for task com.xensource.xenapi.Task@ec1aac0e > through 91 > 2013-09-04 10:29:34,652 DEBUG [storage.snapshot.SnapshotManagerImpl] > (Job-Executor-69:job-95137) Retrying... > 2013-09-04 10:29:34,659 DEBUG [agent.manager.ClusteredAgentAttache] > (Job-Executor-69:job-95137) Seq 89-1251159458: Forwarding Seq 89-1251159458: > { Cmd , MgmtId: 345052370017, via: 89, Ver: v1, Flags: 100011, > [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] > } to 345052370018 > 2013-09-04 10:29:43,085 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 89-1251159458: Received: { Ans: , MgmtId: > 345052370017, via: 89, Ver: v1, Flags: 10, { BackupSnapshotAnswer } } > 2013-09-04 10:29:43,086 DEBUG [storage.snapshot.SnapshotManagerImpl] > (Job-Executor-69:job-95137) The result for > com.cloud.agent.api.BackupSnapshotCommand is BackupSnapshot Failed due to Can > not create second storage SR mountpoint: > mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1/snapshots/45/5863 > due to SR_BACKEND_FAILURE_40The SR scan failed [opterr=['INTERNAL_ERROR', > 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", > "df7eac9c-137a-4655-9d21-d781916351f1")']] through 89 > 2013-09-04 10:30:17,489 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-441:null) Ping from 109 > 2013-09-04 10:31:18,345 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-441:null) Seq 95-1001324546: Executing request > 2013-09-04 10:31:18,642 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-441:null) Seq 95-1001324546: Response Received: > 2013-09-04 10:31:18,642 DEBUG [agent.transport.Request] > (DirectAgent-441:null) Seq 95-1001324546: Processing: { Ans: , MgmtId: > 345052370017, via: 95, Ver: v1, Flags: 10, > [{"Answer":{"result":true,"wait":0}}] } > 2013-09-04 10:31:19,423 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-441:null) Seq 41-1598291970: Executing request > 2013-09-04 10:31:19,663 DEBUG [agent.manager.DirectAgentAttache] > (DirectAgent-441:null) Seq 41-1598291970: Response Received: > 2013-09-04 10:31:19,663 DEBUG [agent.transport.Request] > (DirectAgent-441:null) Seq 41-1598291970: Processing: { Ans: , MgmtId: > 345052370017, via: 41, Ver: v1, Flags: 10, > [{"Answer":{"result":true,"wait":0}}] } > 2013-09-04 10:31:43,086 DEBUG [storage.snapshot.SnapshotManagerImpl] > (Job-Executor-69:job-95137) Retrying... > 2013-09-04 10:31:43,091 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 93-156960768: Sending { Cmd , MgmtId: > 345052370017, via: 93, Ver: v1, Flags: 100011, > [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] > } > 2013-09-04 10:31:43,091 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 93-156960768: Executing: { Cmd , MgmtId: > 345052370017, via: 93, Ver: v1, Flags: 100011, > [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] > } > 2013-09-04 10:31:56,375 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 93-156960768: Received: { Ans: , MgmtId: > 345052370017, via: 93, Ver: v1, Flags: 10, { BackupSnapshotAnswer } } > 2013-09-04 10:31:56,376 DEBUG [storage.snapshot.SnapshotManagerImpl] > (Job-Executor-69:job-95137) The result for > com.cloud.agent.api.BackupSnapshotCommand is BackupSnapshot Failed due to Can > not create second storage SR mountpoint: > mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1/snapshots/45/5863 > due to SR_BACKEND_FAILURE_40The SR scan failed [opterr=['INTERNAL_ERROR', > 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", > "df7eac9c-137a-4655-9d21-d781916351f1")']] through 93 > 2013-09-04 10:33:56,376 DEBUG [storage.snapshot.SnapshotManagerImpl] > (Job-Executor-69:job-95137) Retrying... > 2013-09-04 10:33:56,381 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 95-1001356346: Sending { Cmd , MgmtId: > 345052370017, via: 95, Ver: v1, Flags: 100011, > [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] > } > 2013-09-04 10:33:56,381 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 95-1001356346: Executing: { Cmd , MgmtId: > 345052370017, via: 95, Ver: v1, Flags: 100011, > [{"BackupSnapshotCommand":{"isVolumeInactive":false,"vmName":"i-45-2736-VM","snapshotId":71889,"pool":{"id":208,"uuid":"b290385b-466d-3243-a939-3d242164e034","host":"mccpnas3-4-vip1.mccp.mcinfra.net","path":"/volumes/pool0/MCCP-S-SBP1-1_MCCP-XEN-1","port":2049,"type":"NetworkFilesystem"},"primaryStoragePoolNameLabel":"b290385b-466d-3243-a939-3d242164e034","snapshotUuid":"6396ec81-4128-44ca-a121-d9f1bf789d6c","snapshotName":"vlstws3_ROOT-2736_20130904022710","secondaryStorageUrl":"nfs://mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1","dcId":1,"accountId":45,"volumeId":5863,"volumePath":"9cb7af90-ca88-4b34-aa6f-bc21c3d4a3aa","wait":21600}}] > } > 2013-09-04 10:34:13,682 DEBUG [agent.transport.Request] > (Job-Executor-69:job-95137) Seq 95-1001356346: Received: { Ans: , MgmtId: > 345052370017, via: 95, Ver: v1, Flags: 10, { BackupSnapshotAnswer } } > 2013-09-04 10:34:13,683 DEBUG [storage.snapshot.SnapshotManagerImpl] > (Job-Executor-69:job-95137) The result for > com.cloud.agent.api.BackupSnapshotCommand is BackupSnapshot Failed due to Can > not create second storage SR mountpoint: > mccpnas7.storage.mccp.mcinfra.net/volumes/pool0/MCCP-SHARED-1-1/snapshots/45/5863 > due to SR_BACKEND_FAILURE_40The SR scan failed [opterr=['INTERNAL_ERROR', > 'Db_exn.Uniqueness_constraint_violation("VDI", "uuid", > "df7eac9c-137a-4655-9d21-d781916351f1")']] through 95 > 2013-09-04 10:36:13,683 DEBUG [storage.snapshot.SnapshotManagerImpl] > (Job-Executor-69:job-95137) Retrying... > 2013-09-04 10:36:13,685 WARN [storage.snapshot.SnapshotManagerImpl] > (Job-Executor-69:job-95137) Storage unavailable > 2013-09-04 10:36:13,685 WARN [storage.snapshot.SnapshotManagerImpl] > (Job-Executor-69:job-95137) Failed to back up snapshot on secondary storage, > deleting the record from the DB > 2013-09-04 10:36:13,702 ERROR [cloud.api.ApiDispatcher] > (Job-Executor-69:job-95137) Exception while executing CreateSnapshotCmd: > 2013-09-04 10:36:13,702 DEBUG [cloud.async.AsyncJobManagerImpl] > (Job-Executor-69:job-95137) Complete async job-95137, jobStatus: 2, > resultCode: 530, result: Error Code: 530 Error text: Created snapshot: > com.cloud.storage.SnapshotVO$$EnhancerByCGLIB$$3173e384@40abfe02 on primary > but failed to backup on secondary -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira