tschacara opened a new issue, #7593:
URL: https://github.com/apache/cloudstack/issues/7593
ISSUE TYPE
Bug Report
COMPONENT NAME
Management, Storage, KVM
CLOUDSTACK VERSION
4.17.2.0
CONFIGURATION
1 X Management Server
2 X Primary Storage“s
1 X Secondary Storage
3 X KVM Server.
Advanced Network - Default Security Group.
OS / ENVIRONMENT
Both Management, KVM and Storage“s run Rocklinux 8.7
SUMMARY
Recurring volume snapshots fails and times out after 3600 seconds.
STEPS TO REPRODUCE
Scheduler DATA Volume Snapshot VM (1TB) while VM is online.
LOG HERE
~~~
2023-06-05 15:56:57,212 DEBUG [c.c.s.s.SnapshotSchedulerImpl]
(SnapshotPollTask:ctx-f75b0142) (logid:1000c112) Got 0 snapshots to be executed
at 2023-06-05 18:56:57 GMT
2023-06-05 15:56:59,143 DEBUG
[o.a.c.s.d.d.CloudStackPrimaryDataStoreDriverImpl]
(Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1)
(logid:8443133e) Failed to take snapshot: 375
at
org.apache.cloudstack.storage.snapshot.SnapshotServiceImpl.takeSnapshot(SnapshotServiceImpl.java:208)
at
org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.takeSnapshot(DefaultSnapshotStrategy.java:439)
at
com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1240)
2023-06-05 15:56:59,147 DEBUG [o.a.c.s.s.SnapshotServiceImpl]
(Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1)
(logid:8443133e) create snapshot FileServer_DATA-414_20230605175657 failed:
com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to
Agent:15, com.cloud.exception.OperationTimedoutException: Commands
4851502698584866841 to Host 15 timed out after 3600
2023-06-05 15:56:59,173 DEBUG [o.a.c.s.s.DefaultSnapshotStrategy]
(Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1)
(logid:8443133e) Failed to take snapshot:
com.cloud.utils.exception.CloudRuntimeException: Failed to send command, due to
Agent:15, com.cloud.exception.OperationTimedoutException: Commands
4851502698584866841 to Host 15 timed out after 3600
2023-06-05 15:56:59,188 DEBUG [c.c.s.s.SnapshotManagerImpl]
(Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1)
(logid:8443133e) Failed to create
snapshotcom.cloud.utils.exception.CloudRuntimeException: Failed to send
command, due to Agent:15, com.cloud.exception.OperationTimedoutException:
Commands 4851502698584866841 to Host 15 timed out after 3600
2023-06-05 15:56:59,200 DEBUG [c.c.r.ResourceLimitManagerImpl]
(Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1)
(logid:8443133e) Updating resource Type = snapshot count for Account = 219
Operation = decreasing Amount = 1
2023-06-05 15:56:59,232 ERROR [o.a.c.s.v.VolumeServiceImpl]
(Work-Job-Executor-1:ctx-b15acf5a job-15633/job-15636 ctx-a16b6ca1)
(logid:8443133e) Take snapshot: 1191 failed
at
org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.takeSnapshot(DefaultSnapshotStrategy.java:442)
at
com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1240)
at
org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.takeSnapshot(DefaultSnapshotStrategy.java:442)
at
com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1240)
2023-06-05 15:56:59,303 ERROR [o.a.c.a.c.u.s.CreateSnapshotCmd]
(API-Job-Executor-1:ctx-225af267 job-15633 ctx-cb317a27) (logid:8443133e)
Failed to create snapshot due to an internal error creating snapshot for volume
d2f310f4-af8e-41fa-a94d-a1863d566503
at
org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd.execute(CreateSnapshotCmd.java:219)
at
org.apache.cloudstack.storage.snapshot.DefaultSnapshotStrategy.takeSnapshot(DefaultSnapshotStrategy.java:442)
at
com.cloud.storage.snapshot.SnapshotManagerImpl.takeSnapshot(SnapshotManagerImpl.java:1240)
2023-06-05 15:56:59,322 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-1:ctx-225af267 job-15633) (logid:8443133e) Complete async
job-15633, jobStatus: FAILED, resultCode: 530, result:
org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed
to create snapshot due to an internal error creating snapshot for volume
d2f310f4-af8e-41fa-a94d-a1863d566503"}
2023-06-05 15:56:59,328 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-1:ctx-225af267 job-15633) (logid:8443133e) Done executing
org.apache.cloudstack.api.command.user.snapshot.CreateSnapshotCmd for job-15633
~~~
EXPECTED RESULTS
Volume snapshot is created and transported to secondary storage.
ACTUAL RESULTS
~~~
Volume DATA snapshot process can successfully create VM Snapshot.
Checking the logs, the operation is being timeout after 3600 seconds.
I noticed that even getting the error in the ACS log and in the UI.
The process is interrupted by the ACS in 3600 seconds and does not copy the
data to the secondary storage.
Even receiving the error, the snapshot of the volume runs in the background,
but is not transferred, giving the impression that the hypervisor continued to
execute what was requested in that step, when it received the timeout it does
not execute the next steps to complete the request .
~~~
SNAPSHOTS
~~~
virsh snapshot-list i-219-414-VM
Name Creation Time State
-----------------------------------------------------------------------------------
19290884-7bd7-4b29-863a-e2400f4db35f 2023-06-03 14:23:01 -0300
disk-snapshot
3de38bcb-8e2c-4616-9ade-66524eae41b7 2023-05-27 00:08:01 -0300
disk-snapshot
40a49d8b-294b-4139-97d1-3fb3d5318b5e 2023-05-20 00:08:01 -0300
disk-snapshot
4561cfde-8e21-495c-878a-e3511b3a1b0b 2023-05-27 00:03:01 -0300
disk-snapshot
60f9e8ea-a15c-4c2d-a5f9-320670c44c48 2023-06-05 14:56:59 -0300
disk-snapshot
6cb5b473-9b47-42bb-90ce-99871ed42517 2023-06-03 17:44:44 -0300
disk-snapshot
6d9f0c1a-15ff-4e9c-b739-95406ae86649 2023-05-15 08:15:40 -0300
disk-snapshot
6e4efb0b-f17e-4488-b1a4-3b2cc8053c44 2023-05-06 00:00:38 -0300
disk-snapshot
94d85369-9914-4ed2-9263-04823f992c38 2023-06-05 12:03:54 -0300
disk-snapshot
9e3466e2-5479-4abd-8003-f37336234799 2023-06-03 12:18:01 -0300
disk-snapshot
a722a007-0a8f-447c-96df-14e38360dfc5 2023-05-08 10:20:38 -0300
disk-snapshot
af3236fc-a865-4916-bfcf-3433de227a70 2023-06-03 00:03:01 -0300
disk-snapshot
b54669a5-d7bc-46c0-9fda-f7cf822ad512 2023-05-15 08:20:40 -0300
disk-snapshot
cb237c76-0adb-4288-8a0d-699ff4e63d92 2023-06-03 00:12:11 -0300
disk-snapshot
d61caeaa-5a0c-4b5e-852b-6069a5136bf3 2023-05-20 00:03:01 -0300
disk-snapshot
/mnt/a6c33213-6d0f-365a-845c-021a5b5c2aed/snapshots
drwxr-xr-x. 2 root root 4.0K Jun 5 14:56 .
drwxr-xr-x. 4 root root 4.0K Jun 5 14:56 ..
-rw-r--r--. 1 root root 217G Jun 3 16:26
19290884-7bd7-4b29-863a-e2400f4db35f
-rw-r--r--. 1 root root 243G Jun 5 17:08
60f9e8ea-a15c-4c2d-a5f9-320670c44c48
-rw-r--r--. 1 root root 217G Jun 3 19:46
6cb5b473-9b47-42bb-90ce-99871ed42517
-rw-r--r--. 1 root root 278G Jun 5 14:37
94d85369-9914-4ed2-9263-04823f992c38
-rw-r--r--. 1 root root 217G Jun 3 14:21
9e3466e2-5479-4abd-8003-f37336234799
-rw-r--r--. 1 root root 217G Jun 3 02:14
cb237c76-0adb-4288-8a0d-699ff4e63d92
~~~
GLOBAL SETTINGS
~~~
job.cancel.threshold.minutes Time (in minutes) for async-jobs to be
forcely cancelled if it has been in process for long Advanced
default(60) set(1440)
copy.volume.wait In second, timeout for copy volume
command Storage
default(10800) set(86400)
backup.snapshot.wait In second, timeout for
BackupSnapshotCommand
Storage default(21600) set(86400)
~~~
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]