Re: [ovirt-users] USER_CREATE_SNAPSHOT_FINISHED_FAILURE with Cinder storage stuck

2017-08-23 Thread Maor Lipchuk
On Tue, Aug 22, 2017 at 1:08 PM, Matthias Leopold
 wrote:
>
>
> Am 2017-08-22 um 09:33 schrieb Maor Lipchuk:
>>
>> On Mon, Aug 21, 2017 at 6:12 PM, Matthias Leopold
>>  wrote:
>>>
>>> Hi,
>>>
>>> we're experimenting with Cinder/Ceph Storage on oVirt 4.1.3. When we
>>> tried
>>> to snapshot a VM (2 disks on Cinder storage domain) the task never
>>> finished
>>> and now seems to be in an uninterruptible loop. We tried to stop it in
>>> various (brute force) ways, but the below messages (one of the disks as
>>> an
>>> example) are cluttering engine.log every 10 seconds. We tried the
>>> following:
>>>
>>> - deleting the VM
>>> - restarting ovirt-engine service
>>> - vdsClient -s 0 getAllTasksStatuses on SPM host (no result)
>>> - restarting vdsmd service on SPM host
>>> - /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -u engine -d
>>> engine
>>> -c c841c979-70ea-4e06-b9c4-9c5ce014d76d
>>>
>>> None of this helped. How do we get rid of this failed transaction?
>>>
>>> thx
>>> matthias
>>>
>>> 2017-08-21 16:40:44,798+02 INFO
>>> [org.ovirt.engine.core.utils.transaction.TransactionSupport]
>>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d]
>>> transaction
>>> rolled back
>>> 2017-08-21 16:40:44,799+02 ERROR
>>> [org.ovirt.engine.core.bll.job.ExecutionHandler]
>>> (DefaultQuartzScheduler7)
>>> [080af640-bac3-4990-8bf4-6829551b538d] Exception:
>>> org.springframework.dao.DataIntegrityViolationException:
>>> CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?,
>>> ?,
>>> ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates
>>> foreign
>>> key constraint "fk_step_job"
>>> 2017-08-21 16:40:44,805+02 ERROR
>>> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
>>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Ending
>>> command
>>> 'org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand'
>>> with
>>> failure.
>>> 2017-08-21 16:40:44,807+02 WARN
>>> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
>>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] No
>>> snapshot
>>> was created for VM 'c0235316-81c4-48be-9521-b86b338c7d20' which is in
>>> LOCKED
>>> status
>>> 2017-08-21 16:40:44,810+02 INFO
>>> [org.ovirt.engine.core.utils.transaction.TransactionSupport]
>>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d]
>>> transaction
>>> rolled back
>>> 2017-08-21 16:40:44,810+02 WARN
>>> [org.ovirt.engine.core.bll.lock.InMemoryLockManager]
>>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Trying
>>> to
>>> release exclusive lock which does not exist, lock key:
>>> 'c0235316-81c4-48be-9521-b86b338c7d20VM'
>>> 2017-08-21 16:40:44,810+02 INFO
>>> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
>>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Lock
>>> freed
>>> to object
>>> 'EngineLock:{exclusiveLocks='[c0235316-81c4-48be-9521-b86b338c7d20=VM]',
>>> sharedLocks=''}'
>>> 2017-08-21 16:40:44,829+02 ERROR
>>> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d]
>>> EVENT_ID:
>>> USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Correlation ID:
>>> 080af640-bac3-4990-8bf4-6829551b538d, Job ID:
>>> a3be8af1-8d33-4d35-9672-215ac7c9959f, Call Stack: null, Custom Event ID:
>>> -1,
>>> Message: Failed to complete snapshot 'test' creation for VM ''.
>>> 2017-08-21 16:40:44,829+02 ERROR
>>> [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller]
>>> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Failed
>>> invoking callback end method 'onFailed' for command
>>> 'c841c979-70ea-4e06-b9c4-9c5ce014d76d' with exception 'null', the
>>> callback
>>> is marked for end method retries
>>>
>>>
>
>>
>>
>> Hi Matthias,
>>
>> Can you please attach the full engine log contains the first error
>> occurred so we can trace its origin and fix it?
>> Does it reproduced constantly?
>>
>> The engine does not use VDSM tasks to manage Cinder, the engine use
>> Cinder as an external provider using the COCO infrastructure for async
>> tasks.
>> The COCO tasks are managed in the database using the command_entities
>> table, basically if you will remove all references of the command id
>> from the command_entities and restart engine you should not see it any
>> more.
>>
>> Regards,
>> Maor
>>
>
> Hi Maor,
>
> thanks very much for replying. First i tried to clean the command_entities
> table plus restarting engine as you suggested. This didn't work entirely,
> these two entries

Can you please try first to stop the engine and only then clean the
command_entities.

>
> engine=# select command_id, command_type, root_command_id, status from
> command_entities;
>   command_id  | command_type | root_command_id
> | status
> 

Re: [ovirt-users] USER_CREATE_SNAPSHOT_FINISHED_FAILURE with Cinder storage stuck

2017-08-22 Thread Matthias Leopold



Am 2017-08-22 um 09:33 schrieb Maor Lipchuk:

On Mon, Aug 21, 2017 at 6:12 PM, Matthias Leopold
 wrote:

Hi,

we're experimenting with Cinder/Ceph Storage on oVirt 4.1.3. When we tried
to snapshot a VM (2 disks on Cinder storage domain) the task never finished
and now seems to be in an uninterruptible loop. We tried to stop it in
various (brute force) ways, but the below messages (one of the disks as an
example) are cluttering engine.log every 10 seconds. We tried the following:

- deleting the VM
- restarting ovirt-engine service
- vdsClient -s 0 getAllTasksStatuses on SPM host (no result)
- restarting vdsmd service on SPM host
- /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -u engine -d engine
-c c841c979-70ea-4e06-b9c4-9c5ce014d76d

None of this helped. How do we get rid of this failed transaction?

thx
matthias

2017-08-21 16:40:44,798+02 INFO
[org.ovirt.engine.core.utils.transaction.TransactionSupport]
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] transaction
rolled back
2017-08-21 16:40:44,799+02 ERROR
[org.ovirt.engine.core.bll.job.ExecutionHandler] (DefaultQuartzScheduler7)
[080af640-bac3-4990-8bf4-6829551b538d] Exception:
org.springframework.dao.DataIntegrityViolationException:
CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, ?,
?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates foreign
key constraint "fk_step_job"
2017-08-21 16:40:44,805+02 ERROR
[org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Ending
command
'org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand' with
failure.
2017-08-21 16:40:44,807+02 WARN
[org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] No snapshot
was created for VM 'c0235316-81c4-48be-9521-b86b338c7d20' which is in LOCKED
status
2017-08-21 16:40:44,810+02 INFO
[org.ovirt.engine.core.utils.transaction.TransactionSupport]
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] transaction
rolled back
2017-08-21 16:40:44,810+02 WARN
[org.ovirt.engine.core.bll.lock.InMemoryLockManager]
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Trying to
release exclusive lock which does not exist, lock key:
'c0235316-81c4-48be-9521-b86b338c7d20VM'
2017-08-21 16:40:44,810+02 INFO
[org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Lock freed
to object
'EngineLock:{exclusiveLocks='[c0235316-81c4-48be-9521-b86b338c7d20=VM]',
sharedLocks=''}'
2017-08-21 16:40:44,829+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] EVENT_ID:
USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Correlation ID:
080af640-bac3-4990-8bf4-6829551b538d, Job ID:
a3be8af1-8d33-4d35-9672-215ac7c9959f, Call Stack: null, Custom Event ID: -1,
Message: Failed to complete snapshot 'test' creation for VM ''.
2017-08-21 16:40:44,829+02 ERROR
[org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller]
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Failed
invoking callback end method 'onFailed' for command
'c841c979-70ea-4e06-b9c4-9c5ce014d76d' with exception 'null', the callback
is marked for end method retries







Hi Matthias,

Can you please attach the full engine log contains the first error
occurred so we can trace its origin and fix it?
Does it reproduced constantly?

The engine does not use VDSM tasks to manage Cinder, the engine use
Cinder as an external provider using the COCO infrastructure for async
tasks.
The COCO tasks are managed in the database using the command_entities
table, basically if you will remove all references of the command id
from the command_entities and restart engine you should not see it any
more.

Regards,
Maor



Hi Maor,

thanks very much for replying. First i tried to clean the 
command_entities table plus restarting engine as you suggested. This 
didn't work entirely, these two entries


engine=# select command_id, command_type, root_command_id, status from 
command_entities;
  command_id  | command_type | 
root_command_id| status

--+--+--+
 c841c979-70ea-4e06-b9c4-9c5ce014d76d |  206 | 
c841c979-70ea-4e06-b9c4-9c5ce014d76d | FAILED
 65fa094e-1609-47ea-bf0d-611e3d5b9358 |  206 | 
65fa094e-1609-47ea-bf0d-611e3d5b9358 | FAILED



keep appearing and still cause messages in engine.log like

 2017-08-22 11:54:57,109+02 WARN 
[org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] 
(DefaultQuartzScheduler8) [080af640-bac3-4990-8bf4-6829551b538d] No 
snapshot was created for VM 'c0235316-81c4-48be-9521-b86b338c7d20' which 
is in LOCKED status

Re: [ovirt-users] USER_CREATE_SNAPSHOT_FINISHED_FAILURE with Cinder storage stuck

2017-08-22 Thread Maor Lipchuk
On Mon, Aug 21, 2017 at 6:12 PM, Matthias Leopold
 wrote:
> Hi,
>
> we're experimenting with Cinder/Ceph Storage on oVirt 4.1.3. When we tried
> to snapshot a VM (2 disks on Cinder storage domain) the task never finished
> and now seems to be in an uninterruptible loop. We tried to stop it in
> various (brute force) ways, but the below messages (one of the disks as an
> example) are cluttering engine.log every 10 seconds. We tried the following:
>
> - deleting the VM
> - restarting ovirt-engine service
> - vdsClient -s 0 getAllTasksStatuses on SPM host (no result)
> - restarting vdsmd service on SPM host
> - /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -u engine -d engine
> -c c841c979-70ea-4e06-b9c4-9c5ce014d76d
>
> None of this helped. How do we get rid of this failed transaction?
>
> thx
> matthias
>
> 2017-08-21 16:40:44,798+02 INFO
> [org.ovirt.engine.core.utils.transaction.TransactionSupport]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] transaction
> rolled back
> 2017-08-21 16:40:44,799+02 ERROR
> [org.ovirt.engine.core.bll.job.ExecutionHandler] (DefaultQuartzScheduler7)
> [080af640-bac3-4990-8bf4-6829551b538d] Exception:
> org.springframework.dao.DataIntegrityViolationException:
> CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates foreign
> key constraint "fk_step_job"
> 2017-08-21 16:40:44,805+02 ERROR
> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Ending
> command
> 'org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand' with
> failure.
> 2017-08-21 16:40:44,807+02 WARN
> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] No snapshot
> was created for VM 'c0235316-81c4-48be-9521-b86b338c7d20' which is in LOCKED
> status
> 2017-08-21 16:40:44,810+02 INFO
> [org.ovirt.engine.core.utils.transaction.TransactionSupport]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] transaction
> rolled back
> 2017-08-21 16:40:44,810+02 WARN
> [org.ovirt.engine.core.bll.lock.InMemoryLockManager]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Trying to
> release exclusive lock which does not exist, lock key:
> 'c0235316-81c4-48be-9521-b86b338c7d20VM'
> 2017-08-21 16:40:44,810+02 INFO
> [org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Lock freed
> to object
> 'EngineLock:{exclusiveLocks='[c0235316-81c4-48be-9521-b86b338c7d20=VM]',
> sharedLocks=''}'
> 2017-08-21 16:40:44,829+02 ERROR
> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] EVENT_ID:
> USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Correlation ID:
> 080af640-bac3-4990-8bf4-6829551b538d, Job ID:
> a3be8af1-8d33-4d35-9672-215ac7c9959f, Call Stack: null, Custom Event ID: -1,
> Message: Failed to complete snapshot 'test' creation for VM ''.
> 2017-08-21 16:40:44,829+02 ERROR
> [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller]
> (DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Failed
> invoking callback end method 'onFailed' for command
> 'c841c979-70ea-4e06-b9c4-9c5ce014d76d' with exception 'null', the callback
> is marked for end method retries
>
>
>
>
>
>
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users


Hi Matthias,

Can you please attach the full engine log contains the first error
occurred so we can trace its origin and fix it?
Does it reproduced constantly?

The engine does not use VDSM tasks to manage Cinder, the engine use
Cinder as an external provider using the COCO infrastructure for async
tasks.
The COCO tasks are managed in the database using the command_entities
table, basically if you will remove all references of the command id
from the command_entities and restart engine you should not see it any
more.

Regards,
Maor
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users


[ovirt-users] USER_CREATE_SNAPSHOT_FINISHED_FAILURE with Cinder storage stuck

2017-08-21 Thread Matthias Leopold

Hi,

we're experimenting with Cinder/Ceph Storage on oVirt 4.1.3. When we 
tried to snapshot a VM (2 disks on Cinder storage domain) the task never 
finished and now seems to be in an uninterruptible loop. We tried to 
stop it in various (brute force) ways, but the below messages (one of 
the disks as an example) are cluttering engine.log every 10 seconds. We 
tried the following:


- deleting the VM
- restarting ovirt-engine service
- vdsClient -s 0 getAllTasksStatuses on SPM host (no result)
- restarting vdsmd service on SPM host
- /usr/share/ovirt-engine/setup/dbutils/taskcleaner.sh -u engine -d 
engine -c c841c979-70ea-4e06-b9c4-9c5ce014d76d


None of this helped. How do we get rid of this failed transaction?

thx
matthias

2017-08-21 16:40:44,798+02 INFO 
[org.ovirt.engine.core.utils.transaction.TransactionSupport] 
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] 
transaction rolled back
2017-08-21 16:40:44,799+02 ERROR 
[org.ovirt.engine.core.bll.job.ExecutionHandler] 
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] 
Exception: org.springframework.dao.DataIntegrityViolationException: 
CallableStatementCallback; SQL [{call insertstep(?, ?, ?, ?, ?, ?, ?, ?, 
?, ?, ?, ?, ?, ?)}]; ERROR: insert or update on table "step" violates 
foreign key constraint "fk_step_job"
2017-08-21 16:40:44,805+02 ERROR 
[org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] 
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Ending 
command 
'org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand' 
with failure.
2017-08-21 16:40:44,807+02 WARN 
[org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] 
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] No 
snapshot was created for VM 'c0235316-81c4-48be-9521-b86b338c7d20' which 
is in LOCKED status
2017-08-21 16:40:44,810+02 INFO 
[org.ovirt.engine.core.utils.transaction.TransactionSupport] 
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] 
transaction rolled back
2017-08-21 16:40:44,810+02 WARN 
[org.ovirt.engine.core.bll.lock.InMemoryLockManager] 
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Trying 
to release exclusive lock which does not exist, lock key: 
'c0235316-81c4-48be-9521-b86b338c7d20VM'
2017-08-21 16:40:44,810+02 INFO 
[org.ovirt.engine.core.bll.snapshots.CreateAllSnapshotsFromVmCommand] 
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Lock 
freed to object 
'EngineLock:{exclusiveLocks='[c0235316-81c4-48be-9521-b86b338c7d20=VM]', 
sharedLocks=''}'
2017-08-21 16:40:44,829+02 ERROR 
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] 
EVENT_ID: USER_CREATE_SNAPSHOT_FINISHED_FAILURE(69), Correlation ID: 
080af640-bac3-4990-8bf4-6829551b538d, Job ID: 
a3be8af1-8d33-4d35-9672-215ac7c9959f, Call Stack: null, Custom Event ID: 
-1, Message: Failed to complete snapshot 'test' creation for VM ''.
2017-08-21 16:40:44,829+02 ERROR 
[org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] 
(DefaultQuartzScheduler7) [080af640-bac3-4990-8bf4-6829551b538d] Failed 
invoking callback end method 'onFailed' for command 
'c841c979-70ea-4e06-b9c4-9c5ce014d76d' with exception 'null', the 
callback is marked for end method retries








___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users