[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968860#comment-15968860
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user DaanHoogland closed the pull request at:

https://github.com/apache/cloudstack/pull/2030


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968861#comment-15968861
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user DaanHoogland commented on the issue:

https://github.com/apache/cloudstack/pull/2030
  
show me the money (tm)


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968859#comment-15968859
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user DaanHoogland commented on the issue:

https://github.com/apache/cloudstack/pull/2030
  
renaming and close-opening for retests. "it works on my laptop (tm)"


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968862#comment-15968862
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


GitHub user DaanHoogland reopened a pull request:

https://github.com/apache/cloudstack/pull/2030

WIP: CLOUDSTACK-9864 cleanup stale worker VMs after job expiry time



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shapeblue/cloudstack snapshot-housekeeping

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/2030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2030


commit 40869570fc510fac0d2357f272e96cd4a4518176
Author: Daan Hoogland 
Date:   2017-03-30T14:35:37Z

CE-113 trace logging and rethrow instead of nesting CloudRuntimeException

commit 66d7d846352d52cc539b1dafb5e4d0f1620829a5
Author: Daan Hoogland 
Date:   2017-04-05T12:19:14Z

CE-113 configure workervm gc based on job expiry

commit 996f5834e6a0a9e4dc57d436ceeb5b89e6dc9974
Author: Daan Hoogland 
Date:   2017-04-05T15:35:41Z

CE-113 extra trace log of worker VMs

commit 9a8ea7c0d1c9775ad7e4200db2b3eca93e121909
Author: Daan Hoogland 
Date:   2017-04-06T09:33:53Z

CE-113 removed TODOs

commit e2c0f09609b48f4539f13edcc742ca7e06f0cca2
Author: Daan Hoogland 
Date:   2017-04-07T12:54:19Z

CE-113 use of duration (instead of the old clock-tick-based code




> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962768#comment-15962768
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user blueorangutan commented on the issue:

https://github.com/apache/cloudstack/pull/2030
  
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-632


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962746#comment-15962746
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user borisstoyanov commented on the issue:

https://github.com/apache/cloudstack/pull/2030
  
@blueorangutan package


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962747#comment-15962747
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user blueorangutan commented on the issue:

https://github.com/apache/cloudstack/pull/2030
  
@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep 
you posted as I make progress.


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961650#comment-15961650
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user blueorangutan commented on the issue:

https://github.com/apache/cloudstack/pull/2030
  
Trillian test result (tid-983)
Environment: vmware-55u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 51519 seconds
Marvin logs: 
https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2030-t983-vmware-55u3.zip
Intermitten failure detected: 
/marvin/tests/smoke/test_deploy_vgpu_enabled_vm.py
Intermitten failure detected: /marvin/tests/smoke/test_internal_lb.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: 
/marvin/tests/smoke/test_routers_network_ops.py
Intermitten failure detected: /marvin/tests/smoke/test_routers.py
Intermitten failure detected: /marvin/tests/smoke/test_vm_snapshots.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Test completed. 48 look ok, 2 have error(s)


Test | Result | Time (s) | Test File
--- | --- | --- | ---
test_01_test_vm_volume_snapshot | `Failure` | 322.23 | test_vm_snapshots.py
test_04_rvpc_privategw_static_routes | `Failure` | 888.45 | 
test_privategw_acl.py
test_01_vpc_site2site_vpn | Success | 371.71 | test_vpc_vpn.py
test_01_vpc_remote_access_vpn | Success | 166.99 | test_vpc_vpn.py
test_01_redundant_vpc_site2site_vpn | Success | 593.09 | test_vpc_vpn.py
test_02_VPC_default_routes | Success | 354.29 | test_vpc_router_nics.py
test_01_VPC_nics_after_destroy | Success | 742.66 | test_vpc_router_nics.py
test_05_rvpc_multi_tiers | Success | 675.76 | test_vpc_redundant.py
test_04_rvpc_network_garbage_collector_nics | Success | 1534.04 | 
test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers | 
Success | 751.75 | test_vpc_redundant.py
test_02_redundant_VPC_default_routes | Success | 704.82 | 
test_vpc_redundant.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL | Success | 1374.94 | 
test_vpc_redundant.py
test_09_delete_detached_volume | Success | 30.73 | test_volumes.py
test_06_download_detached_volume | Success | 60.53 | test_volumes.py
test_05_detach_volume | Success | 105.26 | test_volumes.py
test_04_delete_attached_volume | Success | 10.18 | test_volumes.py
test_03_download_attached_volume | Success | 20.32 | test_volumes.py
test_02_attach_volume | Success | 58.72 | test_volumes.py
test_01_create_volume | Success | 519.39 | test_volumes.py
test_change_service_offering_for_vm_with_snapshots | Success | 548.99 | 
test_vm_snapshots.py
test_03_delete_vm_snapshots | Success | 275.23 | test_vm_snapshots.py
test_02_revert_vm_snapshots | Success | 232.04 | test_vm_snapshots.py
test_01_create_vm_snapshots | Success | 161.65 | test_vm_snapshots.py
test_deploy_vm_multiple | Success | 242.48 | test_vm_life_cycle.py
test_deploy_vm | Success | 0.03 | test_vm_life_cycle.py
test_advZoneVirtualRouter | Success | 0.02 | test_vm_life_cycle.py
test_10_attachAndDetach_iso | Success | 26.83 | test_vm_life_cycle.py
test_09_expunge_vm | Success | 125.25 | test_vm_life_cycle.py
test_08_migrate_vm | Success | 60.94 | test_vm_life_cycle.py
test_07_restore_vm | Success | 0.10 | test_vm_life_cycle.py
test_06_destroy_vm | Success | 10.14 | test_vm_life_cycle.py
test_03_reboot_vm | Success | 5.13 | test_vm_life_cycle.py
test_02_start_vm | Success | 20.25 | test_vm_life_cycle.py
test_01_stop_vm | Success | 10.14 | test_vm_life_cycle.py
test_CreateTemplateWithDuplicateName | Success | 206.29 | test_templates.py
test_08_list_system_templates | Success | 0.03 | test_templates.py
test_07_list_public_templates | Success | 0.04 | test_templates.py
test_05_template_permissions | Success | 0.06 | test_templates.py
test_04_extract_template | Success | 10.20 | test_templates.py
test_03_delete_template | Success | 5.09 | test_templates.py
test_02_edit_template | Success | 90.13 | test_templates.py
test_01_create_template | Success | 121.06 | test_templates.py
test_10_destroy_cpvm | Success | 266.87 | test_ssvm.py
test_09_destroy_ssvm | Success | 268.71 | test_ssvm.py
test_08_reboot_cpvm | Success | 156.52 | test_ssvm.py
test_07_reboot_ssvm | Success | 188.45 | test_ssvm.py
test_06_stop_cpvm | Success | 176.88 | test_ssvm.py
test_05_stop_ssvm | Success | 203.78 | test_ssvm.py
test_04_cpvm_internals | Success | 1.20 | test_ssvm.py
test_03_ssvm_internals | Success | 3.39 | test_ssvm.py
test_02_list_cpvm_vm | Success | 0.12 | test_ssvm.py
test_01_list_sec_storage_vm | Success | 0.12 | test_ssvm.py

[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960681#comment-15960681
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user blueorangutan commented on the issue:

https://github.com/apache/cloudstack/pull/2030
  
@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + vmware-55u3) has 
been kicked to run smoke tests


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960678#comment-15960678
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user borisstoyanov commented on the issue:

https://github.com/apache/cloudstack/pull/2030
  
@blueorangutan test centos7 vmware-55u3


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960655#comment-15960655
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user blueorangutan commented on the issue:

https://github.com/apache/cloudstack/pull/2030
  
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-624


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960628#comment-15960628
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user borisstoyanov commented on the issue:

https://github.com/apache/cloudstack/pull/2030
  
@blueorangutan package


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958811#comment-15958811
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user abhinandanprateek commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/2030#discussion_r110145761
  
--- Diff: 
plugins/hypervisors/vmware/src/com/cloud/hypervisor/vmware/manager/VmwareManagerImpl.java
 ---
@@ -128,6 +129,7 @@
 public class VmwareManagerImpl extends ManagerBase implements 
VmwareManager, VmwareStorageMount, Listener, VmwareDatacenterService, 
Configurable {
 private static final Logger s_logger = 
Logger.getLogger(VmwareManagerImpl.class);
 
+private static final long MILISECONDS_PER_MINUTE = 6;
--- End diff --

MILI typo MILLISECONDS ..


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958812#comment-15958812
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


Github user abhinandanprateek commented on a diff in the pull request:

https://github.com/apache/cloudstack/pull/2030#discussion_r110145877
  
--- Diff: 
plugins/hypervisors/vmware/src/com/cloud/hypervisor/vmware/manager/VmwareManagerImpl.java
 ---
@@ -550,15 +552,21 @@ public boolean needRecycle(String workerTag) {
 return true;
 }
 
-// disable time-out check until we have found out a VMware API 
that can check if
-// there are pending tasks on the subject VM
-/*
-if(System.currentTimeMillis() - startTick > 
_hungWorkerTimeout) {
-if(s_logger.isInfoEnabled())
-s_logger.info("Worker VM expired, seconds elapsed: 
" + (System.currentTimeMillis() - startTick) / 1000);
-return true;
-}
- */
+// this time-out check was disabled
+// "until we have found out a VMware API that can check if there 
are pending tasks on the subject VM"
+// but as we expire jobs and those stale worker VMs stay around 
untill an MS reboot we opt in to have them removed anyway
+Long hungWorkerTimeout = 2 * 
(AsyncJobManagerImpl.JobExpireMinutes.value() + 
AsyncJobManagerImpl.JobCancelThresholdMinutes.value()) * MILISECONDS_PER_MINUTE;
+Long letsSayNow = System.currentTimeMillis();
+if(s_vmwareCleanOldWorderVMs.value() && letsSayNow - startTick > 
hungWorkerTimeout) {
+if(s_logger.isInfoEnabled()) {
+s_logger.info("Worker VM expired, seconds elapsed: " + 
(System.currentTimeMillis() - startTick) / 1000);
+}
--- End diff --

For timeouts you may want to use java Duration, that is much cleaner.


> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CLOUDSTACK-9864) cleanup stale worker VMs after job expiry time

2017-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-9864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958542#comment-15958542
 ] 

ASF GitHub Bot commented on CLOUDSTACK-9864:


GitHub user DaanHoogland opened a pull request:

https://github.com/apache/cloudstack/pull/2030

WIP: CLOUDSTACK-9864 cleanup stale worker VMs after job expiry time



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shapeblue/cloudstack snapshot-housekeeping

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cloudstack/pull/2030.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2030


commit 40869570fc510fac0d2357f272e96cd4a4518176
Author: Daan Hoogland 
Date:   2017-03-30T14:35:37Z

CE-113 trace logging and rethrow instead of nesting CloudRuntimeException

commit 66d7d846352d52cc539b1dafb5e4d0f1620829a5
Author: Daan Hoogland 
Date:   2017-04-05T12:19:14Z

CE-113 configure workervm gc based on job expiry

commit 996f5834e6a0a9e4dc57d436ceeb5b89e6dc9974
Author: Daan Hoogland 
Date:   2017-04-05T15:35:41Z

CE-113 extra trace log of worker VMs




> cleanup stale worker VMs after job expiry time
> --
>
> Key: CLOUDSTACK-9864
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9864
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: VMware
>Reporter: Daan Hoogland
>Assignee: Daan Hoogland
>  Labels: vmware, vsphere, workers
>
> In the present code cleaning worker vms after a timeout is disabled, with the 
> documented reason that there is no API to query for related tasks in vcenter. 
> ACS has an expiry time for jobs and a cancel time for jobs.
> - Jobs that take longer then the expiry time will have their results be be 
> neglected.
> - Jobs that are cancelled are forcibly removed after the cancellation expity 
> time.
> Any worker remaining after expiry+cancellation will surely be stale and can 
> be removed.
> As some administrators may not want this behaviour there will be a setting 
> which by default is false that will guard against cleaning stale worker VMs.
> Stale worker VMs will be cleaned after 2 * (expiry-time + cancellation-time) 
> as a safe margin.
> related settings:
> job.expire.minutes: 1440
> job.cancel.threshold.minutes: 60
> vmware.clean.old.worker.vms: false (new)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)