[jira] [Updated] (CLOUDSTACK-7881) Allow VPN IP range to be specified when creating a VPN

2014-12-11 Thread Logan B (JIRA)

 [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Logan B updated CLOUDSTACK-7881:

Attachment: f74b1a26db4514b9795ed760504351db8b03ef03.patch

Patch submitted to review board.

> Allow VPN IP range to be specified when creating a VPN
> --
>
> Key: CLOUDSTACK-7881
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7881
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: UI
>Affects Versions: 4.4.0
> Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS
>Reporter: Logan B
>Priority: Minor
> Fix For: 4.5.0, 4.6.0
>
> Attachments: f74b1a26db4514b9795ed760504351db8b03ef03.patch
>
>
> Currently when creating a VPN on an Isolated Network via the UI the default 
> VPN IP range (specified in Global Settings) is used.  The API permits 
> overriding this range during VPN creation.
> I would suggest adding a text box to the VPN creation form in the UI to 
> specify an IP range that overrides the defaults.  While not critical, it can 
> be useful to the end user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CLOUDSTACK-7847) API: listDomains should display the domain resources, similar to listAccounts

2014-12-01 Thread Logan B (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230039#comment-14230039
 ] 

Logan B commented on CLOUDSTACK-7847:
-

Wei,

Please makes sure to post the commit id here when ready.  I'll be happy to test 
it, as we will need to pull this into our 4.5 deployment so we can display 
statistics to our customers without giant loop calls.

> API: listDomains should display the domain resources, similar to listAccounts
> -
>
> Key: CLOUDSTACK-7847
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7847
> Project: CloudStack
>  Issue Type: Improvement
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: API
>Affects Versions: 4.4.0
> Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS
>Reporter: Logan B
>Assignee: Wei Zhou
> Fix For: 4.6.0
>
>
> Currently the "listDomains" call does not display any resource statistics.
> Since resources can be limited at the Domain level, it would make sense to 
> have the "listDomains" call return the resource limit & usage details the 
> same way "listAccounts" does.
> I would suggest having it return the following details for the domain:
> - Max/Used IPs
> - Max/Used Templates
> - Max/Used Snapshots
> - Max/Used VPC
> - Max/Used Networks
> - Max/Used Memory
> - Max/Used Projects
> - Max/Used vCPU Count
> - Max/Used CPU Mhz (This may not actually be tracked by CloudStack)
> - Max/Used Primary Storage
> - Max/Used Secondary Storage
> - I may have missed some.
> This would make it much easier to pull statistics information for a domain, 
> instead of having to use multiple other calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CLOUDSTACK-7882) SSH Keypair Creation/Selection in UI

2014-11-11 Thread Logan B (JIRA)
Logan B created CLOUDSTACK-7882:
---

 Summary: SSH Keypair Creation/Selection in UI
 Key: CLOUDSTACK-7882
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7882
 Project: CloudStack
  Issue Type: Improvement
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: UI
Affects Versions: 4.4.0
 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS
Reporter: Logan B
Priority: Minor
 Fix For: 4.5.0, 4.6.0


Currently the API allows for creating SSH keypairs, and specifying keypairs to 
use when deploying a VM (if the correct script is installed in the template).

I would suggest adding a section in the UI (perhaps as a drop down option in 
the instances menu) to create SSH keypairs.  I would then suggest adding an 
option in the Instance Wizard to select a keypair to inject into the instance 
upon creation.

It may also be worth adding a button to the instance menu to inject a new 
keypair upon reboot (like we have for password resets now).  This could be 
enabled/disabled with a template flag (e.g., "SSH Key Enabled," like the 
"Password Enabled" flag we have now)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CLOUDSTACK-7881) Allow VPN IP range to be specified when creating a VPN

2014-11-11 Thread Logan B (JIRA)
Logan B created CLOUDSTACK-7881:
---

 Summary: Allow VPN IP range to be specified when creating a VPN
 Key: CLOUDSTACK-7881
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7881
 Project: CloudStack
  Issue Type: Improvement
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: UI
Affects Versions: 4.4.0
 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS
Reporter: Logan B
Priority: Minor
 Fix For: 4.5.0, 4.6.0


Currently when creating a VPN on an Isolated Network via the UI the default VPN 
IP range (specified in Global Settings) is used.  The API permits overriding 
this range during VPN creation.

I would suggest adding a text box to the VPN creation form in the UI to specify 
an IP range that overrides the defaults.  While not critical, it can be useful 
to the end user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CLOUDSTACK-7848) API: updateResourceCount doesn't return all statistics

2014-11-05 Thread Logan B (JIRA)

 [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Logan B updated CLOUDSTACK-7848:

Issue Type: Bug  (was: Improvement)

> API: updateResourceCount doesn't return all statistics
> --
>
> Key: CLOUDSTACK-7848
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7848
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: API
>Affects Versions: 4.4.0
> Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS
>Reporter: Logan B
> Fix For: 4.5.0
>
>
> Currently the "updateResourceCount" API call is not returning correct values 
> for all of the statistics.  Specifically the "Memory Used" and "Secondary 
> Storage Used" are being returned as "0" even if those resources are being 
> used.
> As a workaround right now I'm having to go through other calls to pull this 
> data down.
> I'm unsure if there are other values not being returned correctly, but I can 
> confirm that at least the "IPs Used", "Templates Used", and "Primary Storage 
> Used" values are being returned.
> I have tested this with the "domainid" field specified.  I haven't tested 
> without "domainid" since that is my use case.
> Here is a var_dump of the call with unique information removed:
> object(stdClass)#2 (1) {
>   ["updateresourcecountresponse"]=>
>   object(stdClass)#3 (2) {
> ["count"]=>
> int(12)
> ["resourcecount"]=>
> array(12) {
>   [0]=>
>   object(stdClass)#4 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(1) "0"
> ["resourcecount"]=>
> int(2)
>   }
>   [1]=>
>   object(stdClass)#5 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(1) "1"
> ["resourcecount"]=>
> int(2)
>   }
>   [2]=>
>   object(stdClass)#6 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(1) "2"
> ["resourcecount"]=>
> int(2)
>   }
>   [3]=>
>   object(stdClass)#7 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(1) "3"
> ["resourcecount"]=>
> int(2)
>   }
>   [4]=>
>   object(stdClass)#8 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(1) "4"
> ["resourcecount"]=>
> int(0)
>   }
>   [5]=>
>   object(stdClass)#9 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(1) "5"
> ["resourcecount"]=>
> int(0)
>   }
>   [6]=>
>   object(stdClass)#10 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(1) "6"
> ["resourcecount"]=>
> int(1)
>   }
>   [7]=>
>   object(stdClass)#11 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(1) "7"
> ["resourcecount"]=>
> int(0)
>   }
>   [8]=>
>   object(stdClass)#12 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(1) "8"
> ["resourcecount"]=>
> int(0)
>   }
>   [9]=>
>   object(stdClass)#13 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(1) "9"
> ["resourcecount"]=>
> int(0)
>   }
>   [10]=>
>   object(stdClass)#14 (4) {
> ["domainid"]=>
> string(36) "12345678-91234-56789-1234-567891234"
> ["domain"]=>
> string(7) "Example"
> ["resourcetype"]=>
> string(2) "10"
> ["resourcecount"]=>
> float(11811160064)
>   }
>   [11]=>
>   object(stdClass)#

[jira] [Created] (CLOUDSTACK-7848) API: updateResourceCount doesn't return all statistics

2014-11-05 Thread Logan B (JIRA)
Logan B created CLOUDSTACK-7848:
---

 Summary: API: updateResourceCount doesn't return all statistics
 Key: CLOUDSTACK-7848
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7848
 Project: CloudStack
  Issue Type: Improvement
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: API
Affects Versions: 4.4.0
 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS
Reporter: Logan B
 Fix For: 4.5.0


Currently the "updateResourceCount" API call is not returning correct values 
for all of the statistics.  Specifically the "Memory Used" and "Secondary 
Storage Used" are being returned as "0" even if those resources are being used.

As a workaround right now I'm having to go through other calls to pull this 
data down.

I'm unsure if there are other values not being returned correctly, but I can 
confirm that at least the "IPs Used", "Templates Used", and "Primary Storage 
Used" values are being returned.

I have tested this with the "domainid" field specified.  I haven't tested 
without "domainid" since that is my use case.

Here is a var_dump of the call with unique information removed:

object(stdClass)#2 (1) {
  ["updateresourcecountresponse"]=>
  object(stdClass)#3 (2) {
["count"]=>
int(12)
["resourcecount"]=>
array(12) {
  [0]=>
  object(stdClass)#4 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(1) "0"
["resourcecount"]=>
int(2)
  }
  [1]=>
  object(stdClass)#5 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(1) "1"
["resourcecount"]=>
int(2)
  }
  [2]=>
  object(stdClass)#6 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(1) "2"
["resourcecount"]=>
int(2)
  }
  [3]=>
  object(stdClass)#7 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(1) "3"
["resourcecount"]=>
int(2)
  }
  [4]=>
  object(stdClass)#8 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(1) "4"
["resourcecount"]=>
int(0)
  }
  [5]=>
  object(stdClass)#9 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(1) "5"
["resourcecount"]=>
int(0)
  }
  [6]=>
  object(stdClass)#10 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(1) "6"
["resourcecount"]=>
int(1)
  }
  [7]=>
  object(stdClass)#11 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(1) "7"
["resourcecount"]=>
int(0)
  }
  [8]=>
  object(stdClass)#12 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(1) "8"
["resourcecount"]=>
int(0)
  }
  [9]=>
  object(stdClass)#13 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(1) "9"
["resourcecount"]=>
int(0)
  }
  [10]=>
  object(stdClass)#14 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(2) "10"
["resourcecount"]=>
float(11811160064)
  }
  [11]=>
  object(stdClass)#15 (4) {
["domainid"]=>
string(36) "12345678-91234-56789-1234-567891234"
["domain"]=>
string(7) "Example"
["resourcetype"]=>
string(2) "11"
["resourcecount"]=>
int(0)
  }
}
  }
}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CLOUDSTACK-7847) API: listDomains should display the domain resources, similar to listAccounts

2014-11-05 Thread Logan B (JIRA)
Logan B created CLOUDSTACK-7847:
---

 Summary: API: listDomains should display the domain resources, 
similar to listAccounts
 Key: CLOUDSTACK-7847
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7847
 Project: CloudStack
  Issue Type: Improvement
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: API
Affects Versions: 4.4.0
 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS
Reporter: Logan B
 Fix For: 4.5.0


Currently the "listDomains" call does not display any resource statistics.

Since resources can be limited at the Domain level, it would make sense to have 
the "listDomains" call return the resource limit & usage details the same way 
"listAccounts" does.

I would suggest having it return the following details for the domain:
- Max/Used IPs
- Max/Used Templates
- Max/Used Snapshots
- Max/Used VPC
- Max/Used Networks
- Max/Used Memory
- Max/Used Projects
- Max/Used vCPU Count
- Max/Used CPU Mhz (This may not actually be tracked by CloudStack)
- Max/Used Primary Storage
- Max/Used Secondary Storage
- I may have missed some.

This would make it much easier to pull statistics information for a domain, 
instead of having to use multiple other calls.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CLOUDSTACK-7845) Strict Implicit Dedication should allow for deploying owned Virtual Routers on dedicated host

2014-11-05 Thread Logan B (JIRA)
Logan B created CLOUDSTACK-7845:
---

 Summary: Strict Implicit Dedication should allow for deploying 
owned Virtual Routers on dedicated host
 Key: CLOUDSTACK-7845
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7845
 Project: CloudStack
  Issue Type: Improvement
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: SystemVM, Virtual Router
Affects Versions: 4.4.0
 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS
Reporter: Logan B
 Fix For: 4.5.0


Currently the best method of isolation for domains or accounts is Strict 
Implicit Dedication.  The reasoning is as follows:

Goal: Dedicated a resource (host, cluster, or pod) to an account or domain.

Problems:
- Explicit Dedication: Account or domain's VMs are all deployed on it's 
dedicated resources.  However, System VMs (Virtual Routers) belonging to OTHER 
accounts can also be deployed on those same resources (host, cluster, or pod).  
This is not desirable.

- Preferred Implicit Dedication: Account or domain's VMs are deployed on it's 
dedicated resources.  However, if those resources are near full utilization 
there is a chance that the account or domain's VMs will be deployed on 
resources that are not dedicated.  This is less likely, but also undesirable.

We are currently using both explicit and implicit dedication.  The explicit 
dedication ensures that the first VM deployed is provisioned on the dedicated 
resources, while the implicit dedication ensures that other accounts can't 
deploy resources on the same dedicated resources (intentionally or not).

Proposed changes:

Currently Virtual Router's are considered to be owned by the "system" account, 
even though they are each tied to a specific user account.  This probably 
doesn't need to change, but it makes a solution to the above issue easier since 
Virtual Router's are already tagged/associated with user accounts.

I would suggest changing the Strict Implicit Dedication planner, and the 
Virtual Router deployment planner as follows:

- Strict Implicit Dedication: When selecting a host for strict implicit 
dedication Virtual Router's belonging to the account that "owns" the resource 
should be ignored.  Virtual Router's or other System VMs belonging to OTHER 
accounts should still be considered, and cause the deployment to fail.

- Virtual Router deployment: Virtual Router's belonging to an account should 
prefer deployment on explicitly or implicitly dedicated resources belonging to 
that same account.  In addition, deployment should not fail if the Strict 
Implicitly dedicated resource owner and the Virtual Router "owner" match.


The end goal here is to provide absolute isolation for accounts or domains with 
dedicated resources.  If someone pays for a 'private cloud' with dedicated 
hardware then all of their deployed services should end up on that hardware, 
and no other account/domain should be able to utilize the dedicated resources 
of another.  This ensures that an outage or issue on a public resource doesn't 
affect the dedicated/private infrastructure, and "public" users can't consume 
private resources being paid for by someone else.

Currently the only way this is possible is by dedicating an entire zone to an 
account, but that is far from ideal, and makes management of the overall 
deployment/networking/etc. much more of a hassle.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CLOUDSTACK-7844) IP Reservation in Isolated Networks doesn't work as expected

2014-11-05 Thread Logan B (JIRA)
Logan B created CLOUDSTACK-7844:
---

 Summary: IP Reservation in Isolated Networks doesn't work as 
expected
 Key: CLOUDSTACK-7844
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7844
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: Virtual Router
Affects Versions: 4.4.0
 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS
Reporter: Logan B
 Fix For: 4.5.0


When using the IP Reservation functionality on an Isolated Guest Network in an 
Advanced Zone it doesn't work as expected.

Goal: Create Isolated Network with 10.1.1.0/24 subnet.  Configure network with 
IP reservation to 10.1.1.0/25.

Test:
1. Create Isolated Guest Network with VR/DHCP/Etc. (Using the default 
'IsolatedNetworkOfferingWithSourceNAT' offering).  Use default Guest CIDR 
(10.1.1.0/24).
2. Deploy VM on network to "Implement" it.*  Make sure VM has a NIC in 
10.1.1.0/25. (ex: 10.1.1.50).
3. Edit network and set "Guest CIDR" to 10.1.1.0/25.  After saving the "Guest 
CIDR" field should display 10.1.1.0/25, and the "Network CIDR" field should be 
10.1.1.0/24.
4. NOTE: At this point everything should be working as expected.  Problems 
don't occur until the next step.
5. Restart the network you created (with "Cleanup" checked).
6. Reboot the VM you created earlier, or run dhclient on the primary interface.
7. The VM will now have a /25 (255.255.255.128) netmask, instead of the /24 it 
was initially deployed with.
8. Manually modify the VMs IP and netmask to be outside the Guest CIDR, but 
still within the network CIDR (e.g., 10.1.1.150/24), and create a default route 
for the VR IP (e.g. 10.1.1.1).

Expected Result:
- No VMs should be deployed in the reserved range.
- IPs in the reserved range (10.1.1.127 - 10.1.1.254) should be able to ping 
VMs in the Guest CIDR range (10.1.1.2 - 10.1.1.125), and vice versa.
- The virtual router should still have a 255.255.255.0 netmask, and provide 
routing/DHCP/etc for the entire subnet (10.1.1.0/24).
- New VMs created on the guest network should get an IP in the Guest CIDR range 
(10.1.1.0/25) but have the Network CIDR netmask (255.255.255.0).

Observed Result:
- No VMs are deployed in the reserved range.
- IPs in the reserved range (10.1.1.127 - 10.1.1.254) are NOT ABLE to ping VMs 
in the Guest CIDR range (10.1.1.2 - 10.1.1.125), and vice versa.
- The virtual router has a /25 (255.255.255.128) netmask, and only provides 
routing/DHCP for addresses in that subnet.
- New VMs created on the network are deployed in the Guest CIDR range 
(10.1.1.0/25) with a /25 (255.255.255.128) netmask, instead of a /24 
(255.255.255.0) netmask.

I'm assuming this is not the intended behavior.  I posted these results on the 
dev list, but didn't get much traction.

I would assume this can be resolved in one of two ways:
- Option A) Ensure that the Virtual Router always pulls it's netmask/routing 
from the Network CIDR.  As I understand it CloudStack manually creates static 
DHCP entries when provisioning VMs, so I don't think any networking changes 
should take effect on the VR when implementing IP reservation.  (If anything we 
should just update the "dhcp-range" instead of the netmask/routing.

- Option B) When IP reservation is in effect the virtual router should be 
updated with a route to the reserved range (10.1.1.128/25).  That way it will 
still be reachable if we manually set a /24 netmask on hosts in the reserved 
range.  This option seems like a workaround rather than a fix, so Option A 
would be preferred.

Notice that this problem ONLY comes up when the Virtual Router is cleaned up or 
re-deployed.  Because of this it may not be caught in standard testing, but it 
can cause problems when the router is restarted for 
HA/migrations/maintenance/etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage

2014-06-23 Thread Logan B (JIRA)

 [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Logan B resolved CLOUDSTACK-6938.
-

Resolution: Fixed

Fixed with 736bf540e8ef759a101d221622c64f3b3c3ed425

> Cannot create template from snapshot when using S3 storage
> --
>
> Key: CLOUDSTACK-6938
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Snapshot
>Affects Versions: 4.4.0
> Environment: KVM + S3 Secondary Storage
>Reporter: Logan B
>Priority: Critical
> Fix For: 4.4.0
>
>
> When trying to create a template from a snapshot with S3 secondary storage, 
> the command immediately fails with a NullPointerException.
> This appears to only happen when there is a pre-existing snapshot folder in 
> the NFS staging store.  This indicates that there is something wrong with the 
> copy command (e.g., it's using 'mkdir' instead of 'mkdir -p').
> The issue can be worked around by deleting the existing snapshot folder on 
> the staging store every time you want to create a new template.  This is 
> obviously not viable for end users.
> This issue should be fixed before 4.4 ships because it should be a stupid 
> simple thing to correct, but completely breaks restoring snapshots for end 
> users.  Waiting for 4.5 would be far too long for an issue like this.
> 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] 
> (agentRequest-Handler-2:null) Processing command: 
> org.apache.cloudstack.storage.command.CopyCommand
> 2014-06-18 21:13:54,789 INFO  [storage.resource.NfsSecondaryStorageResource] 
> (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP 
> 172.16.48.99
> 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] 
> (agentRequest-Handler-2:null) Unable to create directory 
> /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy 
> from S3 to cache.
> I'm guessing it's an issue with the mkdirs() function in the code, but I've 
> been unable to find it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage

2014-06-20 Thread Logan B (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039078#comment-14039078
 ] 

Logan B commented on CLOUDSTACK-6938:
-

I've posted a patch for this issue to the review board & mailing list.  Seems 
to be working for me, but I have no idea if the logic is actually sound.

> Cannot create template from snapshot when using S3 storage
> --
>
> Key: CLOUDSTACK-6938
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Snapshot
>Affects Versions: 4.4.0
> Environment: KVM + S3 Secondary Storage
>Reporter: Logan B
>Priority: Critical
> Fix For: 4.4.0
>
>
> When trying to create a template from a snapshot with S3 secondary storage, 
> the command immediately fails with a NullPointerException.
> This appears to only happen when there is a pre-existing snapshot folder in 
> the NFS staging store.  This indicates that there is something wrong with the 
> copy command (e.g., it's using 'mkdir' instead of 'mkdir -p').
> The issue can be worked around by deleting the existing snapshot folder on 
> the staging store every time you want to create a new template.  This is 
> obviously not viable for end users.
> This issue should be fixed before 4.4 ships because it should be a stupid 
> simple thing to correct, but completely breaks restoring snapshots for end 
> users.  Waiting for 4.5 would be far too long for an issue like this.
> 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] 
> (agentRequest-Handler-2:null) Processing command: 
> org.apache.cloudstack.storage.command.CopyCommand
> 2014-06-18 21:13:54,789 INFO  [storage.resource.NfsSecondaryStorageResource] 
> (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP 
> 172.16.48.99
> 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] 
> (agentRequest-Handler-2:null) Unable to create directory 
> /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy 
> from S3 to cache.
> I'm guessing it's an issue with the mkdirs() function in the code, but I've 
> been unable to find it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage

2014-06-20 Thread Logan B (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038846#comment-14038846
 ] 

Logan B commented on CLOUDSTACK-6938:
-

Understandable, though a bug that makes existing features unusable seems like 
it should be fixed sooner rather than later.  Since I would doubt 4.5 will 
release before September I think something this simple should be looked at.  
I'm attempting to come up with a patch, test it, and submit it for review, but 
having never done any real development before I don't know if I can get it in 
and approved before an RC build.

> Cannot create template from snapshot when using S3 storage
> --
>
> Key: CLOUDSTACK-6938
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Snapshot
>Affects Versions: 4.4.0
> Environment: KVM + S3 Secondary Storage
>Reporter: Logan B
>Priority: Critical
> Fix For: 4.4.0
>
>
> When trying to create a template from a snapshot with S3 secondary storage, 
> the command immediately fails with a NullPointerException.
> This appears to only happen when there is a pre-existing snapshot folder in 
> the NFS staging store.  This indicates that there is something wrong with the 
> copy command (e.g., it's using 'mkdir' instead of 'mkdir -p').
> The issue can be worked around by deleting the existing snapshot folder on 
> the staging store every time you want to create a new template.  This is 
> obviously not viable for end users.
> This issue should be fixed before 4.4 ships because it should be a stupid 
> simple thing to correct, but completely breaks restoring snapshots for end 
> users.  Waiting for 4.5 would be far too long for an issue like this.
> 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] 
> (agentRequest-Handler-2:null) Processing command: 
> org.apache.cloudstack.storage.command.CopyCommand
> 2014-06-18 21:13:54,789 INFO  [storage.resource.NfsSecondaryStorageResource] 
> (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP 
> 172.16.48.99
> 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] 
> (agentRequest-Handler-2:null) Unable to create directory 
> /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy 
> from S3 to cache.
> I'm guessing it's an issue with the mkdirs() function in the code, but I've 
> been unable to find it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage

2014-06-18 Thread Logan B (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036468#comment-14036468
 ] 

Logan B commented on CLOUDSTACK-6938:
-

This is the relevant bit of code:

In NfsSecondaryStorageResource.java:

if (!downloadDirectory.mkdirs()) {
final String errMsg = "Unable to create directory " + 
downloadPath + " to copy from S3 to cache.";
s_logger.error(errMsg);
return new CopyCmdAnswer(errMsg);
} else {
s_logger.debug("Directory " + downloadPath + " already exists");
}

I believe mkdirs() returns false if the directory already exists.  So this 
failure logic is prone to breaking.

Better logic might be:

if (downloadDirectory.exists()) {
  s_logger.debug("Directory " + downloadPath + " already exists");
} else {
 if (!downloadDirectory.mkdirs()) {
   final String errMsg = "Unable to create directory " + 
downloadPath + " to copy from S3 to cache.";
s_logger.error(errMsg);
return new CopyCmdAnswer(errMsg);
 }

I'm not a programmer, but it seems that checking for the existing path before 
blindly failing would be better here.  If this code checks out I'll try to 
figure out how to offer a commit for cherry picking.

> Cannot create template from snapshot when using S3 storage
> --
>
> Key: CLOUDSTACK-6938
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Snapshot
>Affects Versions: 4.4.0
> Environment: KVM + S3 Secondary Storage
>Reporter: Logan B
>Priority: Blocker
> Fix For: 4.4.0
>
>
> When trying to create a template from a snapshot with S3 secondary storage, 
> the command immediately fails with a NullPointerException.
> This appears to only happen when there is a pre-existing snapshot folder in 
> the NFS staging store.  This indicates that there is something wrong with the 
> copy command (e.g., it's using 'mkdir' instead of 'mkdir -p').
> The issue can be worked around by deleting the existing snapshot folder on 
> the staging store every time you want to create a new template.  This is 
> obviously not viable for end users.
> This issue should be fixed before 4.4 ships because it should be a stupid 
> simple thing to correct, but completely breaks restoring snapshots for end 
> users.  Waiting for 4.5 would be far too long for an issue like this.
> 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] 
> (agentRequest-Handler-2:null) Processing command: 
> org.apache.cloudstack.storage.command.CopyCommand
> 2014-06-18 21:13:54,789 INFO  [storage.resource.NfsSecondaryStorageResource] 
> (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP 
> 172.16.48.99
> 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] 
> (agentRequest-Handler-2:null) Unable to create directory 
> /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy 
> from S3 to cache.
> I'm guessing it's an issue with the mkdirs() function in the code, but I've 
> been unable to find it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage

2014-06-18 Thread Logan B (JIRA)

 [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Logan B updated CLOUDSTACK-6938:


Fix Version/s: 4.4.0

> Cannot create template from snapshot when using S3 storage
> --
>
> Key: CLOUDSTACK-6938
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Snapshot
>Affects Versions: 4.4.0
> Environment: KVM + S3 Secondary Storage
>Reporter: Logan B
>Priority: Blocker
> Fix For: 4.4.0
>
>
> When trying to create a template from a snapshot with S3 secondary storage, 
> the command immediately fails with a NullPointerException.
> This appears to only happen when there is a pre-existing snapshot folder in 
> the NFS staging store.  This indicates that there is something wrong with the 
> copy command (e.g., it's using 'mkdir' instead of 'mkdir -p').
> The issue can be worked around by deleting the existing snapshot folder on 
> the staging store every time you want to create a new template.  This is 
> obviously not viable for end users.
> This issue should be fixed before 4.4 ships because it should be a stupid 
> simple thing to correct, but completely breaks restoring snapshots for end 
> users.  Waiting for 4.5 would be far too long for an issue like this.
> 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] 
> (agentRequest-Handler-2:null) Processing command: 
> org.apache.cloudstack.storage.command.CopyCommand
> 2014-06-18 21:13:54,789 INFO  [storage.resource.NfsSecondaryStorageResource] 
> (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP 
> 172.16.48.99
> 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] 
> (agentRequest-Handler-2:null) Unable to create directory 
> /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy 
> from S3 to cache.
> I'm guessing it's an issue with the mkdirs() function in the code, but I've 
> been unable to find it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage

2014-06-18 Thread Logan B (JIRA)
Logan B created CLOUDSTACK-6938:
---

 Summary: Cannot create template from snapshot when using S3 storage
 Key: CLOUDSTACK-6938
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: Snapshot
Affects Versions: 4.4.0
 Environment: KVM + S3 Secondary Storage
Reporter: Logan B
Priority: Blocker


When trying to create a template from a snapshot with S3 secondary storage, the 
command immediately fails with a NullPointerException.

This appears to only happen when there is a pre-existing snapshot folder in the 
NFS staging store.  This indicates that there is something wrong with the copy 
command (e.g., it's using 'mkdir' instead of 'mkdir -p').

The issue can be worked around by deleting the existing snapshot folder on the 
staging store every time you want to create a new template.  This is obviously 
not viable for end users.

This issue should be fixed before 4.4 ships because it should be a stupid 
simple thing to correct, but completely breaks restoring snapshots for end 
users.  Waiting for 4.5 would be far too long for an issue like this.

2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) 
Processing command: org.apache.cloudstack.storage.command.CopyCommand
2014-06-18 21:13:54,789 INFO  [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP 
172.16.48.99
2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] 
(agentRequest-Handler-2:null) Unable to create directory 
/mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy 
from S3 to cache.

I'm guessing it's an issue with the mkdirs() function in the code, but I've 
been unable to find it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CLOUDSTACK-6473) Debian 7 Virtual Router ip_conntrack_max not set at boot

2014-04-22 Thread Logan B (JIRA)
Logan B created CLOUDSTACK-6473:
---

 Summary: Debian 7 Virtual Router ip_conntrack_max not set at boot
 Key: CLOUDSTACK-6473
 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6473
 Project: CloudStack
  Issue Type: Bug
  Security Level: Public (Anyone can view this level - this is the default.)
  Components: Virtual Router
Affects Versions: 4.3.0
 Environment: XenServer 6.2
CloudStack 4.3.0
Debian 7 SystemVM/Virtual Router
Reporter: Logan B
 Fix For: 4.3.1


The Problem:
The Debian 7 Virtual Router VMs for XenServer experiences intermittent 
connectivity problems.  This affects all VMs behind the virtual router in 
various ways: SSH failures, Apache connections fail, etc.

This issue also affects various function within CloudStack that attempt to 
connect to the Virtual Router (updating firewall rules, NAT, etc.)

The Cause:
It appears that the issues is being caused by a low default limit for the 
net.ipv4.netfilter.ip_conntrack_max sysctl.  The issue can be easily diagnosed 
in /var/log/messages:
Apr 22 15:45:34 r-5602-VM kernel: [ 1085.117498] nf_conntrack: table full, 
dropping packet.
Apr 22 15:45:34 r-5602-VM kernel: [ 1085.133095] nf_conntrack: table full, 
dropping packet.
Apr 22 15:45:34 r-5602-VM kernel: [ 1085.145440] nf_conntrack: table full, 
dropping packet.

The default setting for ip_conntrack_max is '3796': 
# sysctl net.ipv4.netfilter.ip_conntrack_max
net.ipv4.netfilter.ip_conntrack_max = 3796

As per /etc/sysctl.conf this setting should be '100':
net.ipv4.netfilter.ip_conntrack_max=100

It would appear that this setting is not being correctly applied when the 
virtual router boots.

The Solution:
- A temporary workaround is to manually set the ip_conntrack_max sysctl to the 
correct value:
# sysctl -w net.ipv4.netfilter.ip_conntrack_max=100

It's likely that this sysctl is being run at boot before the module is loaded, 
so it doesn't take effect.  There are various solutions suggested around the 
web, any of which should work fine.

To resolve this problem a new System VM template should be created.  I'm 
assuming this can be done in between CloudStack releases.  I know there is 
supposed to be a new template released to fix the HeartBleed vulnerability, so 
this would be a good fix to include with that updated template.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline

2013-08-07 Thread Logan B (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732577#comment-13732577
 ] 

Logan B commented on CLOUDSTACK-3535:
-

Does the submitted fix address the same issue in XenServer?  If not then I 
don't think this can be flagged as "fixed."

> No HA actions are performed when a KVM host goes offline
> 
>
> Key: CLOUDSTACK-3535
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Hypervisor Controller, KVM, Management Server
>Affects Versions: 4.1.0, 4.1.1, 4.2.0
> Environment: KVM (CentOS 6.3) with CloudStack 4.1
>Reporter: Paul Angus
>Assignee: edison su
>Priority: Blocker
> Fix For: 4.2.0
>
> Attachments: management-server.log.Agent
>
>
> If a KVM host 'goes down', CloudStack does not perform HA for instances which 
> are marked as HA enabled on that host (including system VMs)
> CloudStack does not show the host as disconnected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CLOUDSTACK-3421) When hypervisor is down, no HA occurs with log output "Agent state cannot be determined, do nothing"

2013-07-23 Thread Logan B (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716611#comment-13716611
 ] 

Logan B commented on CLOUDSTACK-3421:
-

This issue is related to: CLOUDSTACK-3535 and affects XenServer/XCP as well.

The patch to address the split brain issue isn't a fix as much as a workaround. 
 Further steps need to be taken to test if the loss of communication is in the 
link between the management server and the network, or the host and the network.

If the management server can communicate with all but one host in the cluster 
it shouldn't just "do nothing."  At the very least it needs to alert an 
administrator that there's a potential problem.  As has been mentioned right 
now if a host goes down there's no indication that it's happened until 
customers start reporting outages.

> When hypervisor is down, no HA occurs with log output "Agent state cannot be 
> determined, do nothing"
> 
>
> Key: CLOUDSTACK-3421
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3421
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: KVM, Management Server
>Affects Versions: 4.1.0
> Environment: CentOS 6.4 minimal install
> Libvirt, KVM/Qemu
> CloudStack 4.1
> GlusterFS 3.2, replicated+distributed as primary storage via Shared Mount 
> Point
> 3 physical servers
> * 1 management server, running NFS secondary storage
> ** 1 nic, management+storage
> * 2 hypervisor nodes, running glusterfs-server 
> ** 4x nic, management+storage, public, guest, gluster peering
> * Advanced zone
> * KVM
> * 4 networks: 
>  eth0: cloudbr0: management+secondary storage, 
>  eth2: cloudbr1: public
>  eth3: cloudbr2: guest
>  eth1: gluster peering
> * Shared Mount Point
> * custom network offering with redundant routers enabled
> * global settings tweaked to increase speed of identifying down state
> ** ping.interval: 10sec
>Reporter: Gerard Lynch
>Priority: Critical
> Fix For: 4.1.1, 4.2.0, Future
>
> Attachments: catalina_management-server.zip
>
>
> We wanted to test CloudStack's HA capabilities by simulating outages to find 
> out how long it would take to recover.  One of the tests was simulating loss 
> of a hypervisor node by shutting it down.   When we tested this, we found 
> that CloudStack failed to bring up any of the VMs (System or Instance), which 
> were on the down node, until the node was powered back up and reconnected.
> In the logs, we see repeating occurances of:
> INFO  [utils.exception.CSExceptionErrorCode] (AgentTaskPool-11:) Could not 
> find exception: com.cloud.exception.OperationTimedoutException in error code 
> list for exceptions
> INFO  [utils.exception.CSExceptionErrorCode] (AgentTaskPool-10:) Could not 
> find exception: com.cloud.exception.OperationTimedoutException in error code 
> list for exceptions
> WARN  [agent.manager.AgentAttache] (AgentTaskPool-11:) Seq 14-660013135: 
> Timed out on Seq 14-660013135:  { Cmd , MgmtId: 93515041483, via: 14, Ver: 
> v1, Flags: 100011, [{"CheckHealthCommand":{"wait":50}}] }
> WARN  [agent.manager.AgentAttache] (AgentTaskPool-10:) Seq 15-1097531400: 
> Timed out on Seq 15-1097531400:  { Cmd , MgmtId: 93515041483, via: 15, Ver: 
> v1, Flags: 100011, [{"CheckHealthCommand":{"wait":50}}] }
> WARN  [agent.manager.AgentManagerImpl] (AgentTaskPool-11:) Operation timed 
> out: Commands 660013135 to Host 14 timed out after 100
> WARN  [agent.manager.AgentManagerImpl] (AgentTaskPool-10:) Operation timed 
> out: Commands 1097531400 to Host 15 timed out after 100
> WARN  [agent.manager.AgentManagerImpl] (AgentTaskPool-11:) Agent state cannot 
> be determined, do nothing
> WARN  [agent.manager.AgentManagerImpl] (AgentTaskPool-10:) Agent state cannot 
> be determined, do nothing
> To reproduce: 
> 1. Build the environment as detailed above
> 2. Register an ISO
> 3. Create a new guest network using the custom network offering (that offers 
> redundant routers)
> 3. Provision an instance
> 4. Ensure the system VMs and instance are on the first hypervisor node
> 5. Shutdown the first hypervisor node (or pull the plug)
> Expected result:
>   All system VMs and instance(s) should be brought up on the 2nd hypervisor 
> node.
> Actual result:
>   We see the first hypervisor node marked "disconnected."
>   All System VMs and the Instance are still marked "Running", however ping to 
> any of them fails. 
>   Ping to the redundant router on the 2nd hypervisor node is still working.
>   We see in the logs 
>   "INFO  [utils.exception.CSExceptionErrorCode] (AgentTaskPool-11:) Could not 
> find exception: com.cloud.exception.OperationTimedoutException in erro

[jira] [Commented] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline

2013-07-16 Thread Logan B (JIRA)

[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710029#comment-13710029
 ] 

Logan B commented on CLOUDSTACK-3535:
-

Please note that this bug does not only affect KVM.  We have experienced the 
same issue with XCP 1.6/XenServer hosts.

The problem stems from a previous fix to prevent a potential split brain issue 
when the management server loses connectivity to the cluster.  The AgentImpl 
function used to mark the host as down when it couldn't be reached, now it just 
marks it at "unable to determine state" and does nothing.  This does fix the 
split brain issue, but if the hosts actually goes down then HA will never take 
over.

I realize this is a tricky fix, and my programming knowledge is minimal, but I 
do have a suggestion for a fix.  The only time the management server should run 
into an actual split brain issue is if it loses connectivity to the clusters.  
Could the following logic be implemented?

( I apologize for the potentially confusing formatting.)

If: Management server cannot ping host:
-> Then: Try to ping management gateway.
--> If: Management server CAN ping gateway:
---> Then: Try to ping other hosts in cluster:
> If: Other hosts can be pinged AND gateway can be pinged:
-> Then: Start HA and send host down report/alert.
> Else If: Other hosts CANNOT be pinged AND gateway CAN be pinged:
-> Then: Send cluster connectivity alert, and do nothing with HA.
--> Else If: Management server CANNOT ping gateway:
---> Then: Attempt to send management connectivity alert, and do nothing with 
HA.

The only time I could see this causing an issue if if the networking for Host A 
goes down, HA migrates VMs to Host B, then Host A's networking comes back up 
with running VMs.  I don't see this being a very likely scenario though.

A short term solution would be to at least trigger some sort of alert/e-mail 
when the host status cannot be determined.  That way manual intervention can be 
started much more quickly.  Right now a host can be offline indefinitely 
without any notice.

> No HA actions are performed when a KVM host goes offline
> 
>
> Key: CLOUDSTACK-3535
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535
> Project: CloudStack
>  Issue Type: Bug
>  Security Level: Public(Anyone can view this level - this is the 
> default.) 
>  Components: Hypervisor Controller, KVM, Management Server
>Affects Versions: 4.1.0, Future
> Environment: KVM (CentOS 6.3) with CloudStack 4.1
>Reporter: Paul Angus
>
> If a KVM host 'goes down', CloudStack does not perform HA for instances which 
> are marked as HA enabled on that host (including system VMs)
> CloudStack does not show the host as disconnected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira