[jira] [Updated] (CLOUDSTACK-7881) Allow VPN IP range to be specified when creating a VPN
[ https://issues.apache.org/jira/browse/CLOUDSTACK-7881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Logan B updated CLOUDSTACK-7881: Attachment: f74b1a26db4514b9795ed760504351db8b03ef03.patch Patch submitted to review board. > Allow VPN IP range to be specified when creating a VPN > -- > > Key: CLOUDSTACK-7881 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7881 > Project: CloudStack > Issue Type: Improvement > Security Level: Public(Anyone can view this level - this is the > default.) > Components: UI >Affects Versions: 4.4.0 > Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS >Reporter: Logan B >Priority: Minor > Fix For: 4.5.0, 4.6.0 > > Attachments: f74b1a26db4514b9795ed760504351db8b03ef03.patch > > > Currently when creating a VPN on an Isolated Network via the UI the default > VPN IP range (specified in Global Settings) is used. The API permits > overriding this range during VPN creation. > I would suggest adding a text box to the VPN creation form in the UI to > specify an IP range that overrides the defaults. While not critical, it can > be useful to the end user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CLOUDSTACK-7847) API: listDomains should display the domain resources, similar to listAccounts
[ https://issues.apache.org/jira/browse/CLOUDSTACK-7847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230039#comment-14230039 ] Logan B commented on CLOUDSTACK-7847: - Wei, Please makes sure to post the commit id here when ready. I'll be happy to test it, as we will need to pull this into our 4.5 deployment so we can display statistics to our customers without giant loop calls. > API: listDomains should display the domain resources, similar to listAccounts > - > > Key: CLOUDSTACK-7847 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7847 > Project: CloudStack > Issue Type: Improvement > Security Level: Public(Anyone can view this level - this is the > default.) > Components: API >Affects Versions: 4.4.0 > Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS >Reporter: Logan B >Assignee: Wei Zhou > Fix For: 4.6.0 > > > Currently the "listDomains" call does not display any resource statistics. > Since resources can be limited at the Domain level, it would make sense to > have the "listDomains" call return the resource limit & usage details the > same way "listAccounts" does. > I would suggest having it return the following details for the domain: > - Max/Used IPs > - Max/Used Templates > - Max/Used Snapshots > - Max/Used VPC > - Max/Used Networks > - Max/Used Memory > - Max/Used Projects > - Max/Used vCPU Count > - Max/Used CPU Mhz (This may not actually be tracked by CloudStack) > - Max/Used Primary Storage > - Max/Used Secondary Storage > - I may have missed some. > This would make it much easier to pull statistics information for a domain, > instead of having to use multiple other calls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CLOUDSTACK-7882) SSH Keypair Creation/Selection in UI
Logan B created CLOUDSTACK-7882: --- Summary: SSH Keypair Creation/Selection in UI Key: CLOUDSTACK-7882 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7882 Project: CloudStack Issue Type: Improvement Security Level: Public (Anyone can view this level - this is the default.) Components: UI Affects Versions: 4.4.0 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS Reporter: Logan B Priority: Minor Fix For: 4.5.0, 4.6.0 Currently the API allows for creating SSH keypairs, and specifying keypairs to use when deploying a VM (if the correct script is installed in the template). I would suggest adding a section in the UI (perhaps as a drop down option in the instances menu) to create SSH keypairs. I would then suggest adding an option in the Instance Wizard to select a keypair to inject into the instance upon creation. It may also be worth adding a button to the instance menu to inject a new keypair upon reboot (like we have for password resets now). This could be enabled/disabled with a template flag (e.g., "SSH Key Enabled," like the "Password Enabled" flag we have now) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CLOUDSTACK-7881) Allow VPN IP range to be specified when creating a VPN
Logan B created CLOUDSTACK-7881: --- Summary: Allow VPN IP range to be specified when creating a VPN Key: CLOUDSTACK-7881 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7881 Project: CloudStack Issue Type: Improvement Security Level: Public (Anyone can view this level - this is the default.) Components: UI Affects Versions: 4.4.0 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS Reporter: Logan B Priority: Minor Fix For: 4.5.0, 4.6.0 Currently when creating a VPN on an Isolated Network via the UI the default VPN IP range (specified in Global Settings) is used. The API permits overriding this range during VPN creation. I would suggest adding a text box to the VPN creation form in the UI to specify an IP range that overrides the defaults. While not critical, it can be useful to the end user. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CLOUDSTACK-7848) API: updateResourceCount doesn't return all statistics
[ https://issues.apache.org/jira/browse/CLOUDSTACK-7848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Logan B updated CLOUDSTACK-7848: Issue Type: Bug (was: Improvement) > API: updateResourceCount doesn't return all statistics > -- > > Key: CLOUDSTACK-7848 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7848 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: API >Affects Versions: 4.4.0 > Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS >Reporter: Logan B > Fix For: 4.5.0 > > > Currently the "updateResourceCount" API call is not returning correct values > for all of the statistics. Specifically the "Memory Used" and "Secondary > Storage Used" are being returned as "0" even if those resources are being > used. > As a workaround right now I'm having to go through other calls to pull this > data down. > I'm unsure if there are other values not being returned correctly, but I can > confirm that at least the "IPs Used", "Templates Used", and "Primary Storage > Used" values are being returned. > I have tested this with the "domainid" field specified. I haven't tested > without "domainid" since that is my use case. > Here is a var_dump of the call with unique information removed: > object(stdClass)#2 (1) { > ["updateresourcecountresponse"]=> > object(stdClass)#3 (2) { > ["count"]=> > int(12) > ["resourcecount"]=> > array(12) { > [0]=> > object(stdClass)#4 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(1) "0" > ["resourcecount"]=> > int(2) > } > [1]=> > object(stdClass)#5 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(1) "1" > ["resourcecount"]=> > int(2) > } > [2]=> > object(stdClass)#6 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(1) "2" > ["resourcecount"]=> > int(2) > } > [3]=> > object(stdClass)#7 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(1) "3" > ["resourcecount"]=> > int(2) > } > [4]=> > object(stdClass)#8 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(1) "4" > ["resourcecount"]=> > int(0) > } > [5]=> > object(stdClass)#9 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(1) "5" > ["resourcecount"]=> > int(0) > } > [6]=> > object(stdClass)#10 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(1) "6" > ["resourcecount"]=> > int(1) > } > [7]=> > object(stdClass)#11 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(1) "7" > ["resourcecount"]=> > int(0) > } > [8]=> > object(stdClass)#12 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(1) "8" > ["resourcecount"]=> > int(0) > } > [9]=> > object(stdClass)#13 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(1) "9" > ["resourcecount"]=> > int(0) > } > [10]=> > object(stdClass)#14 (4) { > ["domainid"]=> > string(36) "12345678-91234-56789-1234-567891234" > ["domain"]=> > string(7) "Example" > ["resourcetype"]=> > string(2) "10" > ["resourcecount"]=> > float(11811160064) > } > [11]=> > object(stdClass)#
[jira] [Created] (CLOUDSTACK-7848) API: updateResourceCount doesn't return all statistics
Logan B created CLOUDSTACK-7848: --- Summary: API: updateResourceCount doesn't return all statistics Key: CLOUDSTACK-7848 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7848 Project: CloudStack Issue Type: Improvement Security Level: Public (Anyone can view this level - this is the default.) Components: API Affects Versions: 4.4.0 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS Reporter: Logan B Fix For: 4.5.0 Currently the "updateResourceCount" API call is not returning correct values for all of the statistics. Specifically the "Memory Used" and "Secondary Storage Used" are being returned as "0" even if those resources are being used. As a workaround right now I'm having to go through other calls to pull this data down. I'm unsure if there are other values not being returned correctly, but I can confirm that at least the "IPs Used", "Templates Used", and "Primary Storage Used" values are being returned. I have tested this with the "domainid" field specified. I haven't tested without "domainid" since that is my use case. Here is a var_dump of the call with unique information removed: object(stdClass)#2 (1) { ["updateresourcecountresponse"]=> object(stdClass)#3 (2) { ["count"]=> int(12) ["resourcecount"]=> array(12) { [0]=> object(stdClass)#4 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(1) "0" ["resourcecount"]=> int(2) } [1]=> object(stdClass)#5 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(1) "1" ["resourcecount"]=> int(2) } [2]=> object(stdClass)#6 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(1) "2" ["resourcecount"]=> int(2) } [3]=> object(stdClass)#7 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(1) "3" ["resourcecount"]=> int(2) } [4]=> object(stdClass)#8 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(1) "4" ["resourcecount"]=> int(0) } [5]=> object(stdClass)#9 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(1) "5" ["resourcecount"]=> int(0) } [6]=> object(stdClass)#10 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(1) "6" ["resourcecount"]=> int(1) } [7]=> object(stdClass)#11 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(1) "7" ["resourcecount"]=> int(0) } [8]=> object(stdClass)#12 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(1) "8" ["resourcecount"]=> int(0) } [9]=> object(stdClass)#13 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(1) "9" ["resourcecount"]=> int(0) } [10]=> object(stdClass)#14 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(2) "10" ["resourcecount"]=> float(11811160064) } [11]=> object(stdClass)#15 (4) { ["domainid"]=> string(36) "12345678-91234-56789-1234-567891234" ["domain"]=> string(7) "Example" ["resourcetype"]=> string(2) "11" ["resourcecount"]=> int(0) } } } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CLOUDSTACK-7847) API: listDomains should display the domain resources, similar to listAccounts
Logan B created CLOUDSTACK-7847: --- Summary: API: listDomains should display the domain resources, similar to listAccounts Key: CLOUDSTACK-7847 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7847 Project: CloudStack Issue Type: Improvement Security Level: Public (Anyone can view this level - this is the default.) Components: API Affects Versions: 4.4.0 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS Reporter: Logan B Fix For: 4.5.0 Currently the "listDomains" call does not display any resource statistics. Since resources can be limited at the Domain level, it would make sense to have the "listDomains" call return the resource limit & usage details the same way "listAccounts" does. I would suggest having it return the following details for the domain: - Max/Used IPs - Max/Used Templates - Max/Used Snapshots - Max/Used VPC - Max/Used Networks - Max/Used Memory - Max/Used Projects - Max/Used vCPU Count - Max/Used CPU Mhz (This may not actually be tracked by CloudStack) - Max/Used Primary Storage - Max/Used Secondary Storage - I may have missed some. This would make it much easier to pull statistics information for a domain, instead of having to use multiple other calls. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CLOUDSTACK-7845) Strict Implicit Dedication should allow for deploying owned Virtual Routers on dedicated host
Logan B created CLOUDSTACK-7845: --- Summary: Strict Implicit Dedication should allow for deploying owned Virtual Routers on dedicated host Key: CLOUDSTACK-7845 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7845 Project: CloudStack Issue Type: Improvement Security Level: Public (Anyone can view this level - this is the default.) Components: SystemVM, Virtual Router Affects Versions: 4.4.0 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS Reporter: Logan B Fix For: 4.5.0 Currently the best method of isolation for domains or accounts is Strict Implicit Dedication. The reasoning is as follows: Goal: Dedicated a resource (host, cluster, or pod) to an account or domain. Problems: - Explicit Dedication: Account or domain's VMs are all deployed on it's dedicated resources. However, System VMs (Virtual Routers) belonging to OTHER accounts can also be deployed on those same resources (host, cluster, or pod). This is not desirable. - Preferred Implicit Dedication: Account or domain's VMs are deployed on it's dedicated resources. However, if those resources are near full utilization there is a chance that the account or domain's VMs will be deployed on resources that are not dedicated. This is less likely, but also undesirable. We are currently using both explicit and implicit dedication. The explicit dedication ensures that the first VM deployed is provisioned on the dedicated resources, while the implicit dedication ensures that other accounts can't deploy resources on the same dedicated resources (intentionally or not). Proposed changes: Currently Virtual Router's are considered to be owned by the "system" account, even though they are each tied to a specific user account. This probably doesn't need to change, but it makes a solution to the above issue easier since Virtual Router's are already tagged/associated with user accounts. I would suggest changing the Strict Implicit Dedication planner, and the Virtual Router deployment planner as follows: - Strict Implicit Dedication: When selecting a host for strict implicit dedication Virtual Router's belonging to the account that "owns" the resource should be ignored. Virtual Router's or other System VMs belonging to OTHER accounts should still be considered, and cause the deployment to fail. - Virtual Router deployment: Virtual Router's belonging to an account should prefer deployment on explicitly or implicitly dedicated resources belonging to that same account. In addition, deployment should not fail if the Strict Implicitly dedicated resource owner and the Virtual Router "owner" match. The end goal here is to provide absolute isolation for accounts or domains with dedicated resources. If someone pays for a 'private cloud' with dedicated hardware then all of their deployed services should end up on that hardware, and no other account/domain should be able to utilize the dedicated resources of another. This ensures that an outage or issue on a public resource doesn't affect the dedicated/private infrastructure, and "public" users can't consume private resources being paid for by someone else. Currently the only way this is possible is by dedicating an entire zone to an account, but that is far from ideal, and makes management of the overall deployment/networking/etc. much more of a hassle. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CLOUDSTACK-7844) IP Reservation in Isolated Networks doesn't work as expected
Logan B created CLOUDSTACK-7844: --- Summary: IP Reservation in Isolated Networks doesn't work as expected Key: CLOUDSTACK-7844 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7844 Project: CloudStack Issue Type: Bug Security Level: Public (Anyone can view this level - this is the default.) Components: Virtual Router Affects Versions: 4.4.0 Environment: CloudStack 4.4.0 w/ KVM Hypervisor on Ubuntu 14.04 LTS Reporter: Logan B Fix For: 4.5.0 When using the IP Reservation functionality on an Isolated Guest Network in an Advanced Zone it doesn't work as expected. Goal: Create Isolated Network with 10.1.1.0/24 subnet. Configure network with IP reservation to 10.1.1.0/25. Test: 1. Create Isolated Guest Network with VR/DHCP/Etc. (Using the default 'IsolatedNetworkOfferingWithSourceNAT' offering). Use default Guest CIDR (10.1.1.0/24). 2. Deploy VM on network to "Implement" it.* Make sure VM has a NIC in 10.1.1.0/25. (ex: 10.1.1.50). 3. Edit network and set "Guest CIDR" to 10.1.1.0/25. After saving the "Guest CIDR" field should display 10.1.1.0/25, and the "Network CIDR" field should be 10.1.1.0/24. 4. NOTE: At this point everything should be working as expected. Problems don't occur until the next step. 5. Restart the network you created (with "Cleanup" checked). 6. Reboot the VM you created earlier, or run dhclient on the primary interface. 7. The VM will now have a /25 (255.255.255.128) netmask, instead of the /24 it was initially deployed with. 8. Manually modify the VMs IP and netmask to be outside the Guest CIDR, but still within the network CIDR (e.g., 10.1.1.150/24), and create a default route for the VR IP (e.g. 10.1.1.1). Expected Result: - No VMs should be deployed in the reserved range. - IPs in the reserved range (10.1.1.127 - 10.1.1.254) should be able to ping VMs in the Guest CIDR range (10.1.1.2 - 10.1.1.125), and vice versa. - The virtual router should still have a 255.255.255.0 netmask, and provide routing/DHCP/etc for the entire subnet (10.1.1.0/24). - New VMs created on the guest network should get an IP in the Guest CIDR range (10.1.1.0/25) but have the Network CIDR netmask (255.255.255.0). Observed Result: - No VMs are deployed in the reserved range. - IPs in the reserved range (10.1.1.127 - 10.1.1.254) are NOT ABLE to ping VMs in the Guest CIDR range (10.1.1.2 - 10.1.1.125), and vice versa. - The virtual router has a /25 (255.255.255.128) netmask, and only provides routing/DHCP for addresses in that subnet. - New VMs created on the network are deployed in the Guest CIDR range (10.1.1.0/25) with a /25 (255.255.255.128) netmask, instead of a /24 (255.255.255.0) netmask. I'm assuming this is not the intended behavior. I posted these results on the dev list, but didn't get much traction. I would assume this can be resolved in one of two ways: - Option A) Ensure that the Virtual Router always pulls it's netmask/routing from the Network CIDR. As I understand it CloudStack manually creates static DHCP entries when provisioning VMs, so I don't think any networking changes should take effect on the VR when implementing IP reservation. (If anything we should just update the "dhcp-range" instead of the netmask/routing. - Option B) When IP reservation is in effect the virtual router should be updated with a route to the reserved range (10.1.1.128/25). That way it will still be reachable if we manually set a /24 netmask on hosts in the reserved range. This option seems like a workaround rather than a fix, so Option A would be preferred. Notice that this problem ONLY comes up when the Virtual Router is cleaned up or re-deployed. Because of this it may not be caught in standard testing, but it can cause problems when the router is restarted for HA/migrations/maintenance/etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage
[ https://issues.apache.org/jira/browse/CLOUDSTACK-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Logan B resolved CLOUDSTACK-6938. - Resolution: Fixed Fixed with 736bf540e8ef759a101d221622c64f3b3c3ed425 > Cannot create template from snapshot when using S3 storage > -- > > Key: CLOUDSTACK-6938 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Snapshot >Affects Versions: 4.4.0 > Environment: KVM + S3 Secondary Storage >Reporter: Logan B >Priority: Critical > Fix For: 4.4.0 > > > When trying to create a template from a snapshot with S3 secondary storage, > the command immediately fails with a NullPointerException. > This appears to only happen when there is a pre-existing snapshot folder in > the NFS staging store. This indicates that there is something wrong with the > copy command (e.g., it's using 'mkdir' instead of 'mkdir -p'). > The issue can be worked around by deleting the existing snapshot folder on > the staging store every time you want to create a new template. This is > obviously not viable for end users. > This issue should be fixed before 4.4 ships because it should be a stupid > simple thing to correct, but completely breaks restoring snapshots for end > users. Waiting for 4.5 would be far too long for an issue like this. > 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-2:null) Processing command: > org.apache.cloudstack.storage.command.CopyCommand > 2014-06-18 21:13:54,789 INFO [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP > 172.16.48.99 > 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Unable to create directory > /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy > from S3 to cache. > I'm guessing it's an issue with the mkdirs() function in the code, but I've > been unable to find it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage
[ https://issues.apache.org/jira/browse/CLOUDSTACK-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14039078#comment-14039078 ] Logan B commented on CLOUDSTACK-6938: - I've posted a patch for this issue to the review board & mailing list. Seems to be working for me, but I have no idea if the logic is actually sound. > Cannot create template from snapshot when using S3 storage > -- > > Key: CLOUDSTACK-6938 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Snapshot >Affects Versions: 4.4.0 > Environment: KVM + S3 Secondary Storage >Reporter: Logan B >Priority: Critical > Fix For: 4.4.0 > > > When trying to create a template from a snapshot with S3 secondary storage, > the command immediately fails with a NullPointerException. > This appears to only happen when there is a pre-existing snapshot folder in > the NFS staging store. This indicates that there is something wrong with the > copy command (e.g., it's using 'mkdir' instead of 'mkdir -p'). > The issue can be worked around by deleting the existing snapshot folder on > the staging store every time you want to create a new template. This is > obviously not viable for end users. > This issue should be fixed before 4.4 ships because it should be a stupid > simple thing to correct, but completely breaks restoring snapshots for end > users. Waiting for 4.5 would be far too long for an issue like this. > 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-2:null) Processing command: > org.apache.cloudstack.storage.command.CopyCommand > 2014-06-18 21:13:54,789 INFO [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP > 172.16.48.99 > 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Unable to create directory > /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy > from S3 to cache. > I'm guessing it's an issue with the mkdirs() function in the code, but I've > been unable to find it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage
[ https://issues.apache.org/jira/browse/CLOUDSTACK-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038846#comment-14038846 ] Logan B commented on CLOUDSTACK-6938: - Understandable, though a bug that makes existing features unusable seems like it should be fixed sooner rather than later. Since I would doubt 4.5 will release before September I think something this simple should be looked at. I'm attempting to come up with a patch, test it, and submit it for review, but having never done any real development before I don't know if I can get it in and approved before an RC build. > Cannot create template from snapshot when using S3 storage > -- > > Key: CLOUDSTACK-6938 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Snapshot >Affects Versions: 4.4.0 > Environment: KVM + S3 Secondary Storage >Reporter: Logan B >Priority: Critical > Fix For: 4.4.0 > > > When trying to create a template from a snapshot with S3 secondary storage, > the command immediately fails with a NullPointerException. > This appears to only happen when there is a pre-existing snapshot folder in > the NFS staging store. This indicates that there is something wrong with the > copy command (e.g., it's using 'mkdir' instead of 'mkdir -p'). > The issue can be worked around by deleting the existing snapshot folder on > the staging store every time you want to create a new template. This is > obviously not viable for end users. > This issue should be fixed before 4.4 ships because it should be a stupid > simple thing to correct, but completely breaks restoring snapshots for end > users. Waiting for 4.5 would be far too long for an issue like this. > 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-2:null) Processing command: > org.apache.cloudstack.storage.command.CopyCommand > 2014-06-18 21:13:54,789 INFO [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP > 172.16.48.99 > 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Unable to create directory > /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy > from S3 to cache. > I'm guessing it's an issue with the mkdirs() function in the code, but I've > been unable to find it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage
[ https://issues.apache.org/jira/browse/CLOUDSTACK-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036468#comment-14036468 ] Logan B commented on CLOUDSTACK-6938: - This is the relevant bit of code: In NfsSecondaryStorageResource.java: if (!downloadDirectory.mkdirs()) { final String errMsg = "Unable to create directory " + downloadPath + " to copy from S3 to cache."; s_logger.error(errMsg); return new CopyCmdAnswer(errMsg); } else { s_logger.debug("Directory " + downloadPath + " already exists"); } I believe mkdirs() returns false if the directory already exists. So this failure logic is prone to breaking. Better logic might be: if (downloadDirectory.exists()) { s_logger.debug("Directory " + downloadPath + " already exists"); } else { if (!downloadDirectory.mkdirs()) { final String errMsg = "Unable to create directory " + downloadPath + " to copy from S3 to cache."; s_logger.error(errMsg); return new CopyCmdAnswer(errMsg); } I'm not a programmer, but it seems that checking for the existing path before blindly failing would be better here. If this code checks out I'll try to figure out how to offer a commit for cherry picking. > Cannot create template from snapshot when using S3 storage > -- > > Key: CLOUDSTACK-6938 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Snapshot >Affects Versions: 4.4.0 > Environment: KVM + S3 Secondary Storage >Reporter: Logan B >Priority: Blocker > Fix For: 4.4.0 > > > When trying to create a template from a snapshot with S3 secondary storage, > the command immediately fails with a NullPointerException. > This appears to only happen when there is a pre-existing snapshot folder in > the NFS staging store. This indicates that there is something wrong with the > copy command (e.g., it's using 'mkdir' instead of 'mkdir -p'). > The issue can be worked around by deleting the existing snapshot folder on > the staging store every time you want to create a new template. This is > obviously not viable for end users. > This issue should be fixed before 4.4 ships because it should be a stupid > simple thing to correct, but completely breaks restoring snapshots for end > users. Waiting for 4.5 would be far too long for an issue like this. > 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-2:null) Processing command: > org.apache.cloudstack.storage.command.CopyCommand > 2014-06-18 21:13:54,789 INFO [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP > 172.16.48.99 > 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Unable to create directory > /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy > from S3 to cache. > I'm guessing it's an issue with the mkdirs() function in the code, but I've > been unable to find it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage
[ https://issues.apache.org/jira/browse/CLOUDSTACK-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Logan B updated CLOUDSTACK-6938: Fix Version/s: 4.4.0 > Cannot create template from snapshot when using S3 storage > -- > > Key: CLOUDSTACK-6938 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Snapshot >Affects Versions: 4.4.0 > Environment: KVM + S3 Secondary Storage >Reporter: Logan B >Priority: Blocker > Fix For: 4.4.0 > > > When trying to create a template from a snapshot with S3 secondary storage, > the command immediately fails with a NullPointerException. > This appears to only happen when there is a pre-existing snapshot folder in > the NFS staging store. This indicates that there is something wrong with the > copy command (e.g., it's using 'mkdir' instead of 'mkdir -p'). > The issue can be worked around by deleting the existing snapshot folder on > the staging store every time you want to create a new template. This is > obviously not viable for end users. > This issue should be fixed before 4.4 ships because it should be a stupid > simple thing to correct, but completely breaks restoring snapshots for end > users. Waiting for 4.5 would be far too long for an issue like this. > 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] > (agentRequest-Handler-2:null) Processing command: > org.apache.cloudstack.storage.command.CopyCommand > 2014-06-18 21:13:54,789 INFO [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP > 172.16.48.99 > 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] > (agentRequest-Handler-2:null) Unable to create directory > /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy > from S3 to cache. > I'm guessing it's an issue with the mkdirs() function in the code, but I've > been unable to find it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CLOUDSTACK-6938) Cannot create template from snapshot when using S3 storage
Logan B created CLOUDSTACK-6938: --- Summary: Cannot create template from snapshot when using S3 storage Key: CLOUDSTACK-6938 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6938 Project: CloudStack Issue Type: Bug Security Level: Public (Anyone can view this level - this is the default.) Components: Snapshot Affects Versions: 4.4.0 Environment: KVM + S3 Secondary Storage Reporter: Logan B Priority: Blocker When trying to create a template from a snapshot with S3 secondary storage, the command immediately fails with a NullPointerException. This appears to only happen when there is a pre-existing snapshot folder in the NFS staging store. This indicates that there is something wrong with the copy command (e.g., it's using 'mkdir' instead of 'mkdir -p'). The issue can be worked around by deleting the existing snapshot folder on the staging store every time you want to create a new template. This is obviously not viable for end users. This issue should be fixed before 4.4 ships because it should be a stupid simple thing to correct, but completely breaks restoring snapshots for end users. Waiting for 4.5 would be far too long for an issue like this. 2014-06-18 21:13:54,789 DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) Processing command: org.apache.cloudstack.storage.command.CopyCommand 2014-06-18 21:13:54,789 INFO [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-2:null) Determined host 172.16.48.99 corresponds to IP 172.16.48.99 2014-06-18 21:13:54,797 ERROR [storage.resource.NfsSecondaryStorageResource] (agentRequest-Handler-2:null) Unable to create directory /mnt/SecStorage/6b9bdec9-fdc9-3fdd-a5f8-0481df177ae8/snapshots/2/25 to copy from S3 to cache. I'm guessing it's an issue with the mkdirs() function in the code, but I've been unable to find it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CLOUDSTACK-6473) Debian 7 Virtual Router ip_conntrack_max not set at boot
Logan B created CLOUDSTACK-6473: --- Summary: Debian 7 Virtual Router ip_conntrack_max not set at boot Key: CLOUDSTACK-6473 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6473 Project: CloudStack Issue Type: Bug Security Level: Public (Anyone can view this level - this is the default.) Components: Virtual Router Affects Versions: 4.3.0 Environment: XenServer 6.2 CloudStack 4.3.0 Debian 7 SystemVM/Virtual Router Reporter: Logan B Fix For: 4.3.1 The Problem: The Debian 7 Virtual Router VMs for XenServer experiences intermittent connectivity problems. This affects all VMs behind the virtual router in various ways: SSH failures, Apache connections fail, etc. This issue also affects various function within CloudStack that attempt to connect to the Virtual Router (updating firewall rules, NAT, etc.) The Cause: It appears that the issues is being caused by a low default limit for the net.ipv4.netfilter.ip_conntrack_max sysctl. The issue can be easily diagnosed in /var/log/messages: Apr 22 15:45:34 r-5602-VM kernel: [ 1085.117498] nf_conntrack: table full, dropping packet. Apr 22 15:45:34 r-5602-VM kernel: [ 1085.133095] nf_conntrack: table full, dropping packet. Apr 22 15:45:34 r-5602-VM kernel: [ 1085.145440] nf_conntrack: table full, dropping packet. The default setting for ip_conntrack_max is '3796': # sysctl net.ipv4.netfilter.ip_conntrack_max net.ipv4.netfilter.ip_conntrack_max = 3796 As per /etc/sysctl.conf this setting should be '100': net.ipv4.netfilter.ip_conntrack_max=100 It would appear that this setting is not being correctly applied when the virtual router boots. The Solution: - A temporary workaround is to manually set the ip_conntrack_max sysctl to the correct value: # sysctl -w net.ipv4.netfilter.ip_conntrack_max=100 It's likely that this sysctl is being run at boot before the module is loaded, so it doesn't take effect. There are various solutions suggested around the web, any of which should work fine. To resolve this problem a new System VM template should be created. I'm assuming this can be done in between CloudStack releases. I know there is supposed to be a new template released to fix the HeartBleed vulnerability, so this would be a good fix to include with that updated template. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline
[ https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732577#comment-13732577 ] Logan B commented on CLOUDSTACK-3535: - Does the submitted fix address the same issue in XenServer? If not then I don't think this can be flagged as "fixed." > No HA actions are performed when a KVM host goes offline > > > Key: CLOUDSTACK-3535 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Hypervisor Controller, KVM, Management Server >Affects Versions: 4.1.0, 4.1.1, 4.2.0 > Environment: KVM (CentOS 6.3) with CloudStack 4.1 >Reporter: Paul Angus >Assignee: edison su >Priority: Blocker > Fix For: 4.2.0 > > Attachments: management-server.log.Agent > > > If a KVM host 'goes down', CloudStack does not perform HA for instances which > are marked as HA enabled on that host (including system VMs) > CloudStack does not show the host as disconnected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CLOUDSTACK-3421) When hypervisor is down, no HA occurs with log output "Agent state cannot be determined, do nothing"
[ https://issues.apache.org/jira/browse/CLOUDSTACK-3421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716611#comment-13716611 ] Logan B commented on CLOUDSTACK-3421: - This issue is related to: CLOUDSTACK-3535 and affects XenServer/XCP as well. The patch to address the split brain issue isn't a fix as much as a workaround. Further steps need to be taken to test if the loss of communication is in the link between the management server and the network, or the host and the network. If the management server can communicate with all but one host in the cluster it shouldn't just "do nothing." At the very least it needs to alert an administrator that there's a potential problem. As has been mentioned right now if a host goes down there's no indication that it's happened until customers start reporting outages. > When hypervisor is down, no HA occurs with log output "Agent state cannot be > determined, do nothing" > > > Key: CLOUDSTACK-3421 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3421 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: KVM, Management Server >Affects Versions: 4.1.0 > Environment: CentOS 6.4 minimal install > Libvirt, KVM/Qemu > CloudStack 4.1 > GlusterFS 3.2, replicated+distributed as primary storage via Shared Mount > Point > 3 physical servers > * 1 management server, running NFS secondary storage > ** 1 nic, management+storage > * 2 hypervisor nodes, running glusterfs-server > ** 4x nic, management+storage, public, guest, gluster peering > * Advanced zone > * KVM > * 4 networks: > eth0: cloudbr0: management+secondary storage, > eth2: cloudbr1: public > eth3: cloudbr2: guest > eth1: gluster peering > * Shared Mount Point > * custom network offering with redundant routers enabled > * global settings tweaked to increase speed of identifying down state > ** ping.interval: 10sec >Reporter: Gerard Lynch >Priority: Critical > Fix For: 4.1.1, 4.2.0, Future > > Attachments: catalina_management-server.zip > > > We wanted to test CloudStack's HA capabilities by simulating outages to find > out how long it would take to recover. One of the tests was simulating loss > of a hypervisor node by shutting it down. When we tested this, we found > that CloudStack failed to bring up any of the VMs (System or Instance), which > were on the down node, until the node was powered back up and reconnected. > In the logs, we see repeating occurances of: > INFO [utils.exception.CSExceptionErrorCode] (AgentTaskPool-11:) Could not > find exception: com.cloud.exception.OperationTimedoutException in error code > list for exceptions > INFO [utils.exception.CSExceptionErrorCode] (AgentTaskPool-10:) Could not > find exception: com.cloud.exception.OperationTimedoutException in error code > list for exceptions > WARN [agent.manager.AgentAttache] (AgentTaskPool-11:) Seq 14-660013135: > Timed out on Seq 14-660013135: { Cmd , MgmtId: 93515041483, via: 14, Ver: > v1, Flags: 100011, [{"CheckHealthCommand":{"wait":50}}] } > WARN [agent.manager.AgentAttache] (AgentTaskPool-10:) Seq 15-1097531400: > Timed out on Seq 15-1097531400: { Cmd , MgmtId: 93515041483, via: 15, Ver: > v1, Flags: 100011, [{"CheckHealthCommand":{"wait":50}}] } > WARN [agent.manager.AgentManagerImpl] (AgentTaskPool-11:) Operation timed > out: Commands 660013135 to Host 14 timed out after 100 > WARN [agent.manager.AgentManagerImpl] (AgentTaskPool-10:) Operation timed > out: Commands 1097531400 to Host 15 timed out after 100 > WARN [agent.manager.AgentManagerImpl] (AgentTaskPool-11:) Agent state cannot > be determined, do nothing > WARN [agent.manager.AgentManagerImpl] (AgentTaskPool-10:) Agent state cannot > be determined, do nothing > To reproduce: > 1. Build the environment as detailed above > 2. Register an ISO > 3. Create a new guest network using the custom network offering (that offers > redundant routers) > 3. Provision an instance > 4. Ensure the system VMs and instance are on the first hypervisor node > 5. Shutdown the first hypervisor node (or pull the plug) > Expected result: > All system VMs and instance(s) should be brought up on the 2nd hypervisor > node. > Actual result: > We see the first hypervisor node marked "disconnected." > All System VMs and the Instance are still marked "Running", however ping to > any of them fails. > Ping to the redundant router on the 2nd hypervisor node is still working. > We see in the logs > "INFO [utils.exception.CSExceptionErrorCode] (AgentTaskPool-11:) Could not > find exception: com.cloud.exception.OperationTimedoutException in erro
[jira] [Commented] (CLOUDSTACK-3535) No HA actions are performed when a KVM host goes offline
[ https://issues.apache.org/jira/browse/CLOUDSTACK-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710029#comment-13710029 ] Logan B commented on CLOUDSTACK-3535: - Please note that this bug does not only affect KVM. We have experienced the same issue with XCP 1.6/XenServer hosts. The problem stems from a previous fix to prevent a potential split brain issue when the management server loses connectivity to the cluster. The AgentImpl function used to mark the host as down when it couldn't be reached, now it just marks it at "unable to determine state" and does nothing. This does fix the split brain issue, but if the hosts actually goes down then HA will never take over. I realize this is a tricky fix, and my programming knowledge is minimal, but I do have a suggestion for a fix. The only time the management server should run into an actual split brain issue is if it loses connectivity to the clusters. Could the following logic be implemented? ( I apologize for the potentially confusing formatting.) If: Management server cannot ping host: -> Then: Try to ping management gateway. --> If: Management server CAN ping gateway: ---> Then: Try to ping other hosts in cluster: > If: Other hosts can be pinged AND gateway can be pinged: -> Then: Start HA and send host down report/alert. > Else If: Other hosts CANNOT be pinged AND gateway CAN be pinged: -> Then: Send cluster connectivity alert, and do nothing with HA. --> Else If: Management server CANNOT ping gateway: ---> Then: Attempt to send management connectivity alert, and do nothing with HA. The only time I could see this causing an issue if if the networking for Host A goes down, HA migrates VMs to Host B, then Host A's networking comes back up with running VMs. I don't see this being a very likely scenario though. A short term solution would be to at least trigger some sort of alert/e-mail when the host status cannot be determined. That way manual intervention can be started much more quickly. Right now a host can be offline indefinitely without any notice. > No HA actions are performed when a KVM host goes offline > > > Key: CLOUDSTACK-3535 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3535 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Hypervisor Controller, KVM, Management Server >Affects Versions: 4.1.0, Future > Environment: KVM (CentOS 6.3) with CloudStack 4.1 >Reporter: Paul Angus > > If a KVM host 'goes down', CloudStack does not perform HA for instances which > are marked as HA enabled on that host (including system VMs) > CloudStack does not show the host as disconnected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira