Re: [jira] [Commented] (CLOUDSTACK-3163) KVM Virtual Router startup time is painfully long

Musayev, Ilya Thu, 18 Jul 2013 23:49:33 -0700

Marcus

How far are you from prototype to a usable patch?


On somewhat similar topic, this is more on rvm side
For vmware, while a single call is made, userdata.sh had to be enhanced due to 
too many indentical entries in htaccess file (I.e. Each vm creation would add 
about 10 or more entries into htaccess file, so you end up with huge htaccess 
file, but in reality all you needed was 10 or so lines of rewrite rules and not 
10 per vm.) While the logic in the script attempted to check for duplicates, it 
was not optimal and so it would fail.

I enhanced userdata.sh for vmware only I think, but I believe the code was 
somewhat similar between the two, so perhaps kvm/xen can also benefit.

I've committed a patch on 4.1 branch under CLOUDSTACK-2053 and will recommit to 
4.2 and master tomorrow.

Since you've spent time on this and figured out kvm logic, perhaps you can see 
if this fix is applicable to kvm rvm aswell.

Thanks

Regards
Ilya

- All mistakes in this message are not mine but Android's.


-------- Original message --------
From: Marcus Sorensen <[email protected]>
Date:
To: [email protected]
Cc: [email protected]
Subject: Re: [jira] [Commented] (CLOUDSTACK-3163) KVM Virtual Router startup 
time is painfully long


I've prototyped a fix for this, and it took the VmDataCommand from ~7
seconds on restarting one VM down to ~300ms.  For rebooting a router,
with multiple VMs connected, that should be significant. I'm just
dumping the data sent to vmdata into a file as json, copying that up
to the router, and processing it there.

On Thu, Jul 18, 2013 at 3:04 PM, Marcus Sorensen (JIRA) <[email protected]> wrote:
>
>     [ 
> https://issues.apache.org/jira/browse/CLOUDSTACK-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13712845#comment-13712845
>  ]
>
> Marcus Sorensen commented on CLOUDSTACK-3163:
> ---------------------------------------------
>
> ... and each vmdata.sh calls ssh and/or scp several times. Off the top
> of my head, it seems like we could serialize that cmd.getVmData()
> output to maybe JSON or something, get it up on the router in one
> call, and then process it there in a python script.
>
> On Thu, Jul 18, 2013 at 7:08 AM, Wido den Hollander (JIRA)
>
>
>> KVM Virtual Router startup time is painfully long
>> -------------------------------------------------
>>
>>                 Key: CLOUDSTACK-3163
>>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-3163
>>             Project: CloudStack
>>          Issue Type: Bug
>>      Security Level: Public(Anyone can view this level - this is the 
>> default.)
>>          Components: KVM
>>    Affects Versions: pre-4.0.0
>>         Environment: CloudPlatform 3.0.3, but I don't see any changes to the 
>> relevant code (I think) on master
>>            Reporter: Andrew Bayer
>>            Priority: Critical
>>
>> When you've got a couple thousand instances, spread across 10 or so pods, 
>> virtual router startup time is near crippling - actually, if you don't 
>> enable the option to have virtual routers only populated with instances in 
>> their pod, it *is* crippling, in that the virtual routers don't finish 
>> starting before the management server decides they've timed out and tries to 
>> start a new one.
>> This seems to be the result of a few painful inefficiencies:
>> - The same codepath is followed whether you're adding a new instance to an 
>> already running VR, or adding two hundred already running instances to a new 
>> VR. So each ssh/scp/sed/cp/chmod/etc command is replicated for each 
>> instance, rather than finding efficiencies by doing things across the whole 
>> set of instances.
>> - But what really eats up the time is the population of vm data - for each 
>> piece of vm data (which, from a rough look at the code, seems to be 
>> something like 10 or 11 data files), there are something like 7 ssh calls 
>> and an scp call. So that means that per instance, we have somewhere around 
>> 80 to 90 ssh/scp calls, plus the single ssh call for dhcp_entry.sh. So with 
>> 200 instances, that's 1600 to 1800 ssh/scp calls on a single VR, with all 
>> the overhead entailed in opening that many ssh connections, starting bash, 
>> etc, etc... Given that in my experience, a VR with ~200 instances takes ~90 
>> minutes to start up (I may be misremembering slightly - it could be ~200 
>> instances takes closer to 60 minutes, and ~300 takes closer to 90), that 
>> works out to 3 seconds or so per ssh/scp, which doesn't seem implausible to 
>> me.
>> So, this shouldn't be this way. At a minimum, there's no reason not to 
>> offload the whole process from a script run on the host making repeated ssh 
>> calls to the VR to a script on the VR that gets called from the host, albeit 
>> possibly a temporary one that's generated on the fly and copied over to the 
>> VR. That alone would probably save most of the VR startup time, just by 
>> dropping the number of ssh/scp connections per instance from 80-90 to 3 
>> (dhcp_entry.sh call, scp of temporary script, execution of temporary script).
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] [Commented] (CLOUDSTACK-3163) KVM Virtual Router startup time is painfully long

Reply via email to