Re: [Openstack] launching multiple VMs takes very long time
where should I start looking to find out why things are so slow? Check the glance node. The first prototype I set up had an incredibly slow glance node, which led to this kind of behavior. Are the compute nodes downloading at the same time, or sequentially? Mark ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] launching multiple VMs takes very long time
so I dont think there is a network or disk bottle neck now. in watching instances launch I see that the disk image itself gets copied out to the compute node in < 1 minute, the xml file for libvirt gets created. then a pause of ~ 20 minutes then the kvm process starts up and off the VM goes just fine. aside from messages like: 2013-05-01 19:43:40.363+: 2377: warning : virAuditSend:135 : Failed to send audit message virt=kvm vm="instance-014c" uuid=d72afa21-844a-4ea0-8b96-0a4e5e4da5eb vm-ctx=? img-ctx=? model=stack: Operation not permitted and these dont start until the kvm process starts, I see nothing in the logs. but then I dont have debug on which I will do next. what is causing the pause in between is the question. hopefully debug on will help. s On 05/01/2013 11:43 AM, Rick Jones wrote: > On 05/01/2013 10:43 AM, Steve Heistand wrote: >> there may be some network issues going on here, trying to shove some amount >> of data >> bigger then a few Gig seems to start slowing things down. > > If you think there is an actual network problem, you could I suppose try > exploring that with netperf or iperf. It seems unlikely that the > network could retain a "memory" of a previous transfer to cause a > subsequent, non-overlapping transfer to run more slowly. > > The logs on the compute node(s) will show how long it took to actually > retrieve the image yes? > > rick jones > -- Steve Heistand NASA Ames Research Center SciCon Group Mail Stop 258-6 steve.heist...@nasa.gov (650) 604-4369 Moffett Field, CA 94035-1000 "Any opinions expressed are those of our alien overlords, not my own." # For Remedy# #Action: Resolve# #Resolution: Resolved # #Reason: No Further Action Required # #Tier1: User Code # #Tier2: Other # #Tier3: Assistance # #Notification: None # signature.asc Description: OpenPGP digital signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] launching multiple VMs takes very long time
On 05/01/2013 10:43 AM, Steve Heistand wrote: there may be some network issues going on here, trying to shove some amount of data bigger then a few Gig seems to start slowing things down. If you think there is an actual network problem, you could I suppose try exploring that with netperf or iperf. It seems unlikely that the network could retain a "memory" of a previous transfer to cause a subsequent, non-overlapping transfer to run more slowly. The logs on the compute node(s) will show how long it took to actually retrieve the image yes? rick jones ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] launching multiple VMs takes very long time
You might take a look at iptraf during a slow deployment to determine if the utilization on a particular interface is high. -Original Message- From: Openstack [mailto:openstack-bounces+eric_e_smith=dell@lists.launchpad.net] On Behalf Of Steve Heistand Sent: Wednesday, May 01, 2013 12:43 PM To: openstack@lists.launchpad.net Subject: Re: [Openstack] launching multiple VMs takes very long time -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 ok a followup to this. its not any sort of disk access issue. I created a new disk to hold the glance directory and running atop while launching shows all the activity on the original root drive nothing using glance. Normally at idle the root drive does get heavily used by something, mostly keystone and rabbit so atop show that partition as busy. When launching a VM it doesnt seem to get more busy though. (maybe an LVM / is a bad idea) Ive also tried launching tiny (ie cirros) images and they too take ages to boot into tiny flavors. The really really weird part is that I ran an experiment of only launching 1 vm at a time. the time this takes grows each time I launch one. 1st vm < 2 minutes 2nd 2.5m 3rd 4m 4th 8m 5th 10m 6th 12m ... there may be some network issues going on here, trying to shove some amount of data bigger then a few Gig seems to start slowing things down. s On 04/30/2013 11:42 AM, Steve Heistand wrote: > if I launch one vm at a time its doesnt very long to start up the > instance. maybe a minute. if I launch 4 instances (of the same > snapshot as before) it takes 30 minutes. > > they are all launching to different compute nodes, the controllers are > all multicore, I dont see any processes on the compute nodes taking > much cpu power, the controller has a keystone process mostly sucking > up 1 core, loads and loads of beam.smp from rabbitmq but none are really > taking any cpu time. > > the glance image storage is on the controller node, the snapshots are 2-3G in > size. > > where should I start looking to find out why things are so slow? > > thanks > > s > - -- Steve Heistand NASA Ames Research Center email: steve.heist...@nasa.gov Steve Heistand/Mail Stop 258-6 ph: (650) 604-4369 Bldg. 258, Rm. 232-5 Scientific & HPC ApplicationP.O. Box 1 Development/OptimizationMoffett Field, CA 94035-0001 "Any opinions expressed are those of our alien overlords, not my own." # For Remedy# #Action: Resolve# #Resolution: Resolved # #Reason: No Further Action Required # #Tier1: User Code # #Tier2: Other # #Tier3: Assistance # #Notification: None # -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAlGBVD4ACgkQoBCTJSAkVrErjACfetTt4wDzuh9Pq8AuI+iGjXFB toMAmgKR96PJ89edpdHnRyjFyqN25g6k =/+lx -END PGP SIGNATURE- ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] launching multiple VMs takes very long time
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 ok a followup to this. its not any sort of disk access issue. I created a new disk to hold the glance directory and running atop while launching shows all the activity on the original root drive nothing using glance. Normally at idle the root drive does get heavily used by something, mostly keystone and rabbit so atop show that partition as busy. When launching a VM it doesnt seem to get more busy though. (maybe an LVM / is a bad idea) Ive also tried launching tiny (ie cirros) images and they too take ages to boot into tiny flavors. The really really weird part is that I ran an experiment of only launching 1 vm at a time. the time this takes grows each time I launch one. 1st vm < 2 minutes 2nd 2.5m 3rd 4m 4th 8m 5th 10m 6th 12m ... there may be some network issues going on here, trying to shove some amount of data bigger then a few Gig seems to start slowing things down. s On 04/30/2013 11:42 AM, Steve Heistand wrote: > if I launch one vm at a time its doesnt very long to start up the instance. > maybe a > minute. if I launch 4 instances (of the same snapshot as before) it takes 30 > minutes. > > they are all launching to different compute nodes, the controllers are all > multicore, I dont see any processes on the compute nodes taking much cpu > power, the > controller has a keystone process mostly sucking up 1 core, loads and loads of > beam.smp from rabbitmq but none are really taking any cpu time. > > the glance image storage is on the controller node, the snapshots are 2-3G in > size. > > where should I start looking to find out why things are so slow? > > thanks > > s > - -- Steve Heistand NASA Ames Research Center email: steve.heist...@nasa.gov Steve Heistand/Mail Stop 258-6 ph: (650) 604-4369 Bldg. 258, Rm. 232-5 Scientific & HPC ApplicationP.O. Box 1 Development/OptimizationMoffett Field, CA 94035-0001 "Any opinions expressed are those of our alien overlords, not my own." # For Remedy# #Action: Resolve# #Resolution: Resolved # #Reason: No Further Action Required # #Tier1: User Code # #Tier2: Other # #Tier3: Assistance # #Notification: None # -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAlGBVD4ACgkQoBCTJSAkVrErjACfetTt4wDzuh9Pq8AuI+iGjXFB toMAmgKR96PJ89edpdHnRyjFyqN25g6k =/+lx -END PGP SIGNATURE- ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] launching multiple VMs takes very long time
On 04/30/2013 01:36 PM, Melanie Witt wrote: This presentation from the summit might be of interest to you: http://www.openstack.org/summit/portland-2013/session-videos/presentation/scaling-the-boot-barrier-identifying-and-eliminating-contention-in-openstack A nice presentation. Based on his comment at the end, I did the web search and found his slides at: http://www.cs.utoronto.ca/~peter/feiner_slides_openstack_summit_portland_2013.pdf A caveat/nit/whatnot about looking at overall system CPU utilization and assuming no CPU bottleneck (the "hardware" portion at the beginning) at points even well below 100% utilization - with multiple CPUs in a system now, there are for example, many ways for there to be 50% overall CPU utilization. It could be that all the CPUs are indeed at 50% util, but it could also be that 1/2 the CPUs are at 100% and the other half are idle. Now, perhaps that fits in the space between a hardware and a software bottleneck, but I'd be cautious about overall CPU utilization figures. For example, a single or small number of CPUs saturating can happen rather easily in some "networking" workloads - the CPU servicing interrupts from the NIC (or CPUs if the NIC is multiqueue) can saturate. I'd consider that a hardware saturation, even though many of the other CPUs in the system are largely idle. That is why in later versions of netperf, there is a way to report the ID and utilization of the most utilized CPU during a test, in addition to reporting the overall CPU utilization. rick jones ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] launching multiple VMs takes very long time
this certainly looks promising.. thanks s On 04/30/2013 01:36 PM, Melanie Witt wrote: > This presentation from the summit might be of interest to you: > > http://www.openstack.org/summit/portland-2013/session-videos/presentation/scaling-the-boot-barrier-identifying-and-eliminating-contention-in-openstack > > I couldn't find just the slide deck anywhere so far. > > Your issue is different being that you're seeing 30 minute launch time, > that's extreme, but this > info might help you get a high level picture of the flow. > > Melanie > > On 4/30/13 11:42 AM, Steve Heistand wrote: >> if I launch one vm at a time its doesnt very long to start up the instance. >> maybe a minute. >> if I launch 4 instances (of the same snapshot as before) it takes 30 minutes. >> >> they are all launching to different compute nodes, the controllers are all >> multicore, >> I dont see any processes on the compute nodes taking much cpu power, the >> controller has >> a keystone process mostly sucking up 1 core, loads and loads of beam.smp >> from rabbitmq but >> none are really taking any cpu time. >> >> the glance image storage is on the controller node, the snapshots are 2-3G >> in size. >> >> where should I start looking to find out why things are so slow? >> >> thanks >> >> s >> >> >> >> ___ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp >> > -- Steve Heistand NASA Ames Research Center SciCon Group Mail Stop 258-6 steve.heist...@nasa.gov (650) 604-4369 Moffett Field, CA 94035-1000 "Any opinions expressed are those of our alien overlords, not my own." # For Remedy# #Action: Resolve# #Resolution: Resolved # #Reason: No Further Action Required # #Tier1: User Code # #Tier2: Other # #Tier3: Assistance # #Notification: None # signature.asc Description: OpenPGP digital signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] launching multiple VMs takes very long time
This presentation from the summit might be of interest to you: http://www.openstack.org/summit/portland-2013/session-videos/presentation/scaling-the-boot-barrier-identifying-and-eliminating-contention-in-openstack I couldn't find just the slide deck anywhere so far. Your issue is different being that you're seeing 30 minute launch time, that's extreme, but this info might help you get a high level picture of the flow. Melanie On 4/30/13 11:42 AM, Steve Heistand wrote: if I launch one vm at a time its doesnt very long to start up the instance. maybe a minute. if I launch 4 instances (of the same snapshot as before) it takes 30 minutes. they are all launching to different compute nodes, the controllers are all multicore, I dont see any processes on the compute nodes taking much cpu power, the controller has a keystone process mostly sucking up 1 core, loads and loads of beam.smp from rabbitmq but none are really taking any cpu time. the glance image storage is on the controller node, the snapshots are 2-3G in size. where should I start looking to find out why things are so slow? thanks s ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] launching multiple VMs takes very long time
They are all booting the same image. there still may be a random read issues if they arent mostly synced in copying the file to the compute node. But there isnt much of a load on the disk while the instances are being launched. Its just a sata drive at this point so the performance isnt great but better then this. The network isnt being used much either during launch. At least as near as I can tell from netstat and trying to move data around independently of openstack. scp'ing a file for example from controller to compute (being launched on) shows basically no different in speed vs no launch on at the same time. I did have to drop the mtu9000 on the network devices. Not sure if it was the switch or what but after so much data performance just tanked and stayed bad. on normal mtu1500 it stayed good. On 04/30/2013 11:56 AM, Rick Jones wrote: > On 04/30/2013 11:42 AM, Steve Heistand wrote: >> if I launch one vm at a time its doesnt very long to start up the instance. >> maybe a minute. >> if I launch 4 instances (of the same snapshot as before) it takes 30 minutes. >> >> they are all launching to different compute nodes, the controllers are all >> multicore, >> I dont see any processes on the compute nodes taking much cpu power, the >> controller has >> a keystone process mostly sucking up 1 core, loads and loads of beam.smp >> from rabbitmq but >> none are really taking any cpu time. >> >> the glance image storage is on the controller node, the snapshots are 2-3G >> in size. >> >> where should I start looking to find out why things are so slow? > > I am something of a "networking guy" so that will color my response :) > > Is each instance using the same image or a different one? If a > different one, the "workload" to the glance storage will become (I > suspect) a random read workload accessing the four different images even > though each individual stream may be sequential. How much "storage > oomph" do you have on/serving the glance/controller node? > > After that, what do the network statistics look like? Starting I > suppose with the glance/controller node. Take some snapshots over an > interval in each case and run them through something like beforeafter: > > netstat -s > before > ...wait a defined/consistent moment... > netstat -s > after > beforeafter before after > delta > > and go from there. > > rick jones > -- Steve Heistand NASA Ames Research Center SciCon Group Mail Stop 258-6 steve.heist...@nasa.gov (650) 604-4369 Moffett Field, CA 94035-1000 "Any opinions expressed are those of our alien overlords, not my own." # For Remedy# #Action: Resolve# #Resolution: Resolved # #Reason: No Further Action Required # #Tier1: User Code # #Tier2: Other # #Tier3: Assistance # #Notification: None # signature.asc Description: OpenPGP digital signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] launching multiple VMs takes very long time
Perhaps network throughput where the snapshot is hosted (The controller node)? -Original Message- From: Openstack [mailto:openstack-bounces+eric_e_smith=dell@lists.launchpad.net] On Behalf Of Steve Heistand Sent: Tuesday, April 30, 2013 1:43 PM To: openstack@lists.launchpad.net Subject: [Openstack] launching multiple VMs takes very long time if I launch one vm at a time its doesnt very long to start up the instance. maybe a minute. if I launch 4 instances (of the same snapshot as before) it takes 30 minutes. they are all launching to different compute nodes, the controllers are all multicore, I dont see any processes on the compute nodes taking much cpu power, the controller has a keystone process mostly sucking up 1 core, loads and loads of beam.smp from rabbitmq but none are really taking any cpu time. the glance image storage is on the controller node, the snapshots are 2-3G in size. where should I start looking to find out why things are so slow? thanks s -- Steve Heistand NASA Ames Research Center SciCon Group Mail Stop 258-6 steve.heist...@nasa.gov (650) 604-4369 Moffett Field, CA 94035-1000 "Any opinions expressed are those of our alien overlords, not my own." # For Remedy# #Action: Resolve# #Resolution: Resolved # #Reason: No Further Action Required # #Tier1: User Code # #Tier2: Other # #Tier3: Assistance # #Notification: None # ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp
Re: [Openstack] launching multiple VMs takes very long time
On 04/30/2013 11:42 AM, Steve Heistand wrote: if I launch one vm at a time its doesnt very long to start up the instance. maybe a minute. if I launch 4 instances (of the same snapshot as before) it takes 30 minutes. they are all launching to different compute nodes, the controllers are all multicore, I dont see any processes on the compute nodes taking much cpu power, the controller has a keystone process mostly sucking up 1 core, loads and loads of beam.smp from rabbitmq but none are really taking any cpu time. the glance image storage is on the controller node, the snapshots are 2-3G in size. where should I start looking to find out why things are so slow? I am something of a "networking guy" so that will color my response :) Is each instance using the same image or a different one? If a different one, the "workload" to the glance storage will become (I suspect) a random read workload accessing the four different images even though each individual stream may be sequential. How much "storage oomph" do you have on/serving the glance/controller node? After that, what do the network statistics look like? Starting I suppose with the glance/controller node. Take some snapshots over an interval in each case and run them through something like beforeafter: netstat -s > before ...wait a defined/consistent moment... netstat -s > after beforeafter before after > delta and go from there. rick jones /* * beforeafter * * SYNOPSIS * * beforeafter before_file after_file * * Description * * Subtract the numbers in before_file from the numbers in * after_file. * * Example * * # netstat -s > netstat.before * # run some test here * # netstat -s > netstat.after * # beforeafter netstat.before netstat.after * * Note * * The "long double" is usually implemented as the IEEE double * extended precision: its mantissa is _at_least_ 64 bits. * Therefore, "long double" should be able to handle 64-bit * integer numbers. */ #include #include #include #include char *USAGE = "before_file after_file"; main(argc, argv) int argc; char *argv[]; { FILE *fp1; /* "before" file */ FILE *fp2; /* "after" file */ char *fname1; char *fname2; int i; int c; int c2; int separator = 1; int separator2 = 1; unsigned int n1 = 0; unsigned int n2 = 0; long double d1 = 0.0; long double d2 = 0.0; long double delta = 0.0; long double p31; long double p32; long double p63; long double p64; /* * Checke # of arguments. */ if (argc != 3) { long double x; fprintf(stderr, "Usage: %s %s\n", argv[0], USAGE); printf ("\n"); printf (" Testing how many decimal digits can be handled ...\n"); printf (" #digits #bits 1=pass, 0=fail\n"); x = 9.0L - 8.0L; printf ("%10s %6s %1.0Lf\n", "9", "~30", x); x = 99.0L - 98.0L; printf ("%10s %6s %1.0Lf\n", "18", "~60", x); x = 999.0L - 998.0L; printf ("%10s %6s %1.0Lf\n", "27", "~90", x); x = 99.0L - 98.0L; printf ("%10s %6s %1.0Lf\n", "30", "~100", x); x = 9.0L - 8.0L; printf ("%10s %6s %1.0Lf\n", "33", "~110", x); x = .0L - 9998.0L; printf ("%10s %6s %1.0Lf\n", "36", "~120", x); exit (1); } /* * Open files. */ fname1 = argv[1]; fname2 = argv[2]; fp1 = fopen(fname1, "r"); fp2 = fopen(fname2, "r"); if (!fp1 || !fp2) { fprintf(stderr, "fp1 = %p fp2 = %p", fp1, fp2); perror ("Could not open files"); exit (2); } /* * Prepare for 32-bit and 64-bit overflow check. */ for (p31=1.0, i=0; i<31; i++) { p31 *= 2.0; /* 2^31 */ } p32 = p31 * 2.0; /* 2^32 */ p63 = p31 * p32; /* 2^63 */ p64 = p63 * 2.0; /* 2^64 */ /* * Parse. */ while ((c = getc(fp1)) != EOF) { if (c==' ' || c=='\t' || c==':' || c=='(' || c=='\n') { printf("%c", c); separator = 1; } else if (!isdigit(c)) { printf("%c", c); separator = 0; } else { if (separator == 0) { printf("%c", c); /* this digit is a part of a word */ continue; } n1 = c - '0'; d1 = n1; while ((c = getc(fp1)) != EOF) { if (isdigit(c)) { n1 = c - '0'; d1 = 10.0 * d1 + n1; } else { break; } } /* * Find the counterpart in the "after" file. */ while ((c2 = getc(fp2)) != EOF) { if (c2==' ' || c2=='\t' || c2==':' || c2=='(' || c2=='\n') { separator2 = 1; } else if (!isdigit(c2)) { separator2 = 0; } else { if (separator2 == 0) { continue; } n2 = c2 - '0'; d2 = n2; whi
[Openstack] launching multiple VMs takes very long time
if I launch one vm at a time its doesnt very long to start up the instance. maybe a minute. if I launch 4 instances (of the same snapshot as before) it takes 30 minutes. they are all launching to different compute nodes, the controllers are all multicore, I dont see any processes on the compute nodes taking much cpu power, the controller has a keystone process mostly sucking up 1 core, loads and loads of beam.smp from rabbitmq but none are really taking any cpu time. the glance image storage is on the controller node, the snapshots are 2-3G in size. where should I start looking to find out why things are so slow? thanks s -- Steve Heistand NASA Ames Research Center SciCon Group Mail Stop 258-6 steve.heist...@nasa.gov (650) 604-4369 Moffett Field, CA 94035-1000 "Any opinions expressed are those of our alien overlords, not my own." # For Remedy# #Action: Resolve# #Resolution: Resolved # #Reason: No Further Action Required # #Tier1: User Code # #Tier2: Other # #Tier3: Assistance # #Notification: None # signature.asc Description: OpenPGP digital signature ___ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp