Re: [ovirt-devel] XML benchmarks
- Original Message - > From: "Francesco Romani" > To: devel@ovirt.org > Cc: "Nir Soffer" , "Martin Sivak" > Sent: Monday, June 30, 2014 12:14:34 PM > Subject: Re: [ovirt-devel] XML benchmarks > > - Original Message - > > From: "Francesco Romani" > > To: "Nir Soffer" > > Cc: devel@ovirt.org > > Sent: Monday, June 30, 2014 8:47:15 AM > > Subject: Re: [ovirt-devel] XML benchmarks > > > > - Original Message - > > > From: "Nir Soffer" > > > To: "Francesco Romani" > > > Cc: devel@ovirt.org, "Martin Sivak" > > > Sent: Sunday, June 29, 2014 10:34:08 AM > > > Subject: Re: [ovirt-devel] XML benchmarks > > > > > > CPU measurement: just opened a terminal and run 'htop' on it. > > > > CPU profile: clustered around the sampling interval. Usage negligible > > > > most > > > > of > > > > time, peak on sampling as shown below > > > > > > > > 300 VMs > > > > minidom: ~38% CPU > > > > cElementTree: ~5% CPU > > > > > > What is 38% - (38% of one core? how may cores are on the machine?) > > > > 4 cores: 2 physical, 2 logical. I'm prepping a more precise test > > using a better and less ambiguous indicator. > > Here. Attached un updated script (xmlbench2.py) which uses 'psutil' > (https://pypi.python.org/pypi/psutil) to gather the samples. > > CPU sampled each 500ms (half a second). 100% is one core. > My laptop reports 4 core (dualcore with hyperthreading). > > See attached some graphs for easier comsumption and their gnuplot recipe. > > cpu_300t_3m.png: load using the test script with 300 threads, each thread > runs ~3 minutes > cpu_500t_3m.png: load using the test script with 500 threads, each thread > runs ~3 minutes > > sampling is not really accurate but it is more than enough to get an idea. Nice! ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] XML benchmarks
- Original Message - > From: "Francesco Romani" > To: "Nir Soffer" > Cc: devel@ovirt.org > Sent: Monday, June 30, 2014 8:47:15 AM > Subject: Re: [ovirt-devel] XML benchmarks > > - Original Message - > > From: "Nir Soffer" > > To: "Francesco Romani" > > Cc: devel@ovirt.org, "Martin Sivak" > > Sent: Sunday, June 29, 2014 10:34:08 AM > > Subject: Re: [ovirt-devel] XML benchmarks > > > > CPU measurement: just opened a terminal and run 'htop' on it. > > > CPU profile: clustered around the sampling interval. Usage negligible > > > most > > > of > > > time, peak on sampling as shown below > > > > > > 300 VMs > > > minidom: ~38% CPU > > > cElementTree: ~5% CPU > > > > What is 38% - (38% of one core? how may cores are on the machine?) > > 4 cores: 2 physical, 2 logical. I'm prepping a more precise test > using a better and less ambiguous indicator. Here. Attached un updated script (xmlbench2.py) which uses 'psutil' (https://pypi.python.org/pypi/psutil) to gather the samples. CPU sampled each 500ms (half a second). 100% is one core. My laptop reports 4 core (dualcore with hyperthreading). See attached some graphs for easier comsumption and their gnuplot recipe. cpu_300t_3m.png: load using the test script with 300 threads, each thread runs ~3 minutes cpu_500t_3m.png: load using the test script with 500 threads, each thread runs ~3 minutes sampling is not really accurate but it is more than enough to get an idea. -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani #!/usr/bin/env python import sys import threading import time import xml.dom.minidom import xml.etree.cElementTree import xml.etree.ElementTree import psutil def eprint(s): sys.stderr.write('%s\n' % s) class Worker(threading.Thread): def __init__(self, func, xml, delay, numruns): super(Worker, self).__init__() self.daemon = True self.func = func self.xml = xml self.delay = delay self.numruns = numruns def mustgo(self): if self.numruns is not None: self.numruns -= 1 if self.numruns <= 0: return False return True def run(self): while self.mustgo(): time.sleep(self.delay) self.func(self.xml) PARSERS = { 'md': xml.dom.minidom.parseString, 'et': xml.etree.ElementTree.fromstring, 'cet': xml.etree.cElementTree.fromstring } def runner(xml, mode, nthreads, delay, numruns): workers = [] for i in range(nthreads): w = Worker(PARSERS[mode], xml, delay, numruns) w.start() workers.append(w) p = psutil.Process() p.cpu_percent() # see psutil docs. Discard the first one samples = [] ts = 0.0 while any(w.is_alive() for w in workers): time.sleep(0.5) ts += 0.5 samples.append((ts, p.cpu_percent())) return samples def _usage(): eprint("usage: xmlbench xmlpath mode nthreads [delay [numruns]]") eprint("available modes: %s" % ' '.join(PARSERS.keys())) def _main(args): if len(args) < 3: _usage() sys.exit(1) else: xmlpath = args[0] mode = args[1] nthreads = int(args[2]) delay = int(args[3]) if len(args) > 3 else 15 numruns = int(args[4]) if len(args) > 4 else None if mode not in PARSERS: _usage() sys.exit(2) with open(xmlpath, 'rt') as xml: samples = runner(xml.read(), mode, nthreads, delay, numruns) for (ts, value) in samples: print '%f,%f' % (ts, value) if __name__ == "__main__": _main(sys.argv[1:]) plot.sh Description: application/shellscript ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] XML benchmarks
- Original Message - > From: "Nir Soffer" > To: "Francesco Romani" > Cc: devel@ovirt.org, "Martin Sivak" > Sent: Sunday, June 29, 2014 10:34:08 AM > Subject: Re: [ovirt-devel] XML benchmarks > > CPU measurement: just opened a terminal and run 'htop' on it. > > CPU profile: clustered around the sampling interval. Usage negligible most > > of > > time, peak on sampling as shown below > > > > 300 VMs > > minidom: ~38% CPU > > cElementTree: ~5% CPU > > What is 38% - (38% of one core? how may cores are on the machine?) 4 cores: 2 physical, 2 logical. I'm prepping a more precise test using a better and less ambiguous indicator. > Seeing this load created by parsing libvirt xml every 15 seconds, I think > we should consider decreasing the sample rate suggested in > http://gerrit.ovirt.org/28712 Or collecting the data in another way. To collect the data in another way, maybe just after the event which will trigger the change would be the best way. As last resort, to move te code to use cElementTree. Bests, -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] XML benchmarks
It's good to see us moving away from minidom. I do think there is a place though to abstract out common use cases so we are not tied to an API and that we do the optimal thing more most use cases. - Original Message - > From: "Francesco Romani" > To: devel@ovirt.org > Sent: Friday, June 27, 2014 3:30:14 PM > Subject: [ovirt-devel] XML benchmarks > > Hi, > > Due to the recent discussion (http://gerrit.ovirt.org/#/c/28712/), and as > part > of the ongoing focus on scalability and performances > (http://gerrit.ovirt.org/#/c/17694/ and many others), > > I took the chance to do a very quick and dirty bench to see how it really > cost > to do XML processing in sampling threads (thanks to Nir for the kickstart!), > and, > in general, how much the XML processing costs. > > Please find attached the test script and the example XML > (real one made by VDSM master on my RHEL6.5 box). > > On my laptop: > > $ lscpu > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):4 > On-line CPU(s) list: 0-3 > Thread(s) per core:2 > Core(s) per socket:2 > Socket(s): 1 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family:6 > Model: 58 > Model name:Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz > Stepping: 9 > CPU MHz: 1359.375 > CPU max MHz: 3600. > CPU min MHz: 1200. > BogoMIPS: 5786.91 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 4096K > NUMA node0 CPU(s): 0-3 > > 8 GiBs of RAM, running GNOME desktop and the usual development stuff > > xmlbench.py linuxvm1.xml MODE 300 > > MODE is either 'md' (minidom) or 'cet' (cElementTree). > This will run $NUMTHREADS threads fast and loose without synchronization. > We can actually have this behaviour if a customer just mass start VMs. > In general I expect some clustering of the sampling activity, not a nice > evenly interleaved > time sequence. > > CPU measurement: just opened a terminal and run 'htop' on it. > CPU profile: clustered around the sampling interval. Usage negligible most of > time, peak on sampling as shown below > > 300 VMs > minidom: ~38% CPU > cElementTree: ~5% CPU > > 500 VMs > minidom: ~48% CPU > cElementTree: ~6% CPU > > 1000 VMs > python thread error :) > > File "/usr/lib64/python2.7/threading.py", line 746, in start > _start_new_thread(self.__bootstrap, ()) > thread.error: can't start new thread > > > I think this is another proof (if we need more of them) that > * we _really need_ to move away from the 1 thread per VM model -> > http://gerrit.ovirt.org/#/c/29189/ and friends! Let's fire up the > discussion! > * we should move to cElementTree anyway in the near future: faster > processing, scales better, nicer API. > It is also a pet peeve of mine, I do have some patches floating but we need > still some preparation work in the virt package. > > > -- > Francesco Romani > RedHat Engineering Virtualization R & D > Phone: 8261328 > IRC: fromani > > ___ > Devel mailing list > Devel@ovirt.org > http://lists.ovirt.org/mailman/listinfo/devel ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
Re: [ovirt-devel] XML benchmarks
- Original Message - > From: "Francesco Romani" > To: devel@ovirt.org > Sent: Friday, June 27, 2014 3:30:14 PM > Subject: [ovirt-devel] XML benchmarks > > Hi, > > Due to the recent discussion (http://gerrit.ovirt.org/#/c/28712/), and as > part > of the ongoing focus on scalability and performances > (http://gerrit.ovirt.org/#/c/17694/ and many others), > > I took the chance to do a very quick and dirty bench to see how it really > cost > to do XML processing in sampling threads (thanks to Nir for the kickstart!), > and, > in general, how much the XML processing costs. > > Please find attached the test script and the example XML > (real one made by VDSM master on my RHEL6.5 box). > > On my laptop: > > $ lscpu > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):4 > On-line CPU(s) list: 0-3 > Thread(s) per core:2 > Core(s) per socket:2 > Socket(s): 1 > NUMA node(s): 1 > Vendor ID: GenuineIntel > CPU family:6 > Model: 58 > Model name:Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz > Stepping: 9 > CPU MHz: 1359.375 > CPU max MHz: 3600. > CPU min MHz: 1200. > BogoMIPS: 5786.91 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 4096K > NUMA node0 CPU(s): 0-3 > > 8 GiBs of RAM, running GNOME desktop and the usual development stuff > > xmlbench.py linuxvm1.xml MODE 300 > > MODE is either 'md' (minidom) or 'cet' (cElementTree). > This will run $NUMTHREADS threads fast and loose without synchronization. > We can actually have this behaviour if a customer just mass start VMs. > In general I expect some clustering of the sampling activity, not a nice > evenly interleaved > time sequence. > > CPU measurement: just opened a terminal and run 'htop' on it. > CPU profile: clustered around the sampling interval. Usage negligible most of > time, peak on sampling as shown below > > 300 VMs > minidom: ~38% CPU > cElementTree: ~5% CPU What is 38% - (38% of one core? how may cores are on the machine?) > > 500 VMs > minidom: ~48% CPU > cElementTree: ~6% CPU > > 1000 VMs > python thread error :) > > File "/usr/lib64/python2.7/threading.py", line 746, in start > _start_new_thread(self.__bootstrap, ()) > thread.error: can't start new thread > > > I think this is another proof (if we need more of them) that > * we _really need_ to move away from the 1 thread per VM model -> > http://gerrit.ovirt.org/#/c/29189/ and friends! Let's fire up the > discussion! > * we should move to cElementTree anyway in the near future: faster > processing, scales better, nicer API. > It is also a pet peeve of mine, I do have some patches floating but we need > still some preparation work in the virt package. Seeing this load created by parsing libvirt xml every 15 seconds, I think we should consider decreasing the sample rate suggested in http://gerrit.ovirt.org/28712 Or collecting the data in another way. Nir ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel
[ovirt-devel] XML benchmarks
Hi, Due to the recent discussion (http://gerrit.ovirt.org/#/c/28712/), and as part of the ongoing focus on scalability and performances (http://gerrit.ovirt.org/#/c/17694/ and many others), I took the chance to do a very quick and dirty bench to see how it really cost to do XML processing in sampling threads (thanks to Nir for the kickstart!), and, in general, how much the XML processing costs. Please find attached the test script and the example XML (real one made by VDSM master on my RHEL6.5 box). On my laptop: $ lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):4 On-line CPU(s) list: 0-3 Thread(s) per core:2 Core(s) per socket:2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family:6 Model: 58 Model name:Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz Stepping: 9 CPU MHz: 1359.375 CPU max MHz: 3600. CPU min MHz: 1200. BogoMIPS: 5786.91 Virtualization:VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 4096K NUMA node0 CPU(s): 0-3 8 GiBs of RAM, running GNOME desktop and the usual development stuff xmlbench.py linuxvm1.xml MODE 300 MODE is either 'md' (minidom) or 'cet' (cElementTree). This will run $NUMTHREADS threads fast and loose without synchronization. We can actually have this behaviour if a customer just mass start VMs. In general I expect some clustering of the sampling activity, not a nice evenly interleaved time sequence. CPU measurement: just opened a terminal and run 'htop' on it. CPU profile: clustered around the sampling interval. Usage negligible most of time, peak on sampling as shown below 300 VMs minidom: ~38% CPU cElementTree: ~5% CPU 500 VMs minidom: ~48% CPU cElementTree: ~6% CPU 1000 VMs python thread error :) File "/usr/lib64/python2.7/threading.py", line 746, in start _start_new_thread(self.__bootstrap, ()) thread.error: can't start new thread I think this is another proof (if we need more of them) that * we _really need_ to move away from the 1 thread per VM model -> http://gerrit.ovirt.org/#/c/29189/ and friends! Let's fire up the discussion! * we should move to cElementTree anyway in the near future: faster processing, scales better, nicer API. It is also a pet peeve of mine, I do have some patches floating but we need still some preparation work in the virt package. -- Francesco Romani RedHat Engineering Virtualization R & D Phone: 8261328 IRC: fromani #!/usr/bin/env python import sys import threading import time #import lxml.etree import xml.dom.minidom import xml.etree.cElementTree import xml.etree.ElementTree class Worker(threading.Thread): def __init__(self, func, xml, delay, numruns): super(Worker, self).__init__() self.daemon = True self.func = func self.xml = xml self.delay = delay self.numruns = numruns def mustgo(self): if self.numruns is not None: self.numruns -= 1 if self.numruns <= 0: return False return True def run(self): print '%s delay=%i starting!' %(self.name, self.delay) while self.mustgo(): time.sleep(self.delay) print '%s go' %(self.name) self.func(self.xml) print '%s done!' %(self.name) PARSERS = { 'md': xml.dom.minidom.parseString, #'lx': lxml.etree.fromstring, 'et': xml.etree.ElementTree.fromstring, 'cet': xml.etree.cElementTree.fromstring } def runner(xml, mode, nthreads, delay, numruns): workers = [] for i in range(nthreads): w = Worker(PARSERS[mode], xml, delay, numruns) w.start() workers.append(w) if numruns is None: while True: time.sleep(1.0) else: for w in workers: w.join() def _usage(): print "usage: xmlbench xmlpath mode nthreads [delay [numruns]]" print "available modes: %s" % ' '.join(PARSERS.keys()) def _main(args): if len(args) < 3: _usage() sys.exit(1) else: xmlpath = args[0] mode = args[1] nthreads = int(args[2]) delay = int(args[3]) if len(args) > 3 else 15 numruns = args[4] if len(args) > 4 else None if mode not in PARSERS: _usage() sys.exit(2) with open(xmlpath, 'rt') as xml: runner(xml.read(), mode, nthreads, delay, numruns) if __name__ == "__main__": _main(sys.argv[1:]) linuxvm1.xml Description: XML document ___ Devel mailing list Devel@ovirt.org http://lists.ovirt.org/mailman/listinfo/devel