Re: [ovirt-devel] XML benchmarks

2014-06-30 Thread Nir Soffer
- Original Message -
> From: "Francesco Romani" 
> To: devel@ovirt.org
> Cc: "Nir Soffer" , "Martin Sivak" 
> Sent: Monday, June 30, 2014 12:14:34 PM
> Subject: Re: [ovirt-devel] XML benchmarks
> 
> - Original Message -
> > From: "Francesco Romani" 
> > To: "Nir Soffer" 
> > Cc: devel@ovirt.org
> > Sent: Monday, June 30, 2014 8:47:15 AM
> > Subject: Re: [ovirt-devel] XML benchmarks
> > 
> > - Original Message -
> > > From: "Nir Soffer" 
> > > To: "Francesco Romani" 
> > > Cc: devel@ovirt.org, "Martin Sivak" 
> > > Sent: Sunday, June 29, 2014 10:34:08 AM
> > > Subject: Re: [ovirt-devel] XML benchmarks
> > 
> > > > CPU measurement: just opened a terminal and run 'htop' on it.
> > > > CPU profile: clustered around the sampling interval. Usage negligible
> > > > most
> > > > of
> > > > time, peak on sampling as shown below
> > > > 
> > > > 300 VMs
> > > > minidom: ~38% CPU
> > > > cElementTree: ~5% CPU
> > > 
> > > What is 38% - (38% of one core? how may cores are on the machine?)
> > 
> > 4 cores: 2 physical, 2 logical. I'm prepping a more precise test
> > using a better and less ambiguous indicator.
> 
> Here. Attached un updated script (xmlbench2.py) which uses 'psutil'
> (https://pypi.python.org/pypi/psutil) to gather the samples.
> 
> CPU sampled each 500ms (half a second). 100% is one core.
> My laptop reports 4 core (dualcore with hyperthreading).
> 
> See attached some graphs for easier comsumption and their gnuplot recipe.
> 
> cpu_300t_3m.png: load using the test script with 300 threads, each thread
> runs ~3 minutes
> cpu_500t_3m.png: load using the test script with 500 threads, each thread
> runs ~3 minutes
> 
> sampling is not really accurate but it is more than enough to get an idea.

Nice!
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] XML benchmarks

2014-06-30 Thread Francesco Romani
- Original Message -
> From: "Francesco Romani" 
> To: "Nir Soffer" 
> Cc: devel@ovirt.org
> Sent: Monday, June 30, 2014 8:47:15 AM
> Subject: Re: [ovirt-devel] XML benchmarks
> 
> - Original Message -
> > From: "Nir Soffer" 
> > To: "Francesco Romani" 
> > Cc: devel@ovirt.org, "Martin Sivak" 
> > Sent: Sunday, June 29, 2014 10:34:08 AM
> > Subject: Re: [ovirt-devel] XML benchmarks
> 
> > > CPU measurement: just opened a terminal and run 'htop' on it.
> > > CPU profile: clustered around the sampling interval. Usage negligible
> > > most
> > > of
> > > time, peak on sampling as shown below
> > > 
> > > 300 VMs
> > > minidom: ~38% CPU
> > > cElementTree: ~5% CPU
> > 
> > What is 38% - (38% of one core? how may cores are on the machine?)
> 
> 4 cores: 2 physical, 2 logical. I'm prepping a more precise test
> using a better and less ambiguous indicator.

Here. Attached un updated script (xmlbench2.py) which uses 'psutil'
(https://pypi.python.org/pypi/psutil) to gather the samples.

CPU sampled each 500ms (half a second). 100% is one core.
My laptop reports 4 core (dualcore with hyperthreading).

See attached some graphs for easier comsumption and their gnuplot recipe.

cpu_300t_3m.png: load using the test script with 300 threads, each thread runs 
~3 minutes
cpu_500t_3m.png: load using the test script with 500 threads, each thread runs 
~3 minutes

sampling is not really accurate but it is more than enough to get an idea.

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
#!/usr/bin/env python

import sys
import threading
import time
import xml.dom.minidom
import xml.etree.cElementTree
import xml.etree.ElementTree

import psutil


def eprint(s):
sys.stderr.write('%s\n' % s)


class Worker(threading.Thread):
def __init__(self, func, xml, delay, numruns):
super(Worker, self).__init__()
self.daemon = True
self.func = func
self.xml = xml
self.delay = delay
self.numruns = numruns

def mustgo(self):
if self.numruns is not None:
self.numruns -= 1
if self.numruns <= 0:
return False
return True

def run(self):
while self.mustgo():
time.sleep(self.delay)
self.func(self.xml)


PARSERS = {
'md': xml.dom.minidom.parseString,
'et': xml.etree.ElementTree.fromstring,
'cet': xml.etree.cElementTree.fromstring
}


def runner(xml, mode, nthreads, delay, numruns):
workers = []
for i in range(nthreads):
w = Worker(PARSERS[mode], xml, delay, numruns)
w.start()
workers.append(w)

p = psutil.Process()
p.cpu_percent()  # see psutil docs. Discard the first one
samples = []

ts = 0.0
while any(w.is_alive() for w in workers):
time.sleep(0.5)
ts += 0.5
samples.append((ts, p.cpu_percent()))

return samples


def _usage():
eprint("usage: xmlbench xmlpath mode nthreads [delay [numruns]]")
eprint("available modes: %s" % ' '.join(PARSERS.keys()))

def _main(args):
if len(args) < 3:
_usage()
sys.exit(1)
else:
xmlpath = args[0]
mode = args[1]
nthreads = int(args[2])
delay = int(args[3]) if len(args) > 3 else 15
numruns = int(args[4]) if len(args) > 4 else None
if mode not in PARSERS:
_usage()
sys.exit(2)
with open(xmlpath, 'rt') as xml:
samples = runner(xml.read(), mode, nthreads, delay, numruns)
for (ts, value) in samples:
print '%f,%f' % (ts, value)

if __name__ == "__main__":
_main(sys.argv[1:])


plot.sh
Description: application/shellscript
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] XML benchmarks

2014-06-29 Thread Francesco Romani
- Original Message -
> From: "Nir Soffer" 
> To: "Francesco Romani" 
> Cc: devel@ovirt.org, "Martin Sivak" 
> Sent: Sunday, June 29, 2014 10:34:08 AM
> Subject: Re: [ovirt-devel] XML benchmarks

> > CPU measurement: just opened a terminal and run 'htop' on it.
> > CPU profile: clustered around the sampling interval. Usage negligible most
> > of
> > time, peak on sampling as shown below
> > 
> > 300 VMs
> > minidom: ~38% CPU
> > cElementTree: ~5% CPU
> 
> What is 38% - (38% of one core? how may cores are on the machine?)

4 cores: 2 physical, 2 logical. I'm prepping a more precise test
using a better and less ambiguous indicator.
 

> Seeing this load created by parsing libvirt xml every 15 seconds, I think
> we should consider decreasing the sample rate suggested in
> http://gerrit.ovirt.org/28712 Or collecting the data in another way.

To collect the data in another way, maybe just after the event which will
trigger the change would be the best way. As last resort, to move te code
to use cElementTree.

Bests,

-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] XML benchmarks

2014-06-29 Thread Saggi Mizrahi
It's good to see us moving away from minidom.
I do think there is a place though to abstract out
common use cases so we are not tied to an API and that
we do the optimal thing more most use cases.

- Original Message -
> From: "Francesco Romani" 
> To: devel@ovirt.org
> Sent: Friday, June 27, 2014 3:30:14 PM
> Subject: [ovirt-devel] XML benchmarks
> 
> Hi,
> 
> Due to the recent discussion (http://gerrit.ovirt.org/#/c/28712/), and as
> part
> of the ongoing focus on scalability and performances
> (http://gerrit.ovirt.org/#/c/17694/ and many others),
> 
> I took the chance to do a very quick and dirty bench to see how it really
> cost
> to do XML processing in sampling threads (thanks to Nir for the kickstart!),
> and,
> in general, how much the XML processing costs.
> 
> Please find attached the test script and the example XML
> (real one made by VDSM master on my RHEL6.5 box).
> 
> On my laptop:
> 
> $ lscpu
> Architecture:  x86_64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Little Endian
> CPU(s):4
> On-line CPU(s) list:   0-3
> Thread(s) per core:2
> Core(s) per socket:2
> Socket(s): 1
> NUMA node(s):  1
> Vendor ID: GenuineIntel
> CPU family:6
> Model: 58
> Model name:Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
> Stepping:  9
> CPU MHz:   1359.375
> CPU max MHz:   3600.
> CPU min MHz:   1200.
> BogoMIPS:  5786.91
> Virtualization:VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache:  256K
> L3 cache:  4096K
> NUMA node0 CPU(s): 0-3
> 
> 8 GiBs of RAM, running GNOME desktop and the usual development stuff
> 
> xmlbench.py linuxvm1.xml MODE 300
> 
> MODE is either 'md' (minidom) or 'cet' (cElementTree).
> This will run $NUMTHREADS threads fast and loose without synchronization.
> We can actually have this behaviour if a customer just mass start VMs.
> In general I expect some clustering of the sampling activity, not a nice
> evenly interleaved
> time sequence.
> 
> CPU measurement: just opened a terminal and run 'htop' on it.
> CPU profile: clustered around the sampling interval. Usage negligible most of
> time, peak on sampling as shown below
> 
> 300 VMs
> minidom: ~38% CPU
> cElementTree: ~5% CPU
> 
> 500 VMs
> minidom: ~48% CPU
> cElementTree: ~6% CPU
> 
> 1000 VMs
> python thread error :)
> 
>   File "/usr/lib64/python2.7/threading.py", line 746, in start
> _start_new_thread(self.__bootstrap, ())
> thread.error: can't start new thread
> 
> 
> I think this is another proof (if we need more of them) that
> * we _really need_ to move away from the 1 thread per VM model ->
> http://gerrit.ovirt.org/#/c/29189/ and friends! Let's fire up the
> discussion!
> * we should move to cElementTree anyway in the near future: faster
> processing, scales better, nicer API.
>   It is also a pet peeve of mine, I do have some patches floating but we need
>   still some preparation work in the virt package.
> 
> 
> --
> Francesco Romani
> RedHat Engineering Virtualization R & D
> Phone: 8261328
> IRC: fromani
> 
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


Re: [ovirt-devel] XML benchmarks

2014-06-29 Thread Nir Soffer
- Original Message -
> From: "Francesco Romani" 
> To: devel@ovirt.org
> Sent: Friday, June 27, 2014 3:30:14 PM
> Subject: [ovirt-devel] XML benchmarks
> 
> Hi,
> 
> Due to the recent discussion (http://gerrit.ovirt.org/#/c/28712/), and as
> part
> of the ongoing focus on scalability and performances
> (http://gerrit.ovirt.org/#/c/17694/ and many others),
> 
> I took the chance to do a very quick and dirty bench to see how it really
> cost
> to do XML processing in sampling threads (thanks to Nir for the kickstart!),
> and,
> in general, how much the XML processing costs.
> 
> Please find attached the test script and the example XML
> (real one made by VDSM master on my RHEL6.5 box).
> 
> On my laptop:
> 
> $ lscpu
> Architecture:  x86_64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Little Endian
> CPU(s):4
> On-line CPU(s) list:   0-3
> Thread(s) per core:2
> Core(s) per socket:2
> Socket(s): 1
> NUMA node(s):  1
> Vendor ID: GenuineIntel
> CPU family:6
> Model: 58
> Model name:Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
> Stepping:  9
> CPU MHz:   1359.375
> CPU max MHz:   3600.
> CPU min MHz:   1200.
> BogoMIPS:  5786.91
> Virtualization:VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache:  256K
> L3 cache:  4096K
> NUMA node0 CPU(s): 0-3
> 
> 8 GiBs of RAM, running GNOME desktop and the usual development stuff
> 
> xmlbench.py linuxvm1.xml MODE 300
> 
> MODE is either 'md' (minidom) or 'cet' (cElementTree).
> This will run $NUMTHREADS threads fast and loose without synchronization.
> We can actually have this behaviour if a customer just mass start VMs.
> In general I expect some clustering of the sampling activity, not a nice
> evenly interleaved
> time sequence.
> 
> CPU measurement: just opened a terminal and run 'htop' on it.
> CPU profile: clustered around the sampling interval. Usage negligible most of
> time, peak on sampling as shown below
> 
> 300 VMs
> minidom: ~38% CPU
> cElementTree: ~5% CPU

What is 38% - (38% of one core? how may cores are on the machine?)

> 
> 500 VMs
> minidom: ~48% CPU
> cElementTree: ~6% CPU
> 
> 1000 VMs
> python thread error :)
> 
>   File "/usr/lib64/python2.7/threading.py", line 746, in start
> _start_new_thread(self.__bootstrap, ())
> thread.error: can't start new thread
> 
> 
> I think this is another proof (if we need more of them) that
> * we _really need_ to move away from the 1 thread per VM model ->
> http://gerrit.ovirt.org/#/c/29189/ and friends! Let's fire up the
> discussion!
> * we should move to cElementTree anyway in the near future: faster
> processing, scales better, nicer API.
>   It is also a pet peeve of mine, I do have some patches floating but we need
>   still some preparation work in the virt package.

Seeing this load created by parsing libvirt xml every 15 seconds, I think
we should consider decreasing the sample rate suggested in
http://gerrit.ovirt.org/28712 Or collecting the data in another way.

Nir
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] XML benchmarks

2014-06-27 Thread Francesco Romani
Hi,

Due to the recent discussion (http://gerrit.ovirt.org/#/c/28712/), and as part
of the ongoing focus on scalability and performances 
(http://gerrit.ovirt.org/#/c/17694/ and many others),

I took the chance to do a very quick and dirty bench to see how it really cost
to do XML processing in sampling threads (thanks to Nir for the kickstart!), 
and,
in general, how much the XML processing costs.

Please find attached the test script and the example XML
(real one made by VDSM master on my RHEL6.5 box).

On my laptop:

$ lscpu 
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):4
On-line CPU(s) list:   0-3
Thread(s) per core:2
Core(s) per socket:2
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 58
Model name:Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
Stepping:  9
CPU MHz:   1359.375
CPU max MHz:   3600.
CPU min MHz:   1200.
BogoMIPS:  5786.91
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  4096K
NUMA node0 CPU(s): 0-3

8 GiBs of RAM, running GNOME desktop and the usual development stuff

xmlbench.py linuxvm1.xml MODE 300

MODE is either 'md' (minidom) or 'cet' (cElementTree).
This will run $NUMTHREADS threads fast and loose without synchronization.
We can actually have this behaviour if a customer just mass start VMs.
In general I expect some clustering of the sampling activity, not a nice evenly 
interleaved
time sequence.

CPU measurement: just opened a terminal and run 'htop' on it.
CPU profile: clustered around the sampling interval. Usage negligible most of 
time, peak on sampling as shown below

300 VMs
minidom: ~38% CPU
cElementTree: ~5% CPU

500 VMs
minidom: ~48% CPU
cElementTree: ~6% CPU

1000 VMs
python thread error :)

  File "/usr/lib64/python2.7/threading.py", line 746, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread


I think this is another proof (if we need more of them) that
* we _really need_ to move away from the 1 thread per VM model -> 
http://gerrit.ovirt.org/#/c/29189/ and friends! Let's fire up the discussion!
* we should move to cElementTree anyway in the near future: faster processing, 
scales better, nicer API.
  It is also a pet peeve of mine, I do have some patches floating but we need 
still some preparation work in the virt package.


-- 
Francesco Romani
RedHat Engineering Virtualization R & D
Phone: 8261328
IRC: fromani
#!/usr/bin/env python

import sys
import threading
import time
#import lxml.etree
import xml.dom.minidom
import xml.etree.cElementTree
import xml.etree.ElementTree


class Worker(threading.Thread):
def __init__(self, func, xml, delay, numruns):
super(Worker, self).__init__()
self.daemon = True
self.func = func
self.xml = xml
self.delay = delay
self.numruns = numruns

def mustgo(self):
if self.numruns is not None:
self.numruns -= 1
if self.numruns <= 0:
return False
return True

def run(self):
print '%s delay=%i starting!' %(self.name, self.delay)
while self.mustgo():
time.sleep(self.delay)
print '%s go' %(self.name)
self.func(self.xml)
print '%s done!' %(self.name)


PARSERS = {
'md': xml.dom.minidom.parseString,
#'lx': lxml.etree.fromstring,
'et': xml.etree.ElementTree.fromstring,
'cet': xml.etree.cElementTree.fromstring
}


def runner(xml, mode, nthreads, delay, numruns):
workers = []
for i in range(nthreads):
w = Worker(PARSERS[mode], xml, delay, numruns)
w.start()
workers.append(w)

if numruns is None:
while True:
time.sleep(1.0)
else:
for w in workers:
w.join()


def _usage():
print "usage: xmlbench xmlpath mode nthreads [delay [numruns]]"
print "available modes: %s" % ' '.join(PARSERS.keys())

def _main(args):
if len(args) < 3:
_usage()
sys.exit(1)
else:
xmlpath = args[0]
mode = args[1]
nthreads = int(args[2])
delay = int(args[3]) if len(args) > 3 else 15
numruns = args[4] if len(args) > 4 else None
if mode not in PARSERS:
_usage()
sys.exit(2)
with open(xmlpath, 'rt') as xml:
runner(xml.read(), mode, nthreads, delay, numruns)

if __name__ == "__main__":
_main(sys.argv[1:])


linuxvm1.xml
Description: XML document
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel