Re: Multiprocessing performance question

2019-02-21 Thread DL Neil

George: apologies for mis-identifying yourself as OP.

Israel:

On 22/02/19 6:04 AM, Israel Brewster wrote:
Actually not a ’toy example’ at all. It is simply the first step in 
gridding some data I am working with - a problem that is solved by tools 
like SatPy, but unfortunately I can’t use SatPy because it doesn’t 
recognize my file format, and you can’t load data directly. Writing a 
custom file importer for SatPy is probably my next step.


Not to focus on the word "toy", the governing issue is of setup cost cf 
the acceleration afforded by the parallel processing. In this case, the 
former is/was more-or-less as high as the latter, and your efforts were 
insufficiently rewarded.


That said, if the computer was concurrently performing this task and a 
number of others, the number of cores available to you would decrease. 
At which point, speeds start heading backwards!


This is largely speculation because only you know the task, objectives, 
and circumstances - however, for those 'playing along at home' and 
learning from your experiment...



That said, the entire process took around 60 seconds to run. As this 
step was taking 10, I figured it would be low-hanging fruit for speeding 
up the process. Obviously I was wrong. For what it’s worth, I did manage 
to re-factor the code, so instead of generating the entire grid 
up-front, I generate the boxes as needed to calculate the overlap with 
the data grid. This brought the processing time down to around 40 
seconds, so a definite improvement there.


Doing it on-demand. Now you're talking! Plus, if you're able to 'fit' 
the data into each box as it is created, that will help justify the 
setup/tear-down overhead cost for each async process.


Well done!




---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

On Feb 20, 2019, at 4:30 PM, DL Neil > wrote:


George

On 21/02/19 1:15 PM, george trojan wrote:

def create_box(x_y):
return geometry.box(x_y[0] - 1, x_y[1],  x_y[0], x_y[1] - 1)
x_range = range(1, 1001)
y_range = range(1, 801)
x_y_range = list(itertools.product(x_range, y_range))
grid = list(map(create_box, x_y_range))
Which creates and populates an 800x1000 “grid” (represented as a flat 
list

at this point) of “boxes”, where a box is a shapely.geometry.box(). This
takes about 10 seconds to run.
Looking at this, I am thinking it would lend itself well to
parallelization. Since the box at each “coordinate" is independent of all
others, it seems I should be able to simply split the list up into chunks
and process each chunk in parallel on a separate core. To that end, I
created a multiprocessing pool:



I recall a similar discussion when folk were being encouraged to move 
away from monolithic and straight-line processing to modular functions 
- it is more (CPU-time) efficient to run in a straight line; than it 
is to repeatedly call, set-up, execute, and return-from a function or 
sub-routine! ie there is an over-head to many/all constructs!


Isn't the 'problem' that it is a 'toy example'? That the amount of 
computing within each parallel process is small in relation to the 
inherent 'overhead'.


Thus, if the code performed a reasonable analytical task within each 
box after it had been defined (increased CPU load), would you then 
notice the expected difference between the single- and multi-process 
implementations?




From AKL to AK
--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list




--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing performance question

2019-02-21 Thread Israel Brewster
Actually not a ’toy example’ at all. It is simply the first step in gridding 
some data I am working with - a problem that is solved by tools like SatPy, but 
unfortunately I can’t use SatPy because it doesn’t recognize my file format, 
and you can’t load data directly. Writing a custom file importer for SatPy is 
probably my next step.

That said, the entire process took around 60 seconds to run. As this step was 
taking 10, I figured it would be low-hanging fruit for speeding up the process. 
Obviously I was wrong. For what it’s worth, I did manage to re-factor the code, 
so instead of generating the entire grid up-front, I generate the boxes as 
needed to calculate the overlap with the data grid. This brought the processing 
time down to around 40 seconds, so a definite improvement there.
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> On Feb 20, 2019, at 4:30 PM, DL Neil  wrote:
> 
> George
> 
> On 21/02/19 1:15 PM, george trojan wrote:
>> def create_box(x_y):
>> return geometry.box(x_y[0] - 1, x_y[1],  x_y[0], x_y[1] - 1)
>> x_range = range(1, 1001)
>> y_range = range(1, 801)
>> x_y_range = list(itertools.product(x_range, y_range))
>> grid = list(map(create_box, x_y_range))
>> Which creates and populates an 800x1000 “grid” (represented as a flat list
>> at this point) of “boxes”, where a box is a shapely.geometry.box(). This
>> takes about 10 seconds to run.
>> Looking at this, I am thinking it would lend itself well to
>> parallelization. Since the box at each “coordinate" is independent of all
>> others, it seems I should be able to simply split the list up into chunks
>> and process each chunk in parallel on a separate core. To that end, I
>> created a multiprocessing pool:
> 
> 
> I recall a similar discussion when folk were being encouraged to move away 
> from monolithic and straight-line processing to modular functions - it is 
> more (CPU-time) efficient to run in a straight line; than it is to repeatedly 
> call, set-up, execute, and return-from a function or sub-routine! ie there is 
> an over-head to many/all constructs!
> 
> Isn't the 'problem' that it is a 'toy example'? That the amount of computing 
> within each parallel process is small in relation to the inherent 'overhead'.
> 
> Thus, if the code performed a reasonable analytical task within each box 
> after it had been defined (increased CPU load), would you then notice the 
> expected difference between the single- and multi-process implementations?
> 
> 
> 
> From AKL to AK
> -- 
> Regards =dn
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing performance question

2019-02-20 Thread george trojan
I don't know whether this is a toy example, having grid of this size is not
uncommon. True, it would make more sense to do distribute more work on each
box, if there was any. One has to find a proper balance, as with many other
things in life. I simply  responded to a question by the OP.

George

On Thu, 21 Feb 2019 at 01:30, DL Neil 
wrote:

> George
>
> On 21/02/19 1:15 PM, george trojan wrote:
> > def create_box(x_y):
> >  return geometry.box(x_y[0] - 1, x_y[1],  x_y[0], x_y[1] - 1)
> >
> > x_range = range(1, 1001)
> > y_range = range(1, 801)
> > x_y_range = list(itertools.product(x_range, y_range))
> >
> > grid = list(map(create_box, x_y_range))
> >
> > Which creates and populates an 800x1000 “grid” (represented as a flat
> list
> > at this point) of “boxes”, where a box is a shapely.geometry.box(). This
> > takes about 10 seconds to run.
> >
> > Looking at this, I am thinking it would lend itself well to
> > parallelization. Since the box at each “coordinate" is independent of all
> > others, it seems I should be able to simply split the list up into chunks
> > and process each chunk in parallel on a separate core. To that end, I
> > created a multiprocessing pool:
>
>
> I recall a similar discussion when folk were being encouraged to move
> away from monolithic and straight-line processing to modular functions -
> it is more (CPU-time) efficient to run in a straight line; than it is to
> repeatedly call, set-up, execute, and return-from a function or
> sub-routine! ie there is an over-head to many/all constructs!
>
> Isn't the 'problem' that it is a 'toy example'? That the amount of
> computing within each parallel process is small in relation to the
> inherent 'overhead'.
>
> Thus, if the code performed a reasonable analytical task within each box
> after it had been defined (increased CPU load), would you then notice
> the expected difference between the single- and multi-process
> implementations?
>
>
>
>  From AKL to AK
> --
> Regards =dn
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing performance question

2019-02-20 Thread DL Neil

George

On 21/02/19 1:15 PM, george trojan wrote:

def create_box(x_y):
 return geometry.box(x_y[0] - 1, x_y[1],  x_y[0], x_y[1] - 1)

x_range = range(1, 1001)
y_range = range(1, 801)
x_y_range = list(itertools.product(x_range, y_range))

grid = list(map(create_box, x_y_range))

Which creates and populates an 800x1000 “grid” (represented as a flat list
at this point) of “boxes”, where a box is a shapely.geometry.box(). This
takes about 10 seconds to run.

Looking at this, I am thinking it would lend itself well to
parallelization. Since the box at each “coordinate" is independent of all
others, it seems I should be able to simply split the list up into chunks
and process each chunk in parallel on a separate core. To that end, I
created a multiprocessing pool:



I recall a similar discussion when folk were being encouraged to move 
away from monolithic and straight-line processing to modular functions - 
it is more (CPU-time) efficient to run in a straight line; than it is to 
repeatedly call, set-up, execute, and return-from a function or 
sub-routine! ie there is an over-head to many/all constructs!


Isn't the 'problem' that it is a 'toy example'? That the amount of 
computing within each parallel process is small in relation to the 
inherent 'overhead'.


Thus, if the code performed a reasonable analytical task within each box 
after it had been defined (increased CPU load), would you then notice 
the expected difference between the single- and multi-process 
implementations?




From AKL to AK
--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing performance question

2019-02-20 Thread george trojan
def create_box(x_y):
return geometry.box(x_y[0] - 1, x_y[1],  x_y[0], x_y[1] - 1)

x_range = range(1, 1001)
y_range = range(1, 801)
x_y_range = list(itertools.product(x_range, y_range))

grid = list(map(create_box, x_y_range))

Which creates and populates an 800x1000 “grid” (represented as a flat list
at this point) of “boxes”, where a box is a shapely.geometry.box(). This
takes about 10 seconds to run.

Looking at this, I am thinking it would lend itself well to
parallelization. Since the box at each “coordinate" is independent of all
others, it seems I should be able to simply split the list up into chunks
and process each chunk in parallel on a separate core. To that end, I
created a multiprocessing pool:

pool = multiprocessing.Pool()

And then called pool.map() rather than just “map”. Somewhat to my surprise,
the execution time was virtually identical. Given the simplicity of my
code, and the presumable ease with which it should be able to be
parallelized, what could explain why the performance did not improve at all
when moving from the single-process map() to the multiprocess map()?

I am aware that in python3, the map function doesn’t actually produce a
result until needed, but that’s why I wrapped everything in calls to
list(), at least for testing.


The reason multiprocessing does not speed things up is the overhead of
pickling/unpickling objects. Here are results on my machine, running
Jupyter notebook:

def create_box(xy):
return geometry.box(xy[0]-1, xy[1], xy[0], xy[1]-1)

nx = 1000
ny = 800
xrange = range(1, nx+1)
yrange = range(1, ny+1)
xyrange = list(itertools.product(xrange, yrange))

%%time
grid1 = list(map(create_box, xyrange))

CPU times: user 9.88 s, sys: 2.09 s, total: 12 s
Wall time: 10 s

%%time
pool = multiprocessing.Pool()
grid2 = list(pool.map(create_box, xyrange))

CPU times: user 8.48 s, sys: 1.39 s, total: 9.87 s
Wall time: 10.6 s

Results exactly as yours. To see what is going on, I rolled out my own
chunking that allowed me to add some print statements.

%%time
def myfun(chunk):
g = list(map(create_box, chunk))
print('chunk', chunk[0], datetime.now().isoformat())
return g

pool = multiprocessing.Pool()
chunks = [xyrange[i:i+100*ny] for i in range(0, nx*ny, 100*ny)]
print('starting', datetime.now().isoformat())
gridlist = list(pool.map(myfun, chunks))
grid3 = list(itertools.chain(*gridlist))
print('done', datetime.now().isoformat())

starting 2019-02-20T23:03:50.883180
chunk (1, 1) 2019-02-20T23:03:51.674046
chunk (701, 1) 2019-02-20T23:03:51.748765
chunk (201, 1) 2019-02-20T23:03:51.772458
chunk (401, 1) 2019-02-20T23:03:51.798917
chunk (601, 1) 2019-02-20T23:03:51.805113
chunk (501, 1) 2019-02-20T23:03:51.807163
chunk (301, 1) 2019-02-20T23:03:51.818911
chunk (801, 1) 2019-02-20T23:03:51.974715
chunk (101, 1) 2019-02-20T23:03:52.086421
chunk (901, 1) 2019-02-20T23:03:52.692573
done 2019-02-20T23:04:02.477317
CPU times: user 8.4 s, sys: 1.7 s, total: 10.1 s
Wall time: 12.9 s

All ten subprocesses finished within 2 seconds. It took about 10 seconds to
get back and assemble the partial results. The objects have to be packed,
sent through network and unpacked. Unpacking is done by the main (i.e.
single) process. This takes almost the same time as creating the objects
from scratch. Essentially the process does the following:

%%time
def f(b):
g1 = b[0].__new__(b[0])
g1.__setstate__(b[2])
return g1
buf = [g.__reduce__() for g in grid1]
grid4 = [f(b) for b in buf]

CPU times: user 20 s, sys: 411 ms, total: 20.4 s
Wall time: 20.3 s

The first line creates the pickle (not exactly, as pickled data is a single
string, not a list). The second line is what pickle.loads() does.

I do not think numpy will help here. The Python function box() has to be
called 800k times. This will take time. np.vectorize(), as the
documentation states, is provided only for convenience, it is implemented
with a for loop. IMO vectorization would have to be done on C level.

Greetings from Anchorage

George
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing performance question

2019-02-18 Thread Israel Brewster


> On Feb 18, 2019, at 6:37 PM, Ben Finney  wrote:
> 
> I don't have anything to add regarding your experiments with
> multiprocessing, but:
> 
> Israel Brewster  writes:
> 
>> Which creates and populates an 800x1000 “grid” (represented as a flat
>> list at this point) of “boxes”, where a box is a
>> shapely.geometry.box(). This takes about 10 seconds to run.
> 
> This seems like the kind of task NumPy http://www.numpy.org/> is
> designed to address: Generating and manipulating large-to-huge arrays of
> numbers, especially numbers that are representable directly in the
> machine's basic number types (such as moderate-scale integers).
> 
> Have you tried using that library and timing the result?

Sort of. I am using that library, and in fact once I get the result I am 
converting it to a NumPy array for further use/processing, however I am still a 
NumPy newbie and have not been able to find a function that generates a numpy 
array from a function. There is the numpy.fromfunction() command, of course, 
but “…the function is called with … each parameter representing the coordinates 
of the array varying along a specific axis…”, which basically means (if my 
understanding/inital testing is correct) that my function would need to work 
with *arrays* of x,y coordinates. But the geometry.box() function needs 
individual x,y coordinates, not arrays, so I’d have to loop through the arrays 
and append to a new one or something to produce the output that numpy needs, 
which puts me back pretty much to the same code I already have.

There may be a way to make it work, but so far I haven’t been able to figure it 
out any better than the code I’ve got followed by converting to a numpy array. 
You do bring up a good point though: there is quite possibly a better way to do 
this, and knowing that would be just as good as knowing why multiprocessing 
doesn’t improve performance. Thanks!
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> 
> -- 
> \ “You don't need a book of any description to help you have some |
>  `\kind of moral awareness.” —Dr. Francesca Stavrakoloulou, bible |
> _o__)  scholar, 2011-05-08 |
> Ben Finney
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Multiprocessing performance question

2019-02-18 Thread Ben Finney
I don't have anything to add regarding your experiments with
multiprocessing, but:

Israel Brewster  writes:

> Which creates and populates an 800x1000 “grid” (represented as a flat
> list at this point) of “boxes”, where a box is a
> shapely.geometry.box(). This takes about 10 seconds to run.

This seems like the kind of task NumPy http://www.numpy.org/> is
designed to address: Generating and manipulating large-to-huge arrays of
numbers, especially numbers that are representable directly in the
machine's basic number types (such as moderate-scale integers).

Have you tried using that library and timing the result?

-- 
 \ “You don't need a book of any description to help you have some |
  `\kind of moral awareness.” —Dr. Francesca Stavrakoloulou, bible |
_o__)  scholar, 2011-05-08 |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list