Actually not a ’toy example’ at all. It is simply the first step in gridding some data I am working with - a problem that is solved by tools like SatPy, but unfortunately I can’t use SatPy because it doesn’t recognize my file format, and you can’t load data directly. Writing a custom file importer for SatPy is probably my next step.
That said, the entire process took around 60 seconds to run. As this step was taking 10, I figured it would be low-hanging fruit for speeding up the process. Obviously I was wrong. For what it’s worth, I did manage to re-factor the code, so instead of generating the entire grid up-front, I generate the boxes as needed to calculate the overlap with the data grid. This brought the processing time down to around 40 seconds, so a definite improvement there. --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145 > On Feb 20, 2019, at 4:30 PM, DL Neil <[email protected]> wrote: > > George > > On 21/02/19 1:15 PM, george trojan wrote: >> def create_box(x_y): >> return geometry.box(x_y[0] - 1, x_y[1], x_y[0], x_y[1] - 1) >> x_range = range(1, 1001) >> y_range = range(1, 801) >> x_y_range = list(itertools.product(x_range, y_range)) >> grid = list(map(create_box, x_y_range)) >> Which creates and populates an 800x1000 “grid” (represented as a flat list >> at this point) of “boxes”, where a box is a shapely.geometry.box(). This >> takes about 10 seconds to run. >> Looking at this, I am thinking it would lend itself well to >> parallelization. Since the box at each “coordinate" is independent of all >> others, it seems I should be able to simply split the list up into chunks >> and process each chunk in parallel on a separate core. To that end, I >> created a multiprocessing pool: > > > I recall a similar discussion when folk were being encouraged to move away > from monolithic and straight-line processing to modular functions - it is > more (CPU-time) efficient to run in a straight line; than it is to repeatedly > call, set-up, execute, and return-from a function or sub-routine! ie there is > an over-head to many/all constructs! > > Isn't the 'problem' that it is a 'toy example'? That the amount of computing > within each parallel process is small in relation to the inherent 'overhead'. > > Thus, if the code performed a reasonable analytical task within each box > after it had been defined (increased CPU load), would you then notice the > expected difference between the single- and multi-process implementations? > > > > From AKL to AK > -- > Regards =dn > -- > https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
