New submission from Raymond Hettinger:

Heapify() is implemented with a series of siftup() operations that aggregate 
small heaps into bigger heaps.  

The current algorithm builds all the smallest heaps first, then makes all of 
the next largest size, and so on.  This is not cache friendly because the 
aggregation step operates on smaller heaps built long before.  The new 
algorithm performs the aggregation step immediately after its child heaps are 
built (while they are still likely to be in cache).

The overall algorithm is the same (children built before parents).  The number 
of comparisons is the same.   And the resulting heap is identical.  The only 
difference is performing work while the inputs are still in cache rather than 
waiting until all heaps at the current size have been built.

For small heaps that already fit entirely in L1 cache, there is no benefit, so 
we stick with the current version which has less branching. For larger heaps, 
we switch to the new order.

The timings and benefits depend on the number and sizes of the objects being 
heapified as well as the relative speed of L1 to L2 to L3 to DRAM.

------------------------------------------------------------------

For those who are interested, here timings for heapifying shuffled lists of 
various sizes.  The elements are tuples of length 1 that contain distinct 
integers.

The first row has the time to copy to the data.  It should be substracted from 
the timings on the next two rows which time the new algorithm versus the old 
algorithm.

The benefits don't start to show up until after N is over 1000 (depending on 
the input type, the breakeven point seems to fall somewhere between 1200 and 
2500 on my machine).

N = 100
[2.9262970201671124e-05, 2.9265997000038624e-05, 2.9325950890779495e-05] 
tupledata[:]
[0.0006274560000747442, 0.0006340609397739172, 0.0006361680570989847] 
heapify(tupledata[:])
[0.0006139189936220646, 0.0006186790997162461, 0.000632670009508729] 
heapify_old(tupledata[:])

[2.8867041692137718e-05, 2.8883921913802624e-05, 2.896797377616167e-05] 
tupledata[:]
[0.000608008005656302, 0.0006171419518068433, 0.0006187589606270194] 
heapify(tupledata[:])
[0.0006224410608410835, 0.000638791942037642, 0.0006388520123437047] 
heapify_old(tupledata[:])

[2.89019662886858e-05, 2.8969021514058113e-05, 2.8973910957574844e-05] 
tupledata[:]
[0.0006031119264662266, 0.0006048450013622642, 0.0006136660231277347] 
heapify(tupledata[:])
[0.000612352043390274, 0.0006144039798527956, 0.0006217029877007008] 
heapify_old(tupledata[:])


N = 1000
[0.0002854769118130207, 0.0002856890205293894, 0.00028590403962880373] 
tupledata[:]
[0.006836145068518817, 0.006866019102744758, 0.006885501905344427] 
heapify(tupledata[:])
[0.0067316299537196755, 0.006792359985411167, 0.0067987809889018536] 
heapify_old(tupledata[:])

[0.00028532894793897867, 0.0002853329060599208, 0.00028538203332573175] 
tupledata[:]
[0.006822419003583491, 0.0068415619898587465, 0.006888034055009484] 
heapify(tupledata[:])
[0.006734892027452588, 0.006814536056481302, 0.0068227669689804316] 
heapify_old(tupledata[:])

[0.00028527993708848953, 0.0002854960039258003, 0.0002858199877664447] 
tupledata[:]
[0.006787727936170995, 0.0067988099763169885, 0.006827510078437626] 
heapify(tupledata[:])
[0.0067258820636197925, 0.006815586006268859, 0.006871008081361651] 
heapify_old(tupledata[:])


N = 10000
[0.004415847011841834, 0.004417525022290647, 0.0044295149855315685] tupledata[:]
[0.07748138904571533, 0.07753941905684769, 0.07756883592810482] 
heapify(tupledata[:])
[0.08400217199232429, 0.08420385408680886, 0.08428021904546767] 
heapify_old(tupledata[:])

[0.004418709082528949, 0.004422315978445113, 0.004425868042744696] tupledata[:]
[0.07753065403085202, 0.0775474050315097, 0.07755298691336066] 
heapify(tupledata[:])
[0.08406145800836384, 0.08412359503563493, 0.08419332408811897] 
heapify_old(tupledata[:])

[0.0044234748929739, 0.0044267530320212245, 0.0044296300038695335] tupledata[:]
[0.07729987089987844, 0.07750388595741242, 0.07770221296232194] 
heapify(tupledata[:])
[0.08401058206800371, 0.0840839499142021, 0.08423375408165157] 
heapify_old(tupledata[:])


N = 100000
[0.055330604896880686, 0.05594596697483212, 0.056045167963020504] tupledata[:]
[1.2075877389870584, 1.207723677973263, 1.2084980909712613] 
heapify(tupledata[:])
[1.56127171497792, 1.5691186729818583, 1.575164051959291] 
heapify_old(tupledata[:])

[0.0558202009415254, 0.05597207904793322, 0.0560223578941077] tupledata[:]
[1.2101711059221998, 1.211772706010379, 1.2120026310440153] 
heapify(tupledata[:])
[1.5360360990744084, 1.5435883220052347, 1.5501357419416308] 
heapify_old(tupledata[:])

[0.05999936908483505, 0.06000674597453326, 0.06018067698460072] tupledata[:]
[1.209613809012808, 1.2116600699955598, 1.2144729839637876] 
heapify(tupledata[:])
[1.5371010650414973, 1.5499007020844147, 1.5706949040759355] 
heapify_old(tupledata[:])


N = 1000000
[0.8224946830887347, 0.8234598189592361, 0.8247971039963886] tupledata[:]
[18.152570085017942, 18.340327466023155, 18.413799613015726] 
heapify(tupledata[:])
[19.786154965986498, 19.91440916794818, 19.952165015041828] 
heapify_old(tupledata[:])

[0.8147928019752726, 0.8154206149047241, 0.8169217950198799] tupledata[:]
[18.227028850931674, 18.265947047038935, 18.36190685792826] 
heapify(tupledata[:])
[19.587209751014598, 19.62119024794083, 19.85366743709892] 
heapify_old(tupledata[:])

[0.8098425359930843, 0.8100302360253409, 0.8104055189760402] tupledata[:]
[18.16859135404229, 18.207948053022847, 18.350174001068808] 
heapify(tupledata[:])
[19.6270396419568, 19.85634774295613, 20.017801710986532] 
heapify_old(tupledata[:])


N = 10000000
[16.074914730968885, 16.084022041060962, 16.150474293972366] tupledata[:]
[205.83624146296643, 205.94496312504634, 205.96075649105478] 
heapify(tupledata[:])
[349.2653319039382, 349.8653641429264, 351.06795906298794] 
heapify_old(tupledata[:])

[16.07795425108634, 16.091183452052064, 16.10310253489297] tupledata[:]
[205.1686675120145, 206.0234369279351, 206.10121345799416] heapify(tupledata[:])
[348.308155576, 348.6673470499227, 348.7512100059539] heapify_old(tupledata[:])

[16.074230056954548, 16.098330755950883, 16.106685669976287] tupledata[:]
[205.18946332705673, 205.4753301080782, 205.5993042109767] heapify(tupledata[:])
[349.5718760070158, 349.6067797210999, 351.29123334004544] 
heapify_old(tupledata[:])

----------
assignee: rhettinger
components: Extension Modules
files: draft_heapq.diff
keywords: patch
messages: 242850
nosy: rhettinger
priority: normal
severity: normal
status: open
title: Optimize heapify for better cache utililzation
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file39331/draft_heapq.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24155>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to