[issue39801] list.insert is slow, likely due to manual memmove

2020-05-17 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
assignee: rhettinger -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I misspoke. I do the insertions at -1 and -500 not on the same list but each on 
a fresh list (of length 2**20+1).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I think I have a decent way to isolate the memmove vs for-loop times, in Python 
code. Here are example results, five rounds of inserting with list.insert or 
with slice assignment. The times for each of the two ways are pretty stable:

  insertslice
  0.024221  0.006754
  0.024157  0.006710
  0.024015  0.006734
  0.024206  0.006731
  0.024123  0.006769

For each run, I build a list of length 2**20 and append one value. Then I can 
insert 2**17 more values without the list internally resizing.

Then I measure inserting at index -1 and consider that the baseline, as it 
includes all the overhead costs like function calls and other differences 
between the two insertion methods.

Then I measure inserting at index -500 and subtract the baseline time. This 
difference should reflect how long the memmove or the for-loop took, as that's 
the difference between inserting at -1 and at -500. It's what I showed in those 
results above.

I chose -500 here because the use case I have in mind is 
sortedcontainers.SortedList, which I think mostly involves lists of length 1000 
or more (up to 2000), and insertions somewhere in the middle.

Results for index -50 instead:

  insertslice
  0.002479  0.002217
  0.002546  0.002113
  0.002566  0.002007
  0.002425  0.002081
  0.002555  0.002261

I'm btw running Python 3.8.1 32-bit on Windows 10 64-bit on an Intel i5-7200U.

Rest of this message is the benchmark code:

from timeit import repeat

statements = (
  'a.insert(-1,0)',
  'a.insert(-50,0)',
  'a[-1:0] = 0,',
  'a[-50:0] = 0,',
)

# Verify that the statements work and don't cause internal list resizes.
for stmt in statements:
a = [0] * 2**20
a.append(0)
size_before = a.__sizeof__()
stmt = compile(stmt, '', 'single')
for _ in range(2**17):
exec(stmt)
assert len(a) == 2**20 + 1 + 2**17
assert a.__sizeof__() == size_before

# Run the benchmark.
print('  insertslice')
for _ in range(5):
times = []
for stmt in statements:
t = min(repeat(stmt, 'a=[0]*2**20; a.append(0)', number=2**17, 
repeat=20))
times.append(t)
print('%10.6f' * 2 % (times[1] - times[0], times[3] - times[2]))

--
status: pending -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Change by Stefan Pochmann :


--
status: open -> pending

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I have better benchmarks now and am trying some more, though I guess we roughly 
agree about when memmove is faster and when it's slower but that the real 
question is how large the common case is. 

Do you know where I can find those past investigations? Was memmove *faster* 
for sizes *over* 16? What is the common case, i.e., how do people use 
list.insert?

--
status: pending -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Marking this as "pending".  It is more of a personal research project than a 
bug report, feature request, or known optimization.

The underlying hypothesis is that compilers aren't smart enough to generate 
good code for a simple for-loop that moves data.

The evidence is weak as involves timing unusual cases, with significantly 
different calling patterns, and that make repeated calls to realloc() which can 
throw the results off wildly.

--
priority: normal -> low
status: open -> pending
versions: +Python 3.9 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Good point, Tim. Although binarysort really moves very few slots (I think at 
most 62, and average like 13). That might not be representative for how people 
use list.insert, either. I think I mainly use it for the mentioned case of a 
"self-sorting list", either naively via bisect.insort with a single list or via 
sortedcontainers' SortedList using multiple lists (used it only once so far). 
If I'm not mistaken, SortedList has a default load factor of 1000 and splits at 
double that size.

I might do more reasonable benchmarks later (haven't read Raymond's reply yet).

In any case, binarysort does its own inserting, so at least it wouldn't be 
affected if list.insert were changed.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Here's a diff you can try out:

diff --git a/Objects/listobject.c b/Objects/listobject.c
index 3c39c6444b..ac6c9cc07a 100644
--- a/Objects/listobject.c
+++ b/Objects/listobject.c
@@ -272,8 +272,7 @@ ins1(PyListObject *self, Py_ssize_t where, PyObject *v)
 if (where > n)
 where = n;
 items = self->ob_item;
-for (i = n; --i >= where; )
-items[i+1] = items[i];
+memmove(items+where+1, items+where+0, (n - where) * sizeof(PyObject *));
 Py_INCREF(v);
 items[where] = v;
 return 0;

For me, that patch makes little difference in the timings.

Another thing that could be tried is to have ins1() call list_ass_slice() to do 
the work.  That way, both paths will use the same underlying insertion code.  
It would be interesting to see if this makes any difference.

A few other thoughts:

* We usually prefer deques to lists when prepending because the former are O(1) 
and the latter are O(n).

* Modern compilers are very good at optimizing data movements in a simple 
for-loop.

* It is very difficult to get good timings for these kind of operations.  
Internally, both make a call to list_resize() which in turn uses realloc().  
The latter function can be very fast or very slow depending solely on whether 
it can resize in-place or has to move the data.  Accordingly, timings are 
easily confounded by minor changes to memory useage.  To get a clearcut timing, 
you would need to isolate just the for-loop or memmove operation and not 
include Python calling overhead, realloc operations, and everything else that 
is going on in the given timings.

* Another source of confounding is the compiler itself.  Changes that makes 
small improvements on Clang may hurt the GCC build or vice-versa.  The 
comparison can also be thrown-off by whether you're timing a PGO build (which 
tends to do an amazing job of optimizing for-foops).  The results may also 
differ between 32-bit builds and 64-builds.

* The timing isn't representative of how people use insert(x, 0).  It would be 
a mistake to optimize an uncommon case if it comes at the expense of the common 
case (insertions into small lists).  To guard against this, you would need to 
measure inserts into a small list to find-out whether memmove() has more setup 
time than the for-loop.  When we've done such investigations in the past, 
for-loops were found to beat memmove() for sizes under 16.  Almost certainly 
this changed as compilers and processors have changed over the years.

* For smaller list sizes, the calling overhead dominates the insertion time.  
Disassembling the two different calls show a good deal of overhead for calls.

>>> from dis import dis
>>> dis('a.insert(0,0)')
  1   0 LOAD_NAME0 (a)
  2 LOAD_METHOD  1 (insert)
  4 LOAD_CONST   0 (0)
  6 LOAD_CONST   0 (0)
  8 CALL_METHOD  2
 10 RETURN_VALUE
>>> dis('a[0:0]=[0]')
  1   0 LOAD_CONST   0 (0)
  2 BUILD_LIST   1
  4 LOAD_NAME0 (a)
  6 LOAD_CONST   0 (0)
  8 LOAD_CONST   0 (0)
 10 BUILD_SLICE  2
 12 STORE_SUBSCR
 14 LOAD_CONST   1 (None)
 16 RETURN_VALUE

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Tim Peters


Tim Peters  added the comment:

Be cautious about this.  Many years ago I looked into this, mostly in the 
context of the list sort's binary insertion sort, and results were all over the 
map, depending on the compiler in use, the optimization level, and the range at 
which (if any!) memmove() was actually faster than a simple loop.  A fancy 
memmove() will (at least) arrange to copy (say) 8 bytes at a time, but the 
overhead of setting that up (+ the function call) made it a loser when the 
number of bytes to be moved was "too small".

Don't know whether the comment in listobject.c's binarysort() still applies:

   Caution: using memmove is much slower under MSVC 5;
   we're not usually moving many slots. */

Optimizing to move 100,000 elements isn't really sane - cutting the constant 
factor on an inherently quadratic-time too-simplistic basic approach isn't 
really doing you a favor ;-)

Also note that sortedcontainers does not use unbounded-length Python lists 
under the covers.  It puts an upper bound on the length of the Python lists it 
uses, precisely to avoid visibly quadratic-time behavior.

--
nosy: +tim.peters

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Benchmarking with two *Python* versions of bisect.insort, the "insert" version 
takes 1.08 seconds to insort the shuffled range(10**5) while the slice 
assignment version only takes 0.46 seconds:

from timeit import timeit
import random
from bisect import bisect_right

def insort1(a, x):
lo = bisect_right(a, x)
a.insert(lo, x)

def insort2(a, x):
lo = bisect_right(a, x)
a[lo:lo] = [x]

for _ in range(3):
a = list(range(10**5))
random.shuffle(a)
for insort in insort1, insort2:
it = iter(a)
s = []
print(timeit(lambda: insort(s, next(it)), number=len(a)))
print()

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I believe it also affects bisect.insort, which I occasionally use when I need a 
"self-sorting list" (I can't easily test it, as I'm not set up to modify the C 
version of bisect.insort).

And also the popular sortedcontainers package, which relies on such list 
operations to be fast: http://www.grantjenks.com/docs/sortedcontainers/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Raymond Hettinger


Change by Raymond Hettinger :


--
assignee:  -> rhettinger
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Change by Stefan Pochmann :


--
title: list.insert is slow due to manual memmove -> list.insert is slow, likely 
due to manual memmove

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com