[issue46747] bisect.bisect/insort don't document key parameter

2022-02-13 Thread Stefan Pochmann

New submission from Stefan Pochmann :

The signatures for the versions without "_right" suffix are missing the key 
parameter:

bisect.bisect_right(a, x, lo=0, hi=len(a), *, key=None)
bisect.bisect(a, x, lo=0, hi=len(a))¶

bisect.insort_right(a, x, lo=0, hi=len(a), *, key=None)
bisect.insort(a, x, lo=0, hi=len(a))¶

https://docs.python.org/3/library/bisect.html#bisect.bisect_right
https://docs.python.org/3/library/bisect.html#bisect.insort_right

--
assignee: docs@python
components: Documentation
messages: 413213
nosy: Stefan Pochmann, docs@python
priority: normal
severity: normal
status: open
title: bisect.bisect/insort don't document key parameter
versions: Python 3.10, Python 3.11

___
Python tracker 
<https://bugs.python.org/issue46747>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46302] IndexError inside list comprehension + workaround

2022-01-07 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

The error occurs when you do code.strip()[0] when code is " ", not "u2".

--
nosy: +Stefan Pochmann

___
Python tracker 
<https://bugs.python.org/issue46302>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45851] statistics.multimode is inefficient (time and space) (mode somewhat, too)

2021-11-20 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Ok, thanks, had somewhat expected that, as the multimode proposal was rather 
clearly better but the mode proposal not so much.

--

___
Python tracker 
<https://bugs.python.org/issue45851>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45852] statistics.mode test doesn't test what it claims to

2021-11-19 Thread Stefan Pochmann


New submission from Stefan Pochmann :

This test:

def test_counter_data(self):
# Test that a Counter is treated like any other iterable.
data = collections.Counter([1, 1, 1, 2])
# Since the keys of the counter are treated as data points, not the
# counts, this should return the first mode encountered, 1
self.assertEqual(self.func(data), 1)

If the mode() code *were* wrong this way (used Counter(data) instead of 
Counter(iter(data))), then the test wouldn't detect it, as mode() would still 
return 1. The test data should be [1, 2, 2, 2] instead, in which case such 
wrong mode() would return 2.

It used to be correct but wasn't adjusted correctly when mode() switched from 
raising an error for multiple modes to returning the first. The old code was:

def test_counter_data(self):
# Test that a Counter is treated like any other iterable.
data = collections.Counter([1, 1, 1, 2])
# Since the keys of the counter are treated as data points, not the
# counts, this should raise.
self.assertRaises(statistics.StatisticsError, self.func, data)

--
components: Tests
messages: 406642
nosy: Stefan Pochmann
priority: normal
severity: normal
status: open
title: statistics.mode test doesn't test what it claims to
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue45852>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45851] statistics.multimode is inefficient (time and space) (mode somewhat, too)

2021-11-19 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

(somehow the benchmark script didn't get attached, trying again)

--
Added file: https://bugs.python.org/file50453/multimode_mode.py

___
Python tracker 
<https://bugs.python.org/issue45851>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45851] statistics.multimode is inefficient (time and space) (mode somewhat, too)

2021-11-19 Thread Stefan Pochmann


New submission from Stefan Pochmann :

The current implementation is:

def multimode(data):
counts = Counter(iter(data)).most_common()
maxcount, mode_items = next(groupby(counts, key=itemgetter(1)), (0, []))
return list(map(itemgetter(0), mode_items))

The most_common() does a complete sort of Counter item tuples, taking O(n log 
n) time and quite big O(n) extra space (mostly for all those tuples).

When Raymond Hettinger suggested it in 
https://bugs.python.org/issue35892#msg336338 he said it should have "running 
speed that is optimal for the desired result". But then he detailed that with 
"Slow O(n log n), loads all data in memory, full sort".

Which seems like a mistake, as that's not optimal. It's easy to do in O(n) time 
and O(1) extra memory (in addition to the Counter and result, I mean):

def multimode(data):
counts = Counter(iter(data))
if not counts:
return []
maxcount = max(counts.values())
return [value for value, count in counts.items() if count == maxcount]

If there are only very few *different* values then the time/space after 
creating the Counter is insignificant compared to the Counter creation. But if 
there are many different values, it can be significant.

statistics.mode takes O(n) time and O(1) space, which is optimal, but I found 
an apparently faster way anyway (code at end).

For example for data = random.choices(range(n), k=n):

   | multimode |   mode
 n | current  proposal | current  proposal
---+---+--
 1 |131%70%|125%   58%
10 |144%73%|119%   53%
   100 |126%71%|108%   29%
 1,000 |123%65%| 62%   22%
10,000 |172%55%| 53%   18%
   100,000 |164%44%| 55%   20%
 1,000,000 | 85%20%| 22%4%
10,000,000 | 56%12%| 11%4%

All four start with Counter(iter(data)), so I took that as baseline and the 
above results show relative additional times. For example 55% means if Counter 
construction alone took 10 seconds, the function took 15.5 seconds.

An extreme case, data = list(range(n)):

   | multimode|   mode
 n | current proposal | current proposal
---+--+-
 1 |   128%   67% |   124%   56%
10 |   187%   93% |   141%   52%
   100 |   316%  149% |   181%   45%
 1,000 |   380%  174% |   213%   46%
10,000 |   349%  111% |   146%   30%
   100,000 |   397%  128% |   159%   34%
 1,000,000 |   336%   95% |   112%   24%
10,000,000 |   349%   97% |   109%   23%

I also tried a bunch of other cases, didn't find one where my versions weren't 
quite a bit faster.

My mode() version:

from operator import indexOf
from itertools import islice

def mode(data):
counts = Counter(iter(data))
if not counts:
raise StatisticsError('no mode for empty data') from None
maxcount = max(counts.values())
index = indexOf(counts.values(), maxcount)
return next(islice(counts, index, None))

--
components: Library (Lib)
messages: 406638
nosy: Stefan Pochmann
priority: normal
severity: normal
status: open
title: statistics.multimode is inefficient (time and space) (mode somewhat, too)
type: performance
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue45851>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-11-14 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Turns out for n=100_000, k=50_000, about 87% of my factors are 1, so they don't 
even need to be turned into Python ints for multiplication, improving the 
multiplication part to 3.05 ms. And a C++ version to produce the factors took 
0.85 ms. Updated estimation:

1510.4 ms  math.comb(n, k)
 460.8 ms  factorial(n) // (factorial(k) * factorial(n-k))
   3.9 ms  *estimation* for mycomb if written in C

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-11-14 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

And for Raymond's case 4), about running very long and not responding to 
SIGINT, with n=1_000_000 and k=500_000:

150.91 seconds  math.comb(n, k)
 39.11 seconds  factorial(n) // (factorial(k) * factorial(n-k))
  0.40 seconds  mycomb(n, k)
  0.14 seconds  *estimation* for mycomb if written in C

And for n=10_000_000 and k=5_000_000:

   ~4 hours*estimation* for math.comb(n, k)
   ~1 hour *estimation* for factorials solution
  8.3 seconds  mycomb(n, k)
  4.5 seconds  *estimation* for mycomb if written in C

--

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37295] Possible optimizations for math.comb()

2021-11-14 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I wrote a Python solution ("mycomb") that computes comb(100_000, 50_000) 
faster, maybe of interest:

1510.4 ms  math.comb(n, k)
 460.8 ms  factorial(n) // (factorial(k) * factorial(n-k))
  27.5 ms  mycomb(n, k)
   6.7 ms  *estimation* for mycomb if written in C

The idea:

13 * 12 * 11 * 10 * 9 * 8
comb(13, 6)  =  -  =  13 * 1 * 11 * 1 * 3 * 4
  1 * 2 * 3 * 4 * 5 * 6

It lists the numerator factors, then divides the denominator factors out of 
them (using primes), then just multiplies.

Preparing the factors for the final multiplication took most of the time, about 
23.1 ms. That part only needs numbers <= n, so it could be done with C ints and 
be much faster. If it's ten times faster, then mycomb in C would take 23.1/10 + 
(27.5-23.1) = 6.71 ms.

See the comb_with_primes.py file.

--
nosy: +Stefan Pochmann
Added file: https://bugs.python.org/file50439/comb_with_primes.py

___
Python tracker 
<https://bugs.python.org/issue37295>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45752] copy module doc wrongly says it doesn't copy arrays

2021-11-13 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Just saw that it's in copy.py's docstring as well:

"This version does not copy types like module, class, function, method,
nor stack trace, stack frame, nor file, socket, window, nor array, nor
any similar types."

https://github.com/python/cpython/blob/3.10/Lib/copy.py#L41-L43

--

___
Python tracker 
<https://bugs.python.org/issue45752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45752] copy module doc wrongly says it doesn't copy arrays

2021-11-08 Thread Stefan Pochmann


New submission from Stefan Pochmann :

The doc https://docs.python.org/3/library/copy.html says:

"This module does not copy types like module, method, stack trace, stack frame, 
file, socket, window, array, or any similar types."

But it does copy arrays just fine:

import copy, array
a = array.array('i', [1, 2])
b = copy.copy(a)
a[0] = 3
print(a)
print(b)

Output:

array('i', [3, 2])
array('i', [1, 2])

--
assignee: docs@python
components: Documentation
messages: 405962
nosy: Stefan Pochmann, docs@python
priority: normal
severity: normal
status: open
title: copy module doc wrongly says it doesn't copy arrays
versions: Python 3.10, Python 3.11, Python 3.6, Python 3.7, Python 3.8, Python 
3.9

___
Python tracker 
<https://bugs.python.org/issue45752>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-28 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

> This is already faster in pure Python than list.sort() for cases like:

Seems to depend on the system, it's slower on my laptop but faster on GCE:

Python 3.10.0 on my laptop:
 7.42 s  lexisort
 6.83 s  sort
 5.07 s  groupsort

Python 3.9.2 on Google Compute Engine:
 2.09 s  lexisort
 2.64 s  list.sort
 1.52 s  groupsort

This is my groupsort btw:

def groupsort(xs):
xs.sort(key=itemgetter(0))
start = 0
n = len(xs)
tail = itemgetter(slice(1, None))
for i in range(1, n+1):
if i == n or xs[i-1][0] < xs[i][0]:
sublist = xs[start:i]
sublist.sort(key=tail)
xs[start:i] = sublist
start = i

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-22 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Yes, I'm more familiar with the issue in the context of strings or lists. Your 
example of strings like "'x' * 10_000 + str(i)" looks like something I almost 
certainly used before as counterexample to someone's time complexity claim :-)

I the context of multi-criteria sort I might not have thought of it before, I 
guess because you usually don't have many criteria. But now this made me think 
we can take even more advantage of the existing tuple_elem_compare.

I the context of multi-criteria sort I might not have thought of it before, I 
guess because you usually don't have many criteria. But now the optimized 
tuple_elem_compare makes me think we can take even more advantage of it.

I think those type-specific optimized comparison functions are what I left out 
when I previously read the sort, that's why it was new to me. I just saw you 
also use powersort's merge strategy now. Kinda sad, I had seen you lament that 
your own strategy "remains hard to grasp why it's always correct now", while I 
think it's rather straightforward and had considered adding a little 
explanation in the doc.

Bucket sort is a name I considered, but I wrote two, so named them after their 
implementation (groupsort first used itertools.groupby).

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-22 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

> It's not surprising the 2nd positions are rarely equal _given_ that the 
> universe has been reduced to the 100 tuples with the same first element.

Yes, exactly.

> In any case, I don't see the point to this exercise ;-)

Just to illustrate why I disagree with what I responded to there. I'm not 
convinced that many matches at the first tuple position are really an 
indication that there'll be many matches at the second position as well, so 
that one should "admit defeat" and that it's "arrogance" to hope for only few 
matches at the second position. I'd wager it's rather typical that the matching 
rate at the second position is significantly lower. If you sort people by last 
and then first name, I think you'd get the same effect as with my above random 
example, for the same reason. If you sort a set of 2D coordinates, you'll even 
*never* get matches at the second position. Even with that SO question's data 
and its massive duplication (especially at the second position) *and* its 
correlation between first and second position (since both "sorted(x)" and 
"x[::2]" are derived from the same x) and its 43% matching rate at the first 
position, there's still only 27% matching rate at the second position (lower tha
 n the 1/3 break-even point).

> BTW, don't overlook that tuple_elem_compare can _only_ be used on the very 
> first elements of the tuples.

Yes, that's why I only talked about PyObject_RichCompareBool there.

> > 1) Sort the list only by first position (comparing with only
> >one tuple_elem_compare).

> Not sure what that means, exactly. (the "with only one tuple_elem_compare" 
> part;

Just wanted to make clear it wouldn't used two tuple_elem_compare or a 
PyObject_RichCompareBool.

> > 2) Identify equal-first-position-runs (with tuple_elem_compare)
> > and sort each run independently (comparing only positions 1+).

> What are you aiming at? There was no motivation given here.

Much fewer comparisons, especially PyObject_RichCompareBool comparisons. For 
example for sorting a million pairs with first/second element chosen from a 
population of 100,000, I get these numbers of comparisons (per element):

current  groupsort
--
== first 18.60  none(PyObject_RichCompareBool)
 < first 16.19 19.60(tuple_elem_compare)
== second 2.41  2.36(PyObject_RichCompareBool)
 < second 2.41  2.36(PyObject_RichCompareBool)
--
total slow   23.42  4.72(PyObject_RichCompareBool)
total fast   16.19 19.60(tuple_elem_compare)
--
total39.61 24.32

(With "current" I mean before your optimizations, which I can't test yet.)

> Will, this issue report is about unsafe_tuple_compare(), so, ya, if the 
> changes you envision aren't limited to that, I'd say you want to open a 
> different issue report and a different PR :-)

Ok, I might try that.

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-21 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I see you mentioned that PyObject_RichCompareBool(..., Py_EQ) might be faster 
for example because it checks identity. Why not do an identity check before the 
ms->tuple_elem_compare calls? Too expensive and rarely successful?
 
> Extending the idea to positions beyond the first is dubious on those grounds: 
> if the first tuple positions in fact often match, then the optimization has 
> already backfired. Time to admit defeat then, not double down on failed 
> arrogance ;-)
 
I don't see that. First and second positions are quite different.

For example I sorted a million tuples where both first and second element are 
randomly chosen from a population of 10,000. So their amounts of duplication 
were the same. But these are the statistics from sorting:
- First position:   18,603,981 equality comparisons, 29.87% equal
- Second position:   5,556,203 equality comparisons,  0.09% equal
Many first positions matched, almost no second positions matched.

One more idea: What do you think of sorting lists of tuples (or tuple keys) in 
two stages?
1) Sort the list only by first position (comparing with only one 
tuple_elem_compare).
2) Identify equal-first-position-runs (with tuple_elem_compare) and sort each 
run independently (comparing only positions 1+).
Some experiments I did with this looked good, not sure if too off-topic to post 
here...

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-20 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Hmm. What do you think about using "<" inside the loop for the remaining tuple 
elements as well? It's even less likely that both the first and the second 
elements are equal.

When the second elements differ, this saves 0.5 PyObject_RichCompareBool 
(average 1.5 "<" instead of one "==" and one "<").

When the second elements are equal, this costs 1 PyObject_RichCompareBool more 
(two "<" instead of one "==").

That's fewer PyObject_RichCompareBool calls if they're equal less than 1/3 of 
the time.

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-20 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Ok, I agree, the change only possibly "breaks" expectations that were already 
broken (including my naive sorted-vs-minmax).

Yes, I know sort itself only asks `<` (I've actually read most of its code and 
all its documentation). That's why I was irritated for a moment when I saw a 
`>`.

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-19 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I misinterpreted you then, sorry. I guess also because Elliot shortly after 
retrated from the approach, saying he "rewrote unsafe_tuple_compare to move the 
less-than after the equality testing, to make sure it's 100% consistent".

I'd say the inconsistency the program showed was different. Yes, the xs got 
sorted differently than the ys, but the xs differed from the ys. The proposed 
method would change comparisons of *the same* tuples. You'd for example no 
longer have "sorted(tuples) == [min(tuples), max(tuples)]" for a list of two 
tuples.

Also curious factoid: I recently saw a case where despite the "only <" promise, 
sorting caused an "x > y" comparison (because "y < x" wasn't implemented).

(tiny note: tupsort.py does "range(2", should be "range(1" I think)

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-19 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

> What justifies "shouldn't"?

I based that on your old comments. Looked like tuple comparison results in 
sorting shouldn't differ from tuple comparison results elsewhere. I had 
actually proposed this exact method in a comment under your stackoverflow 
answer, but deleted it when I found that it had been discussed and discarded 
back then. But if it feels ok now, then good :-)

--

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45530] Improve listobject.c's unsafe_tuple_compare()

2021-10-19 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

That's how Elliot did it in the original proposal: 
https://bugs.python.org/issue28685

Back then you pointed out that "Else we can assume u[0] == v[0]" isn't true for 
float('nan') values:
https://github.com/python/cpython/pull/582#issuecomment-285838656
https://github.com/python/cpython/pull/582#issuecomment-285841789

If you sort objects that always return True for both `<` and `==`, this would 
also have the opposite problem, considering tuple u smaller than v when it 
shouldn't.

That said, maybe the function could be optimized for known "well-behaving" 
types?

--
nosy: +Stefan Pochmann

___
Python tracker 
<https://bugs.python.org/issue45530>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45346] Some "truthy" crept in

2021-10-02 Thread Stefan Pochmann


Change by Stefan Pochmann :


--
type:  -> enhancement

___
Python tracker 
<https://bugs.python.org/issue45346>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45346] Some "truthy" crept in

2021-10-02 Thread Stefan Pochmann


New submission from Stefan Pochmann :

Python consistently says true/false, not truthy/falsy. But a few "truthy" have 
crept in now:

https://github.com/python/cpython/search?q=truthy
- Two in Doc/reference/compound_stmts.rst
- One in Doc/library/ast.rst
- One in Lib/test/test_builtin.py

Evidence for consistent true/false usage, if needed:
https://docs.python.org/3/library/stdtypes.html#truth-value-testing
https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not
https://docs.python.org/3/reference/compound_stmts.html#the-if-statement
https://docs.python.org/3/reference/compound_stmts.html#the-while-statement
https://docs.python.org/3/reference/expressions.html#boolean-operations

--
assignee: docs@python
components: Documentation
messages: 403056
nosy: Stefan Pochmann, docs@python
priority: normal
severity: normal
status: open
title: Some "truthy" crept in
versions: Python 3.10, Python 3.11

___
Python tracker 
<https://bugs.python.org/issue45346>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I misspoke. I do the insertions at -1 and -500 not on the same list but each on 
a fresh list (of length 2**20+1).

--

___
Python tracker 
<https://bugs.python.org/issue39801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I think I have a decent way to isolate the memmove vs for-loop times, in Python 
code. Here are example results, five rounds of inserting with list.insert or 
with slice assignment. The times for each of the two ways are pretty stable:

  insertslice
  0.024221  0.006754
  0.024157  0.006710
  0.024015  0.006734
  0.024206  0.006731
  0.024123  0.006769

For each run, I build a list of length 2**20 and append one value. Then I can 
insert 2**17 more values without the list internally resizing.

Then I measure inserting at index -1 and consider that the baseline, as it 
includes all the overhead costs like function calls and other differences 
between the two insertion methods.

Then I measure inserting at index -500 and subtract the baseline time. This 
difference should reflect how long the memmove or the for-loop took, as that's 
the difference between inserting at -1 and at -500. It's what I showed in those 
results above.

I chose -500 here because the use case I have in mind is 
sortedcontainers.SortedList, which I think mostly involves lists of length 1000 
or more (up to 2000), and insertions somewhere in the middle.

Results for index -50 instead:

  insertslice
  0.002479  0.002217
  0.002546  0.002113
  0.002566  0.002007
  0.002425  0.002081
  0.002555  0.002261

I'm btw running Python 3.8.1 32-bit on Windows 10 64-bit on an Intel i5-7200U.

Rest of this message is the benchmark code:

from timeit import repeat

statements = (
  'a.insert(-1,0)',
  'a.insert(-50,0)',
  'a[-1:0] = 0,',
  'a[-50:0] = 0,',
)

# Verify that the statements work and don't cause internal list resizes.
for stmt in statements:
a = [0] * 2**20
a.append(0)
size_before = a.__sizeof__()
stmt = compile(stmt, '', 'single')
for _ in range(2**17):
exec(stmt)
assert len(a) == 2**20 + 1 + 2**17
assert a.__sizeof__() == size_before

# Run the benchmark.
print('  insertslice')
for _ in range(5):
times = []
for stmt in statements:
t = min(repeat(stmt, 'a=[0]*2**20; a.append(0)', number=2**17, 
repeat=20))
times.append(t)
print('%10.6f' * 2 % (times[1] - times[0], times[3] - times[2]))

--
status: pending -> open

___
Python tracker 
<https://bugs.python.org/issue39801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Change by Stefan Pochmann :


--
status: open -> pending

___
Python tracker 
<https://bugs.python.org/issue39801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I have better benchmarks now and am trying some more, though I guess we roughly 
agree about when memmove is faster and when it's slower but that the real 
question is how large the common case is. 

Do you know where I can find those past investigations? Was memmove *faster* 
for sizes *over* 16? What is the common case, i.e., how do people use 
list.insert?

--
status: pending -> open

___
Python tracker 
<https://bugs.python.org/issue39801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Good point, Tim. Although binarysort really moves very few slots (I think at 
most 62, and average like 13). That might not be representative for how people 
use list.insert, either. I think I mainly use it for the mentioned case of a 
"self-sorting list", either naively via bisect.insort with a single list or via 
sortedcontainers' SortedList using multiple lists (used it only once so far). 
If I'm not mistaken, SortedList has a default load factor of 1000 and splits at 
double that size.

I might do more reasonable benchmarks later (haven't read Raymond's reply yet).

In any case, binarysort does its own inserting, so at least it wouldn't be 
affected if list.insert were changed.

--

___
Python tracker 
<https://bugs.python.org/issue39801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Benchmarking with two *Python* versions of bisect.insort, the "insert" version 
takes 1.08 seconds to insort the shuffled range(10**5) while the slice 
assignment version only takes 0.46 seconds:

from timeit import timeit
import random
from bisect import bisect_right

def insort1(a, x):
lo = bisect_right(a, x)
a.insert(lo, x)

def insort2(a, x):
lo = bisect_right(a, x)
a[lo:lo] = [x]

for _ in range(3):
a = list(range(10**5))
random.shuffle(a)
for insort in insort1, insort2:
it = iter(a)
s = []
print(timeit(lambda: insort(s, next(it)), number=len(a)))
print()

--

___
Python tracker 
<https://bugs.python.org/issue39801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

I believe it also affects bisect.insort, which I occasionally use when I need a 
"self-sorting list" (I can't easily test it, as I'm not set up to modify the C 
version of bisect.insort).

And also the popular sortedcontainers package, which relies on such list 
operations to be fast: http://www.grantjenks.com/docs/sortedcontainers/

--

___
Python tracker 
<https://bugs.python.org/issue39801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow, likely due to manual memmove

2020-02-29 Thread Stefan Pochmann


Change by Stefan Pochmann :


--
title: list.insert is slow due to manual memmove -> list.insert is slow, likely 
due to manual memmove

___
Python tracker 
<https://bugs.python.org/issue39801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39801] list.insert is slow due to manual memmove

2020-02-29 Thread Stefan Pochmann


New submission from Stefan Pochmann :

Using a list's insert function is much slower than using slice assignment:

> python -m timeit -n 10 -s "a=[]" "a.insert(0,0)"
10 loops, best of 5: 19.2 usec per loop

> python -m timeit -n 10 -s "a=[]" "a[0:0]=[0]"
10 loops, best of 5: 6.78 usec per loop

(Note that the list starts empty but grows to 100,000 elements.)

At first I thought maybe it's the attribute lookup or function call overhead or 
so, but inserting near the end shows that that's negligible:

> python -m timeit -n 10 -s "a=[]" "a.insert(-1,0)"
10 loops, best of 5: 79.1 nsec per loop

I asked at StackOverflow and someone pointed out that list.insert uses a manual 
loop instead of memmove: https://stackoverflow.com/a/60466572/12671057

--
components: Interpreter Core
messages: 362986
nosy: Stefan Pochmann
priority: normal
severity: normal
status: open
title: list.insert is slow due to manual memmove
type: performance
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue39801>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39521] reversed(mylist) much slower on Python 3.8 32-bit for Windows

2020-02-01 Thread Stefan Pochmann


Stefan Pochmann  added the comment:

Oh right. The times are correct, I just summarized wrongly there.

--

___
Python tracker 
<https://bugs.python.org/issue39521>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39521] reversed(mylist) much slower on Python 3.8 32-bit for Windows

2020-02-01 Thread Stefan Pochmann


Change by Stefan Pochmann :


--
title: reversed(mylist) much slower on Python 3.8.1 32-bit for Windows -> 
reversed(mylist) much slower on Python 3.8 32-bit for Windows

___
Python tracker 
<https://bugs.python.org/issue39521>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39521] reversed(mylist) much slower on Python 3.8.1 32-bit for Windows

2020-02-01 Thread Stefan Pochmann


New submission from Stefan Pochmann :

Somehow `reversed` became much slower than `iter` on lists:

List with 1,000 elements:

> python -m timeit -s "a = list(range(1000))" "list(iter(a))"
5 loops, best of 5: 5.73 usec per loop

> python -m timeit -s "a = list(range(1000))" "list(reversed(a))"
2 loops, best of 5: 14.2 usec per loop

List with 1,000,000 elements:

> python -m timeit -s "a = list(range(250)) * 4000" "list(iter(a))"
50 loops, best of 5: 7.08 msec per loop

> python -m timeit -s "a = list(range(250)) * 4000" "list(reversed(a))"
20 loops, best of 5: 15.5 msec per loop

On another machine I tried ten different Python versions and found out it's 
only version 3.8.1 and only the 32-bit version:

32-bit  64-bit
CPython iter   reversed iter   reversed
 3.5.4  19.8 19.9   22.4 22.7
 3.6.8  19.8 19.9   22.3 22.6
 3.7.6  19.9 19.9   22.3 22.5
 3.8.1  19.8 24.9   22.4 22.6

Another time with 3.8.0 instead of 3.8.1:

32-bit  64-bit
CPython iter   reversed iter   reversed
 3.5.4  19.5 19.6   21.9 22.2
 3.6.8  19.5 19.7   21.8 22.1
 3.7.6  19.5 19.6   21.7 22.0
 3.8.0  19.4 24.5   21.7 22.1

I used the "Stable Releases" "executable installer"s from here:
https://www.python.org/downloads/windows/

More details here: https://stackoverflow.com/q/60005302/12671057

--
components: Build
messages: 361195
nosy: Stefan Pochmann
priority: normal
severity: normal
status: open
title: reversed(mylist) much slower on Python 3.8.1 32-bit for Windows
type: performance
versions: Python 3.8

___
Python tracker 
<https://bugs.python.org/issue39521>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32767] Mutating a list while iterating: clarify the docs

2018-02-07 Thread Stefan Pochmann

Stefan Pochmann <stefan.pochm...@gmail.com> added the comment:

And `bytearray`.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32767>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32767] Mutating a list while iterating: clarify the docs

2018-02-06 Thread Stefan Pochmann

Stefan Pochmann <stefan.pochm...@gmail.com> added the comment:

Yes, there's also `array.array`.

--
nosy: +Stefan Pochmann

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32767>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32341] itertools "generators"?

2017-12-15 Thread Stefan Pochmann

New submission from Stefan Pochmann <stefan.pochm...@gmail.com>:

The itertools page https://docs.python.org/3.6/library/itertools.html says 
"Combinatoric generators:" above the table with product(), permutations(), etc. 
But as far as I understand, they're not generators and they don't produce 
generator iterators. I think the table title should be "Combinatoric 
iterators:" instead, in line with "Infinite Iterators:" and "Iterators 
terminating on the shortest input sequence:" of the two tables above it.

Reference:
https://docs.python.org/3/glossary.html#term-generator

--
assignee: docs@python
components: Documentation
messages: 308425
nosy: Stefan Pochmann, docs@python
priority: normal
severity: normal
status: open
title: itertools "generators"?
versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32341>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31082] reduce takes iterable, not just sequence

2017-07-30 Thread Stefan Pochmann

New submission from Stefan Pochmann:

functools.reduce has a parameter called "iterable" and it only needs to be an 
iterable, not a sequence.

The paragraph documenting it says "sequence" instead of "iterable" six times:
https://docs.python.org/3/library/functools.html#functools.reduce

The help shown by executing "help(functools.reduce)" in Python additionally 
actually names the parameter "sequence" in the signature.

--
assignee: docs@python
components: Documentation
messages: 299520
nosy: Stefan Pochmann, docs@python
priority: normal
severity: normal
status: open
title: reduce takes iterable, not just sequence
type: enhancement
versions: Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue31082>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31071] Bad error message about maps not iterable

2017-07-28 Thread Stefan Pochmann

New submission from Stefan Pochmann:

Python 3.6 makes it sound like maps aren't iterable:

>>> map(str, *map(int, [[]]))
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in 
map(str, *map(int, [[]]))
TypeError: type object argument after * must be an iterable, not map

More, including a likely explanation, in my question and its answer here:
https://stackoverflow.com/q/45363330/1672429

Apparently the TypeError from int([]) gets mistaken for a TypeError indicating 
non-iterability of the map object.

--
messages: 299402
nosy: Stefan Pochmann
priority: normal
severity: normal
status: open
title: Bad error message about maps not iterable
type: behavior
versions: Python 3.6

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue31071>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29709] Short-circuiting not only on False and True

2017-03-03 Thread Stefan Pochmann

New submission from Stefan Pochmann:

The notes at 
https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not 
say that `or` "only evaluates the second argument if the first one is False" 
and that `and` "only evaluates the second argument if the first one is True". 
Should say "false" and "true" instead of "False" and "True".

--
assignee: docs@python
components: Documentation
messages: 21
nosy: Stefan Pochmann, docs@python
priority: normal
severity: normal
status: open
title: Short-circuiting not only on False and True
type: enhancement
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29709>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29580] "Built-in Functions" not being functions

2017-02-16 Thread Stefan Pochmann

Stefan Pochmann added the comment:

The page also contains many references like "As repr(), return a string 
containing..." where the "()" should be removed.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29580] "Built-in Functions" not being functions

2017-02-16 Thread Stefan Pochmann

New submission from Stefan Pochmann:

About https://docs.python.org/3/library/functions.html: The title "Built-in 
Functions", the table header "Built-in Functions" and the "functions" in the 
URL all suggest that what's on this page are functions. But many things on that 
page don't appear to be functions, like "range" or "dict". They appear to be 
callable types instead. For example "range":

>>> type(range)

>>> import types
>>> isinstance(range, types.FunctionType)
False
>>> isinstance(range, types.BuiltinFunctionType)
False

Compare with "abs":

>>> type(abs)

>>> isinstance(abs, types.FunctionType)
False
>>> isinstance(abs, types.BuiltinFunctionType)
True

The text between title and table talks about "functions and types", but the 
title/tableheader/URL disagree.

Also, the table is wrong in saying that "abs()" etc are functions. Should say 
"abs" instead, i.e., remove the "()".

Many, like "abs", aren't even allowed to be called like that, as it results in 
"TypeError: abs() takes exactly one argument (0 given)".

--
assignee: docs@python
components: Documentation
messages: 287961
nosy: Stefan Pochmann, docs@python
priority: normal
severity: normal
status: open
title: "Built-in Functions" not being functions
type: enhancement
versions: Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com