[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-21 Thread Steven D'Aprano


Steven D'Aprano  added the comment:

On Mon, Dec 21, 2020 at 09:11:48PM +, Samuel Marks wrote:

> There were only 12k occurrences, I'm sure I could manually go through that
> in an afternoon. Would you accept it then?

Assuming "an afternoon" is half a work day, so 4 hours, that's 1.2 
seconds per occurrence. So no, not even if you were a trusted core 
developer.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-21 Thread Samuel Marks


Samuel Marks  added the comment:

There were only 12k occurrences, I'm sure I could manually go through that
in an afternoon. Would you accept it then?

On Tue, 22 Dec 2020, 12:22 am Eric V. Smith,  wrote:

>
> Eric V. Smith  added the comment:
>
> See https://github.com/ikamensh/flynt#dangers-of-conversion for reasons.
>
> Would I like to see all string literal formatting done with f-strings? Yes!
>
> Would I accept the risk and hassle of doing it blindly? No.
>
> --
>
> ___
> Python tracker 
> 
> ___
>

--
nosy: +SamuelMarks

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-21 Thread Eric V. Smith


Eric V. Smith  added the comment:

See https://github.com/ikamensh/flynt#dangers-of-conversion for reasons.

Would I like to see all string literal formatting done with f-strings? Yes!

Would I accept the risk and hassle of doing it blindly? No.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-21 Thread Samuel Marks


Samuel Marks  added the comment:

EDIT: Just found https://github.com/ikamensh/flynt

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-21 Thread Samuel Marks

Samuel Marks  added the comment:

I suppose that's a good justification to never improve/upgrade the syntax and 
quality of the codebase.

In terms of automatic upgrades of the codebase, one could always replicate the 
approach I use in doctrans—i.e., use of `ast` and/or `inspect`—to automatically 
upgrade syntax from `%` and `.format` to use f-strings.

Would that be acceptable?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-21 Thread Eric V. Smith


Eric V. Smith  added the comment:

> Wait I don't understand why you wouldn't accept a wholesale replacement of 
> all `%` and `format` with f-strings through the entire CPython codebase 
> [master branch]?

For such a large change it's difficult to review every single change and ensure 
it's correct. I'm guessing there are thousands of occurrences.

In addition, it runs the risk of breaking any existing pull requests.

And in the vast majority of cases it wouldn't make any noticeable performance 
difference. So why risk the breakage and endure the hassle?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-21 Thread Samuel Marks


Samuel Marks  added the comment:

Wait I don't understand why you wouldn't accept a wholesale replacement of all 
`%` and `format` with f-strings through the entire CPython codebase [master 
branch]?

BTW: Kinda unrelated, but would be great to have perspective on this little 
project - https://github.com/SamuelMarks/doctrans - I'm interested in wholesale 
enforcement of code consistency and translating between constructs (e.g., 
dataclass to/fro argparse to/fro class method; ReST to/fro Google to/fro NumPy 
dostrings; inline types to/fro types in docstrings; with more planned)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-21 Thread Eric V. Smith


Eric V. Smith  added the comment:

@samuelmarks:

A place where there it is possible to improve performance is with f-strings 
replacing %-formatting or str.format. This does move significant work to 
compile time.

However, we'd be unlikely to accept a wholesale stdlib change that swaps in 
f-strings. Instead, if there were specific places where benchmarks showed real 
world improvements, we should look at those on a case-by-case basis.

Also note that replacing %-formatting with .format() is almost always a 
performance pessimization.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-21 Thread Samuel Marks

Samuel Marks  added the comment:

Yeah I hear ya, was just trying for the most concise issue title. I tend to 
[over]use `map`, `filter`, `filterfalse` and other `itertools` and `operator` 
methods in my own codebase.

Surprised with that result, that using an explicit list is actually faster. 
Seems like an obvious* micro-optimisation. *But don't want to say that unless 
I'm actually maintaining/contributing-to your C code.

It's also surprising—last time I checked—that lists are faster to construct 
than tuples. When I create codebases or maintain other peoples, I try and:
- remove all unnecessary mutability (incl. replacing lists with tuples);
- flatten `.append` occurrences into generator comprehensions or map;
- remove all indentation creating for-loops, replacing with comprehensions or 
map and functions or lambdas
- combine generators rather than concatenate lists;

The general idea here is to evaluate at the time of computation, and be lazy 
everywhere else. So searching the whole codebase for lists and other mutable 
structures is a good first step.

But maybe with efficiency losses, like shown here, means that this would only 
aid [maybe] in readability, understandability, traceability & whatever other 
functional -ility; but not performance?

:(

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-20 Thread Steven D'Aprano


Steven D'Aprano  added the comment:

By the way, it is almost always wrong to write "k for k in iterable" when you 
can just write "iterable" or "list(iterable)".

Here are some micro-benchmarks:


[steve ~]$ python3.9 -m timeit -s "from string import ascii_letters" "''.join(k 
for k in ascii_letters)"
10 loops, best of 5: 2.3 usec per loop

[steve ~]$ python3.9 -m timeit -s "from string import ascii_letters" 
"''.join([k for k in ascii_letters])"
20 loops, best of 5: 1.57 usec per loop

[steve ~]$ python3.9 -m timeit -s "from string import ascii_letters" 
"''.join(list(ascii_letters))"
50 loops, best of 5: 749 nsec per loop

[steve ~]$ python3.9 -m timeit -s "from string import ascii_letters" 
"''.join(ascii_letters)"
50 loops, best of 5: 737 nsec per loop

--
nosy: +steven.daprano

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-20 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Sorry Samuel, but this would be a performance degradation.  The reason is that 
the algorithm of str.join makes two passes over the input, so it runs faster 
when the input is already a list; otherwise, it would have to do the additional 
work of creating a list.

--
nosy: +rhettinger
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-20 Thread Ken Jin


Ken Jin  added the comment:

Sorry for intruding, but I thought I'd offer some rudimentary, non-scientific 
benchmarks for this:

[MSC v.1928 32 bit (Intel)] on win32  # a debug build of python, no compiler 
optimizations

import timeit
# gen comp
timeit.timeit("''.join(str(_) for _ in range(1000))", number=1)
11.15456029957

# list comp
timeit.timeit("''.join([str(_) for _ in range(1000)])", number=1)
9.98751089961

The list comp is slightly faster than the gen comp. Interestingly, if one were 
to use python -m timeit instead, the gen comp would show better results since 
it has a better 'best of 5' timing. IMO, total time is a more accurate 
representation than best of 5 since the latter gets skewed by outliers.

--
nosy: +kj

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-20 Thread Josh Rosenberg


Josh Rosenberg  added the comment:

This is a pessimization given the current implementation of str.join; it calls 
PySequence_Fast as the very first step, which is effectively free for a tuple 
or list input (just reference count manipulation), but must convert a generator 
expression to a list (which is slower than building the list with a listcomp in 
the first place).

It does this so it can do two passes, one to compute the final length (and max 
ordinal) of the string, allowing it to allocate just once, and one to build the 
new string.

In theory, it might be rewritten to use PyUnicodeWriter under-the-hood for 
single-pass operation, but as is, a generator expression is slower than a 
listcomp for this task.

--
nosy: +josh.r

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-20 Thread Samuel Marks


Samuel Marks  added the comment:

@eric.smith No benchmarks offhand, but I'd expect it to be a very minor 
improvement (if detectable).

If this gets accepted I'll probably do a bunch of little changes like this, to 
improve things, e.g., replace '%' with '.format' (or f-strings, whatever you 
prefer), ensure `.iterkeys()`/`.iteritems()` validity, and collapse some 
obvious `.append` cases with list comprehensions.

The idea I'm going off is that when one is debugging their Python code, and it 
goes across to the Python source, that that Python source code quality is 
better or equal to the one the higher-level Python developer is creating.

Constructing unnecessary lists is one such code quality issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-20 Thread Eric V. Smith


Eric V. Smith  added the comment:

Do you have any benchmarks to show this is an actual improvement? Often times 
it is not.

--
nosy: +eric.smith

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42699] Use `.join(k for k in g)` instead of `.join([k for k in g])`

2020-12-20 Thread Samuel Marks

New submission from Samuel Marks :

This is an extremely minor improvement. Rather than create a `list`—using a 
comprehension—then have it consumed by `.join`, one can skip the list 
construction entirely.

(I remember this working from at least Python 2.7… probably earlier also)

--
messages: 383474
nosy: samuelmarks
priority: normal
pull_requests: 22737
severity: normal
status: open
title: Use `.join(k for k in g)` instead of `.join([k for k in g])`
type: performance
versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com