[issue43923] Can't create generic NamedTuple as of py3.9

2021-11-10 Thread David Lukeš

David Lukeš  added the comment:

This is unfortunate, especially since it used to work... Going forward, is the 
intention not to support this use case? Or is it possible that support for 
generic NamedTuples will be re-added in the future?

--
nosy: +dlukes

___
Python tracker 
<https://bugs.python.org/issue43923>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43604] Fix tempfile.mktemp()

2021-03-23 Thread David Lukeš

David Lukeš  added the comment:

> You can use TemporaryDirectory.

That was actually the first approach I tried :) I even thought this could be 
used to make `mktemp` safe -- just create the name in a `TemporaryDirectory`.

However, after reading through the mailing list thread, I realized this just 
restricts the potential collision/hijacking to misbehaving/malicious processes 
running under the same user or under the super user. But the core problem with 
too easily guessable filenames (= not random enough, or not at all, as in your 
example) remains. Correct me if I'm wrong though.

Sorry, I should probably have mentioned this in OP. I thought about doing so, 
but then it turned out very long even without it, so I decided it would be 
better to discuss it only if someone else mentions it.

--

___
Python tracker 
<https://bugs.python.org/issue43604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43604] Fix tempfile.mktemp()

2021-03-23 Thread David Lukeš

David Lukeš  added the comment:

> A secure `mktemp` could be as simple as ...

Though in practice, I'd rather be inclined to make the change in 
`tempfile._RandomNameSequence`, so as to get the same behavior across the 
entire module, instead of special-casing `mktemp`. As Guido van Rossum points 
out (see 
<https://mail.python.org/pipermail/python-dev/2019-March/156746.html>), that 
would improve the security of all the names generated by the `tempfile` module, 
not just `mktemp`:

> Hm, the random sequence (implemented in tempfile._RandomNameSequence) is
> currently derived from the random module, which is not cryptographically
> secure. Maybe all we need to do is replace its source of randomness with
> one derived from the secrets module. That seems a one-line change.

--

___
Python tracker 
<https://bugs.python.org/issue43604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43604] Fix tempfile.mktemp()

2021-03-23 Thread David Lukeš

New submission from David Lukeš :

I recently came across a non-testing use case for `tempfile.mktemp()` where I 
struggle to find a viable alternative -- temporary named pipes (FIFOs):

```
import os
import tempfile
import subprocess as sp

fifo_path = tempfile.mktemp()
os.mkfifo(fifo_path, 0o600)
try:
proc = sp.Popen(["cat", fifo_path], stdout=sp.PIPE, text=True)
with open(fifo_path, "w") as fifo:
for c in "Kočka leze dírou, pes oknem.":
print(c, file=fifo)
proc.wait()
finally:
os.unlink(fifo_path)

for l in proc.stdout:
print(l.strip())
```

(`cat` is obviously just a stand-in for some useful program which needs to read 
from a file, but you want to send it input from Python.)

`os.mkfifo()` needs a path which doesn't point to an existing file, so it's not 
possible to use a `tempfile.NamedTemporaryFile(delete=False)`, close it, and 
pass its `.name` attribute to `mkfifo()`.

I know there has been some discussion regarding `mktemp()` in the relatively 
recent past (see the Python-Dev thread starting with 
<https://mail.python.org/pipermail/python-dev/2019-March/156721.html>). There 
has also been some confusion as to what actually makes it unsafe (see 
<https://mail.python.org/pipermail/python-dev/2019-March/156778.html>). Before 
the discussion petered out, it looked like people were reaching a consensus 
"that mktemp() could be made secure by using a longer name generated by a 
secure random generator" (quoting from the previous link).

A secure `mktemp` could be as simple as (see 
<https://mail.python.org/pipermail/python-dev/2019-March/156765.html>):

```
def mktemp(suffix='', prefix='tmp', dir=None):
if dir is None:
dir = gettempdir()
return _os.path.join(dir, prefix + secrets.token_urlsafe(ENTROPY_BYTES) + 
suffix)
```

There's been some discussion as to what `ENTROPY_BYTES` should be. I like 
Steven D'Aprano's suggestion (see 
<https://mail.python.org/pipermail/python-dev/2019-March/156777.html>) of 
having an overkill default just to be on the safe side, which can be overridden 
if needed. Of course, the security implications of lowering it should be 
clearly documented.

Fixing `mktemp` would make it possible to get rid of its hybrid deprecated (in 
the docs) / not depracated (in code) status, which is somewhat confusing for 
users. Speaking from experience -- when I realized I needed it, the deprecation 
notice led me down this rabbit hole of reading mailing list threads and 
submitting issues :) People could stop losing time worrying about `mktemp` and 
trying to weed it out whenever they come across it (see e.g. 
https://bugs.python.org/issue42278).

So I'm wondering whether there would be interest in:

1. A PR which would modify `mktemp` along the lines sketched above, to make it 
safe in practice. Along with that, it would probably make sense to undeprecate 
it in the docs, or at least indicate that while users should prefer `mkstemp` 
when they're fine with the file being created for them, `mktemp` is alright in 
cases where this is not acceptable.
2. Following that, possibly a PR which would encapsulate the new `mktemp` + 
`mkfifo` into a `TemporaryNamedPipe` or `TemporaryFifo`:

```
import os
import tempfile
import subprocess as sp

with tempfile.TemporaryNamedPipe() as fifo:
proc = sp.Popen(["cat", fifo.name], stdout=sp.PIPE, text=True)
for c in "Kočka leze dírou, pes oknem.":
print(c, file=fifo)
proc.wait()

for l in proc.stdout:
print(l.strip())
```

(Caveat: opening the FIFO for writing cannot happen in `__enter__`, it would 
have to be delayed until the first call to `fifo.write()` because it hangs if 
no one is reading from it.)

--
components: Library (Lib)
messages: 389393
nosy: David Lukeš
priority: normal
severity: normal
status: open
title: Fix tempfile.mktemp()
type: security
versions: Python 3.10

___
Python tracker 
<https://bugs.python.org/issue43604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29842] Make Executor.map work with infinite/large inputs correctly

2021-01-14 Thread David Lukeš

David Lukeš  added the comment:

Any updates on this? Making Executor.map lazier would indeed be more consistent 
and very useful, it would be a shame if the PR went to waste :) It's a feature 
I keep wishing for in comparison with the older and process-only 
multiprocessing API. And eventually, yielding results in the order that tasks 
complete, like multiprocessing.Pool.imap_unordered, could be added on top of 
this, which would be really neat. (I know there's 
concurrent.futures.as_completed, but again, that one doesn't handle infinite 
iterables.)

--
nosy: +David Lukeš

___
Python tracker 
<https://bugs.python.org/issue29842>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33068] Inconsistencies in parsing (evaluating?) longstrings

2018-03-13 Thread David Lukeš

David Lukeš <dafydd.lu...@gmail.com> added the comment:

Oh, right, of course! Sorry and thanks for taking the time to clarify that :)

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33068>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33068] Inconsistencies in parsing (evaluating?) longstrings

2018-03-13 Thread David Lukeš

New submission from David Lukeš <dafydd.lu...@gmail.com>:

""" \""" """ evaluates to ' """ ' (as expected), but without the surrounding 
spaces, """\"""""" evaluates to '"' instead of '"""'.

Is this expected behavior? If I'm reading the definition of string syntax in 
https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
 correctly, it shouldn't be.

--
components: Interpreter Core
messages: 313745
nosy: David Lukeš
priority: normal
severity: normal
status: open
title: Inconsistencies in parsing (evaluating?) longstrings
versions: Python 3.6

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33068>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32306] Clarify map API in concurrent.futures

2017-12-21 Thread David Lukeš

David Lukeš <dafydd.lu...@gmail.com> added the comment:

Perfect, thanks!

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32306>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32306] Clarify map API in concurrent.futures

2017-12-20 Thread David Lukeš

David Lukeš <dafydd.lu...@gmail.com> added the comment:

Yes, sorry for not being quite clear the first time around :)

I eventually found out about Pool.imap (see item 3 on list in OP) and indeed it 
fits my use case very nicely, but my point was that the documentation is 
somewhat misleading with respect to the semantics of built-in `map()` in Python 
3.

Specifically, I would argue that it is unexpected for a function which claims 
to be "Equivalent to map(func, *iterables)" to require allocating a list the 
length of the shortest iterable.

Maybe a code example will make this clearer for potential newcomers to the 
discussion -- this is what I would expect to happen (= the behavior of built-in 
`map()` itself), yielding values from the iterable is interleaved with calls to 
the mapped function:

```
>>> def gen():
... for i in range(3):
... print("yielding", i)
... yield i
... 
>>> def add1(i):
... print("adding 1 to", i)
... return i + 1
... 
>>> list(map(add1, gen()))
yielding 0
adding 1 to 0
yielding 1
adding 1 to 1
yielding 2
adding 1 to 2
[1, 2, 3]
```

This is what happens instead with `concurrent.futures.Executor.map()`:

```
>>> def my_map(fn, iterable):
... lst = list(iterable)
... for i in lst:
... yield fn(i)
... 
>>> list(my_map(add1, gen()))
yielding 0
yielding 1
yielding 2
adding 1 to 0
adding 1 to 1
adding 1 to 2
[1, 2, 3]
```

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32306>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32306] Clarify map API in concurrent.futures

2017-12-20 Thread David Lukeš

David Lukeš <dafydd.lu...@gmail.com> added the comment:

Hi Antoine,

Thanks for the response! :) I think the problem lies in the line immediately 
preceding the code you've posted:

```
fs = [self.submit(fn, *args) for args in zip(*iterables)]
```

In other words, all the jobs are first submitted and their futures stored in a 
list, which is then iterated over. This approach obviously breaks down when 
there is a great number of jobs, or when it's part of a pipeline meant for 
processing jobs continuously as they come.

--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32306>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32306] Clarify map API in concurrent.futures

2017-12-13 Thread David Lukeš

New submission from David Lukeš <dafydd.lu...@gmail.com>:

The docstring for `concurrent.futures.Executor.map` starts by stating that it 
is "Equivalent to map(func, *iterables)". In the case of Python 3, I would 
argue this is true only superficially: with `map`, the user expects 
memory-efficient processing, i.e. that the entire resulting collection will not 
be held in memory at once unless s/he requests so e.g. with `list(map(...))`. 
(In Python 2, the expectations are different of course.) On the other hand, 
while `Executor.map` superficially returns a generator, which seems to align 
with this expectation, what happens behind the scenes is that the call blocks 
until all results are computed and only then starts yielding them. In other 
words, they have to be stored in memory all at once at some point.

The lower-level multiprocessing module also describes 
`multiprocessing.pool.Pool.map` as "A parallel equivalent of the map() built-in 
function", but at least it immediately goes on to add that "It blocks until the 
result is ready.", which is a clear indication that all of the results will 
have to be stored somewhere before being yielded.

I can think of several ways the situation could be improved, listed here from 
most conservative to most progressive:

1. Add "It blocks until the result is ready." to the docstring of 
`Executor.map` as well, preferably somewhere at the beginning.
2. Reword the docstrings of both `Executor.map` and `Pool.map` so that they 
don't describe the functions as "equivalent" to built-in `map`, which raises 
the wrong expectations. ("Similar to map(...), but blocks until all results are 
collected and only then yields them.")
3. I would argue that the function that can be described as semantically 
equivalent to `map` is actually `Pool.imap`, which yields results as they're 
being computed. It would be really nice if this could be added to the 
higher-level `futures` API, along with `Pool.imap_unordered`. `Executor.map` 
simply doesn't work for very long streams of data.
4. Maybe instead of adding `imap` and `imap_unordered` methods to `Executor`, 
it would be a good idea to change the signature of `Executor.map(func, 
*iterables, timeout=None, chunksize=1)` to `Executor.map(func, *iterables, 
timeout=None, chunksize=1, block=True, ordered=True)`, in order to keep the API 
simple with good defaults while providing flexibility via keyword arguments.
5. I would go so far as to say that for me personally, the `block` argument to 
the version of `Executor.map` proposed in #4 above should be `False` by 
default, because that would make it behave most like built-in `map`, which is 
the least suprising behavior. But I've observed that for smaller work loads, 
`imap` tends to be slower than `map`, so I understand it might be a tradeoff 
between performance and semantics. Still, in a higher-level API meant for 
non-experts, I think semantics should be emphasized.

If the latter options seem much too radical, please consider at least something 
along the lines of #1 above, I think it would help people correct their 
expectations when they first encounter the API :)

--
components: Library (Lib)
messages: 308221
nosy: David Lukeš
priority: normal
severity: normal
status: open
title: Clarify map API in concurrent.futures
type: enhancement
versions: Python 3.8

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32306>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24606] segfault caused by nested calls to map()

2015-07-10 Thread David Lukeš

New submission from David Lukeš:

The following program makes Python 3.4.3 crash with a segmentation fault:

```
#!/usr/bin/env python3

import operator

N = 50
l = [0]

for i in range(N):
l = map(operator.add, l, [1])

print(list(l))
```

I suppose the problem is that there are too many nested lazy calls to map, 
which cause a segfault when evaluated. I've played with N and surprisingly, the 
threshold to cause the crash varied slightly (between 130900 and 131000 on my 
machine).

I know that a list-comprehension, which is evaluated straight away, would be 
much more idiomatic for repeated element-wise addition (or numpy arrays for 
that matter, if available). I'm **not advocating this piece of code**, just 
wondering whether there couldn't be a more informative way to make Python bail 
out instead of the segfault? (In my real application, it took me a while to 
figure where the problem was without a stack trace.)

--
messages: 246567
nosy: David Lukeš
priority: normal
severity: normal
status: open
title: segfault caused by nested calls to map()
type: crash
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24606
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com