[issue43923] Can't create generic NamedTuple as of py3.9
David Lukeš added the comment: This is unfortunate, especially since it used to work... Going forward, is the intention not to support this use case? Or is it possible that support for generic NamedTuples will be re-added in the future? -- nosy: +dlukes ___ Python tracker <https://bugs.python.org/issue43923> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43604] Fix tempfile.mktemp()
David Lukeš added the comment: > You can use TemporaryDirectory. That was actually the first approach I tried :) I even thought this could be used to make `mktemp` safe -- just create the name in a `TemporaryDirectory`. However, after reading through the mailing list thread, I realized this just restricts the potential collision/hijacking to misbehaving/malicious processes running under the same user or under the super user. But the core problem with too easily guessable filenames (= not random enough, or not at all, as in your example) remains. Correct me if I'm wrong though. Sorry, I should probably have mentioned this in OP. I thought about doing so, but then it turned out very long even without it, so I decided it would be better to discuss it only if someone else mentions it. -- ___ Python tracker <https://bugs.python.org/issue43604> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43604] Fix tempfile.mktemp()
David Lukeš added the comment: > A secure `mktemp` could be as simple as ... Though in practice, I'd rather be inclined to make the change in `tempfile._RandomNameSequence`, so as to get the same behavior across the entire module, instead of special-casing `mktemp`. As Guido van Rossum points out (see <https://mail.python.org/pipermail/python-dev/2019-March/156746.html>), that would improve the security of all the names generated by the `tempfile` module, not just `mktemp`: > Hm, the random sequence (implemented in tempfile._RandomNameSequence) is > currently derived from the random module, which is not cryptographically > secure. Maybe all we need to do is replace its source of randomness with > one derived from the secrets module. That seems a one-line change. -- ___ Python tracker <https://bugs.python.org/issue43604> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue43604] Fix tempfile.mktemp()
New submission from David Lukeš : I recently came across a non-testing use case for `tempfile.mktemp()` where I struggle to find a viable alternative -- temporary named pipes (FIFOs): ``` import os import tempfile import subprocess as sp fifo_path = tempfile.mktemp() os.mkfifo(fifo_path, 0o600) try: proc = sp.Popen(["cat", fifo_path], stdout=sp.PIPE, text=True) with open(fifo_path, "w") as fifo: for c in "Kočka leze dírou, pes oknem.": print(c, file=fifo) proc.wait() finally: os.unlink(fifo_path) for l in proc.stdout: print(l.strip()) ``` (`cat` is obviously just a stand-in for some useful program which needs to read from a file, but you want to send it input from Python.) `os.mkfifo()` needs a path which doesn't point to an existing file, so it's not possible to use a `tempfile.NamedTemporaryFile(delete=False)`, close it, and pass its `.name` attribute to `mkfifo()`. I know there has been some discussion regarding `mktemp()` in the relatively recent past (see the Python-Dev thread starting with <https://mail.python.org/pipermail/python-dev/2019-March/156721.html>). There has also been some confusion as to what actually makes it unsafe (see <https://mail.python.org/pipermail/python-dev/2019-March/156778.html>). Before the discussion petered out, it looked like people were reaching a consensus "that mktemp() could be made secure by using a longer name generated by a secure random generator" (quoting from the previous link). A secure `mktemp` could be as simple as (see <https://mail.python.org/pipermail/python-dev/2019-March/156765.html>): ``` def mktemp(suffix='', prefix='tmp', dir=None): if dir is None: dir = gettempdir() return _os.path.join(dir, prefix + secrets.token_urlsafe(ENTROPY_BYTES) + suffix) ``` There's been some discussion as to what `ENTROPY_BYTES` should be. I like Steven D'Aprano's suggestion (see <https://mail.python.org/pipermail/python-dev/2019-March/156777.html>) of having an overkill default just to be on the safe side, which can be overridden if needed. Of course, the security implications of lowering it should be clearly documented. Fixing `mktemp` would make it possible to get rid of its hybrid deprecated (in the docs) / not depracated (in code) status, which is somewhat confusing for users. Speaking from experience -- when I realized I needed it, the deprecation notice led me down this rabbit hole of reading mailing list threads and submitting issues :) People could stop losing time worrying about `mktemp` and trying to weed it out whenever they come across it (see e.g. https://bugs.python.org/issue42278). So I'm wondering whether there would be interest in: 1. A PR which would modify `mktemp` along the lines sketched above, to make it safe in practice. Along with that, it would probably make sense to undeprecate it in the docs, or at least indicate that while users should prefer `mkstemp` when they're fine with the file being created for them, `mktemp` is alright in cases where this is not acceptable. 2. Following that, possibly a PR which would encapsulate the new `mktemp` + `mkfifo` into a `TemporaryNamedPipe` or `TemporaryFifo`: ``` import os import tempfile import subprocess as sp with tempfile.TemporaryNamedPipe() as fifo: proc = sp.Popen(["cat", fifo.name], stdout=sp.PIPE, text=True) for c in "Kočka leze dírou, pes oknem.": print(c, file=fifo) proc.wait() for l in proc.stdout: print(l.strip()) ``` (Caveat: opening the FIFO for writing cannot happen in `__enter__`, it would have to be delayed until the first call to `fifo.write()` because it hangs if no one is reading from it.) -- components: Library (Lib) messages: 389393 nosy: David Lukeš priority: normal severity: normal status: open title: Fix tempfile.mktemp() type: security versions: Python 3.10 ___ Python tracker <https://bugs.python.org/issue43604> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29842] Make Executor.map work with infinite/large inputs correctly
David Lukeš added the comment: Any updates on this? Making Executor.map lazier would indeed be more consistent and very useful, it would be a shame if the PR went to waste :) It's a feature I keep wishing for in comparison with the older and process-only multiprocessing API. And eventually, yielding results in the order that tasks complete, like multiprocessing.Pool.imap_unordered, could be added on top of this, which would be really neat. (I know there's concurrent.futures.as_completed, but again, that one doesn't handle infinite iterables.) -- nosy: +David Lukeš ___ Python tracker <https://bugs.python.org/issue29842> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33068] Inconsistencies in parsing (evaluating?) longstrings
David Lukeš <dafydd.lu...@gmail.com> added the comment: Oh, right, of course! Sorry and thanks for taking the time to clarify that :) -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33068> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33068] Inconsistencies in parsing (evaluating?) longstrings
New submission from David Lukeš <dafydd.lu...@gmail.com>: """ \""" """ evaluates to ' """ ' (as expected), but without the surrounding spaces, """\"""""" evaluates to '"' instead of '"""'. Is this expected behavior? If I'm reading the definition of string syntax in https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals correctly, it shouldn't be. -- components: Interpreter Core messages: 313745 nosy: David Lukeš priority: normal severity: normal status: open title: Inconsistencies in parsing (evaluating?) longstrings versions: Python 3.6 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33068> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32306] Clarify map API in concurrent.futures
David Lukeš <dafydd.lu...@gmail.com> added the comment: Perfect, thanks! -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32306> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32306] Clarify map API in concurrent.futures
David Lukeš <dafydd.lu...@gmail.com> added the comment: Yes, sorry for not being quite clear the first time around :) I eventually found out about Pool.imap (see item 3 on list in OP) and indeed it fits my use case very nicely, but my point was that the documentation is somewhat misleading with respect to the semantics of built-in `map()` in Python 3. Specifically, I would argue that it is unexpected for a function which claims to be "Equivalent to map(func, *iterables)" to require allocating a list the length of the shortest iterable. Maybe a code example will make this clearer for potential newcomers to the discussion -- this is what I would expect to happen (= the behavior of built-in `map()` itself), yielding values from the iterable is interleaved with calls to the mapped function: ``` >>> def gen(): ... for i in range(3): ... print("yielding", i) ... yield i ... >>> def add1(i): ... print("adding 1 to", i) ... return i + 1 ... >>> list(map(add1, gen())) yielding 0 adding 1 to 0 yielding 1 adding 1 to 1 yielding 2 adding 1 to 2 [1, 2, 3] ``` This is what happens instead with `concurrent.futures.Executor.map()`: ``` >>> def my_map(fn, iterable): ... lst = list(iterable) ... for i in lst: ... yield fn(i) ... >>> list(my_map(add1, gen())) yielding 0 yielding 1 yielding 2 adding 1 to 0 adding 1 to 1 adding 1 to 2 [1, 2, 3] ``` -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32306> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32306] Clarify map API in concurrent.futures
David Lukeš <dafydd.lu...@gmail.com> added the comment: Hi Antoine, Thanks for the response! :) I think the problem lies in the line immediately preceding the code you've posted: ``` fs = [self.submit(fn, *args) for args in zip(*iterables)] ``` In other words, all the jobs are first submitted and their futures stored in a list, which is then iterated over. This approach obviously breaks down when there is a great number of jobs, or when it's part of a pipeline meant for processing jobs continuously as they come. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32306> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32306] Clarify map API in concurrent.futures
New submission from David Lukeš <dafydd.lu...@gmail.com>: The docstring for `concurrent.futures.Executor.map` starts by stating that it is "Equivalent to map(func, *iterables)". In the case of Python 3, I would argue this is true only superficially: with `map`, the user expects memory-efficient processing, i.e. that the entire resulting collection will not be held in memory at once unless s/he requests so e.g. with `list(map(...))`. (In Python 2, the expectations are different of course.) On the other hand, while `Executor.map` superficially returns a generator, which seems to align with this expectation, what happens behind the scenes is that the call blocks until all results are computed and only then starts yielding them. In other words, they have to be stored in memory all at once at some point. The lower-level multiprocessing module also describes `multiprocessing.pool.Pool.map` as "A parallel equivalent of the map() built-in function", but at least it immediately goes on to add that "It blocks until the result is ready.", which is a clear indication that all of the results will have to be stored somewhere before being yielded. I can think of several ways the situation could be improved, listed here from most conservative to most progressive: 1. Add "It blocks until the result is ready." to the docstring of `Executor.map` as well, preferably somewhere at the beginning. 2. Reword the docstrings of both `Executor.map` and `Pool.map` so that they don't describe the functions as "equivalent" to built-in `map`, which raises the wrong expectations. ("Similar to map(...), but blocks until all results are collected and only then yields them.") 3. I would argue that the function that can be described as semantically equivalent to `map` is actually `Pool.imap`, which yields results as they're being computed. It would be really nice if this could be added to the higher-level `futures` API, along with `Pool.imap_unordered`. `Executor.map` simply doesn't work for very long streams of data. 4. Maybe instead of adding `imap` and `imap_unordered` methods to `Executor`, it would be a good idea to change the signature of `Executor.map(func, *iterables, timeout=None, chunksize=1)` to `Executor.map(func, *iterables, timeout=None, chunksize=1, block=True, ordered=True)`, in order to keep the API simple with good defaults while providing flexibility via keyword arguments. 5. I would go so far as to say that for me personally, the `block` argument to the version of `Executor.map` proposed in #4 above should be `False` by default, because that would make it behave most like built-in `map`, which is the least suprising behavior. But I've observed that for smaller work loads, `imap` tends to be slower than `map`, so I understand it might be a tradeoff between performance and semantics. Still, in a higher-level API meant for non-experts, I think semantics should be emphasized. If the latter options seem much too radical, please consider at least something along the lines of #1 above, I think it would help people correct their expectations when they first encounter the API :) -- components: Library (Lib) messages: 308221 nosy: David Lukeš priority: normal severity: normal status: open title: Clarify map API in concurrent.futures type: enhancement versions: Python 3.8 ___ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32306> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24606] segfault caused by nested calls to map()
New submission from David Lukeš: The following program makes Python 3.4.3 crash with a segmentation fault: ``` #!/usr/bin/env python3 import operator N = 50 l = [0] for i in range(N): l = map(operator.add, l, [1]) print(list(l)) ``` I suppose the problem is that there are too many nested lazy calls to map, which cause a segfault when evaluated. I've played with N and surprisingly, the threshold to cause the crash varied slightly (between 130900 and 131000 on my machine). I know that a list-comprehension, which is evaluated straight away, would be much more idiomatic for repeated element-wise addition (or numpy arrays for that matter, if available). I'm **not advocating this piece of code**, just wondering whether there couldn't be a more informative way to make Python bail out instead of the segfault? (In my real application, it took me a while to figure where the problem was without a stack trace.) -- messages: 246567 nosy: David Lukeš priority: normal severity: normal status: open title: segfault caused by nested calls to map() type: crash versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24606 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com