[issue17505] [doc] email.header.Header.__unicode__ does not decode header
Hrvoje Nikšić added the comment: > Any suggestions on what needs to be done for current revisions? Hi! I'm the person who submitted this issue back in 2013. Let's take a look at how things are in Python 3.10: Python 3.10.2 (main, Jan 13 2022, 19:06:22) [GCC 10.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import email >>> msg = email.message_from_string('Subject: >>> =?gb2312?b?1eLKx9bQzsSy4srUo6E=?=\n\nfoo\n') >>> msg['Subject'] '=?gb2312?b?1eLKx9bQzsSy4srUo6E=?=' So the headers are still not decoded by default. The `unicode()` invocation in the original description was just an attempt to get a Unicode string out of a byte string (assuming it was correctly decoded from MIME, which it wasn't). Since Python 3 strings are Unicode already, I'd expect to just get the decoded subject - but that still doesn't happen. The correct way to make it happen is to specify `policy=email.policy.default`: >>> msg = email.message_from_string('Subject: >>> =?gb2312?b?1eLKx9bQzsSy4srUo6E=?=\n\nfoo\n', policy=email.policy.default) >>> msg['Subject'] '这是中文测试!' The docs should point out that you really _want_ to specify the "default" policy (strangely named, since it's not in fact the default). The current docs only say that `message_from_string()` is "equivalent to Parser().parsestr(s)." and that `policy` is interpreted "as with the Parser class constructor". The docs of the Parser constructor don't document `policy` at all, except for the version when it was added. So, if you want to work for this, my suggestion would be to improve the docs in the following ways: * in message_from_string() docs, explain that `policy=email.policy.default` is what you want to send to get the headers decoded. * in Parser docs, explain what _class and policy arguments do in the constructor, which policies are possible, etc. (These things seem to be explained in the BytesFeedParser, so you might want to just link to that, or include a shortened version.) -- ___ Python tracker <https://bugs.python.org/issue17505> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33533] Provide an async iterator version of as_completed
Hrvoje Nikšić added the comment: Justin, thanks for updating the PR! I'll take another look at the code. -- ___ Python tracker <https://bugs.python.org/issue33533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33533] Provide an async-generator version of as_completed
Hrvoje Nikšić added the comment: Hi, thanks for providing a PR. One thing I noticed is that the implementation in the PR yields results of the futures from the generator. This issue proposes a generator that instead yields the futures passed to as_completed. This is needed not just for consistency with concurrent.futures.as_completed, but also to allow associating results with the requests that produced them, which is an important use case for as_completed. An example of how this is useful is the snippet at https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example which you cannot implement with an as_completed that doesn't yield the original futures. The StackOverflow question at [1] inquires about that issue, and would be easily resolved by yielding the futures as concurrent.futures.as_completed does. As far as I can tell, the needed change is simple: just yield `f` instead of `f.result()` from `_wait_for_one` when invoked from __anext__. (It must still return f.result() when called from __next__ for backward compatibility.) [1] https://stackoverflow.com/questions/50028465/python-get-reference-to-original-task-after-ordering-tasks-by-completion -- ___ Python tracker <https://bugs.python.org/issue33533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41159] Nested async dict comprehension fails with SyntaxError
New submission from Hrvoje Nikšić : Originally brought up on StackOverflow, https://stackoverflow.com/questions/60799366/nested-async-comprehension : This dict comprehension parses and works correctly: async def bar(): return { n: await foo(n) for n in [1, 2, 3] } But making it nested fails with a SyntaxError: async def bar(): return { i: { n: await foo(n) for n in [1, 2, 3] } for i in [1,2,3] } The error reported is: File "", line 0 SyntaxError: asynchronous comprehension outside of an asynchronous function -- components: Interpreter Core messages: 372582 nosy: hniksic priority: normal severity: normal status: open title: Nested async dict comprehension fails with SyntaxError type: behavior versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue41159> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36780] Interpreter exit blocks waiting for futures of shut-down ThreadPoolExecutors
Hrvoje Nikšić added the comment: > The only way I could see this to work as intended without making any changes > to threading would be to optionally use daemon threads and avoid joining the > threads in `executor.shutdown()` if `wait_at_exit` is set to False in the > constructor for the executor. Agreed, and that is precisely what I suggested in my previous comment. The attached PR already deals with the executor-specific part of the wait, but it would be straightforward to have it also affect the executor to create daemon threads, and the flag moved to the constructor. > IMO, it also would require some fairly extensive testing to make sure it > works as intended, which the patch currently lacks. It is quite easy to check that a hanging thread (emulated by a simple sleep) is not joined by the executor with the appropriate flag set. -- ___ Python tracker <https://bugs.python.org/issue36780> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36780] Interpreter exit blocks waiting for futures of shut-down ThreadPoolExecutors
Hrvoje Nikšić added the comment: Thanks for the clarification, I didn't know about the change to non-daemon threads. I still think this patch is useful, and won't harm general use because it is opt-in, just like daemon threads themselves. I suggest to update the PR to specify non-waiting pool at the point of creation rather than in shutdown(). (That will also make it more acceptable for the same option not being supported for process pools - it is ok for constructor signatures to differ.) If there is interest, someone should take over the PR, as I have changed jobs and no longer have time to actively pursue this issue. -- ___ Python tracker <https://bugs.python.org/issue36780> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36780] Interpreter exit blocks waiting for futures of shut-down ThreadPoolExecutors
Hrvoje Nikšić added the comment: > I don't think there's much ThreadPoolExecutor can do. If you drop the > references to the threads, they still exist and they still be waited upon at > interpreter exit. ThreadPoolExecutor introduces additional waiting of its own, and it is this wait the PR adds an option to disable. > The solution is for you to avoid having hanging threads. In some cases that is not possible. For example, the file IO on network file systems can take an arbitrary amount of time, and there is no way to implement a timeout. DNS lookups have been notorious for not supporting timeouts. Many third-party libraries, such as database drivers, still don't support timeouts. This is a real issue that bit many users in production systems, it's not about a toy program using "home-baked networking code". -- ___ Python tracker <https://bugs.python.org/issue36780> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38423] Event loop implementation docs advertise set_event_loop which doesn't work with asyncio.run
Change by Hrvoje Nikšić : -- title: Event loop implementation docs advertise set_event_loop -> Event loop implementation docs advertise set_event_loop which doesn't work with asyncio.run ___ Python tracker <https://bugs.python.org/issue38423> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38423] Event loop implementation docs advertise set_event_loop
New submission from Hrvoje Nikšić : The docs of SelectorEventLoop and ProactorEventLoop contain examples that call asyncio.set_event_loop: selector = selectors.SelectSelector() loop = asyncio.SelectorEventLoop(selector) asyncio.set_event_loop(loop) But this won't have any effect on code that uses asyncio.run(), because asyncio.run() creates a fresh event loop. Since asyncio.run() is the recommended way to execute async code and is used consistently throughout the documentation, this might be confusing to someone who tries to e.g. use the proactor loop on Windows. I propose the following: * add a disclaimer that instantiating the event loop won't affect calls to asyncio.run(), and that loop.run_until_complete() must be used; and * link to asyncio.set_event_loop_policy(), which does work with asyncio.run(), but doesn't allow fine-tuning the details, such as which selector to use for the SelectorEventLoop. -- assignee: docs@python components: Documentation, asyncio messages: 354288 nosy: asvetlov, docs@python, hniksic, yselivanov priority: normal severity: normal status: open title: Event loop implementation docs advertise set_event_loop versions: Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue38423> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38192] Fix invocation of EchoClientProtocol
New submission from Hrvoje Nikšić : This is a followup on issue38178. While testing the new code, I noticed that my change introduced a bug, where the code still attempts to pass "loop" when constructing an EchoClientProtocol. A pull request is attached. Also, I've noticed that the MyProtocol example under "Connecting Existing Sockets" is still passing an explicit loop, so I created a commit that converts it to the on_con_lost idiom, and included it in the above pull request. -- assignee: docs@python components: Documentation, asyncio messages: 352584 nosy: asvetlov, docs@python, hniksic, yselivanov priority: normal severity: normal status: open title: Fix invocation of EchoClientProtocol versions: Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue38192> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38192] Fix invocation of EchoClientProtocol
Change by Hrvoje Nikšić : -- keywords: +patch pull_requests: +15806 stage: -> patch review pull_request: https://github.com/python/cpython/pull/16202 ___ Python tracker <https://bugs.python.org/issue38192> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38178] Remove explicit "loop" argument from EchoClientProtocol example
Hrvoje Nikšić added the comment: Raymond, no problem; I guess I assumed that the authors are following the bug tracker (or have possibly moved on and are inactive). I also had reason to believe the change to be non-controversial, since it is in line with Yury's own recommendations, e.g. from his 2018 EuroPython presentation (slide 14): https://speakerdeck.com/1st1/asyncio-today-and-tomorrow In any case, I am happy that the pull request has been processed and accepted so quickly - thanks everyone for the great work! -- ___ Python tracker <https://bugs.python.org/issue38178> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38178] Remove explicit "loop" argument from EchoClientProtocol example
Change by Hrvoje Nikšić : -- keywords: +patch pull_requests: +15769 stage: -> patch review pull_request: https://github.com/python/cpython/pull/16159 ___ Python tracker <https://bugs.python.org/issue38178> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38178] Remove explicit "loop" argument from EchoClientProtocol example
New submission from Hrvoje Nikšić : The EchoClientProtocol example receives a "loop" argument, which is not used at all in the TCP example, and is used to create a future in the UDP example. In modern asyncio code the explicit loop arguments are no longer used since the loop can always be obtained with get_running_loop(). The proposed patch makes the UDP example consistent with the TCP one (by having the constructor accept the on_con_lost future) and removes the loop argument from both. -- assignee: docs@python components: Documentation, asyncio messages: 352478 nosy: asvetlov, docs@python, hniksic, yselivanov priority: normal severity: normal status: open title: Remove explicit "loop" argument from EchoClientProtocol example versions: Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue38178> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36780] Interpreter exit blocks waiting for futures of shut-down ThreadPoolExecutors
Hrvoje Nikšić added the comment: @asvetlov The idea of the new flag is to disable any subsequent waiting for futures after ThreadPoolExecutor.shutdown(wait=False) returns. Currently the additional waiting is implemented using "atexit", so I assumed it referred to process exit. (The documentation at https://docs.python.org/3/library/atexit.html doesn't seem to specify precisely when the callbacks are executed.) But looking at the implementation of atexit, it seems that the atexit callbacks are actually called from Py_FinalizeEx. I think wait_at_exit is descriptive because in most cases the process exit and interpreter shutdown will correlate, but I can still update the docs to make it clearer what "exit" refers to. We should just avoid the word "shutdown" in the flag name to avoid confusion with executor shutdown. -- ___ Python tracker <https://bugs.python.org/issue36780> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36780] Interpreter exit blocks waiting for futures of shut-down ThreadPoolExecutors
Change by Hrvoje Nikšić : -- keywords: +patch pull_requests: +13161 stage: -> patch review ___ Python tracker <https://bugs.python.org/issue36780> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36794] asyncio.Lock documentation in Py3.8 lacks parts presented in documentation in Py3.6
Hrvoje Nikšić added the comment: @matrixise I've signed the CLA in the meantime, and it's now confirmed by https://check-python-cla.herokuapp.com/ Thanks for the pointer. -- ___ Python tracker <https://bugs.python.org/issue36794> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36794] asyncio.Lock documentation in Py3.8 lacks parts presented in documentation in Py3.6
Hrvoje Nikšić added the comment: Ok, found it, and I've now updated the github name on my bpo account. I'll gladly sign the CLA if needed, I thought it wasn't necessary for small changes based on previous experience. Please advise whether it's necessary here. -- ___ Python tracker <https://bugs.python.org/issue36794> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36780] Interpreter exit blocks waiting for futures of shut-down ThreadPoolExecutors
Hrvoje Nikšić added the comment: > Are you interested in writing a patch? Yes - I wanted to check if there is interest in the feature before I commit time to write the patch, documentation, tests, etc. (And also I wanted to check if there's a better way to do it.) In any case, thanks for picking up on this. > `disown` might not be the best name - maybe `allow_shutdown` or something. Disown is an admittedly obscure reference to the shell built-in of the same name (https://tinyurl.com/qfn8ao7). The idea is to get the point across that the pool is truly abandoned, and its running futures are left to their own devices. Maybe wait_at_exit would be a clearer name: pool.shutdown(wait=False, wait_at_exit=True) -- ___ Python tracker <https://bugs.python.org/issue36780> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36794] asyncio.Lock documentation in Py3.8 lacks parts presented in documentation in Py3.6
Hrvoje Nikšić added the comment: How do I connect the accounts? Please note that I've previously submitted PRs which have been accepted, e.g. https://bugs.python.org/issue34476 and https://bugs.python.org/issue35465 -- ___ Python tracker <https://bugs.python.org/issue36794> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36780] Interpreter exit blocks waiting for futures of shut-down ThreadPoolExecutors
Hrvoje Nikšić added the comment: I agree with the last comment. The disowning functionality is only used in specific circumstances, so it's perfectly fine to keep the functionality as a shutdown flag. I also agree that the change cannot be *unconditional*, for backward compatibility if not for other reasons. The StackOverflow question, and more importantly the existence of shutdown(wait=False), suggest that there are legitimate cases when one doesn't want to wait for all running futures to finish. If a flag to shutdown() is considered to complicate the API, then perhaps we could add an opt-out by subclassing the executor and overriding a semi-private method. Currently there seems to be no way to just abandon the thread pool. Since user threads don't and never will support cancellation, the options are: 1. add the disown option to shutdown, as suggested 2. monkey-patch concurrent.futures to not block at shutdown 3. make functions executed by the pool externally cancellable 4. roll our own thread pool #1 is suggested by this issue, and #2 is what we do now, but it's obviously unacceptable in the long term. #3 is infeasible because the functions we submit to the pool heavily rely on http (through "requests") and database communication, which don't support user-driven cancellation. #4 would be technically possible, but it would require reimplementing half of concurrent.futures (badly), just for the purpose of being able to get rid of the mandatory wait at interpreter exit. -- ___ Python tracker <https://bugs.python.org/issue36780> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36794] asyncio.Lock documentation in Py3.8 lacks parts presented in documentation in Py3.6
Hrvoje Nikšić added the comment: Also, the docstring of asyncio.Lock still states: When more than one coroutine is blocked in acquire() waiting for the state to turn to unlocked, only one coroutine proceeds when a release() call resets the state to unlocked; first coroutine which is blocked in acquire() is being processed. -- nosy: +hniksic ___ Python tracker <https://bugs.python.org/issue36794> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue36780] Interpreter exit blocks waiting for futures of shut-down ThreadPoolExecutors
New submission from Hrvoje Nikšić : At interpreter shutdown, Python waits for all pending futures of all executors to finish. There seems to be no way to disable the wait for pools that have been explicitly shut down with pool.shutdown(wait=False). The attached script demonstrates the issue. In our code the futures are running blocking network calls that can be canceled by the user. The cancel action automatically cancels the pending futures and informs the running ones that they should exit. The remaining futures are those whose callables are "stuck" in network calls with long or infinite timeouts, such as reading from a non-responding network filesystem. Since those can't be interrupted, we use pool.shutdown(wait=False) to disown the whole pool. This works nicely, until the application exit, at which point it blocks trying to wait for the pending futures to finish. This can take an arbitrary amount of time, possibly never finishing. A similar question has come up on StackOverflow, with the only answer recommending to unregister concurrent.futures.thread._python_exit: https://stackoverflow.com/q/48350257/1600898 . We are ourselves using a similar hack, but we would like to contribute a proper way to disown an executor. The current behavior is explicitly documented, so presumably it can't be (easily) changed, but we could add a "disown" keyword argument to shutdown(), or a new disown() method, which would serve to explicitly disable the waiting. If this is considered desirable, I will create a pull request. -- components: Library (Lib) files: pool-shutdown messages: 341329 nosy: hniksic priority: normal severity: normal status: open title: Interpreter exit blocks waiting for futures of shut-down ThreadPoolExecutors versions: Python 3.7, Python 3.8, Python 3.9 Added file: https://bugs.python.org/file48297/pool-shutdown ___ Python tracker <https://bugs.python.org/issue36780> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35539] Cannot properly close terminated process
New submission from Hrvoje Nikšić : It seems impossible to correctly close() an asyncio Process on which terminate has been invoked. Take the following coroutine: async def test(): proc = await asyncio.create_subprocess_shell( "sleep 1", stdout=asyncio.subprocess.PIPE) proc.terminate() await proc.wait() After running it with asyncio.run(), Python prints a warning about "Event loop is closed" exception ignored in BaseSubprocessTransport.__del__. The code does wait for the process to exit, and neither proc nor proc.stdout have a close() method, so the warning seems spurious. Commenting out proc.terminate() makes the program finish without an exception (but then it waits for a full second, of course). Runnable example attached. -- components: asyncio files: terminate.py messages: 332165 nosy: asvetlov, hniksic, yselivanov priority: normal severity: normal status: open title: Cannot properly close terminated process versions: Python 3.7 Added file: https://bugs.python.org/file48008/terminate.py ___ Python tracker <https://bugs.python.org/issue35539> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35465] [asyncio] Document loop.add_signal_handler
Hrvoje Nikšić added the comment: Done, https://github.com/python/cpython/pull/11145 -- ___ Python tracker <https://bugs.python.org/issue35465> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35465] Document add_signal_handler
New submission from Hrvoje Nikšić : In https://stackoverflow.com/q/53704709/1600898 a StackOverflow user asked how the add_signal_handler event loop method differs from the signal.signal normally used by Python code. The add_signal_handler documentation is quite brief - if we exclude the parts that explain the exceptions raised and how to pass keyword arguments to the callback, the meat is this sentence: Set callback as the handler for the signum signal. It is only after looking at the source, and understanding asyncio, that one comes to the conclusion that the idea is to run the handler along with other event loop callbacks and coroutines, at the time when it is actually safe to invoke asyncio code. I think this deserves to be mentioned explicitly, for example: Set callback as the handler for the signum signal. The callback will be invoked in the thread that runs the event loop, along with other queued callbacks and runnable coroutines. Unlike signal handlers registered using signal.signal(), a callback registered with this function is allowed to interact with the event loop. -- assignee: docs@python components: Documentation messages: 331645 nosy: docs@python, hniksic priority: normal severity: normal status: open title: Document add_signal_handler versions: Python 3.7, Python 3.8 ___ Python tracker <https://bugs.python.org/issue35465> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33533] Provide an async-generator version of as_completed
Hrvoje Nikšić added the comment: I didn't start working on the PR, so please go ahead if you're interested. One small suggestion: If you're implementing this, please note that the proof-of-concept implementation shown in the description is not very efficient because each call to `wait` has to iterate over all the futures (which can be potentially large in number) to set up and tear down the done callbacks on each one. A more efficient implementation would set up the callbacks only once - see https://stackoverflow.com/a/51403277/1600898 for an example. -- ___ Python tracker <https://bugs.python.org/issue33533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33533] Provide an async-generator version of as_completed
Hrvoje Nikšić added the comment: If there is interest in this, I'd like to attempt a PR for a sync/async variant of as_completed. Note that the new docs are *much* clearer, so the first (documentation) problem from the description is now fixed. Although the documentation is still brief, it now contains the key pieces of information: 1) that the futures are actually run in parallel, and 2) that each yielded future produces the next result that becomes available. Neither was actually stated in the old docs (!), so this is a marked improvement. -- ___ Python tracker <https://bugs.python.org/issue33533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34476] asyncio.sleep(0) not documented
Hrvoje Nikšić added the comment: Agreed about the special case. Minor change suggestion: ``sleep()` always suspends the current task, allowing other tasks to run. That is, replace "switches execution to another [task]" because there might not be other tasks or they might not be executable - the idea is to allow them to run. Also, replace "pause" with "suspend" (because when delay<=0 there is not really a discernible pause). https://github.com/python/cpython/pull/9643 If you'd still prefer the previous wording, I'll amend the PR. -- ___ Python tracker <https://bugs.python.org/issue34476> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34476] asyncio.sleep(0) not documented
Hrvoje Nikšić added the comment: The issue is because the current documentation *doesn't* say that "`asyncio.sleep()` always pauses the current task and switches execution to another one", it just says that it "blocks for _delay_ seconds". With that description a perfectly valid implementation could be further optimized with: async def sleep(delay): if delay <= 0: return ... In which case `await sleep(0)` would *not* cause a task switch. And this is not an unreasonable thing to expect because there are many other potentially-switching situations in asyncio that sometimes don't cause a switch, such as await `queue.get()` from a non-empty queue or await `await stream.readline()` from a socket stream that has a line to provide. The user who wants to implement a "yield control to event loop" has to look at the source to find out how delay==0 is handled, and then they have to wonder if it's an implementation detail. https://github.com/python/asyncio/issues/284 states that the behavior is explicit and here to stay, but that promise has never made it into the actual documentation. -- ___ Python tracker <https://bugs.python.org/issue34476> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34518] Documentation for coroutine objects
Hrvoje Nikšić added the comment: That's exactly it, thanks! I have no idea how I missed it, despite looking (I thought) carefully. But yes, they should be linked from https://docs.python.org/3/library/stdtypes.html . Just as currently there is https://docs.python.org/3/library/stdtypes.html#generator-types that links to https://docs.python.org/3/reference/expressions.html#yieldexpr , there could be a #coroutine-types that links to https://docs.python.org/3/reference/datamodel.html#coroutine-objects Another place to link is https://docs.python.org/3/glossary.html#term-coroutine - it currently does link to the reference, but only to the "async def" syntax. Since https://docs.python.org/3/reference/compound_stmts.html#async-def is linked from many places, it might make sense to mention that *calling* a coroutine immediately returns a coroutine object, with a link to https://docs.python.org/3/reference/datamodel.html#coroutine-objects -- ___ Python tracker <https://bugs.python.org/issue34518> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34518] Documentation for coroutine objects
New submission from Hrvoje Nikšić : Coroutine objects have public methods such as send, close, and throw, which do not appear to be documented. For example, at https://stackoverflow.com/q/51975658/1600898 a StackOverflow user asks how to abort an already created (but not submitted) coroutine without a RuntimeWarning, with the answer being to use the close() method. The user asked where does one find the close method. Currently methods only appear to be documented in PEP 492, referring to generator documentation for details. The glossary entry for coroutine (object) links to PEP 492 and to the async def statement. Various places in the documentation, e.g. the index, link to https://docs.python.org/3/library/asyncio-task.html#coroutine but that page is mostly concerned with the usage of coroutines within asyncio, where the methods on individual coroutine objects should not be used. I would expect to find documentation on coroutine objects under built-in types, https://docs.python.org/3/library/stdtypes.html . In comparison, generator-iterator methods are documented in the language reference: https://docs.python.org/3/reference/expressions.html#generator-iterator-methods -- assignee: docs@python components: Documentation messages: 324157 nosy: docs@python, hniksic priority: normal severity: normal status: open title: Documentation for coroutine objects versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue34518> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34506] Traceback logged when SSL handshake fails
New submission from Hrvoje Nikšić : When an SSL handshake fails in asyncio, an exception traceback is logged to standard error even if the application code catches the exception. This logging cannot be suppressed, except by providing a custom exception handler for the whole event loop. The question was raised on StackOverflow in https://stackoverflow.com/q/52012488/1600898 To reproduce the issue, run the attached minimal example (taken from the SO question). Expected behavior is for "Error handled" to be printed. Actual behavior is that, in addition to that line, two tracebacks are printed. It looks like a bug that _fatal_error both logs the exception and calls connection_lost on the protocol (through transport._force_close()). If the idea is for the exception not to get swallowed by protocols that don't implement a sensible connection_lost (e.g. because they've just inherited from BaseProtocol, like the echo server examples), then maybe a protocol that propagates the exception in connection_lost should be able to opt out of the additional logging. That way the stream protocols would avoid spurious output for the suppressed exception by default, and the same opt-out mechanism would be available to user-written protocols. -- files: sslerr.py messages: 324113 nosy: hniksic priority: normal severity: normal status: open title: Traceback logged when SSL handshake fails Added file: https://bugs.python.org/file47763/sslerr.py ___ Python tracker <https://bugs.python.org/issue34506> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34476] asyncio.sleep(0) not documented
Hrvoje Nikšić added the comment: Also, the "Create a coroutine ..." wording in the current documentation is a bit strange - sleep() is already marked as a coroutine, and documentation of other coroutines simply state their effect in imperative. -- ___ Python tracker <https://bugs.python.org/issue34476> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34476] asyncio.sleep(0) not documented
New submission from Hrvoje Nikšić : Looking at the implementation and at the old issue at https://github.com/python/asyncio/issues/284 shows that asyncio.sleep special-cases asyncio.sleep(0) to mean "yield control to the event loop" without incurring additional overhead of sleeping. However, this is not documented at https://docs.python.org/3/library/asyncio-task.html#asyncio.sleep . I propose to add to the documentation a statement along the lines of: """ A delay of 0 will relinquish control to the event loop, allowing other tasks and callbacks to run. The control will be returned to the coroutine as soon as they have run, without an explicit delay. """ -- assignee: docs@python components: Documentation, asyncio messages: 323949 nosy: asvetlov, docs@python, hniksic, yselivanov priority: normal severity: normal status: open title: asyncio.sleep(0) not documented versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue34476> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33605] Detect accessing event loop from a different thread outside of _debug
Hrvoje Nikšić added the comment: I would definitely not propose or condone sacrificing performance. Part of the reason why I suggested the check is that it can be done efficiently - it is literally a comparison of two integers, both of which are obtained trivially. I would hope that it doesn't affect performance at all, but only measurements will show. I looked at the implementation of threading.get_ident(), and it looks just as one would hope - a cast of pthread_self() to unsigned long, and a call to PyLong_FromUnsignedLong. If needed, it might be further improved by caching the resulting PyObject * in a __thread variable on platforms that support them, but hopefully that is not necessary. (It would require additional code to Py_CLEAR the cached PyObject * when the thread exits.) > I very doubt that correct asyncio program have problems like this. That is true by definition, since what we're discussing is incorrect in asyncio. But we could be better at diagnosing the incorrect use, and we could do so (or so I hope) very efficiently. Failure to emit a diagnostic leads to bugs that are discovered much later or never. As for why people use multi-threading with asyncio, keep in mind that many libraries create threads behind the scenes without the user being aware of it. Like me, you are a regular at python-asyncio tag, so you've seen those. In some cases you can write it off as the user doing something stupid and not reading the docs, but in many others, the mistake is far from obvious, especially for users who are not (yet) asyncio experts. -- ___ Python tracker <https://bugs.python.org/issue33605> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33605] Detect accessing event loop from a different thread outside of _debug
Hrvoje Nikšić added the comment: > I'd be OK with this if the performance penalty is within 0.5% in > microbenchmarks for asyncio & uvloop. @yselivanov Are you referring to specific microbenchmarks published somewhere, or the general "echo server" style microbenchmarks? -- ___ Python tracker <https://bugs.python.org/issue33605> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33605] Detect accessing event loop from a different thread outside of _debug
New submission from Hrvoje Nikšić : Looking at StackOverflow's python-asyncio tag[1], it appears that it's a very common mistake for users to invoke asyncio functions or methods from a thread other than the event loop thread. In some cases this happens because the user is careless with threads and hasn't read the documentation. But in many cases what happens is that a third-party library invoked a callback in a different thread without making it transparent that that's what it's doing. The trouble is that in many cases doing these things, e.g. calling loop.call_soon() or loop.create_task() from the wrong thread, will *appear to work*. The typical event loop is busy servicing different coroutines, so a function or task enqueued without a proper wakeup gets picked up soon enough. This is, of course, a disaster waiting to happen because it could easily lead to corruption of event loop's data structures. But users don't know that, and many of them become aware of the problem only after wondering "why does my code start working when I add a coroutine that does nothing but asyncio.sleep(0.1) in an infinite loop?" Some may never even fix their code, just assuming that asyncio takes a long time to process a new task or something like that. I suggest that asyncio should be stricter about this error and that methods and functions that operate on the event loop, such as call_soon, call_later, create_task, ensure_future, and close, should all call _check_thread() even when not in debug mode. _check_thread() warns that it "should only be called when self._debug == True", hinting at "performance reasons", but that doesn't seem justified. threading.get_ident() is efficiently implemented in C, and comparing that integer to another cached integer is about as efficient an operation as it gets. The added benefit would be a vast improvement of robustness of asyncio-based programs, saving many hours of debugging. [1] Here is an incomplete list of questions where the users stumbled on this problem, and that's only from the last three months or so: https://stackoverflow.com/questions/49906034/python-asyncio-run-forever-and-tasks https://stackoverflow.com/questions/49851514/python-websockets-and-gtk-confused-about-asyncio-queue https://stackoverflow.com/questions/49533612/using-asyncio-loop-reference-in-another-file https://stackoverflow.com/questions/49093623/strange-behaviour-when-task-added-to-empty-loop-in-different-thread https://stackoverflow.com/questions/48836285/python-asyncio-event-wait-not-responding-to-event-set https://stackoverflow.com/questions/48833644/how-to-wait-for-asynchronous-callback-in-the-background-i-e-not-invoked-by-us https://stackoverflow.com/questions/48695670/running-asynchronous-code-synchronously-in-separate-thread -- components: asyncio messages: 317324 nosy: asvetlov, hniksic, yselivanov priority: normal severity: normal status: open title: Detect accessing event loop from a different thread outside of _debug type: enhancement versions: Python 3.7, Python 3.8 ___ Python tracker <https://bugs.python.org/issue33605> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33533] Provide an async-generator version of as_completed
Hrvoje Nikšić added the comment: Another option occurred to me: as_completed could return an object that implements both synchronous and asynchronous iteration protocol: class as_completed: def __init__(fs, *, loop=None, timeout=None): self.__fs = fs self.__loop = loop self.__timeout = timeout def __iter__(self): # current implementation here ... async def __aiter__(self): # new async implementation here ... def __next__(self): # defined for backward compatibility with code that expects # as_completed() to return an iterator rather than an iterable if self._iter is None: self._iter = iter(self) return next(self._iter) With that design there wouldn't need to be a new function under a different name; instead, as_completed could just be documented as an asynchronous iterable, with the old synchronous iteration supported for backward compatibility. -- ___ Python tracker <https://bugs.python.org/issue33533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33403] asyncio.tasks.wait does not allow to set custom exception when return_when=FIRST_EXCEPTION
Hrvoje Nikšić added the comment: """ At the moment this can be done but it will cancel all the coroutines with any exception that is raised and at some occasions this may not be desired. """ Does wait() really "cancel all the coroutines"? The documentation doesn't mention wait() canceling anything it only returns them in the `pending` set. It is gather() and wait_for() that cancel automatically. If you want to cancel everything on some exception and not on another, you can easily implement the logic yourself, e.g: tasks = [asyncio.ensure_future(example(x)) for x in range(20)] done, pending = await asyncio.wait(tasks, return_when=FIRST_EXCEPTION) for fut in done: try: fut.result() except CancelException: for fut in pending: fut.cancel() -- nosy: +hniksic ___ Python tracker <https://bugs.python.org/issue33403> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33544] Asyncio Event.wait() is a hold over from before awaitable, and should be awaitable
Hrvoje Nikšić added the comment: Deprecating Event.wait would be incorrect because Event was designed to mimic the threading.Event class which has a (blocking) wait() method[1]. Supporting `await event` is still worthwhile, though. [1] https://docs.python.org/2/library/threading.html#threading.Event.wait -- nosy: +hniksic ___ Python tracker <https://bugs.python.org/issue33544> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33533] Provide an async-generator version of as_completed
Change by Hrvoje Nikšić : -- type: -> enhancement ___ Python tracker <https://bugs.python.org/issue33533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33533] Provide an async-generator version of as_completed
Hrvoje Nikšić added the comment: Of course, `yield from done` would actually have to be `for future in done: yield future`, since async generators don't support yield from. -- ___ Python tracker <https://bugs.python.org/issue33533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33533] Provide an async-generator version of as_completed
New submission from Hrvoje Nikšić : Judging by questions on the StackOverflow python-asyncio tag[1][2], it seems that users find it hard to understand how to use as_completed correctly. I have identified three issues: * It's somewhat sparingly documented. A StackOverflow user ([2]) didn't find it obvious that it runs the futures in parallel. Unless one is already aware of the meaning, the term "as completed" could suggest that they are executed and completed sequentially. * Unlike its concurrent.futures counter-part, it's non-blocking. This sounds like a good idea because it's usable from synchronous code, but it means that the futures it yields aren't completed, you have to await them first. This is confusing for a function with "completed" in the name, and is not the case with concurrent.futures.as_completed, nor with other waiting functions in asyncio (gather, wait, wait_for). * It yields futures other than those that were passed in. This prevents some usual patterns from working, e.g. associating the results with context data, such as Python docs itself uses for concurrent.futures.as_completed in https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example . See SO question [1] for a similar request in asyncio. Here is my proposal to address the issues. I believe the usage problems stem from as_completed predating the concept of async iterators. If we had async iterators back then, as_completed would have been an obvious candidate to be one. In that case it could be both "blocking" (but not for the event loop) and return the original futures. For example: async def as_completed2(fs): pending = set(map(asyncio.ensure_future(fs))) while pending: done, pending = await asyncio.wait(pending, return_when=asyncio.FIRST_COMPLETED) yield from done (It is straightforward to add support for a timeout argument.) I propose to deprecate asyncio.as_completed and advertise the async-iterator version like the one presented here - under a nicer name, such as as_done(), or as_completed_async(). [1] https://stackoverflow.com/questions/50028465/python-get-reference-to-original-task-after-ordering-tasks-by-completion [2] https://stackoverflow.com/questions/50355944/yield-from-async-generator-in-python-asyncio -- components: asyncio messages: 316773 nosy: asvetlov, hniksic, yselivanov priority: normal severity: normal status: open title: Provide an async-generator version of as_completed versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue33533> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33469] RuntimeError after closing loop that used run_in_executor
Change by Hrvoje Nikšić : -- components: +asyncio nosy: +asvetlov, yselivanov ___ Python tracker <https://bugs.python.org/issue33469> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33469] RuntimeError after closing loop that used run_in_executor
New submission from Hrvoje Nikšić : Looking at a StackOverflow question[1], I was unable to find a way to correctly close an event loop that uses run_in_executor() for long-running tasks. The question author tried to implement the following scenario: 1. create some tasks that use run_in_executor 2. run asyncio.wait(tasks, return_when=FIRST_EXCEPTION) 3. cancel pending tasks, if any 4. close the loop and continue with non-async work However, when there are pending tasks from wait(), a RuntimeError is raised some time after step #4. In the test programs, it happens while waiting for the program to finish. I have attached a minimal example to reproduce the issue. The immediate cause is that a callback installed by wrap_future() notices that the underlying concurrent.futures.Future is done and calls loop.call_soon_threadsafe() to copy the result to the asyncio.Future. call_soon_threadsafe() fails when the loop is closed. This would be reasonable behavior if not for the fact that the code explicitly cancelled the asyncio future, and awaited it to ensure that the cancellation took effect. While it is clear that asyncio cannot interrupt a function already running in an executor, it should probably detach the connection between the concurrent future and the asyncio future, to prevent this kind of error (and possibly other problems). For example, the _call_check_cancel closure in _chain_future could remove (or disable) the done_callback installed on source after the call to source.cancel(). Since at that point we know that destination (asyncio.Future) is already canceled, there is no longer value in invoking the done callback for source (concurrent.futures.Future). [1] https://stackoverflow.com/q/50279522/1600898 -- files: executor-cancel messages: 316420 nosy: hniksic priority: normal severity: normal status: open title: RuntimeError after closing loop that used run_in_executor versions: Python 3.6, Python 3.7 Added file: https://bugs.python.org/file47583/executor-cancel ___ Python tracker <https://bugs.python.org/issue33469> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30256] Adding a SyncManager Queue proxy to a SyncManager dict or Namespace proxy raises an exception
Hrvoje Nikšić added the comment: The issue is also present in Python 3.7.0b1. -- versions: +Python 3.7 ___ Python tracker <https://bugs.python.org/issue30256> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30256] Adding a SyncManager Queue proxy to a SyncManager dict or Namespace proxy raises an exception
Hrvoje Nikšić added the comment: I encountered this bug while testing the code in this StackOverflow answer: https://stackoverflow.com/a/48565011/1600898 The code at the end of the answer runs on Python 3.5, but fails on 3.6 with the "unexpected keyword argument 'manager_owned'" error. If someone knows of a workaround until the PR is accepted, it would be appreciated as well. -- nosy: +hniksic ___ Python tracker <https://bugs.python.org/issue30256> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10049] Add a "no-op" (null) context manager to contextlib
Hrvoje Nikšić added the comment: I am of course willing to sign the CLA (please send further instructions via email), although I don't know how useful my original patch is, given that it caches the null context manager. -- ___ Python tracker <https://bugs.python.org/issue10049> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10049] Add a "no-op" (null) context manager to contextlib (Rejected: use contextlib.ExitStack())
Hrvoje Nikšić added the comment: For what it's worth, we are still using our own null context manager function in critical code. We tend to avoid contextlib.ExitStack() for two reasons: 1) it is not immediately clear from looking at the code what ExitStack() means. (Unlike the "contextmanager" decorator, ExitStack is unfamiliar to most developers.) 2) ExitStack's __init__ and __exit__ incur a non-negligible overhead compared to a true do-nothing context manager. It doesn't surprise me that projects like Tensor Flow introduce their own versions of this decorator. Having said that, I can also understand why it wasn't added. It is certainly possible to live without it, and ExitStack() is a more than acceptable replacement for casual use. -- ___ Python tracker <https://bugs.python.org/issue10049> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30637] Syntax error reported on compile(...), but not on compile(..., ast.PyCF_ONLY_AST)
Hrvoje Nikšić added the comment: > Can you suggest a couple of sentences you would have like to have > seen, and where? Thanks, I would suggest to add something like this to the documentation of ast.parse: """ ``parse`` raises ``SyntaxError`` if the compiled source is invalid, and ``ValueError`` if the source contains null bytes. Note that a successful parse does not guarantee correct syntax of ``source``. Further syntax errors can be detected, and ``SyntaxError`` raised, when the source is compiled to a code object using ``compile`` without the ``ast.PyCF_ONLY_AST`` flag, or executed with ``exec``. For example, a lone ``break`` statement can be parsed, but not converted into a code object or executed. """ I don't think the ``compile`` docs need to be changed, partly because they're already sizable, and partly because they don't document individual flags at all. (A reference to the ``ast`` module regarding the flags, like the one for AST objects in the first paragraph, might be a useful addition.) -- ___ Python tracker <http://bugs.python.org/issue30637> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30637] Syntax error reported on compile(...), but not on compile(..., ast.PyCF_ONLY_AST)
Hrvoje Nikšić added the comment: > The appropriate fix would probably be to add a sentence to the > `ast.PyCF_ONLY_AST` documentation to say that some syntax errors > are only detected when compiling the AST to a code object. Yes, please. I'm not saying the current behavior is wrong (it makes sense that some constructs are legal as AST, but can't be converted into code), I just found it surprising. In other words, we would have found it very useful for the documentation to mention that code generation performs additional checks on the AST that are not performed during the ast.PyCF_ONLY_AST compilation. -- ___ Python tracker <http://bugs.python.org/issue30637> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30637] Syntax error reported on compile(...), but not on compile(..., ast.PyCF_ONLY_AST)
New submission from Hrvoje Nikšić: Our application compiles snippets of user-specified code using the compile built-in with ast.PyCF_ONLY_AST flag. At this stage we catch syntax errors and perform some sanity checks on the AST. The AST is then compiled into actual code using compile() and run using further guards. We found that using a bare "return" in the code works with ast.PyCF_ONLY_AST, but raises SyntaxError when compiled without the flag: >>> import ast >>> compile('return', '', 'exec', ast.PyCF_ONLY_AST, True) <_ast.Module object at 0x7f35df872310> >>> compile('return', '', 'exec', 0, True) Traceback (most recent call last): File "", line 1, in File "", line 1 SyntaxError: 'return' outside function Is this intended behavior? It doesn't seem to be documented anywhere. -- components: Interpreter Core messages: 295771 nosy: hniksic priority: normal severity: normal status: open title: Syntax error reported on compile(...), but not on compile(..., ast.PyCF_ONLY_AST) versions: Python 2.7, Python 3.7 ___ Python tracker <http://bugs.python.org/issue30637> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26959] pickle: respect dispatch for functions again
Hrvoje Nikšić added the comment: You can simplify pickle_lambda in the test by using marshal.dumps(code_obj) and marshal.loads(code_obj) to dump and load the code object without going through its entire guts. It would be a shame to have to change a pickle test just because some detail of the code object implementation changes. -- nosy: +hniksic ___ Python tracker <http://bugs.python.org/issue26959> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26198] PyArg_ParseTuple with format "et#" and "es#" detects overflow by raising TypeError instead of ValueError
Hrvoje Nikšić added the comment: Barun, Serhiy, thanks for picking this up so quickly. I would further suggest to avoid using a fixed buffer in abspath (_getfullpathname, but only abspath seems to call it). Other filesystem calls are using the interface where PyArg_ParseTuple allocates the buffer. This makes it easier to handling errors from the native layer by catching OSError or similar, instead of the very generic TypeError (or ValueError). -- ___ Python tracker <http://bugs.python.org/issue26198> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26198] PyArg_ParseTuple with format "et#" and "es#" detects overflow by raising TypeError instead of ValueError
Hrvoje Nikšić added the comment: The problem can be encountered and easily reproduced by calling os.path functions, such as os.path.abspath, with a sufficiently large string on Windows: >>> os.path.abspath("a" * 1024) Traceback (most recent call last): File "", line 1, in File "P:\...\lib\ntpath.py", line 471, in abspath TypeError: must be (buffer overflow), not str The error message is somewhat confusing, making it look like the "must be" and "not" arguments are swapped. Ideally, the message could be along the lines of "must be a string of no more than X characters". -- ___ Python tracker <http://bugs.python.org/issue26198> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26198] PyArg_ParseTuple with format "et#" and "es#" detects overflow by raising TypeError instead of ValueError
New submission from Hrvoje Nikšić: The documentation for the "es#" format (and the "et#" that derives from it) documents the possibility of providing an already allocated buffer. Buffer overflow is detected and handled as follows: "If the buffer is not large enough, a ValueError will be set." However, the actual behavior is to raise a TypeError. Inspecting the code in getargs.c reveals that convertsimple() handles buffer overflow by returning a formatted message to its caller, convertitem(). Calls to convertitem() that return an error call seterror() to set the error, and seterror() unconditionally sets the PyExc_TypeError. This is not a big issue in practice, and since the behavior is not new, it might be best to simply update the documentation to match the existing practice instead of changing the behavior and risking breaking working code. -- assignee: docs@python components: Documentation, Interpreter Core messages: 258905 nosy: docs@python, hniksic priority: normal severity: normal status: open title: PyArg_ParseTuple with format "et#" and "es#" detects overflow by raising TypeError instead of ValueError type: behavior versions: Python 2.7, Python 3.5 ___ Python tracker <http://bugs.python.org/issue26198> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21167] float('nan') returns 0.0 on Python compiled with icc
Hrvoje Nikšić added the comment: Note that defaulting to unsafe math in extensions will make *their* use of the Py_NAN macro break under icc. If we go that route ("-fp-model strict" for Python build, but not for extensions), we should also apply the attached patch that defines Py_NAN as Py_HUGE_VAL / Py_HUGE_VAL. -- ___ Python tracker <http://bugs.python.org/issue21167> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21167] float('nan') returns 0.0 on Python compiled with icc
Hrvoje Nikšić added the comment: Using -fp-model strict (or other appropriate icc flag) seems like a reasonable resolution. It should likely also be applied to Python 3.x, despite the version field of this issue. (Even if float('nan') happens to work in current 3.x, internal code that returns Py_NAN can and will break.) -- ___ Python tracker <http://bugs.python.org/issue21167> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21167] float('nan') returns 0.0 on Python compiled with icc
Hrvoje Nikšić added the comment: Mark: > > If Python requires strict IEEE 754 floating-point, > > It doesn't (at the moment). Does this imply that my patch is a better fix than requiring the builder to specify -fp-model strict to icc? For our use case either solution is valid. For administrators and users testing or using Python with icc, it might be more convenient for Py_NAN to do the right thing with the default icc flags. -- ___ Python tracker <http://bugs.python.org/issue21167> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21167] float('nan') returns 0.0 on Python compiled with icc
Hrvoje Nikšić added the comment: The compilation was performed with the default flags, i.e. without -fp-model strict or similar. If Python requires strict IEEE 754 floating-point, it should probably be mentioned at a prominent place in the documentation. Administrators building Python with alternative compilers or on non-gcc Unix systems might not be aware of the issue. -- ___ Python tracker <http://bugs.python.org/issue21167> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21167] float('nan') returns 0.0 on Python compiled with icc
Hrvoje Nikšić added the comment: sys.float_repr_style is 'short'. I haven't actually tried this in Python 3.5, but I did note the same definition of Py_NAN (which is used elsewhere in the code), so I tagged the issue as likely also present in 3.x. -- ___ Python tracker <http://bugs.python.org/issue21167> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21167] float('nan') returns 0.0 on Python compiled with icc
New submission from Hrvoje Nikšić: On a Python compiled with Intel C compiler, float('nan') returns 0.0. This behavior can be reproduced with icc versions 11 and 12. The definition of Py_NAN in include/pymath.h expects `HUGE_VAL * 0.0` to compile to a NaN value on IEEE754 platforms: /* Py_NAN * A value that evaluates to a NaN. On IEEE 754 platforms INF*0 or * INF/INF works. Define Py_NO_NAN in pyconfig.h if your platform * doesn't support NaNs. */ #if !defined(Py_NAN) && !defined(Py_NO_NAN) #define Py_NAN (Py_HUGE_VAL * 0.) #endif I don't have a copy of the standard, but a simple test shows that the above does not hold for icc. When compiled with icc without any flags, the following program outputs "inf 0.00" instead of the expected "inf nan": #include #include int main() { double inf = HUGE_VAL; double nan = HUGE_VAL * 0; printf("%f %f\n", inf, nan); return 0; } Compiling the same program with gcc yields the expected output of "inf nan". This issue can be fixed (or worked around, if it's an icc bug) by using a different definition of Py_NAN. On icc, defining Py_NAN as `Py_HUGE_VAL / Py_HUGE_VAL` produces the intended effect. If this is considered suboptimal on other compilers, the definition can be conditionally compiled for icc only. The attached patch fixes the issue, taking the conservative approach of only modifying the icc definition of Py_NAN. -- components: Interpreter Core files: intel-nan.diff keywords: patch messages: 215687 nosy: hniksic priority: normal severity: normal status: open title: float('nan') returns 0.0 on Python compiled with icc type: behavior versions: Python 2.7, Python 3.5 Added file: http://bugs.python.org/file34746/intel-nan.diff ___ Python tracker <http://bugs.python.org/issue21167> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19092] ExitStack.__exit__ incorrectly suppresses exceptions in __exit__ callbacks of inner context managers
Hrvoje Nikšić added the comment: Indeed, that works, thanks. Here is the updated patch for review, passing all tests. -- Added file: http://bugs.python.org/file31908/exitstack.diff ___ Python tracker <http://bugs.python.org/issue19092> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19092] ExitStack.__exit__ incorrectly suppresses exceptions in __exit__ callbacks of inner context managers
Hrvoje Nikšić added the comment: Here is the updated patch, with a very minor improvement (no longer unnecessarily holds on to original exc_info), and with new tests. The tests test for the non-suppression of exit-exception (which fails without the fix) and for the correct suppression of body-exception by an outer CM. It was not necessary to write a test for suppression of exit-exception, since this is already tested by test_exit_exception_chaining_suppress(). There is one problem, however: applying my patch mysteriously breaks the existing test_exit_exception_chaining(). It breaks in a peculiar way: the correct exception is propagated , but the exception's context is wrong. Instead of to KeyError, it points to ZeroDivisionError, despite its having been correctly suppressed. I placed prints in _fix_exception_context right before assignment to __context__, to make sure it wasn't broken by my patch, and the assignment appears correct, it sets the context of IndexError to KeyError instance. The __context__ is correct immediately before the raise statement. However, after the IndexError is caught inside test_exit_exception_chaning(), its __context__ is unexpectedly pointing to ZeroDivisionError. It would seem that the difference is in the raise syntax. The old code used "raise", while the new code uses "raise exc[1]", which I assume changes the exception's __context__. Changing "raise exc[1]" to "raise exc[1] from None" didn't help. -- Added file: http://bugs.python.org/file31897/exitstack.diff ___ Python tracker <http://bugs.python.org/issue19092> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19092] ExitStack.__exit__ incorrectly suppresses exceptions in __exit__ callbacks of inner context managers
Hrvoje Nikšić added the comment: Nick, thanks for the review. Do you need me to write the patch for the test suite along with the original patch? -- ___ Python tracker <http://bugs.python.org/issue19092> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue19092] ExitStack.__exit__ incorrectly suppresses exceptions in __exit__ callbacks of inner context managers
New submission from Hrvoje Nikšić: While using contextlib.ExitStack in our project, we noticed that its __exit__ method of contextlib.ExitStack suppresses the exception raised in any contextmanager's __exit__ except the outermost one. Here is a test case to reproduce the problem: class Err: def __enter__(self): pass def __exit__(self, *exc): 1/0 class Ok: def __enter__(self): pass def __exit__(self, *exc): pass import contextlib s = contextlib.ExitStack() s.push(Ok()) s.push(Err()) with s: pass Since the inner context manager raises in __exit__ and neither context manager requests suppression, I would expect to see a ZeroDivisionError raised. What actually happens is that the exception is suppressed. This behavior caused us quite a few headaches before we figured out why we the exceptions raised in during __exit__ went silently undetected in production. The problem is in ExitStack.__exit__, which explicitly propagates the exception only if it occurs in the outermost exit callback. The idea behind it appears to be to simply record the raised exception in order to allow exit callbacks of the outer context managers to see the it -- and get a chance to suppress it. The only exception that is directly re-raised is the one occurring in the outermost exit callback, because it certainly cannot be seen nor suppressed by anyone else. But this reasoning is flawed because if an exception happens in an inner cm and none of the outer cm's chooses to suppress it, then there will be no one left to raise it. Simply returning True from ExitStack.__exit__ is of no help, as that only has an effect when an exception was passed into the function in the first place, and even then, the caller can only re-raise the earlier exception, not the exception that occurred in the exit callback. And if no exception was sent to ExitStack.__exit__, as is the case in the above code, then no exception will be re-raised at all, effectively suppressing it. I believe the correct way to handle this is by keeping track of whether an exception actually occurred in one of the _exit_callbacks. As before, the exception is forwarded to next cm's exit callbacks, but if none of them suppresses it, then the exception is re-raised instead of returning from the function. I am attaching a patch to contextlib.py that implements this change. The patch also makes sure that True is returned from ExitStack.__exit__ only if an exception was actually passed into it. -- components: Library (Lib) files: exitstack.diff keywords: patch messages: 198411 nosy: hniksic priority: normal severity: normal status: open title: ExitStack.__exit__ incorrectly suppresses exceptions in __exit__ callbacks of inner context managers type: behavior versions: Python 3.4 Added file: http://bugs.python.org/file31872/exitstack.diff ___ Python tracker <http://bugs.python.org/issue19092> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17505] email.header.Header.__unicode__ does not decode header
Hrvoje Nikšić added the comment: Thanks for pointing out the make_header(decode_header(...)) idiom, which I was indeed not aware of. It solves the problem perfectly. I agree that it is a doc bug. While make_header is documented on the same place as decode_header and Header itself, it is not explained *why* I should call it if I already have in hand a perfectly valid Header instance. Specifically, it is not at all clear that while unicode(h) and unicode(make_header(decode_header(h)) will return different things -- I would have expected make_header(decode_header(h)) to return an object indistinguishable from h. Also, the policy=default parameter in Python 3 sounds great, it's exactly what one would expect. -- ___ Python tracker <http://bugs.python.org/issue17505> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17505] email.header.Header.__unicode__ does not decode header
Hrvoje Nikšić added the comment: An example of the confusion that lack of a clear "convert to unicode" method creates is illustrated by this StackOverflow question: http://stackoverflow.com/q/15516958/1600898 -- ___ Python tracker <http://bugs.python.org/issue17505> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17505] email.header.Header.__unicode__ does not decode header
Changes by Hrvoje Nikšić : -- type: -> behavior ___ Python tracker <http://bugs.python.org/issue17505> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17505] email.header.Header.__unicode__ does not decode header
New submission from Hrvoje Nikšić: The __unicode__ method is documented to "return the header as a Unicode string". For this to be useful, I would expect it to decode a string such as "=?gb2312?b?1eLKx9bQzsSy4srUo6E=?=" into a Unicode string that can be displayed to the user, in this case u'\u8fd9\u662f\u4e2d\u6587\u6d4b\u8bd5\uff01'. However, unicode(header) returns the not so useful u"=?gb2312?b?1eLKx9bQzsSy4srUo6E=?=". Looking at the code of __unicode__, it appears that the code does attempt to decode the header into Unicode, but this fails for Headers initialized from a single MIME-quoted string, as is done by the message parser. In other words, __unicode__ is failing to call decode_header. Here is a minimal example demonstrating the problem: >>> msg = email.message_from_string('Subject: >>> =?gb2312?b?1eLKx9bQzsSy4srUo6E=?=\n\nfoo\n') >>> unicode(msg['subject']) u'=?gb2312?b?1eLKx9bQzsSy4srUo6E=?=' Expected output of the last line: u'\u8fd9\u662f\u4e2d\u6587\u6d4b\u8bd5\uff01' To get the fully decoded Unicode string, one must use something like: >>> u''.join(unicode(s, c) for s, c in >>> email.header.decode_header(msg['subject'])) which is unintuitive and hard to teach to new users of the email package. (And looking at the source of __unicode__ it's not even obvious that it's correct — it appears that a space must be added before us-ascii-coded chunks. The joining is non-trivial.) The same problem occurs in Python 3.3 with str(msg['subject']). -- components: email messages: 184856 nosy: barry, hniksic, r.david.murray priority: normal severity: normal status: open title: email.header.Header.__unicode__ does not decode header versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue17505> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1469629] __dict__ = self in subclass of dict causes a memory leak?
Hrvoje Nikšić added the comment: Could this patch please be committed to Python? We have just run into this problem in production, where our own variant of AttrDict was shown to be leaking. It is possible to work around the problem by implementing explicit __getattr__ and __setattr__, but that is both slower and trickier to get right. -- nosy: +hniksic ___ Python tracker <http://bugs.python.org/issue1469629> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10049] Add a "no-op" (null) context manager to contextlib
Hrvoje Nikšić added the comment: Is there anything else I need to do to have the patch reviewed and applied? I am in no hurry since we're still using 2.x, I'd just like to know if more needs to be done on my part to move the issue forward. My last Python patch was accepted quite some years ago, so I'm not closely familiar with the current approval process. -- ___ Python tracker <http://bugs.python.org/issue10049> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10049] Add a "no-op" (null) context manager to contextlib
Hrvoje Nikšić added the comment: Here is a more complete patch that includes input from Nick, as well as the patch to test_contextlib.py and the documentation. For now I've retained the function-returning-singleton approach for consistency and future extensibility. -- keywords: +patch Added file: http://bugs.python.org/file19176/nullcontext.patch ___ Python tracker <http://bugs.python.org/issue10049> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10049] Add a "no-op" (null) context manager to contextlib
Hrvoje Nikšić added the comment: I considered using a variable, but I went with the factory function for two reasons: consistency with the rest of contextlib, and equivalence to the contextmanager-based implementation. Another reason is that it leaves the option of adding optional parameters at a later point. I know, optional parameters aren't likely for a "null" constructor, but still... it somehow didn't feel right to relinquish control. -- ___ Python tracker <http://bugs.python.org/issue10049> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10049] Add a "no-op" (null) context manager to contextlib
Hrvoje Nikšić added the comment: That is what we are using now, but I think a contextlib.null() would be useful to others, i.e. that its use is a useful idiom to adopt. Specifically I would like to discourage the "duplicated code" idiom from the report, which I've seen all too often. The "closing" constructor is also trivial to define, but it's there for convenience and to promote the use of with statement over try/finally boilerplate. The same goes here: you don't miss the null context manager when you don't have it; you invent other solutions. But when it's already available, it's an elegant pattern. In my experience, if they have to define it to get it, most people won't bother with the pattern and will retain less elegant solutions. -- ___ Python tracker <http://bugs.python.org/issue10049> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10049] Add a "no-op" (null) context manager to contextlib
Hrvoje Nikšić added the comment: Thank you for your comments. @Michael: I will of course write tests and documentation if there is indication that the feature will be accepted for stdlib. @Antoine: it is true that a null context manager can be easily defined, but it does requires a separate generator definition, often repeated in different modules. This is markedly less elegant than just using contextlib.null() in an expression. I'm not acquainted with the history of identity function requests, but note that the identity function can be defined as an expression, using simply lambda x: x. The equivalent expression that evaluates to a null context manager is markedly more convoluted, as shown in my report. @Éric: The Null/_null/null distinction is an optimization that avoids creating new objects for something that is effectively a singleton. It would be perfectly reasonable to define contextlib.null as Antoine did, but, this being stdlib, I wanted the implementation to be as efficient as (reasonably) possible. -- ___ Python tracker <http://bugs.python.org/issue10049> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10049] Add the null context manager to contextlib
Changes by Hrvoje Nikšić : -- components: +Library (Lib) type: -> feature request ___ Python tracker <http://bugs.python.org/issue10049> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10049] Add the null context manager to contextlib
New submission from Hrvoje Nikšić : I find that I frequently need the "null" (no-op) context manager. For example, in code such as: with transaction or contextlib.null(): ... Since there is no easy expression to create a null context manager, we must resort to workarounds, such as: if transaction: with transaction: ... code ... else: ... duplicated code ... (with the usual options of moving the duplicated code to a function—but still.) Or by creating ad-hoc null context managers with the help of contextlib.contextmanager: if transaction is None: transaction = contextlib.contextmanager(lambda: iter([None])() with transaction: ... Adding a "null" context manager would be both practical and elegant. I have attached a patch for contextlib.py -- files: fl messages: 118181 nosy: hniksic priority: normal severity: normal status: open title: Add the null context manager to contextlib Added file: http://bugs.python.org/file19158/fl ___ Python tracker <http://bugs.python.org/issue10049> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2892] improve cElementTree iterparse error handling
Hrvoje Nikšić added the comment: Here is a small test case that demonstrates the problem, expected behavior and actual behavior: {{{ for ev in xml.etree.cElementTree.iterparse(StringIO('rubbish'), events=('start', 'end')): print ev }}} The above code should first print the two events (start and end), and then raise the exception. In Python 2.7 it runs like this: {{{ >>> for ev in xml.etree.cElementTree.iterparse(StringIO('rubbish'), >>> events=('start', 'end')): ... print ev ... Traceback (most recent call last): File "", line 1, in File "", line 84, in next cElementTree.ParseError: junk after document element: line 1, column 7 }}} Expected behavior, obtained with my patch, is that it runs like this: {{{ >>> for ev in my_iterparse(StringIO('rubbish'), events=('start', 'end')): ... print ev ... ('start', ) ('end', ) Traceback (most recent call last): File "", line 1, in File "", line 26, in __iter__ cElementTree.ParseError: junk after document element: line 1, column 7 }}} The difference is, of course, only visible when printing events. A side-effect-free operation, such as building a list using list(iterparse(...)) would behave exactly the same before and after the change. -- ___ Python tracker <http://bugs.python.org/issue2892> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5141] C API for appending to arrays
Hrvoje Nikšić added the comment: Yes, and I use it in the second example, but the buffer interface doesn't really help with adding new elements into the array. ___ Python tracker <http://bugs.python.org/issue5141> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5141] C API for appending to arrays
New submission from Hrvoje Nikšić : The array.array type is an excellent type for storing a large amount of "native" elements, such as integers, chars, doubles, etc., without involving the heavy machinery of numpy. It's both blazingly fast and reasonably efficient with memory. The one thing missing from the array module is the ability to directly access array values from C. This might seem superfluous, as it's perfectly possible to manipulate array contents from Python/C using PyObject_CallMethod and friends. The problem is that it requires the native values to be marshalled to Python objects, only to be immediately converted back to native values by the array code. This can be a problem when, for example, a numeric array needs to be filled with contents, such as in this hypothetical example: /* error checking and refcounting subtleties omitted for brevity */ PyObject *load_data(Source *src) { PyObject *array_type = get_array_type(); PyObject *array = PyObject_CallFunction(array_type, "c", 'd'); PyObject *append = PyObect_GetAttrString(array, "append"); while (!source_done(src)) { double num = source_next(src); PyObject *f = PyFloat_FromDouble(num); PyObject *ret = PyObject_CallFunctionObjArgs(append, f, NULL); if (!ret) return NULL; Py_DECREF(ret); Py_DECREF(f); } Py_DECREF(array_type); return array; } The inner loop must convert each C double to a Python Float, only for the array to immediately extract the double back from the Float and store it into the underlying array of C doubles. This may seem like a nitpick, but it turns out that more than half of the time of this function is spent creating and deleting those short-lived floating-point objects. Float creation is already well-optimized, so opportunities for speedup lie elsewhere. The array object exposes a writable buffer, which can be used to store values directly. For test purposes I created a faster "append" specialized for doubles, defined like this: int array_append(PyObject *array, PyObject *appendfun, double val) { PyObject *ret; double *buf; Py_ssize_t bufsize; static PyObject *zero; if (!zero) zero = PyFloat_FromDouble(0); // append dummy zero value, created only once ret = PyObject_CallFunctionObjArgs(appendfun, zero, NULL); if (!ret) return -1; Py_DECREF(ret); // append the element directly at the end of the C buffer PyObject_AsWriteBuffer(array, (void **) &buf, &bufsize)); buf[bufsize / sizeof(double) - 1] = val; return 0; } This hack actually speeds up array creation by a significant percentage (30-40% in my case, and that's for code that was producing the values by parsing a large text file). It turns out that an even faster method of creating an array is by using the fromstring() method. fromstring() requires an actual string, not a buffer, so in C++ I created an std::vector with a contiguous array of doubles, passed that array to PyString_FromStringAndSize, and called array.fromstring with the resulting string. Despite all the unnecessary copying, the result was much faster than either of the previous versions. Would it be possible for the array module to define a C interface for the most frequent operations on array objects, such as appending an item, and getting/setting an item? Failing that, could we at least make fromstring() accept an arbitrary read buffer, not just an actual string? -- messages: 81039 nosy: hniksic severity: normal status: open title: C API for appending to arrays type: feature request ___ Python tracker <http://bugs.python.org/issue5141> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4174] Performance optimization for min() and max() over lists
Hrvoje Nikšić <[EMAIL PROTECTED]> added the comment: Note that the item retrieved by PyList_GET_ITEM must be increffed before being passed to the function. Otherwise mutating the list can remove the item from the list and destroy the underlying object, in which case the current maxitem can refer to a freed object. This pitfall is described in http://docs.python.org/extending/extending.html#thin-ice in some detail. -- nosy: +hniksic ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue4174> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3953] __cmp__ still documented in 3.0rc1?
New submission from Hrvoje Nikšić <[EMAIL PROTECTED]>: __cmp__ is apparently still documented at http://docs.python.org/dev/3.0/reference/datamodel.html#object.__cmp__ . If it is going away for 3.0, it should be removed from the documentation as well. -- assignee: georg.brandl components: Documentation messages: 73692 nosy: georg.brandl, hniksic severity: normal status: open title: __cmp__ still documented in 3.0rc1? versions: Python 3.0 ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3953> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2389] Array pickling exposes internal memory representation of elements
Hrvoje Nikšić <[EMAIL PROTECTED]> added the comment: I think preserving integer width is a good idea because it saves us from having to throw overflow errors when unpickling to machines with different width of C types. The cost is that pickling/unpickling the array might change the array's typecode, which can be a problem for C code that processes the array's buffer and expects the C type to remain invariant. Instead of sticking to network byte order, I propose to include byte order information in the pickle (for example as '<' or '>' like struct does), so that pickling/unpickling between the same-endianness architectures doesn't have to convert at all. Floats are always pickled as IEEE754, but the same optimization (not having to convert anything) would apply when unpickling a float array on an IEEE754 architecture. Preserving widths and including endianness information would allow pickling to be as fast as it is now (with the exception of unicode chars and floats on non-IEEE754 platforms). It would also allow unpickling to be as fast between architecture with equal endianness, and correct between others. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2389> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2389] Array pickling exposes internal memory representation of elements
Hrvoje Nikšić <[EMAIL PROTECTED]> added the comment: Unfortunately dumping the internal representation of non-long arrays won't work, for several reasons. First, it breaks when porting pickles between platforms of different endianness such as Intel and SPARC. Then, it ignores the considerable work put into correctly pickling floats, including the support for IEEE 754 special values. Finally, it will break when unpickling Unicode character arrays pickled on different Python versions -- wchar_t is 2 bytes wide on Windows, 4 bytes on Unix. I believe pickling arrays to compact strings is the right approach on the grounds of efficiency and I wouldn't change it. We must only be careful to pickle to a string with a portable representation of values. The straightforward way to do this is to pick a "standard" size for types (much like the struct module does) and endianness and use it in the pickled array. Ints are simple, and the code for handling floats is already there, for example _PyFloat_Pack8 used by cPickle. Pickling arrays as lists is probably a decent workaround for the pending release because it's backward and forward compatible (old pickles will work as well as before and new pickles will be correctly read by old Python versions), but for the next release I would prefer to handle this the right way. If there is agreement on this, I can start work on a patch in the following weeks. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2389> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2389] Array pickling exposes internal memory representation of elements
Hrvoje Nikšić <[EMAIL PROTECTED]> added the comment: I guess it went unnoticed due to prevalence of little-endian 32-bit machines. With 64-bit architectures becoming more and more popular, this might become a bigger issue. Raymond, why do you think fixing this bug would complicate porting to 2.6/3.0? ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2389> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2892] improve cElementTree iterparse error handling
New submission from Hrvoje Nikšić <[EMAIL PROTECTED]>: In some cases it is unfortunate that any error in the XML chunk seen by the buffer prevents the events generated before the error from being delivered. For example, in some cases valid XML is embedded in a larger file or stream, and it is useful to be able to ignore text that follows the root tag, if any. The iterparse API and expat itself make this possible, but it doesn't work because in case of a parsing exception, iterparse doesn't deliver the events generated before the exception. A simple change to iterparse makes this possible, however. I would like to share the change with you for possible inclusion in a future release. Note that this change shouldn't affect the semantics of iterparse: the exception is still delivered to the caller, the only difference is that the events generated by expat before the exception are not forgotten. I am attaching a diff between the current implementation of iterparse, and a modified one that fixes this problem. -- components: Extension Modules files: patch messages: 66935 nosy: hniksic severity: normal status: open title: improve cElementTree iterparse error handling type: behavior versions: Python 2.5 Added file: http://bugs.python.org/file10343/patch __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2892> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2389] Array pickling exposes internal memory representation of elements
Hrvoje Nikšić <[EMAIL PROTECTED]> added the comment: Here is an example that directly demonstrates the bug. Pickling on x86_64: Python 2.5.1 (r251:54863, Mar 21 2008, 13:06:31) [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import array, cPickle as pickle >>> pickle.dumps(array.array('l', [1, 2, 3])) "carray\narray\np1\n(S'l'\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\ntRp2\n." Unpickling on ia32: Python 2.5.1 (r251:54863, Oct 5 2007, 13:36:32) [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import cPickle as pickle >>> pickle.loads("carray\narray\np1\n(S'l'\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\ntRp2\n.") array('l', [1, 0, 2, 0, 3, 0]) __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2389> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2389] Array pickling exposes internal memory representation of elements
New submission from Hrvoje Nikšić <[EMAIL PROTECTED]>: It would seem that pickling arrays directly exposes the underlying machine words, making the pickle non-portable to platforms with different layout of array elements. The guts of array.__reduce__ look like this: if (array->ob_size > 0) { result = Py_BuildValue("O(cs#)O", array->ob_type, array->ob_descr->typecode, array->ob_item, array->ob_size * array->ob_descr->itemsize, dict); } The byte string that is pickled is directly created from the array's contents. Unpickling calls array_new which in turn calls array_fromstring, which ends up memcpying the string data to the new array. As far as I can tell, array pickles created on one platform cannot be unpickled on a platform with different endianness (in case of integer arrays), wchar_t size (in case of unicode arrays) or floating-point representation (rare in practice, but possible). If pickles are supposed to be platform-independent, this should be fixed. Maybe the "typecode" field when used with the constructor could be augmented to include information about the elements, such as endianness and floating-point format. Or we should simply punt and pickle the array as a list of Python objects that comprise it...? -- components: Extension Modules messages: 63915 nosy: hniksic severity: normal status: open title: Array pickling exposes internal memory representation of elements type: behavior versions: Python 2.5, Python 2.6 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2389> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1782] PyModule_AddIntConstant and PyModule_AddStringConstant can leak
Hrvoje Nikšić added the comment: Here is a patch, as per description above. Added file: http://bugs.python.org/file9160/addpatch __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1782> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1782] PyModule_AddIntConstant and PyModule_AddStringConstant can leak
Hrvoje Nikšić added the comment: I agree that a leak would very rarely occur in practice, but since there is a straightforward fix, why not apply it? If nothing else, the code in the core should be an example of writing leak-free Python/C code, and a fix will also prevent others from wasting time on this in the future. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1782> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1782] PyModule_AddIntConstant and PyModule_AddStringConstant can leak
New submission from Hrvoje Nikšić: PyModule_AddObject has somewhat strange reference-counting behavior in that it *conditionally* steals a reference. In case of error it doesn't change the reference to the passed object, but in case of success it steals it. This means that, as written, PyModule_AddIntConstant and PyModuleAddStringConstant can leak created objects if PyModule_AddObject fails. As far as I can tell, the correct way to write those functions would be (using PyModule_AddIntConstant as the example): int PyModule_AddIntConstant(PyObject *m, const char *name, long value) { PyObject *o = PyInt_FromLong(value); if (PyModule_AddObject(m, name, o) == 0) return 0; Py_XDECREF(o); return -1; } PyModule_AddObject was obviously intended to enable writing the "simple" code (it even gracefully handles being passed NULL object to add) like the one in PyModule_AddIntConstant, but I don't see a way to enable such usage and avoid both leaks and an interface change. Changing the reference-counting behavior of PyModule_AddObject would be backward-incompatible, but it might be a good idea to consider it for Python 3. If there is agreement that my analysis and the proposed fixes are correct, I will produce a proper patch. -- components: Interpreter Core messages: 59662 nosy: hniksic severity: normal status: open title: PyModule_AddIntConstant and PyModule_AddStringConstant can leak type: resource usage versions: Python 2.5 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1782> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1638] %zd configure test fails on Linux
Hrvoje Nikšić added the comment: Thanks for the quick review. I considered guarding the include with #ifdef as well, but I concluded it's not necessary for the following reasons: 1. a large number of existing tests already simply include (the makedev test, sizeof(off_t) test, IPv6-related tests, and various socket tests); 2. if sys/types.h doesn't exist, the test will fail, and Python will conclude that %zd is not available. This conclusion is almost certainly correct, as I doubt that a system without sys/types.h has a working %zd format. __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1638> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1638] %zd configure test fails on Linux
New submission from Hrvoje Nikšić: The printf("%zd", ...) configure test fails on Linux, although it supports the %zd format. config.log reveals that the test tests for %zd with Py_ssize_t, which is (within the test) typedeffed to ssize_t. But the appropriate system header is not included by the test, and ssize_t is not defined. This results in Py_ssize_t not being correctly defined, and the test failing. According to http://tinyurl.com/3dhbbm/, ssize_t is defined in . Adding #include manually to the test fixes the test for me. A patch like the one attached should fix the problem. -- files: patch messages: 58675 nosy: hniksic severity: normal status: open title: %zd configure test fails on Linux versions: Python 2.5 Added file: http://bugs.python.org/file8966/patch __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1638> __ patch Description: Binary data ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com