Re: [Python-Dev] Inconsistency between PEP492, documentation and behaviour of "async with"
> > Can someone please clarify the exact behaviour of "async with"? > > "async with" is expected to behave essentially the same way that > normal "with" does as far as return, break, and continue are concerned > (i.e. calling __aexit__ without an exception set, so it's more like > try/finally than it is try/else). > > Would you mind filing a documentation bug for that? We clearly missed > that the semantics described in the new documentation didn't actually > match the original with statement semantics (even though matching > those semantics is the intended behaviour). Ok, bug filed at: http://bugs.python.org/issue30707 ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Inconsistency between PEP492, documentation and behaviour of "async with"
Hi all, Regarding the behaviour of the "async with" statement: it seems that the description of it in PEP492, and the language documentation, do not match the behaviour of CPython (v3.6.1). The PEP and the docs here: https://www.python.org/dev/peps/pep-0492/#asynchronous-context-managers-and-async-with https://docs.python.org/3/reference/compound_stmts.html#async-with say that "async with" is equivalent to a particular use of try/except/else. But the implementation seems more like a try/except/finally, because the __aexit__ is always executed, even if a return statement is in the try block ("else" won't be executed if there's a "return" in the "try"). Also, as per normal "with", the implementation is a bit more complex than try/except/finally because you don't want to execute the __aexit__ method twice if there is an exception in the try. Can someone please clarify the exact behaviour of "async with"? Background: in implementing "async with" in MicroPython, we went by the PEP/docs, and now our behaviour doesn't match that of CPython. Cheers, Damien. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Opcode cache in ceval loop
Hi Yury, That's great news about the speed improvements with the dict offset cache! > The cache struct is defined in code.h [2], and is 32 bytes long. When a > code object becomes hot, it gets an cache offset table allocated for it > (+1 byte for each opcode) + an array of cache structs. Ok, so each opcode has a 1-byte cache that sits separately to the actual bytecode. But a lot of opcodes don't use it so that leads to some wasted memory, correct? But then how do you index the cache, do you keep a count of the current opcode number? If I remember correctly, CPython has some opcodes taking 1 byte, and some taking 3 bytes, so the offset into the bytecode cannot be easily mapped to a bytecode number. Cheers, Damien. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Speeding up CPython 5-10%
Hi Yury, > An off-topic: have you ever tried hg.python.org/benchmarks > or compare MicroPython vs CPython? I'm curious if MicroPython > is faster -- in that case we'll try to copy some optimization > ideas. I've tried a small number of those benchmarks, but not in any rigorous way, and not enough to compare properly with CPython. Maybe one day I (or someone) will get to it and report results :) One thing that makes MP fast is the use of pointer tagging and stuffing of small integers within object pointers. Thus integer arithmetic below 2**30 (on 32-bit arch) requires no heap. > Do you use opcode dictionary caching only for LOAD_GLOBAL-like > opcodes? Do you have an equivalent of LOAD_FAST, or you use > dicts to store local variables? The opcodes that have dict caching are: LOAD_NAME LOAD_GLOBAL LOAD_ATTR STORE_ATTR LOAD_METHOD (not implemented yet in mainline repo) For local variables we use LOAD_FAST and STORE_FAST (and DELETE_FAST). Actually, there are 16 dedicated opcodes for loading from positions 0-15, and 16 for storing to these positions. Eg: LOAD_FAST_0 LOAD_FAST_1 ... Mostly this is done to save RAM, since LOAD_FAST_0 is 1 byte. > If we change the opcode size, it will probably affect libraries > that compose or modify code objects. Modules like "dis" will > also need to be updated. And that's probably just a tip of the > iceberg. > > We can still implement your approach if we add a separate > private 'unsigned char' array to each code object, so that > LOAD_GLOBAL can store the key offsets. It should be a bit > faster than my current patch, since it has one less level > of indirection. But this way we loose the ability to > optimize LOAD_METHOD, simply because it requires more memory > for its cache. In any case, I'll experiment! Problem with that approach (having a separate array for offset_guess) is that how do you know where to look into that array for a given LOAD_GLOBAL opcode? The second LOAD_GLOBAL in your bytecode should look into the second entry in the array, but how does it know? I'd love to experiment implementing my original caching idea with CPython, but no time! Cheers, Damien. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Speeding up CPython 5-10%
Hi Yuri, I think these are great ideas to speed up CPython. They are probably the simplest yet most effective ways to get performance improvements in the VM. MicroPython has had LOAD_METHOD/CALL_METHOD from the start (inspired by PyPy, and the main reason to have it is because you don't need to allocate on the heap when doing a simple method call). The specific opcodes are: LOAD_METHOD # same behaviour as you propose CALL_METHOD # for calls with positional and/or keyword args CALL_METHOD_VAR_KW # for calls with one or both of */** We also have LOAD_ATTR, CALL_FUNCTION and CALL_FUNCTION_VAR_KW for non-method calls. MicroPython also has dictionary lookup caching, but it's a bit different to your proposal. We do something much simpler: each opcode that has a cache ability (eg LOAD_GLOBAL, STORE_GLOBAL, LOAD_ATTR, etc) includes a single byte in the opcode which is an offset-guess into the dictionary to find the desired element. Eg for LOAD_GLOBAL we have (pseudo code): CASE(LOAD_GLOBAL): key = DECODE_KEY; offset_guess = DECODE_BYTE; if (global_dict[offset_guess].key == key) { // found the element straight away } else { // not found, do a full lookup and save the offset offset_guess = dict_lookup(global_dict, key); UPDATE_BYTECODE(offset_guess); } PUSH(global_dict[offset_guess].elem); We have found that such caching gives a massive performance increase, on the order of 20%. The issue (for us) is that it increases bytecode size by a considerable amount, requires writeable bytecode, and can be non-deterministic in terms of lookup time. Those things are important in the embedded world, but not so much on the desktop. Good luck with it! Regards, Damien. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Speeding up CPython 5-10%
Hi Yury, (Sorry for misspelling your name previously!) > Yes, we'll need to add CALL_METHOD{_VAR|_KW|etc} opcodes to optimize all > kind of method calls. However, I'm not sure how big the impact will be, > need to do more benchmarking. I never did such fine grained analysis with MicroPython. I don't think there are many uses of * and ** that it'd be worth it, but definitely there are lots of uses of plain keywords. Also, you'd want to consider how simple/complex it is to treat all these different opcodes in the compiler. For us, it's simpler to treat everything the same. Otherwise your LOAD_METHOD part of the compiler will need to peek deep into the AST to see what kind of call it is. > BTW, how do you benchmark MicroPython? Haha, good question! Well, we use Pystone 1.2 (unmodified) to do basic benchmarking, and find it to be quite good. We track our code live at: http://micropython.org/resources/code-dashboard/ You can see there the red line, which is the Pystone result. There was a big jump around Jan 2015 which is when we introduced opcode dictionary caching. And since then it's been very gradually increasing due to small optimisations here and there. Pystone is actually a great benchmark for embedded systems because it gives very reliable results there (almost zero variation across runs) and if we can squeeze 5 more Pystones out with some change then we know that it's a good optimisation (for efficiency at least). For us, low RAM usage and small code size are the most important factors, and we track those meticulously. But in fact, smaller code size quite often correlates with more efficient code because there's less to execute and it fits in the CPU cache (at least on the desktop). We do have some other benchmarks, but they are highly specialised for us. For example, how fast can you bit bang a GPIO pin using pure Python code. Currently we get around 200kHz on a 168MHz MCU, which shows that pure (Micro)Python code is about 100 times slower than C. > That's a neat idea! You're right, it does require bytecode to become > writeable. I considered implementing a similar strategy, but this would > be a big change for CPython. So I decided to minimize the impact of the > patch and leave the opcodes untouched. I think you need to consider "big" changes, especially ones like this that can have a great (and good) impact. But really, this is a behind-the-scenes change that *should not* affect end users, and so you should not have any second thoughts about doing it. One problem I see with CPython is that it exposes way too much to the user (both Python programmer and C extension writer) and this hurts both language evolution (you constantly need to provide backwards compatibility) and ability to optimise. Cheers, Damien. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Clarification of PEP 394 for scripts that run under Python 2 and 3
Hi python-dev, We have a Python script that runs correctly under Python 2.6, 2.7 and 3.3+. It is executed on a *nix system using the "python" executable (ie not python2 or python3 specifically). This works just fine for systems that have Python 2 installed, or 2 and 3, or just 3 and symlink "python" to "python3" (eg Arch Linux). But it fails for systems that have only Python 3 and do not create a "python" symlink, ie only "python3" exists as an executable. We thought that PEP 394 would come to the rescue here but it seems to be unclear on this point. In particular it says: - 4th point of the abstract: "so python should be used in the shebang line only for scripts that are source compatible with both Python 2 and 3" - 6th point of the recommendation section: "One exception to this is scripts that are deliberately written to be source compatible with both Python 2.x and 3.x. Such scripts may continue to use python on their shebang line without affecting their portability" - 8th point in the migration notes: "If these conventions are adhered to, it will become the case that the python command is only executed in an interactive manner as a user convenience, or to run scripts that are source compatible with both Python 2 and Python 3." Well, that's pretty clear to me: one can expect the "python" executable to be available to run scripts that are compatible with versions 2.x and 3.x. The confusion comes because there are systems that install Python 3 without creating a "python" symlink (hence breaking the above). And Guido said that "'python' should always be the same as 'python2'" (see https://mail.python.org/pipermail/python-dev/2014-September/136389.html). Further, Nick Coghlan seemed to agree that "when there's only python3 installed, there should be no /usr/bin/python" (see https://mail.python.org/pipermail/python-dev/2014-September/136527.html). My questions are: 1. What is the true intent of PEP 394 when only Python 3 is installed? Is "python" available or not to run scripts compatible with 2.x and 3.x? 2. Is it possible to write a shebang line that supports all variations of Python installations on *nix machines? 3. If the answer to 2 is no, then what is the recommended way to support all Python installations with one standalone script? Thanks! Regards, Damien. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] async/await in Python; v2
Hi Yury, In your PEP 492 draft, in the Grammar section, I think you're missing the modifications to the flow_stmt line. Cheers, Damien. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com