[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar
If you look at pegen, that uses the stdlib tokenizer as input, you will see that the obejct us3d to implement memoization on top of a token stream simply swallow NL ( https://github.com/we-like-parsers/pegen/blob/main/src/pegen/tokenizer.py#L49). This is safe since NL has no syntactic meaning only NEWLINE does. Best Matthieu On Thu, Oct 27, 2022, 01:59 Matthias Görgens wrote: > Hi David, > > Could you share what you have so far, perhaps ok GitHub or so? That way > it's easier to diagnose your problems. I'm reasonably familiar with Rust. > > Perhaps also add a minimal crashing example? > > Cheers, > Matthias. > > On Thu, 27 Oct 2022, 04:52 David J W, wrote: > >> Pablo, >> Nl and Newline are tokens but I am interested in NEWLINE's behavior >> in the Python grammar, note the casing. >> >> For example in simple_stmts @ >> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107 >> >> Is that NEWLINE some sort of built in rule to the grammar? In my >> project I am running into problems where the parser crashes any time there >> is some double like NL & N or Newline & NL but I want to nail down >> NEWLINE's behavior in CPython's PEG grammar. >> >> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado < >> pablog...@gmail.com> wrote: >> >>> Hi, >>> >>> I am not sure I understand exactly what you are asking but NEWLINE is a >>> token, not a parser rule. What decides when NEWLINE is emitted is the lexer >>> that has nothing to do with PEG. Normally PEG parsers also acts as >>> tokenizers but the one in cpython does not. >>> >>> Also notice that CPython’s parser uses a version of the tokeniser >>> written in C that doesn’t share code with the exposed version. You will >>> find that the tokenizer module in the standard library actually behaves >>> differently regarding what tokens are emitted in new lines and indentations. >>> >>> The only way to be sure is check the code unfortunately. >>> >>> Hope this helps. >>> >>> Regards from rainy London, >>> Pablo Galindo Salgado >>> >>> > On 26 Oct 2022, at 19:12, David J W wrote: >>> > >>> > >>> > I am writing a Rust version of Python for fun and I am at the parser >>> stage of development. >>> > >>> > I copied and modified a PEG grammar ruleset from another open source >>> project and I've already noticed some problems (ex Newline vs NL) with how >>> they transcribed things. >>> > >>> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for >>> the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to >>> sanity check if that is right before I figure out how to hack in a NEWLINE >>> rule and update my grammar ruleset. >>> > ___ >>> > Python-Dev mailing list -- python-dev@python.org >>> > To unsubscribe send an email to python-dev-le...@python.org >>> > https://mail.python.org/mailman3/lists/python-dev.python.org/ >>> > Message archived at >>> https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/ >>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> ___ >> Python-Dev mailing list -- python-dev@python.org >> To unsubscribe send an email to python-dev-le...@python.org >> https://mail.python.org/mailman3/lists/python-dev.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-dev@python.org/message/LTDXZ4DS2GLICZRWYZ5PVLPBJHVGQPSS/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > ___ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/ZZDKWS62QG3BTNIT2NYRCLRI4VJ2HBF6/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5SPCIOVE5TSZ2DRJT75NKEWQWAKQHKII/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] CACHE opcode in Python 3.11 bytecode
Hi all, I am in the slow process of adding support for Python 3.11 in the bytecode project (https://github.com/MatthieuDartiailh/bytecode). While attempting to update some tests I stumbled upon the need to include CACHE opcode to get things to work. For example, one can use bytecode to manually assemble the bytecode for the function: def f(): return 24 < 42 Under Python 3.10 it would look like: f.__code__= Bytecode( [ Instr("LOAD_CONST", 24), Instr("LOAD_CONST", 42), Instr("COMPARE_OP", Compare.LT), Instr("RETURN_VALUE"), ] ).to_code() Under Python 3.11 I had to go to: f.__code__= Bytecode( [ Instr("RESUME", 0), Instr("LOAD_CONST", 24), Instr("LOAD_CONST", 42), Instr("COMPARE_OP", Compare.LT), Instr("CACHE", 0), Instr("CACHE", 0), Instr("RETURN_VALUE"), ] ).to_code() Reading the doc for the dis module I understand the need for the RESUME instruction. However the documentation is rather vague in regard of CACHE. In particular when using the first version, the code in the function ends up looking like '\x97\x00d\x00d\x01k\x00\x00\x00\x00\x00' even though bytecode generated '\x97\x00d\x00d\x01k\x00S\x00'. One can "see" that the two caches (\x00\x00\x00\x00) have been added automatically but the return disappeared. Is this a bug in 3.11 and if not where can I find more details regarding where one expect CACHE instructions to be present ? Best Matthieu C. Dartiailh PS: I know the mailing list is going to be retired but I did not yet got everything configured for Discourse.___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AGXZVV5XLBNHM3KRNFNTLQU34OKEH4K4/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: Python 3.11 bytecode and exception table
Hi Irit, hi Patrick, Thanks for your quick answers. First thanks Patrick, it seems I went back to the stable docs at one point without noticing it and hence I missed the new opcodes. Thanks Irit for the clarification regarding the pseudo-instructions use in dis. Regarding the existence of nested try/except I believe a we could have 2 SETUP_* followed by 2 POP_BLOCK so I am not sure what issue you see there. However if we can have exception tables with two rows such as (1, 3, ...) and (2, 4, ...) then yes I will have an issue. I guess I will have to try implementing something and try to roundtrip on as many examples as possible. Would you be interested in being posted about my progress ? Best Matthieu Le 7/5/2022 à 11:01 AM, Irit Katriel a écrit : Hi Matthieu, The dis output for this function in 3.12 is the same as it is in 3.11. The pseudo-instructions are emitted by the compiler's codegen stage, but never make it to compiled bytecode. They are removed or replaced by real opcodes before the code object is created. The recent change to the dis module that you mentioned did not change how the disassembly of bytecode gets displayed. Rather, it added the pseudo-instructions to the opcodes list so that we have access to their mnemonics from python. This is a step towards exposing intermediate compilation steps to python (for unit tests, etc). BTW - part of this will require writing some test utilities for cpython that let us specify and compare opcode sequences, similar to what you have in bytecode. As for deconstructing the exception table and planting the pseudo instructions back into the code - it would be nice if dis could do that, but we may need to settle for an approximation because I'm not sure the exact block structure can be reliably reconstructed from the exception table at the moment. I may be wrong. Having a SETUP_*/POP_BLOCK for each line in the exception table is not going to be correct - there can be nested try-except blocks, for instance, and even without them the compiler can emit the code of an except block in non-contiguous order (in https://github.com/python/cpython/pull/93622 I fixed one of those cases to reduce the size of the exception table, but it wasn't a correctness bug). Irit On Tue, Jul 5, 2022 at 9:27 AM Matthieu Dartiailh wrote: Hi all, I am the current maintainer of bytecode (https://github.com/MatthieuDartiailh/bytecode) which is a library to perform assembly and disassembly of Python bytecode. The library was created by V. Stinner. I started looking in Python 3.11 support in bytecode, I read Objects/exception_handling_notes.txt and I have a couple of questions regarding the exception table: Currently bytecode exposes three level of abstractions: - the concrete level in which one deals with instruction offset for jumps and explicit indexing into the known constants and names - the bytecode level which uses labels for jumps and allow non integer argument to instructions - the cfg level which provides basic blocks delineation over the bytecode level So my first idea was to directly expose the unpacked exception table (start, stop, target, stack_depth, last_i) at the concrete level and use pseudo-instruction and labels at the bytecode level. At this point of my reflections, I saw https://github.com/python/cpython/commit/c57aad777afc6c0b382981ee9e4bc94c03bf5f68 about adding pseudo-instructionto dis output in 3.12 and though it would line up quite nicely. Reading through, I got curious about how SETUP_WITH handled popping one extra item from the stack so I went to look at dis results on a couple of small examples. I tried on 3.10 and 3.11b3 (for some reasons I cannot compile main at a391b74d on windows). I looked at simple things and got a bit surprised: Disassembling: deff(): try: a= 1 except: raise I get on 3.11: 1 0 RESUME 0 2 2 NOP 3 4 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 8 LOAD_CONST 0 (None) 10 RETURN_VALUE >> 12 PUSH_EXC_INFO 4 14 POP_TOP 5 16 RAISE_VARARGS 0 >> 18 COPY 3 20 POP_EXCEPT 22 RERAISE 1 ExceptionTable: 4 to 6 -> 12 [0] 12 to 16 -> 18 [1] lasti On 3.10: 2 0 SETUP_FINALLY 5 (to 12) 3 2 LOAD_CONST 1 (1) 4 STORE_FAST 0 (a) 6 POP_BLOCK 8 LOAD_CONST 0 (None) 10 RETURN_VALUE 4 >> 12 POP_TOP 14 POP_TOP 16 POP_TOP 5
[Python-Dev] Python 3.11 bytecode and exception table
Hi all, I am the current maintainer of bytecode (https://github.com/MatthieuDartiailh/bytecode) which is a library to perform assembly and disassembly of Python bytecode. The library was created by V. Stinner. I started looking in Python 3.11 support in bytecode, I read Objects/exception_handling_notes.txt and I have a couple of questions regarding the exception table: Currently bytecode exposes three level of abstractions: - the concrete level in which one deals with instruction offset for jumps and explicit indexing into the known constants and names - the bytecode level which uses labels for jumps and allow non integer argument to instructions - the cfg level which provides basic blocks delineation over the bytecode level So my first idea was to directly expose the unpacked exception table (start, stop, target, stack_depth, last_i) at the concrete level and use pseudo-instruction and labels at the bytecode level. At this point of my reflections, I saw https://github.com/python/cpython/commit/c57aad777afc6c0b382981ee9e4bc94c03bf5f68 about adding pseudo-instructionto dis output in 3.12 and though it would line up quite nicely. Reading through, I got curious about how SETUP_WITH handled popping one extra item from the stack so I went to look at dis results on a couple of small examples. I tried on 3.10 and 3.11b3 (for some reasons I cannot compile main at a391b74d on windows). I looked at simple things and got a bit surprised: Disassembling: deff(): try: a= 1 except: raise I get on 3.11: 1 0 RESUME 0 2 2 NOP 3 4 LOAD_CONST 1 (1) 6 STORE_FAST 0 (a) 8 LOAD_CONST 0 (None) 10 RETURN_VALUE >> 12 PUSH_EXC_INFO 4 14 POP_TOP 5 16 RAISE_VARARGS 0 >> 18 COPY 3 20 POP_EXCEPT 22 RERAISE 1 ExceptionTable: 4 to 6 -> 12 [0] 12 to 16 -> 18 [1] lasti On 3.10: 2 0 SETUP_FINALLY 5 (to 12) 3 2 LOAD_CONST 1 (1) 4 STORE_FAST 0 (a) 6 POP_BLOCK 8 LOAD_CONST 0 (None) 10 RETURN_VALUE 4 >> 12 POP_TOP 14 POP_TOP 16 POP_TOP 5 18 RAISE_VARARGS 0 This surprised me on two levels: - first I have never seen the RESUME opcode and it is currently not documented - my second surprise comes from the second entry in the exception table. At first I failed to see why it was needed but writing this I realize it corresponds to the explicit handling of exception propagation to the caller. Since I cannot compile 3.12 ATM I am wondering how this plays with pseudo-instruction: in particular are pseudo-instructions generated for all entries in the exception table ? My initial idea was to have a SETUP_FINALLY/SETUP_CLEANUP - POP_BLOCK pair for each line in the exception table and label for the jump target. But I realize it means we will have many such pairs than in 3.10. It is fine by me but I wondered what choice was made in 3.12 dis and if this approach made sense. Best regards Matthieu___ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XZ7KDCI3TXEUERU3YIFKC543GAGIYG6Q/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-Dev] Re: code.replace() and Python 3.11 exception table
As the maintainer of bytecode (thanks to Victor), I expect that adding support for 3.11 will be challenging at least. However I hoped that by waiting for the first beta most changes would be at least documented. What would be the best channel to reach people that may clarify how things work starting with 3.11 ? Best Matthieu Dartiailh On Fri, Apr 1, 2022, 18:34 Mark Shannon wrote: > Hi Gabriele, > > On 01/04/2022 4:50 pm, Gabriele wrote: > > Does this mean that this line in the bytecode library is likely to fail > with 3.11, with no way to fix it? > > > > You can pass the exception table the same way you pass all the other > arguments. > The exception table depends on the code, but that is nothing new. The > bytecode library already recomputes the consts, names, etc. > > TBH, calling `types.CodeType` didn't work for earlier versions either. > It just sort of worked, some of the time. > > Cheers, > Mark. > > > > > https://github.com/MatthieuDartiailh/bytecode/blob/7b0423234b0e999b45a4eb0c58115b284314f46b/bytecode/concrete.py#L398 > < > https://github.com/MatthieuDartiailh/bytecode/blob/7b0423234b0e999b45a4eb0c58115b284314f46b/bytecode/concrete.py#L398 > > > > > > On Fri, 1 Apr 2022, 10:40 Victor Stinner, vstin...@python.org>> wrote: > > > > I created https://bugs.python.org/issue47185 < > https://bugs.python.org/issue47185> to discuss this issue: > > either recompute automatically co_exceptiontable, or at least > document > > the change. > > > > Victor > > > > On Fri, Apr 1, 2022 at 11:21 AM Victor Stinner <mailto:vstin...@python.org>> wrote: > > > > > > ("Re: C API: Move PEP 523 "Adding a frame evaluation API to > CPython" > > > private C API to the internal C API") > > > > > > On Fri, Apr 1, 2022 at 11:01 AM Chris Angelico <mailto:ros...@gmail.com>> wrote: > > > > > > > > On Fri, 1 Apr 2022 at 19:51, Victor Stinner < > vstin...@python.org <mailto:vstin...@python.org>> wrote: > > > > > In Python, sadly the types.CodeType type also has a public > constructor > > > > > and many projects break at each Python release because the API > > > > > changes. Hopefully, it seems like the new CodeType.replace() > method > > > > > added to Python 3.8 mitigated the issue. IMO > CodeType.replace() is a > > > > > better abstraction and closer to what developers need in > practice. > > > > > > > > It certainly has been for me. When I want to do bytecode > hackery, I > > > > usually start by creating a function with def/lambda, then > construct a > > > > modified function using f.__code__.replace(). It's the easiest > way to > > > > ensure that all the little details are correct. > > > > > > Python 3.11 added the concept of "exception table" > > > (code.co_exceptiontable). You have to build this table, otherwise > > > Python can no longer catch exceptions :-) > > > > > > I don't know how to build this exception table. It seems like > > > currently there is no Python function in the stdlib to build this > > > table. > > > > > > Example: > > > --- > > > def f(): > > > try: > > > print("raise") > > > raise ValueError > > > except ValueError: > > > print("except") > > > else: > > > print("else") > > > print("exit func") > > > > > > def g(): pass > > > > > > if 1: > > > code = f.__code__ > > > g.__code__ = g.__code__.replace( > > > co_code=code.co_code, > > > co_consts=code.co_consts, > > > co_names=code.co_names, > > > co_flags=code.co_flags, > > > co_stacksize=code.co_stacksize) > > > else: > > > g.__code__ = f.__code__ # this code path works on Python 3.11 > > > > > > g() > > > --- > > > > > > Output with Python 3.10 (ok): > > > --- > > > raise > > > except > > > exit func > > > --- > > > > > > Output with
[Python-Dev] CALL_FUNCTION_EX arg and stack_effect
Hi, I have a question about the use of CALL_FUNCTION_EX in https://github.com/python/cpython/blob/master/Python/compile.c#L3624. Looking at the code it appears that the argument will be either 1 or 0 depending on whether or not the function is taking keywords arguments (which means that CALL_FUNCTION_EX cannot be used on function taking no argument). Executing that opcode will remove from the stack the function code, the positional arguments (packed in a tuple) and the potential keyword arguments packed in a dict and push the return value. So the stack effect will be either -1 or -2 (could be 0 if the possibility to pass 0 arguments existed). Looking at the stack effect computation (https://github.com/python/cpython/blob/master/Python/compile.c#L1047), it appears that the stack effect will be 0 if the argument is 0, -1 for either 1 or 2, and -2 for 3. Which means that the code generated at https://github.com/python/cpython/blob/master/Python/compile.c#L3624 can never allow to compute the right stack effect using the stack_effect function (as it will return either 0 or -1 instead of -1 and -2) I would say that this is a bug and that the oparg should be 1 + 2 if keywords arguments are present at line 3624. I am not sure what consequence this can have on CPython but it means the bytecode becomes weird as a the stack looks like it can grow during a list comprehension (calling a function f using * syntax) : BUILD_LIST 0 1 LOAD_FAST .0 1 FOR_ITER 22 1 ---> after each jump it looks like the stack is higher by one STORE_FAST i -1 LOAD_GLOBAL f 1 LOAD_DEREF a 1 LOAD_FAST i 1 BINARY_SUBSCR None -1 CALL_FUNCTION_EX 0 0 LIST_APPEND 2 -1 JUMP_ABSOLUTE 4 0 RETURN_VALUE None -1 What do you think ? Should I open an issue on https://bugs.python.org/ ? Best regards Matthieu ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com