[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-27 Thread Matthieu Dartiailh
If you look at pegen, that uses the stdlib tokenizer as input, you will see
that the obejct us3d to implement memoization on top of a token stream
simply swallow NL (
https://github.com/we-like-parsers/pegen/blob/main/src/pegen/tokenizer.py#L49).
This is safe since NL has no syntactic meaning only NEWLINE does.

Best

Matthieu

On Thu, Oct 27, 2022, 01:59 Matthias Görgens 
wrote:

> Hi David,
>
> Could you share what you have so far, perhaps ok GitHub or so? That way
> it's easier to diagnose your problems. I'm reasonably familiar with Rust.
>
> Perhaps also add a minimal crashing example?
>
> Cheers,
> Matthias.
>
> On Thu, 27 Oct 2022, 04:52 David J W,  wrote:
>
>> Pablo,
>> Nl and Newline are tokens but I am interested in NEWLINE's behavior
>> in the Python grammar, note the casing.
>>
>> For example in simple_stmts @
>> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107
>>
>> Is that NEWLINE some sort of built in rule to the grammar?   In my
>> project I am running into problems where the parser crashes any time there
>> is some double like NL & N or Newline & NL but I want to nail down
>> NEWLINE's behavior in CPython's PEG grammar.
>>
>> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <
>> pablog...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am not sure I understand exactly what you are asking but NEWLINE is a
>>> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
>>> that has nothing to do with PEG. Normally PEG parsers also acts as
>>> tokenizers but the one in cpython does not.
>>>
>>> Also notice that CPython’s parser uses a version of the tokeniser
>>> written in C that doesn’t share code with the exposed version. You will
>>> find that the tokenizer module in the standard library actually behaves
>>> differently regarding what tokens are emitted in new lines and indentations.
>>>
>>> The only way to be sure is check the code unfortunately.
>>>
>>> Hope this helps.
>>>
>>> Regards from rainy London,
>>> Pablo Galindo Salgado
>>>
>>> > On 26 Oct 2022, at 19:12, David J W  wrote:
>>> >
>>> > 
>>> > I am writing a Rust version of Python for fun and I am at the parser
>>> stage of development.
>>> >
>>> > I copied and modified a PEG grammar ruleset from another open source
>>> project and I've already noticed some problems (ex Newline vs NL) with how
>>> they transcribed things.
>>> >
>>> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for
>>> the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
>>> sanity check if that is right before I figure out how to hack in a NEWLINE
>>> rule and update my grammar ruleset.
>>> > ___
>>> > Python-Dev mailing list -- python-dev@python.org
>>> > To unsubscribe send an email to python-dev-le...@python.org
>>> > https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> > Message archived at
>>> https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
>>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/LTDXZ4DS2GLICZRWYZ5PVLPBJHVGQPSS/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/ZZDKWS62QG3BTNIT2NYRCLRI4VJ2HBF6/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5SPCIOVE5TSZ2DRJT75NKEWQWAKQHKII/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] CACHE opcode in Python 3.11 bytecode

2022-07-25 Thread Matthieu Dartiailh

Hi all,

I am in the slow process of adding support for Python 3.11 in the 
bytecode project (https://github.com/MatthieuDartiailh/bytecode).


While attempting to update some tests I stumbled upon the need to 
include CACHE opcode to get things to work. For example, one can use 
bytecode to manually assemble the bytecode for the function:


def f():
    return 24 < 42

Under Python 3.10 it would look like:

f.__code__= Bytecode(
[
Instr("LOAD_CONST", 24),
Instr("LOAD_CONST", 42),
        Instr("COMPARE_OP", Compare.LT),
Instr("RETURN_VALUE"), ]
).to_code()


Under Python 3.11 I had to go to:

f.__code__= Bytecode(
[
Instr("RESUME", 0), Instr("LOAD_CONST", 24),
Instr("LOAD_CONST", 42),
        Instr("COMPARE_OP", Compare.LT), Instr("CACHE", 0), 
Instr("CACHE", 0),

Instr("RETURN_VALUE"), ]
).to_code()


Reading the doc for the dis module I understand the need for the RESUME 
instruction. However the documentation is rather vague in regard of CACHE.


In particular when using the first version, the code in the function 
ends up looking like '\x97\x00d\x00d\x01k\x00\x00\x00\x00\x00' even 
though bytecode generated '\x97\x00d\x00d\x01k\x00S\x00'. One can "see" 
that the two caches (\x00\x00\x00\x00) have been added automatically but 
the return disappeared. Is this a bug in 3.11 and if not where can I 
find more details regarding where one expect CACHE instructions to be 
present ?


Best

Matthieu C. Dartiailh

PS: I know the mailing list is going to be retired but I did not yet got 
everything configured for Discourse.___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AGXZVV5XLBNHM3KRNFNTLQU34OKEH4K4/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Python 3.11 bytecode and exception table

2022-07-05 Thread Matthieu Dartiailh

Hi Irit, hi Patrick,

Thanks for your quick answers.

First thanks Patrick, it seems I went back to the stable docs at one 
point without noticing it and hence I missed the new opcodes.


Thanks Irit for the clarification regarding the pseudo-instructions use 
in dis.


Regarding the existence of nested try/except I believe a we could have 2 
SETUP_* followed by 2 POP_BLOCK so I am not sure what issue you see 
there. However if we can have exception tables with two rows such as (1, 
3, ...) and (2, 4, ...) then yes I will have an issue. I guess I will 
have to try implementing something and try to roundtrip on as many 
examples as possible. Would you be interested in being posted about my 
progress ?


Best

Matthieu

Le 7/5/2022 à 11:01 AM, Irit Katriel a écrit :

Hi Matthieu,

The dis output for this function in 3.12 is the same as it is in 3.11.

The pseudo-instructions are emitted by the compiler's codegen stage, 
but never make it to compiled bytecode. They are removed or replaced 
by real opcodes before the code object is created.


The recent change to the dis module that you mentioned did not change 
how the disassembly of bytecode gets displayed. Rather, it added the 
pseudo-instructions to the opcodes list so that we have access to 
their mnemonics from python. This is a step towards exposing 
intermediate compilation steps to python (for unit tests, etc).  BTW - 
part of this will require writing some test utilities for cpython that 
let us specify and compare opcode sequences, similar to what you have 
in bytecode.


As for deconstructing the exception table and planting the pseudo 
instructions back into the code - it would be nice if dis could do 
that, but we may need to settle for an approximation because I'm not 
sure the exact block structure can be reliably reconstructed from the 
exception table at the moment. I may be wrong.


Having a SETUP_*/POP_BLOCK for each line in the exception table is not 
going to be correct - there can be nested try-except blocks, for 
instance, and even without them the compiler can emit the code of an 
except block in non-contiguous order (in 
https://github.com/python/cpython/pull/93622 I fixed one of those 
cases to reduce the size of the exception table, but it wasn't a 
correctness bug).


Irit

On Tue, Jul 5, 2022 at 9:27 AM Matthieu Dartiailh 
 wrote:


Hi all,

I am the current maintainer of bytecode
(https://github.com/MatthieuDartiailh/bytecode) which is a library
to perform assembly and disassembly of Python bytecode. The
library was created by V. Stinner.

I started looking in Python 3.11 support in bytecode, I read
Objects/exception_handling_notes.txt and I have a couple of
questions regarding the exception table:

Currently bytecode exposes three level of abstractions:
  - the concrete level in which one deals with instruction offset
for jumps and explicit indexing into the known constants and names
  - the bytecode level which uses labels for jumps and allow non
integer argument to instructions
  - the cfg level which provides basic blocks delineation over the
bytecode level

So my first idea was to directly expose the unpacked exception
table (start, stop, target, stack_depth, last_i) at the concrete
level and use pseudo-instruction and labels at the bytecode level.
At this point of my reflections, I saw

https://github.com/python/cpython/commit/c57aad777afc6c0b382981ee9e4bc94c03bf5f68
about adding pseudo-instructionto dis output in 3.12 and though it
would line up quite nicely. Reading through, I got curious about
how SETUP_WITH handled popping one extra item from the stack so I
went to look at dis results on a couple of small examples. I tried
on 3.10 and 3.11b3 (for some reasons I cannot compile main at
a391b74d on windows).

I looked at simple things and got a bit surprised:

Disassembling:
deff():
try:
a= 1
except:
raise

I get on 3.11:
 1   0 RESUME   0

  2   2 NOP

  3   4 LOAD_CONST   1 (1)
  6 STORE_FAST   0 (a)
  8 LOAD_CONST   0 (None)
 10 RETURN_VALUE
    >>   12 PUSH_EXC_INFO

  4  14 POP_TOP

  5  16 RAISE_VARARGS    0
    >>   18 COPY 3
 20 POP_EXCEPT
 22 RERAISE  1
ExceptionTable:
  4 to 6 -> 12 [0]
  12 to 16 -> 18 [1] lasti

On 3.10:
  2   0 SETUP_FINALLY    5 (to 12)

  3   2 LOAD_CONST   1 (1)
  4 STORE_FAST   0 (a)
  6 POP_BLOCK
  8 LOAD_CONST   0 (None)
 10 RETURN_VALUE

  4 >>   12 POP_TOP
 14 POP_TOP
 16 POP_TOP

  5  

[Python-Dev] Python 3.11 bytecode and exception table

2022-07-05 Thread Matthieu Dartiailh

Hi all,

I am the current maintainer of bytecode 
(https://github.com/MatthieuDartiailh/bytecode) which is a library to 
perform assembly and disassembly of Python bytecode. The library was 
created by V. Stinner.


I started looking in Python 3.11 support in bytecode, I read 
Objects/exception_handling_notes.txt and I have a couple of questions 
regarding the exception table:


Currently bytecode exposes three level of abstractions:
  - the concrete level in which one deals with instruction offset for 
jumps and explicit indexing into the known constants and names
  - the bytecode level which uses labels for jumps and allow non 
integer argument to instructions
  - the cfg level which provides basic blocks delineation over the 
bytecode level


So my first idea was to directly expose the unpacked exception table 
(start, stop, target, stack_depth, last_i) at the concrete level and use 
pseudo-instruction and labels at the bytecode level. At this point of my 
reflections, I saw 
https://github.com/python/cpython/commit/c57aad777afc6c0b382981ee9e4bc94c03bf5f68 
about adding pseudo-instructionto dis output in 3.12 and though it would 
line up quite nicely. Reading through, I got curious about how 
SETUP_WITH handled popping one extra item from the stack so I went to 
look at dis results on a couple of small examples. I tried on 3.10 and 
3.11b3 (for some reasons I cannot compile main at a391b74d on windows).


I looked at simple things and got a bit surprised:

Disassembling:
deff():
try:
a= 1
except:
raise

I get on 3.11:
 1   0 RESUME   0

  2   2 NOP

  3   4 LOAD_CONST   1 (1)
  6 STORE_FAST   0 (a)
  8 LOAD_CONST   0 (None)
 10 RETURN_VALUE
    >>   12 PUSH_EXC_INFO

  4  14 POP_TOP

  5  16 RAISE_VARARGS    0
    >>   18 COPY 3
 20 POP_EXCEPT
 22 RERAISE  1
ExceptionTable:
  4 to 6 -> 12 [0]
  12 to 16 -> 18 [1] lasti

On 3.10:
  2   0 SETUP_FINALLY    5 (to 12)

  3   2 LOAD_CONST   1 (1)
  4 STORE_FAST   0 (a)
  6 POP_BLOCK
  8 LOAD_CONST   0 (None)
 10 RETURN_VALUE

  4 >>   12 POP_TOP
 14 POP_TOP
 16 POP_TOP

  5  18 RAISE_VARARGS    0

This surprised me on two levels:
- first I have never seen the RESUME opcode and it is currently not 
documented
- my second surprise comes from the second entry in the exception table. 
At first I failed to see why it was needed but writing this I realize it 
corresponds to the explicit handling of exception propagation to the 
caller. Since I cannot compile 3.12 ATM I am wondering how this plays 
with pseudo-instruction: in particular are pseudo-instructions generated 
for all entries in the exception table ?


My initial idea was to have a SETUP_FINALLY/SETUP_CLEANUP - POP_BLOCK 
pair for each line in the exception table and label for the jump target. 
But I realize it means we will have many such pairs than in 3.10. It is 
fine by me but I wondered what choice was made in 3.12 dis and if this 
approach made sense.


Best regards

Matthieu___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XZ7KDCI3TXEUERU3YIFKC543GAGIYG6Q/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: code.replace() and Python 3.11 exception table

2022-04-01 Thread Matthieu Dartiailh
As the maintainer of bytecode (thanks to Victor), I expect that adding
support for 3.11 will be challenging at least. However I hoped that by
waiting for the first beta most changes would be at least documented. What
would be the best channel to reach people that may clarify how things work
starting with 3.11 ?

Best

Matthieu Dartiailh

On Fri, Apr 1, 2022, 18:34 Mark Shannon  wrote:

> Hi Gabriele,
>
> On 01/04/2022 4:50 pm, Gabriele wrote:
> > Does this mean that this line in the bytecode library is likely to fail
> with 3.11, with no way to fix it?
> >
>
> You can pass the exception table the same way you pass all the other
> arguments.
> The exception table depends on the code, but that is nothing new. The
> bytecode library already recomputes the consts, names, etc.
>
> TBH, calling `types.CodeType` didn't work for earlier versions either.
> It just sort of worked, some of the time.
>
> Cheers,
> Mark.
>
>
> >
> https://github.com/MatthieuDartiailh/bytecode/blob/7b0423234b0e999b45a4eb0c58115b284314f46b/bytecode/concrete.py#L398
> <
> https://github.com/MatthieuDartiailh/bytecode/blob/7b0423234b0e999b45a4eb0c58115b284314f46b/bytecode/concrete.py#L398
> >
> >
> > On Fri, 1 Apr 2022, 10:40 Victor Stinner,  vstin...@python.org>> wrote:
> >
> > I created https://bugs.python.org/issue47185 <
> https://bugs.python.org/issue47185> to discuss this issue:
> > either recompute automatically co_exceptiontable, or at least
> document
> > the change.
> >
> > Victor
> >
> > On Fri, Apr 1, 2022 at 11:21 AM Victor Stinner  <mailto:vstin...@python.org>> wrote:
> >  >
> >  > ("Re: C API: Move PEP 523 "Adding a frame evaluation API to
> CPython"
> >  > private C API to the internal C API")
> >  >
> >  > On Fri, Apr 1, 2022 at 11:01 AM Chris Angelico  <mailto:ros...@gmail.com>> wrote:
> >  > >
> >  > > On Fri, 1 Apr 2022 at 19:51, Victor Stinner <
> vstin...@python.org <mailto:vstin...@python.org>> wrote:
> >  > > > In Python, sadly the types.CodeType type also has a public
> constructor
> >  > > > and many projects break at each Python release because the API
> >  > > > changes. Hopefully, it seems like the new CodeType.replace()
> method
> >  > > > added to Python 3.8 mitigated the issue. IMO
> CodeType.replace() is a
> >  > > > better abstraction and closer to what developers need in
> practice.
> >  > >
> >  > > It certainly has been for me. When I want to do bytecode
> hackery, I
> >  > > usually start by creating a function with def/lambda, then
> construct a
> >  > > modified function using f.__code__.replace(). It's the easiest
> way to
> >  > > ensure that all the little details are correct.
> >  >
> >  > Python 3.11 added the concept of "exception table"
> >  > (code.co_exceptiontable). You have to build this table, otherwise
> >  > Python can no longer catch exceptions :-)
> >  >
> >  > I don't know how to build this exception table. It seems like
> >  > currently there is no Python function in the stdlib to build this
> >  > table.
> >  >
> >  > Example:
> >  > ---
> >  > def f():
> >  > try:
> >  > print("raise")
> >  > raise ValueError
> >  > except ValueError:
> >  > print("except")
> >  > else:
> >  > print("else")
> >  > print("exit func")
> >  >
> >  > def g(): pass
> >  >
> >  > if 1:
> >  > code = f.__code__
> >  > g.__code__ = g.__code__.replace(
> >  > co_code=code.co_code,
> >  > co_consts=code.co_consts,
> >  > co_names=code.co_names,
> >  > co_flags=code.co_flags,
> >  > co_stacksize=code.co_stacksize)
> >  > else:
> >  > g.__code__ = f.__code__  # this code path works on Python 3.11
> >  >
> >  > g()
> >  > ---
> >  >
> >  > Output with Python 3.10 (ok):
> >  > ---
> >  > raise
> >  > except
> >  > exit func
> >  > ---
> >  >
> >  > Output with 

[Python-Dev] CALL_FUNCTION_EX arg and stack_effect

2017-02-20 Thread Matthieu Dartiailh

Hi,

I have a question about the use of CALL_FUNCTION_EX in 
https://github.com/python/cpython/blob/master/Python/compile.c#L3624. 
Looking at the code it appears that the argument will be either 1 or 0 
depending on whether or not the function is taking keywords arguments 
(which means that CALL_FUNCTION_EX cannot be used on function taking no 
argument).
Executing that opcode will remove from the stack the function code, the 
positional arguments (packed in a tuple) and the potential keyword 
arguments packed in a dict and push the return value. So the stack 
effect will be either -1 or -2 (could be 0 if the possibility to pass 0 
arguments existed).
Looking at the stack effect computation 
(https://github.com/python/cpython/blob/master/Python/compile.c#L1047), 
it appears that the stack effect will be 0 if the argument is 0, -1 for 
either 1 or 2, and -2 for 3. Which means that the code generated at 
https://github.com/python/cpython/blob/master/Python/compile.c#L3624 can 
never allow to compute the right stack effect using the stack_effect 
function (as it will return either 0 or -1 instead of -1 and -2)


I would say that this is a bug and that the oparg should be 1 + 2 if 
keywords arguments are present at line 3624.


I am not sure what consequence this can have on CPython but it means the 
bytecode becomes weird as a the stack looks like it can grow during a 
list comprehension (calling a function f using * syntax) :

 BUILD_LIST 0 1
 LOAD_FAST .0 1
 FOR_ITER 22 1 ---> after each jump it looks like the stack is 
higher by one

 STORE_FAST i -1
 LOAD_GLOBAL f 1
 LOAD_DEREF a 1
 LOAD_FAST i 1
 BINARY_SUBSCR None -1
 CALL_FUNCTION_EX 0 0
 LIST_APPEND 2 -1
 JUMP_ABSOLUTE 4 0
 RETURN_VALUE None -1

What do you think ? Should I open an issue on https://bugs.python.org/ ?

Best regards

Matthieu

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com