Re: [PATCH] decodetree: Allow 'dot' in opcode names
On 1/12/21 11:28 PM, Eduardo Habkost wrote: > On Tue, Jan 12, 2021 at 11:15:38PM +0100, Philippe Mathieu-Daudé wrote: >> On 1/12/21 9:44 PM, Richard Henderson wrote: >>> On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote: Some ISA use a dot in their opcodes. Allow the decodetree script to process them. The dot is replaced by an underscore in the generated code. >>> >>> Given that you then have to remember to use '_' on the C side, what >>> advantage >>> does this give? >> >> The direct advantage is you can copy/paste the opcode in a PDF viewer >> without having to edit it :) >> >> See i.e. some Loongson opcodes [*]: >> >> MULT.G 011100 . . . 0 01 @rs_rt_rd >> DMULT.G 011100 . . . 0 010001 @rs_rt_rd >> MULTU.G 011100 . . . 0 010010 @rs_rt_rd >> DMULTU.G011100 . . . 0 010011 @rs_rt_rd >> >> DIV.G 011100 . . . 0 010100 @rs_rt_rd >> DDIV.G 011100 . . . 0 010101 @rs_rt_rd >> DIVU.G 011100 . . . 0 010110 @rs_rt_rd >> DDIVU.G 011100 . . . 0 010111 @rs_rt_rd >> >> MOD.G 011100 . . . 0 011100 @rs_rt_rd >> DMOD.G 011100 . . . 0 011101 @rs_rt_rd >> MODU.G 011100 . . . 0 00 @rs_rt_rd >> DMODU.G 011100 . . . 0 01 @rs_rt_rd >> >> The other - remote - advantage I see is when using a disassembler >> based on decodetree (as AVR does), the opcode displayed also matches >> the specs. We are not yet there with MIPS, but I have something in >> progress... > > Interesting. So, the decodetree format is not used exclusively > inside the QEMU source tree, but also by other projects? Is > there a specification somewhere else? "as AVR does in QEMU", see commit 9d8caa67a24 ("target/avr: Add support for disassembling via option '-d in_asm'"). What seduces me with decodetree is we don't need to match QEMU instruction class with each CPU capabilities. IOW we can use the same decoder for TCG and disassembly, and the disassembly matches the instruction set of the CPU (with all the specific instructions). Currently some specific opcodes are displayed as generic ones (or as unknown via hexadecimal value). Unfortunately not something we can show with QEMU AVR target because the ISA is very simple. Regards, Phil.
Re: [PATCH] decodetree: Allow 'dot' in opcode names
On Tue, Jan 12, 2021 at 11:19:49PM +0100, Philippe Mathieu-Daudé wrote: > On 1/12/21 10:05 PM, Eduardo Habkost wrote: [...] > >> diff --git a/scripts/decodetree.py b/scripts/decodetree.py > >> index 47aa9caf6d1..b7572589e64 100644 > >> --- a/scripts/decodetree.py > >> +++ b/scripts/decodetree.py > >> @@ -49,7 +49,7 @@ > >> re_arg_ident = '&[a-zA-Z0-9_]*' > >> re_fld_ident = '%[a-zA-Z0-9_]*' > >> re_fmt_ident = '@[a-zA-Z0-9_]*' > >> -re_pat_ident = '[a-zA-Z0-9_]*' > >> +re_pat_ident = '[a-zA-Z0-9_.]*' > > > > If pattern identifiers are going to follow different rules, > > doesn't this need to be documented at docs/devel/decodetree.rst? > > I checked and luckily for me the opcode pattern identifiers is > not documented <:) The format is not documented, but the specification grammar implies the same rules apply to all identifiers. -- Eduardo
Re: [PATCH] decodetree: Allow 'dot' in opcode names
On Tue, Jan 12, 2021 at 11:15:38PM +0100, Philippe Mathieu-Daudé wrote: > On 1/12/21 9:44 PM, Richard Henderson wrote: > > On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote: > >> Some ISA use a dot in their opcodes. Allow the decodetree > >> script to process them. The dot is replaced by an underscore > >> in the generated code. > > > > Given that you then have to remember to use '_' on the C side, what > > advantage > > does this give? > > The direct advantage is you can copy/paste the opcode in a PDF viewer > without having to edit it :) > > See i.e. some Loongson opcodes [*]: > > MULT.G 011100 . . . 0 01 @rs_rt_rd > DMULT.G 011100 . . . 0 010001 @rs_rt_rd > MULTU.G 011100 . . . 0 010010 @rs_rt_rd > DMULTU.G011100 . . . 0 010011 @rs_rt_rd > > DIV.G 011100 . . . 0 010100 @rs_rt_rd > DDIV.G 011100 . . . 0 010101 @rs_rt_rd > DIVU.G 011100 . . . 0 010110 @rs_rt_rd > DDIVU.G 011100 . . . 0 010111 @rs_rt_rd > > MOD.G 011100 . . . 0 011100 @rs_rt_rd > DMOD.G 011100 . . . 0 011101 @rs_rt_rd > MODU.G 011100 . . . 0 00 @rs_rt_rd > DMODU.G 011100 . . . 0 01 @rs_rt_rd > > The other - remote - advantage I see is when using a disassembler > based on decodetree (as AVR does), the opcode displayed also matches > the specs. We are not yet there with MIPS, but I have something in > progress... Interesting. So, the decodetree format is not used exclusively inside the QEMU source tree, but also by other projects? Is there a specification somewhere else? > > [*] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg02509.html > -- Eduardo
Re: [PATCH] decodetree: Allow 'dot' in opcode names
On 1/12/21 10:05 PM, Eduardo Habkost wrote: > On Tue, Jan 12, 2021 at 07:41:56PM +0100, Philippe Mathieu-Daudé wrote: >> Some ISA use a dot in their opcodes. Allow the decodetree >> script to process them. The dot is replaced by an underscore >> in the generated code. > > Will something break if we just use underscores instead of dots > in the input file? No, but then the opcode doesn't really match the spec. > >> >> Signed-off-by: Philippe Mathieu-Daudé >> --- >> scripts/decodetree.py | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/scripts/decodetree.py b/scripts/decodetree.py >> index 47aa9caf6d1..b7572589e64 100644 >> --- a/scripts/decodetree.py >> +++ b/scripts/decodetree.py >> @@ -49,7 +49,7 @@ >> re_arg_ident = '&[a-zA-Z0-9_]*' >> re_fld_ident = '%[a-zA-Z0-9_]*' >> re_fmt_ident = '@[a-zA-Z0-9_]*' >> -re_pat_ident = '[a-zA-Z0-9_]*' >> +re_pat_ident = '[a-zA-Z0-9_.]*' > > If pattern identifiers are going to follow different rules, > doesn't this need to be documented at docs/devel/decodetree.rst? I checked and luckily for me the opcode pattern identifiers is not documented <:) > >> >> def error_with_file(file, lineno, *args): >> """Print an error message from file:line and args and exit.""" >> @@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat): >> elif re.fullmatch(re_fmt_ident, name): >> parse_generic(start_lineno, None, name[1:], toks) >> elif re.fullmatch(re_pat_ident, name): >> +name = name.replace('.', '_') >> parse_generic(start_lineno, parent_pat, name, toks) > > Do we want error messages generated by the script to use the > modified identifier with underscores, or the original identifier > with dots? (This patch does the former) You are right, we want the former in the error message (the input format). Thanks, Phil.
Re: [PATCH] decodetree: Allow 'dot' in opcode names
On 1/12/21 9:44 PM, Richard Henderson wrote: > On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote: >> Some ISA use a dot in their opcodes. Allow the decodetree >> script to process them. The dot is replaced by an underscore >> in the generated code. > > Given that you then have to remember to use '_' on the C side, what advantage > does this give? The direct advantage is you can copy/paste the opcode in a PDF viewer without having to edit it :) See i.e. some Loongson opcodes [*]: MULT.G 011100 . . . 0 01 @rs_rt_rd DMULT.G 011100 . . . 0 010001 @rs_rt_rd MULTU.G 011100 . . . 0 010010 @rs_rt_rd DMULTU.G011100 . . . 0 010011 @rs_rt_rd DIV.G 011100 . . . 0 010100 @rs_rt_rd DDIV.G 011100 . . . 0 010101 @rs_rt_rd DIVU.G 011100 . . . 0 010110 @rs_rt_rd DDIVU.G 011100 . . . 0 010111 @rs_rt_rd MOD.G 011100 . . . 0 011100 @rs_rt_rd DMOD.G 011100 . . . 0 011101 @rs_rt_rd MODU.G 011100 . . . 0 00 @rs_rt_rd DMODU.G 011100 . . . 0 01 @rs_rt_rd The other - remote - advantage I see is when using a disassembler based on decodetree (as AVR does), the opcode displayed also matches the specs. We are not yet there with MIPS, but I have something in progress... [*] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg02509.html
Re: [PATCH] decodetree: Allow 'dot' in opcode names
On Tue, Jan 12, 2021 at 07:41:56PM +0100, Philippe Mathieu-Daudé wrote: > Some ISA use a dot in their opcodes. Allow the decodetree > script to process them. The dot is replaced by an underscore > in the generated code. Will something break if we just use underscores instead of dots in the input file? > > Signed-off-by: Philippe Mathieu-Daudé > --- > scripts/decodetree.py | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/scripts/decodetree.py b/scripts/decodetree.py > index 47aa9caf6d1..b7572589e64 100644 > --- a/scripts/decodetree.py > +++ b/scripts/decodetree.py > @@ -49,7 +49,7 @@ > re_arg_ident = '&[a-zA-Z0-9_]*' > re_fld_ident = '%[a-zA-Z0-9_]*' > re_fmt_ident = '@[a-zA-Z0-9_]*' > -re_pat_ident = '[a-zA-Z0-9_]*' > +re_pat_ident = '[a-zA-Z0-9_.]*' If pattern identifiers are going to follow different rules, doesn't this need to be documented at docs/devel/decodetree.rst? > > def error_with_file(file, lineno, *args): > """Print an error message from file:line and args and exit.""" > @@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat): > elif re.fullmatch(re_fmt_ident, name): > parse_generic(start_lineno, None, name[1:], toks) > elif re.fullmatch(re_pat_ident, name): > +name = name.replace('.', '_') > parse_generic(start_lineno, parent_pat, name, toks) Do we want error messages generated by the script to use the modified identifier with underscores, or the original identifier with dots? (This patch does the former) > else: > error(lineno, 'invalid token "{0}"'.format(name)) > -- > 2.26.2 > -- Eduardo
Re: [PATCH] decodetree: Allow 'dot' in opcode names
On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote: > Some ISA use a dot in their opcodes. Allow the decodetree > script to process them. The dot is replaced by an underscore > in the generated code. Given that you then have to remember to use '_' on the C side, what advantage does this give? r~
[PATCH] decodetree: Allow 'dot' in opcode names
Some ISA use a dot in their opcodes. Allow the decodetree script to process them. The dot is replaced by an underscore in the generated code. Signed-off-by: Philippe Mathieu-Daudé --- scripts/decodetree.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/scripts/decodetree.py b/scripts/decodetree.py index 47aa9caf6d1..b7572589e64 100644 --- a/scripts/decodetree.py +++ b/scripts/decodetree.py @@ -49,7 +49,7 @@ re_arg_ident = '&[a-zA-Z0-9_]*' re_fld_ident = '%[a-zA-Z0-9_]*' re_fmt_ident = '@[a-zA-Z0-9_]*' -re_pat_ident = '[a-zA-Z0-9_]*' +re_pat_ident = '[a-zA-Z0-9_.]*' def error_with_file(file, lineno, *args): """Print an error message from file:line and args and exit.""" @@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat): elif re.fullmatch(re_fmt_ident, name): parse_generic(start_lineno, None, name[1:], toks) elif re.fullmatch(re_pat_ident, name): +name = name.replace('.', '_') parse_generic(start_lineno, parent_pat, name, toks) else: error(lineno, 'invalid token "{0}"'.format(name)) -- 2.26.2