Re: [PATCH] decodetree: Allow 'dot' in opcode names

2021-01-12 Thread Philippe Mathieu-Daudé
On 1/12/21 11:28 PM, Eduardo Habkost wrote:
> On Tue, Jan 12, 2021 at 11:15:38PM +0100, Philippe Mathieu-Daudé wrote:
>> On 1/12/21 9:44 PM, Richard Henderson wrote:
>>> On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
 Some ISA use a dot in their opcodes. Allow the decodetree
 script to process them. The dot is replaced by an underscore
 in the generated code.
>>>
>>> Given that you then have to remember to use '_' on the C side, what 
>>> advantage
>>> does this give?
>>
>> The direct advantage is you can copy/paste the opcode in a PDF viewer
>> without having to edit it :)
>>
>> See i.e. some Loongson opcodes [*]:
>>
>> MULT.G  011100 . . . 0 01   @rs_rt_rd
>> DMULT.G 011100 . . . 0 010001   @rs_rt_rd
>> MULTU.G 011100 . . . 0 010010   @rs_rt_rd
>> DMULTU.G011100 . . . 0 010011   @rs_rt_rd
>>
>> DIV.G   011100 . . . 0 010100   @rs_rt_rd
>> DDIV.G  011100 . . . 0 010101   @rs_rt_rd
>> DIVU.G  011100 . . . 0 010110   @rs_rt_rd
>> DDIVU.G 011100 . . . 0 010111   @rs_rt_rd
>>
>> MOD.G   011100 . . . 0 011100   @rs_rt_rd
>> DMOD.G  011100 . . . 0 011101   @rs_rt_rd
>> MODU.G  011100 . . . 0 00   @rs_rt_rd
>> DMODU.G 011100 . . . 0 01   @rs_rt_rd
>>
>> The other - remote - advantage I see is when using a disassembler
>> based on decodetree (as AVR does), the opcode displayed also matches
>> the specs. We are not yet there with MIPS, but I have something in
>> progress...
> 
> Interesting.  So, the decodetree format is not used exclusively
> inside the QEMU source tree, but also by other projects?  Is
> there a specification somewhere else?

"as AVR does in QEMU", see commit 9d8caa67a24
("target/avr: Add support for disassembling via option '-d in_asm'").

What seduces me with decodetree is we don't need to match QEMU
instruction class with each CPU capabilities. IOW we can use the
same decoder for TCG and disassembly, and the disassembly matches
the instruction set of the CPU (with all the specific instructions).

Currently some specific opcodes are displayed as generic ones (or
as unknown via hexadecimal value). Unfortunately not something we
can show with QEMU AVR target because the ISA is very simple.

Regards,

Phil.



Re: [PATCH] decodetree: Allow 'dot' in opcode names

2021-01-12 Thread Eduardo Habkost
On Tue, Jan 12, 2021 at 11:19:49PM +0100, Philippe Mathieu-Daudé wrote:
> On 1/12/21 10:05 PM, Eduardo Habkost wrote:
[...]
> >> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> >> index 47aa9caf6d1..b7572589e64 100644
> >> --- a/scripts/decodetree.py
> >> +++ b/scripts/decodetree.py
> >> @@ -49,7 +49,7 @@
> >>  re_arg_ident = '&[a-zA-Z0-9_]*'
> >>  re_fld_ident = '%[a-zA-Z0-9_]*'
> >>  re_fmt_ident = '@[a-zA-Z0-9_]*'
> >> -re_pat_ident = '[a-zA-Z0-9_]*'
> >> +re_pat_ident = '[a-zA-Z0-9_.]*'
> > 
> > If pattern identifiers are going to follow different rules,
> > doesn't this need to be documented at docs/devel/decodetree.rst?
> 
> I checked and luckily for me the opcode pattern identifiers is
> not documented <:)

The format is not documented, but the specification grammar
implies the same rules apply to all identifiers.

-- 
Eduardo




Re: [PATCH] decodetree: Allow 'dot' in opcode names

2021-01-12 Thread Eduardo Habkost
On Tue, Jan 12, 2021 at 11:15:38PM +0100, Philippe Mathieu-Daudé wrote:
> On 1/12/21 9:44 PM, Richard Henderson wrote:
> > On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
> >> Some ISA use a dot in their opcodes. Allow the decodetree
> >> script to process them. The dot is replaced by an underscore
> >> in the generated code.
> > 
> > Given that you then have to remember to use '_' on the C side, what 
> > advantage
> > does this give?
> 
> The direct advantage is you can copy/paste the opcode in a PDF viewer
> without having to edit it :)
> 
> See i.e. some Loongson opcodes [*]:
> 
> MULT.G  011100 . . . 0 01   @rs_rt_rd
> DMULT.G 011100 . . . 0 010001   @rs_rt_rd
> MULTU.G 011100 . . . 0 010010   @rs_rt_rd
> DMULTU.G011100 . . . 0 010011   @rs_rt_rd
> 
> DIV.G   011100 . . . 0 010100   @rs_rt_rd
> DDIV.G  011100 . . . 0 010101   @rs_rt_rd
> DIVU.G  011100 . . . 0 010110   @rs_rt_rd
> DDIVU.G 011100 . . . 0 010111   @rs_rt_rd
> 
> MOD.G   011100 . . . 0 011100   @rs_rt_rd
> DMOD.G  011100 . . . 0 011101   @rs_rt_rd
> MODU.G  011100 . . . 0 00   @rs_rt_rd
> DMODU.G 011100 . . . 0 01   @rs_rt_rd
> 
> The other - remote - advantage I see is when using a disassembler
> based on decodetree (as AVR does), the opcode displayed also matches
> the specs. We are not yet there with MIPS, but I have something in
> progress...

Interesting.  So, the decodetree format is not used exclusively
inside the QEMU source tree, but also by other projects?  Is
there a specification somewhere else?

> 
> [*] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg02509.html
> 

-- 
Eduardo




Re: [PATCH] decodetree: Allow 'dot' in opcode names

2021-01-12 Thread Philippe Mathieu-Daudé
On 1/12/21 10:05 PM, Eduardo Habkost wrote:
> On Tue, Jan 12, 2021 at 07:41:56PM +0100, Philippe Mathieu-Daudé wrote:
>> Some ISA use a dot in their opcodes. Allow the decodetree
>> script to process them. The dot is replaced by an underscore
>> in the generated code.
> 
> Will something break if we just use underscores instead of dots
> in the input file?

No, but then the opcode doesn't really match the spec.

> 
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  scripts/decodetree.py | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
>> index 47aa9caf6d1..b7572589e64 100644
>> --- a/scripts/decodetree.py
>> +++ b/scripts/decodetree.py
>> @@ -49,7 +49,7 @@
>>  re_arg_ident = '&[a-zA-Z0-9_]*'
>>  re_fld_ident = '%[a-zA-Z0-9_]*'
>>  re_fmt_ident = '@[a-zA-Z0-9_]*'
>> -re_pat_ident = '[a-zA-Z0-9_]*'
>> +re_pat_ident = '[a-zA-Z0-9_.]*'
> 
> If pattern identifiers are going to follow different rules,
> doesn't this need to be documented at docs/devel/decodetree.rst?

I checked and luckily for me the opcode pattern identifiers is
not documented <:)

> 
>>  
>>  def error_with_file(file, lineno, *args):
>>  """Print an error message from file:line and args and exit."""
>> @@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat):
>>  elif re.fullmatch(re_fmt_ident, name):
>>  parse_generic(start_lineno, None, name[1:], toks)
>>  elif re.fullmatch(re_pat_ident, name):
>> +name = name.replace('.', '_')
>>  parse_generic(start_lineno, parent_pat, name, toks)
> 
> Do we want error messages generated by the script to use the
> modified identifier with underscores, or the original identifier
> with dots?  (This patch does the former)

You are right, we want the former in the error message (the input
format).

Thanks,

Phil.



Re: [PATCH] decodetree: Allow 'dot' in opcode names

2021-01-12 Thread Philippe Mathieu-Daudé
On 1/12/21 9:44 PM, Richard Henderson wrote:
> On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
>> Some ISA use a dot in their opcodes. Allow the decodetree
>> script to process them. The dot is replaced by an underscore
>> in the generated code.
> 
> Given that you then have to remember to use '_' on the C side, what advantage
> does this give?

The direct advantage is you can copy/paste the opcode in a PDF viewer
without having to edit it :)

See i.e. some Loongson opcodes [*]:

MULT.G  011100 . . . 0 01   @rs_rt_rd
DMULT.G 011100 . . . 0 010001   @rs_rt_rd
MULTU.G 011100 . . . 0 010010   @rs_rt_rd
DMULTU.G011100 . . . 0 010011   @rs_rt_rd

DIV.G   011100 . . . 0 010100   @rs_rt_rd
DDIV.G  011100 . . . 0 010101   @rs_rt_rd
DIVU.G  011100 . . . 0 010110   @rs_rt_rd
DDIVU.G 011100 . . . 0 010111   @rs_rt_rd

MOD.G   011100 . . . 0 011100   @rs_rt_rd
DMOD.G  011100 . . . 0 011101   @rs_rt_rd
MODU.G  011100 . . . 0 00   @rs_rt_rd
DMODU.G 011100 . . . 0 01   @rs_rt_rd

The other - remote - advantage I see is when using a disassembler
based on decodetree (as AVR does), the opcode displayed also matches
the specs. We are not yet there with MIPS, but I have something in
progress...

[*] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg02509.html



Re: [PATCH] decodetree: Allow 'dot' in opcode names

2021-01-12 Thread Eduardo Habkost
On Tue, Jan 12, 2021 at 07:41:56PM +0100, Philippe Mathieu-Daudé wrote:
> Some ISA use a dot in their opcodes. Allow the decodetree
> script to process them. The dot is replaced by an underscore
> in the generated code.

Will something break if we just use underscores instead of dots
in the input file?

> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  scripts/decodetree.py | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> index 47aa9caf6d1..b7572589e64 100644
> --- a/scripts/decodetree.py
> +++ b/scripts/decodetree.py
> @@ -49,7 +49,7 @@
>  re_arg_ident = '&[a-zA-Z0-9_]*'
>  re_fld_ident = '%[a-zA-Z0-9_]*'
>  re_fmt_ident = '@[a-zA-Z0-9_]*'
> -re_pat_ident = '[a-zA-Z0-9_]*'
> +re_pat_ident = '[a-zA-Z0-9_.]*'

If pattern identifiers are going to follow different rules,
doesn't this need to be documented at docs/devel/decodetree.rst?

>  
>  def error_with_file(file, lineno, *args):
>  """Print an error message from file:line and args and exit."""
> @@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat):
>  elif re.fullmatch(re_fmt_ident, name):
>  parse_generic(start_lineno, None, name[1:], toks)
>  elif re.fullmatch(re_pat_ident, name):
> +name = name.replace('.', '_')
>  parse_generic(start_lineno, parent_pat, name, toks)

Do we want error messages generated by the script to use the
modified identifier with underscores, or the original identifier
with dots?  (This patch does the former)

>  else:
>  error(lineno, 'invalid token "{0}"'.format(name))
> -- 
> 2.26.2
> 

-- 
Eduardo




Re: [PATCH] decodetree: Allow 'dot' in opcode names

2021-01-12 Thread Richard Henderson
On 1/12/21 8:41 AM, Philippe Mathieu-Daudé wrote:
> Some ISA use a dot in their opcodes. Allow the decodetree
> script to process them. The dot is replaced by an underscore
> in the generated code.

Given that you then have to remember to use '_' on the C side, what advantage
does this give?


r~



[PATCH] decodetree: Allow 'dot' in opcode names

2021-01-12 Thread Philippe Mathieu-Daudé
Some ISA use a dot in their opcodes. Allow the decodetree
script to process them. The dot is replaced by an underscore
in the generated code.

Signed-off-by: Philippe Mathieu-Daudé 
---
 scripts/decodetree.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/decodetree.py b/scripts/decodetree.py
index 47aa9caf6d1..b7572589e64 100644
--- a/scripts/decodetree.py
+++ b/scripts/decodetree.py
@@ -49,7 +49,7 @@
 re_arg_ident = '&[a-zA-Z0-9_]*'
 re_fld_ident = '%[a-zA-Z0-9_]*'
 re_fmt_ident = '@[a-zA-Z0-9_]*'
-re_pat_ident = '[a-zA-Z0-9_]*'
+re_pat_ident = '[a-zA-Z0-9_.]*'
 
 def error_with_file(file, lineno, *args):
 """Print an error message from file:line and args and exit."""
@@ -1082,6 +1082,7 @@ def parse_file(f, parent_pat):
 elif re.fullmatch(re_fmt_ident, name):
 parse_generic(start_lineno, None, name[1:], toks)
 elif re.fullmatch(re_pat_ident, name):
+name = name.replace('.', '_')
 parse_generic(start_lineno, parent_pat, name, toks)
 else:
 error(lineno, 'invalid token "{0}"'.format(name))
-- 
2.26.2