Re: [PATCH v2 08/10] target/hexagon: import parser for idef-parser

2021-03-23 Thread Paolo Montesel via
Thanks for the feedback, it helped us improve the implementation quite a bit.

> > +| rvalue QMARK rvalue COLON rvalue
> > +{
> > +@1.last_column = @5.last_column;
> > +bool is_64bit = ($3.bit_width == 64) || ($5.bit_width == 64);
> > +int bit_width = (is_64bit) ? 64 : 32;
> > +if (is_64bit) {
> > +$1 = rvalue_extend(c, &@1, &$1);
> > +$3 = rvalue_extend(c, &@1, &$3);
> > +$5 = rvalue_extend(c, &@1, &$5);
> > +} else {
> > +$1 = rvalue_truncate(c, &@1, &$1);
> > +}
> > +$1 = rvalue_materialize(c, &@1, &$1);
> > +$3 = rvalue_materialize(c, &@1, &$3);
> > +$5 = rvalue_materialize(c, &@1, &$5);
> > +HexValue res = gen_local_tmp(c, &@1, bit_width);
> > +HexValue zero = gen_tmp_value(c, &@1, "0", bit_width);
> > +OUT(c, &@1, "tcg_gen_movcond_i", &bit_width);
> > +OUT(c, &@1, "(TCG_COND_NE, ", &res, ", ", &$1, ", ", &zero);
>
> It would be better if you parsed conditions differently.
> Retain the two arguments and the condition, so that you can fold that into the
> movcond directly.
>
> E.g. instead of
>
> tcg_gen_setcond_i32(cond, t, x, y)
> tcg_gen_movcond_i32(TCG_COND_NE, dest, t, zero, src1, src2);
>
> you'd be able to do
>
> tcg_gen_movcond_i32(cond, dest, x, y, src1, src2);
>
> This would be trivial with a non-terminal "cond", used here and with IF.  
> You'd
> include cond as an alternative of rvalue, which would perform the reduction to
> boolean with setcond.

This would save us from emitting some tcg ops but would increase the
complexity of the parser, which doesn't seem worth it imho.

> > +case VALUE:
> > +EMIT(c, "((int64_t)%" PRIu64 "ULL)", (int64_t)imm->value);
>
> Why are you using ull then casting to signed?  Just use ll.

We have a case in which we would print `-9223372036854775808LL` (64
bit integer with sign bit set) and gcc would complain with `warning:
integer constant is so large that it is unsigned`.
That's the reason for using ULL and then casting.
I'm open to other solutions.

> > +switch (op_types) {
> > +case IMM_IMM:
> > +{
> > +OUT(c, locp, "tcg_gen_movi_", bit_suffix,
> > +"(", &res, ", ", &op1, " == ", &op2, ");\n");
> > +break;
> > +}
>
> Drop useless braces like this.
>
> Do you really see any IMM_IMM operations?  There are some typos in this
> section, so certainly all operators are not represented.  It might be worth
> folding all of these inside the parser, and not deferring to the C compiler, 
> so
> that you can be certain of having a real value for any IMMEDIATE.  Which will
> help when it comes to shift below.

Maybe not for all bin ops, but we do see IMM_IMM.
I think we can't always fold IMMEDIATES, because they include "normal"
C variables (e.g.: `int32_t uiV` in function arguments) that depend on
instruction bytes at runtime.
That's true also for the IMM_IMM case.

> I'm thinking that this code could really benefit from tcg_constant_{i32,i64}.
> It produces a hashed temp that need not be freed, and it's what many of the
> tcg_gen_fooi functions use in the backend.

We can technically convert all the IMMs to tcg_constant before using
them, but since we can't constant fold in the parser that would
probably decrease performance quite a bit.

> > +static void gen_div_op(Context *c, YYLTYPE *locp, HexValue *res,
> > +   enum OpTypes op_types, HexValue *op1, HexValue *op2)
> > +{
> > +switch (op_types) {
> > +case IMM_IMM:
> > +OUT(c, locp, "int64_t ", res, " = ", op1, " / ", op2, ";\n");
> > +break;
> > +case IMM_REG:
> > +case REG_IMM:
> > +case REG_REG:
> > +OUT(c, locp, res, " = gen_helper_divu("
> > +"cpu_env, ", op1, ", ", op2, ");\n");
>
> Are we trusting that div-by-zero has already been tested for?

Turns out div is not even used by the instructions we currently
support (: so we can just delete this

~Paolo



Re: [PATCH v2 08/10] target/hexagon: import parser for idef-parser

2021-03-01 Thread Alessandro Di Federico via
On Thu, 25 Feb 2021 19:30:14 -0800
Richard Henderson  wrote:

> > +}
> > +}
> > +code
> > +{
> > +if (c->inst.error_count != 0) {
> > +fprintf(stderr,
> > +"Parsing of instruction %s generated %d errors!\n",
> > +c->inst.name,
> > +c->inst.error_count);
> > +EMIT(c, "assert(false && \"This instruction is not
> > implemented!\");");  
> 
> What's the point of assert(false) above abort()?
> 
> Is there any point in emitting anything at all, since I assume the
> idef-parser program itself will exit with error, stopping the build
> process?

This is a leftover, that string will never be written to disk (`commit`
is not invoked).

> > +| pre ASSIGN rvalue
> > +{
> > +@1.last_column = @3.last_column;  
> 
> Do you really find any value in this column manipulation, given that
> the input is not the original file, but the output of cpp?

The output of `cpp` is quite readable. We use it a lot for debugging
and it's very helpful.

> IMO this is another reason to *not* preprocess with macros.inc, nor
> sed the output as a workaround for your parsing troubles.

Yes, `sed` is a workaround I really don't like too. But preprocessing
with `cpp` saves us from having to handle a larger, redundant language.
After all, the input language is designed to be expanded through the
preprocessor, although with a different set of macros. I'd keep that
part.

-- 
Alessandro Di Federico
rev.ng



Re: [PATCH v2 08/10] target/hexagon: import parser for idef-parser

2021-02-25 Thread Richard Henderson
On 2/25/21 7:18 AM, Alessandro Di Federico wrote:
> +instructions : instruction instructions
> +| %empty
> +;

I have never seen bison written flush-left like this, and I find it really hard
to read, especially with some of the larger non-terminals.

I'm also not a fan of large blocks of code within non-terminals.  IMO they
should pretty much be limited to calling a function, the result of which is
assigned to $$ as needed.

> +instruction : INAME
> +{
> +/* Early-free if the parser failed on the previous instruction */
> +free_instruction(c);

Surely this belongs in the error alternative.

> +for (int i = 0; i < c->inst.init_count; i++) {
> +bool is64 = c->inst.init_list[i].bit_width == 64;
> +const char *type = is64 ? "i64" : "i32";

Why would you not use "i%d" in the printf format.

> +if (c->inst.init_list[i].type == REGISTER) {
> +OUT(c, &@1, "tcg_gen_movi_", type,
> +"(", &(c->inst.init_list[i]), ", 0);\n");
> +} else if (c->inst.init_list[i].type == PREDICATE) {
> +OUT(c, &@1, "tcg_gen_movi_", type,
> +"(", &(c->inst.init_list[i]), ", 0);\n");
> +}

These are identical.

> +}
> +}
> +code
> +{
> +if (c->inst.error_count != 0) {
> +fprintf(stderr,
> +"Parsing of instruction %s generated %d errors!\n",
> +c->inst.name,
> +c->inst.error_count);
> +EMIT(c, "assert(false && \"This instruction is not 
> implemented!\");");

What's the point of assert(false) above abort()?

Is there any point in emitting anything at all, since I assume the idef-parser
program itself will exit with error, stopping the build process?

> +arguments : LPAR RPAR

Is there any reason you're not spelling these '(' and ')'?
The same goes for every other single-character token.

> +decl : REG
> +{
> +emit_arg(c, &@1, &$1);
> +/* Enqueue register into initialization list */
> +yyassert(c, &@1, c->inst.init_count < INIT_LIST_LEN,
> + "init_count overflow");

Use a data structure that grows.  There are plenty to choose from.

> +| pre ASSIGN rvalue
> +{
> +@1.last_column = @3.last_column;

Do you really find any value in this column manipulation, given that the input
is not the original file, but the output of cpp?

IMO this is another reason to *not* preprocess with macros.inc, nor sed the
output as a workaround for your parsing troubles.

> +| SETBITS LPAR rvalue COMMA rvalue COMMA rvalue COMMA rvalue RPAR
> +{
> +@1.last_column = @10.last_column;
> +yyassert(c, &@1, $3.type == IMMEDIATE &&
> + $3.imm.type == VALUE &&
> + $5.type == IMMEDIATE &&
> + $5.imm.type == VALUE,
> + "Range deposit needs immediate values!\n");
> +int i;
> +$9 = rvalue_truncate(c, &@1, &$9);
> +for (i = $5.imm.value; i <= $3.imm.value; ++i) {
> +OUT(c, &@1, "gen_set_bit(", &i, ", ", &$7, ", ", &$9, ");\n");

A loop, really?  Surely this is just

  deposit($7, $7, -$9, $5, $3 - $5 + 1)

> +control_statement : frame_check  { /* does nothing */ }
> +| cancel_statement { /* does nothing */ }
> +| if_statement { /* does nothing */ }
> +| for_statement{ /* does nothing */ }
> +| fpart1_statement { /* does nothing */ }
> +;

You can drop all the empty code blocks.

> +frame_check : FCHK LPAR rvalue COMMA rvalue RPAR SEMI  {

Do not start code blocks on the previous line like this.
What goes for our C coding style does not apply to yacc grammar.  ;-)

> +for_statement : FOR LPAR IMM ASSIGN IMM SEMI IMM LT IMM SEMI IMM PLUSPLUS 
> RPAR
> +{
> +@1.last_column = @13.last_column;
> +OUT(c, &@1, "for (int ", &$3, " = ", &$5, "; ", &$7, " < ", &$9);
> +OUT(c, &@1, "; ", &$11, "++) {\n");
> +}
> +code_block
> +{
> +OUT(c, &@1, "}\n");
> +}
> +;
> +
> +for_statement : FOR LPAR IMM ASSIGN IMM SEMI IMM LT IMM SEMI IMM INC IMM RPAR

Why separate these productions, and not use |?

> +if_stmt : IF
> +{
> +/* Generate an end label, if false branch to that label */
> +OUT(c, &@1, "TCGLabel *if_label_", &c->inst.if_count,
> +" = gen_new_label();\n");
> +}

Do you actually use the label before...

> +LPAR rvalue RPAR
> +{
> +@1.last_column = @3.last_column;
> +$4 = rvalue_materialize(c, &@1, &$4);
> +const char *bit_suffix = ($4.bit_width == 64) ? "i64" : "i32";
> +OUT(c, &@1, "tcg_gen_brcondi_", bit_suffix, "(TCG_COND_EQ, ", &$4,
> +", 0, if_label_", &c->inst.if_count, ");\n");

... here?  I don't see why you need to create it early.


> +rvalue : FAIL
> +{
> +@1.last_column = @1.last_column;

Useless assignment.

> +| PC
> +{
> +/* Read PC from the CR */
> +$$ = gen_tmp(c, &@1, 32);
> +OUT(c, &@1, "tcg_gen_mov_i32(", &$$, ", hex_gpr[HEX_REG_PC]);\n");
> +}
> +| NPC
> +{
> +/* NPC is only read from CALLs, so we can hardcode it at translation 
> time */
> +$$ = gen_tmp(c, &@1, 32);
> +OUT(c,

[PATCH v2 08/10] target/hexagon: import parser for idef-parser

2021-02-25 Thread Alessandro Di Federico via
From: Paolo Montesel 

Signed-off-by: Alessandro Di Federico 
Signed-off-by: Paolo Montesel 
---
 target/hexagon/idef-parser/idef-parser.y  | 1250 +++
 target/hexagon/idef-parser/parser-helpers.c   | 1925 +
 target/hexagon/idef-parser/parser-helpers.h   |  293 +++
 target/hexagon/meson.build|   23 +-
 tests/docker/dockerfiles/alpine.docker|1 +
 tests/docker/dockerfiles/centos7.docker   |1 +
 tests/docker/dockerfiles/centos8.docker   |1 +
 tests/docker/dockerfiles/debian10.docker  |1 +
 .../dockerfiles/fedora-i386-cross.docker  |1 +
 .../dockerfiles/fedora-win32-cross.docker |1 +
 .../dockerfiles/fedora-win64-cross.docker |1 +
 tests/docker/dockerfiles/fedora.docker|1 +
 tests/docker/dockerfiles/opensuse-leap.docker |1 +
 tests/docker/dockerfiles/ubuntu.docker|1 +
 tests/docker/dockerfiles/ubuntu1804.docker|1 +
 tests/docker/dockerfiles/ubuntu2004.docker|3 +-
 16 files changed, 3503 insertions(+), 2 deletions(-)
 create mode 100644 target/hexagon/idef-parser/idef-parser.y
 create mode 100644 target/hexagon/idef-parser/parser-helpers.c
 create mode 100644 target/hexagon/idef-parser/parser-helpers.h

diff --git a/target/hexagon/idef-parser/idef-parser.y 
b/target/hexagon/idef-parser/idef-parser.y
new file mode 100644
index 00..d9dd81bd6f
--- /dev/null
+++ b/target/hexagon/idef-parser/idef-parser.y
@@ -0,0 +1,1250 @@
+%{
+/*
+ * Copyright(c) 2019-2020 rev.ng Srls. All Rights Reserved.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; withOUT even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "idef-parser.h"
+#include "parser-helpers.h"
+#include "idef-parser.tab.h"
+#include "idef-parser.yy.h"
+
+/* Uncomment this to disable yyasserts */
+/* #define NDEBUG */
+
+#define ERR_LINE_CONTEXT 40
+
+%}
+
+%lex-param {void *scanner}
+%parse-param {void *scanner}
+%parse-param {Context *c}
+
+/* Uncomment this for better errors in recent bison versions */
+/* %define parse.error detailed */
+%define parse.error verbose
+%define parse.lac full
+%define api.pure full
+
+%locations
+
+%union {
+char *string;
+HexValue rvalue;
+HexSat sat;
+HexCast cast;
+HexExtract extract;
+HexMpy mpy;
+bool is_unsigned;
+int index;
+}
+
+/* Tokens */
+%start input
+
+%expect 1
+
+%token INAME DREG DIMM DPRE DEA RREG WREG FREG FIMM RPRE WPRE FPRE FWRAP FEA
+%token VAR LBR RBR LPAR RPAR LSQ RSQ SEMI COLON PLUS MINUS MUL POW DIV MOD ABS
+%token CROUND ROUND CIRCADD COUNTONES AND OR XOR NOT ASSIGN INC DEC ANDA ORA
+%token XORA PLUSPLUS LT GT ASL ASR LSR EQ NEQ LTE GTE MIN MAX ANDL ORL NOTL
+%token COMMA FOR ICIRC IF MUN FSCR FCHK SXT ZXT NEW CONSTEXT LOCNT BREV SIGN
+%token LOAD STORE CONSTLL CONSTULL PC NPC LPCFG CANC QMARK IDENTITY PART1
+%token BREV_4 BREV_8 ROTL INSBITS SETBITS EXTBITS EXTRANGE CAST4_8U SETOVF FAIL
+%token DEINTERLEAVE INTERLEAVE
+
+%token  REG IMM PRE
+%token  ELSE
+%token  MPY
+%token  SAT
+%token  CAST DEPOSIT SETHALF
+%token  EXTRACT
+%type  INAME
+%type  rvalue lvalue VAR assign_statement pre
+%type  DREG DIMM DPRE RREG RPRE FAIL
+%type  if_stmt IF
+%type  SIGN
+
+/* Operator Precedences */
+%left MIN MAX
+%left LPAR
+%left COMMA
+%left ASSIGN
+%right CIRCADD
+%right INC DEC ANDA ORA XORA
+%left QMARK COLON
+%left ORL
+%left ANDL
+%left OR
+%left XOR ANDOR
+%left AND
+%left EQ NEQ
+%left LT GT LTE GTE
+%left ASL ASR LSR
+%right ABS
+%left MINUS PLUS
+%left POW
+%left MUL DIV MOD MPY
+%right NOT NOTL
+%left LSQ
+%left NEW
+%right CAST
+%right LOCNT BREV
+
+/* Bison Grammar */
+%%
+
+/* Input file containing the description of each hexagon instruction */
+input : instructions
+{
+YYACCEPT;
+}
+;
+
+instructions : instruction instructions
+| %empty
+;
+
+instruction : INAME
+{
+/* Early-free if the parser failed on the previous instruction */
+free_instruction(c);
+
+c->total_insn++;
+c->inst.name = $1;
+emit_header(c);
+}
+arguments
+{
+EMIT_SIG(c, ")");
+EMIT_HEAD(c, "{\n");
+
+/* Initialize declared but uninitialized registers, but only for */
+/* non-conditional instructions */
+for (int i = 0; i < c->inst.init_count; i++) {
+bool is64 = c->inst.init_list[i].bit_width == 64;
+const char *type = is64 ? "i64" : "i32";
+if (c->inst.init_list[i].type == REGISTER) {
+OUT(c,