Re: [PATCH 06/15] arc: TCG instruction definitions

2021-01-15 Thread Cupertino Miranda
>> +
>> +assert(ctx->insn.limm_p == 0 && !in_delay_slot);
>> +
>> +if (ctx->insn.limm_p == 0 && !in_delay_slot) {
>> +in_delay_slot = true;
>> +uint32_t cpc = ctx->cpc;
>> +uint32_t pcl = ctx->pcl;
>> +insn_t insn = ctx->insn;
>> +
>> +ctx->cpc = ctx->npc;
>> +ctx->pcl = ctx->cpc & 0xfffc;
>> +
>> +++ctx->ds;
>> +
>> +TCGLabel *do_not_set_bta_and_de = gen_new_label();
>> +tcg_gen_brcondi_i32(TCG_COND_NE, take_branch, 1, 
>> do_not_set_bta_and_de);
>> +/*
>> + * In case an exception should be raised during the execution
>> + * of delay slot, bta value is used to set erbta.
>> + */
>> +tcg_gen_mov_tl(cpu_bta, bta);
>> +/* We are in a delay slot */
>> +tcg_gen_mov_tl(cpu_DEf, take_branch);
>> +gen_set_label(do_not_set_bta_and_de);
>> +
>> +tcg_gen_movi_tl(cpu_is_delay_slot_instruction, 1);
>> +
>> +/* Set the pc to the next pc */
>> +tcg_gen_movi_tl(cpu_pc, ctx->npc);
>> +/* Necessary for the likely call to restore_state_to_opc() */
>> +tcg_gen_insn_start(ctx->npc);
> 
> Wow, this looks wrong.
> 
> It doesn't work with icount or single-stepping.  You need to be able to begin 
> a
> TB at any instruction, including a delay slot.
> 
> You should have a look at some of the other targets that handle this, e.g.
> openrisc or sparc.  Since you've got NPC already, for looping, sparc is
> probably the better match.
> 
> There should be no reason why you'd need to emit insn_start outside of
> arc_tr_insn_start.
> 

We are also able to start TB at any instruction, that is not what is 
being validated here. It is just verifying if a delayslot instruction 
does not have the delayslot flag set, and that the delayslot instruction 
does not have a limm.

The way we encode delayslot instructions is a little peculiar. When some 
instruction defines a delayslot, it calls this function and the 
instruction is decoded inline.
As far as I remember the change of instruction context, with 
tcg_gen_insns_start, allowed us to properly make gdb jump from branch 
instruction to delay slot instruction and then back.

The asser was used to guarantee that those conditions where never broken 
internally. The condition becomes irrelevant.


>> + ***
>> + * Statically inferred return function *
>> + ***
>> + */
>> +
>> +TCGv arc_gen_next_reg(const DisasCtxt *ctx, TCGv reg)
>> +{
>> +int i;
>> +for (i = 0; i < 64; i += 2) {
>> +if (reg == cpu_r[i]) {
>> +return cpu_r[i + 1];
>> +}
>> +}
>> +/* Check if REG is an odd register. */
>> +for (i = 1; i < 64; i += 2) {
>> +/* If so, that is unsanctioned. */
>> +if (reg == cpu_r[i]) {
>> +arc_gen_excp(ctx, EXCP_INST_ERROR, 0, 0);
>> +return NULL;
>> +}
>> +}
> 
> Wow, really?  Can't you grab this directly from the operand decoding?  Where
> surely we have already ensured that the relevant regnos are not odd.

Indeed, we can grab it from operand decoding. Please allow us to do this 
change once we rework binutils code.


Re: [PATCH 06/15] arc: TCG instruction definitions

2020-12-03 Thread Cupertino Miranda
Seems perfectly doable, no objections.
It will probably take me longer to integrate it in the build system
then to get the scripts ready.
I will start by placing the ruby tool and documentation there, and
later on, integrate it in the build system.

Hope that you get re-motivated to review our patches. No pressure though ;-)
Very valuable comments, lots of improvements happening here.

On Thu, Dec 3, 2020 at 7:34 PM Richard Henderson
 wrote:
>
> On 12/3/20 10:54 AM, Cupertino Miranda wrote:
> > Our generation tool has different levels of verbosity, expressing
> > instruction semantics from a pattern level up to what it is shown in
> >  as comments, which is later converted to TCG format.
> > For QEMU purposes I would say that input format should be what is
> > shown as comments in  file.
>
> That seems reasonable.
>
> > Also, as is, the generator is done in Ruby, and to be honest, would
> > not be very easy to redo in some other language. Would this be
> > considered a problem if we would include it as Ruby code ?
> > IMO execution of these scripts should not be a step of build process
> > but rather a development one, such that Ruby does not become a
> > requirement to build QEmu.
>
> It's not ideal -- I would have preferred python or C -- but I won't object.
>
> At minimum, I would expect build system changes that detects ruby support in
> the system, and a manual build rule that rebuilds the generated files.  This
> build + check-in process would want documenting in target/arc/README or
> something.  If there are any ruby packages required apart from the base
> language, this should be documented as well (I know nothing about ruby myself,
> just guessing based on what happens with python and perl).
>
> Even better would be build system changes that, if ruby is installed runs the
> generator, and only fall-back to the checked-in files if ruby is missing.
>
> In this way, anyone who wants to modify the code generator would merely have 
> to
> install the ruby packages on their system, but they would not be required for 
> a
> non-ARC developer to build.
>
>
> r~



Re: [PATCH 06/15] arc: TCG instruction definitions

2020-12-03 Thread Richard Henderson
On 12/3/20 10:54 AM, Cupertino Miranda wrote:
> Our generation tool has different levels of verbosity, expressing
> instruction semantics from a pattern level up to what it is shown in
>  as comments, which is later converted to TCG format.
> For QEMU purposes I would say that input format should be what is
> shown as comments in  file.

That seems reasonable.

> Also, as is, the generator is done in Ruby, and to be honest, would
> not be very easy to redo in some other language. Would this be
> considered a problem if we would include it as Ruby code ?
> IMO execution of these scripts should not be a step of build process
> but rather a development one, such that Ruby does not become a
> requirement to build QEmu.

It's not ideal -- I would have preferred python or C -- but I won't object.

At minimum, I would expect build system changes that detects ruby support in
the system, and a manual build rule that rebuilds the generated files.  This
build + check-in process would want documenting in target/arc/README or
something.  If there are any ruby packages required apart from the base
language, this should be documented as well (I know nothing about ruby myself,
just guessing based on what happens with python and perl).

Even better would be build system changes that, if ruby is installed runs the
generator, and only fall-back to the checked-in files if ruby is missing.

In this way, anyone who wants to modify the code generator would merely have to
install the ruby packages on their system, but they would not be required for a
non-ARC developer to build.


r~



Re: [PATCH 06/15] arc: TCG instruction definitions

2020-12-03 Thread Cupertino Miranda
Hi Richard,

Thanks for your quick reply, and again, thanks for the reviews.
We have already started to make very significant changes to the port
based on your comments. ;-)

Our generation tool has different levels of verbosity, expressing
instruction semantics from a pattern level up to what it is shown in
 as comments, which is later converted to TCG format.
For QEMU purposes I would say that input format should be what is
shown as comments in  file.

I believe that I can in relative short time isolate the TCG generator
and prepare a tool to process those comments and update the respective
TCG definitions.
Also, as is, the generator is done in Ruby, and to be honest, would
not be very easy to redo in some other language. Would this be
considered a problem if we would include it as Ruby code ?
IMO execution of these scripts should not be a step of build process
but rather a development one, such that Ruby does not become a
requirement to build QEmu.

Thanks,
Cupertino


On Thu, Dec 3, 2020 at 4:08 PM Richard Henderson
 wrote:
>
> On 12/2/20 6:55 AM, Cupertino Miranda wrote:
> > I totally understand your concerns with generated code.
> >
> > To explain our decision, we have an internal database that we are able
> > to describe the architecture and map encoding with hw semantics, and
> > for the sake of saving us debug time generating each and every
> > instruction, we use it to generate both the TCG and instruction
> > decoding tables that you have already reviewed.
> > This tool is not only used in QEmu but through all our tools code,
> > allowing us to cross validate the content of the database.
> >
> > Considering our situation and current state of the port, what would
> > you think is a reasonable compromise?
>
> In some respects you're in the same situation as the hexagon target that's
> currently in flight on the list -- both of you are wanting to generate tcg 
> from
> a company-specific canonical source.
>
> In the case of hexagon, the target/hexagon/imported/ subdirectory contains a
> bunch of stuff exported from Qualcomm's specification.  It's not fantastically
> readable, but it's not bad.  These files are then massaged in various ways to
> produce either (1) out-of-line helpers or (2) inline tcg stuff.
>
> Without knowing what form the Synopsys database takes, how easy would it be to
> export something mostly human-readable, which could then be processed by a 
> tool
> that is included in the qemu source?
>
> Future qemu maintainence is thus on the tool, and not on the auto-generated
> code.  There's also clear separation on what needs tcg review and what's 
> simply
> a spec update.
>
>
> r~



Re: [PATCH 06/15] arc: TCG instruction definitions

2020-12-03 Thread Richard Henderson
On 12/2/20 6:55 AM, Cupertino Miranda wrote:
> I totally understand your concerns with generated code.
> 
> To explain our decision, we have an internal database that we are able
> to describe the architecture and map encoding with hw semantics, and
> for the sake of saving us debug time generating each and every
> instruction, we use it to generate both the TCG and instruction
> decoding tables that you have already reviewed.
> This tool is not only used in QEmu but through all our tools code,
> allowing us to cross validate the content of the database.
> 
> Considering our situation and current state of the port, what would
> you think is a reasonable compromise?

In some respects you're in the same situation as the hexagon target that's
currently in flight on the list -- both of you are wanting to generate tcg from
a company-specific canonical source.

In the case of hexagon, the target/hexagon/imported/ subdirectory contains a
bunch of stuff exported from Qualcomm's specification.  It's not fantastically
readable, but it's not bad.  These files are then massaged in various ways to
produce either (1) out-of-line helpers or (2) inline tcg stuff.

Without knowing what form the Synopsys database takes, how easy would it be to
export something mostly human-readable, which could then be processed by a tool
that is included in the qemu source?

Future qemu maintainence is thus on the tool, and not on the auto-generated
code.  There's also clear separation on what needs tcg review and what's simply
a spec update.


r~



Re: [PATCH 06/15] arc: TCG instruction definitions

2020-12-02 Thread Cupertino Miranda
Hi Richard,

Thank you so much for your reviews.
I will start working on improving the code straight away to try to
minimize waiting times.

However lets just address the elephant in the room.
Indeed we have the TCG definitions being generated.
However we are very serious about wanting to upstream our port and we
will certainly maintain this code.

I totally understand your concerns with generated code.

To explain our decision, we have an internal database that we are able
to describe the architecture and map encoding with hw semantics, and
for the sake of saving us debug time generating each and every
instruction, we use it to generate both the TCG and instruction
decoding tables that you have already reviewed.
This tool is not only used in QEmu but through all our tools code,
allowing us to cross validate the content of the database.

Considering our situation and current state of the port, what would
you think is a reasonable compromise?

Once again thanks for the reviews.

Best regards,
Cupertino


On Tue, Dec 1, 2020 at 11:09 PM Richard Henderson
 wrote:
>
> On 11/11/20 10:17 AM, cupertinomira...@gmail.com wrote:
> > +case 0x09:
> > +/* (N & V & !Z) | (!N & !V & !Z) */
>
> This is xnor(N, V) & !Z, and since as you now know xnor = eqv, you can perform
> this in just two steps.
>
> tcg_gen_eqv_tl(ret, cpu_Nf, cpu_Vf);
> tcg_gen_andc_tl(ret, ret, cpu_Zf);
>
> > +/* GE */
> > +case 0x0A:
> > +/* (N & V) | (!N & !V) */
>
> xnor(N, V)
>
> > +/* LT */
> > +case 0x0B:
> > +/* (N & !V) | (!N & V) */
>
> xor(N, V)
>
> > +/* LE */
> > +case 0x0C:
> > +/* Z | (N & !V) | (!N & V) */
>
> xor(N, V) | Z
>
> > +/* HI */
> > +case 0x0D:
> > +/* !C & !Z */
> > +tcg_gen_xori_tl(nC, cpu_Cf, 1);
> > +tcg_gen_xori_tl(nZ, cpu_Zf, 1);
> > +
> > +tcg_gen_and_tl(ret, nC, nZ);
>
> !A & !B -> !(A | B), so one fewer xor.
>
> > +/* PNZ */
> > +case 0x0F:
> > +/* !N & !Z */
>
> Likewise.
>
> > +void arc_gen_set_memory(DisasCtxt *ctx, TCGv vaddr, int size,
> > +TCGv src, bool sign_extend)
> > +{
> > +switch (size) {
> > +case 0x00:
> > +tcg_gen_qemu_st_tl(src, vaddr, MEMIDX, MO_UL);
> > +break;
>
> Surely you'd put all of this size+sign-extend mapping into a table and issue
> just one tcg_gen_qemu_st_tl.
>
> > +void arc_gen_get_memory(DisasCtxt *ctx, TCGv dest, TCGv vaddr,
> > +int size, bool sign_extend)
> > +{
>
> And re-use the same table here, it would appear.
>
> > +void arc_gen_no_further_loads_pending(DisasCtxt *ctx, TCGv ret)
> > +{
> > +tcg_gen_movi_tl(ret, 1);
> > +}
>
> Huh?  A missing TODO, perhaps?
>
> > +extern bool enabled_interrupts;
>
> Race condition.  What is a global variable doing being set by a translation 
> thread?
>
>
> > +void
> > +arc_gen_execute_delayslot(DisasCtxt *ctx, TCGv bta, TCGv take_branch)
> > +{
> > +static int in_delay_slot = false;
>
> And another one.  Surely these belong in DisasContext.
>
> > +
> > +assert(ctx->insn.limm_p == 0 && !in_delay_slot);
> > +
> > +if (ctx->insn.limm_p == 0 && !in_delay_slot) {
> > +in_delay_slot = true;
> > +uint32_t cpc = ctx->cpc;
> > +uint32_t pcl = ctx->pcl;
> > +insn_t insn = ctx->insn;
> > +
> > +ctx->cpc = ctx->npc;
> > +ctx->pcl = ctx->cpc & 0xfffc;
> > +
> > +++ctx->ds;
> > +
> > +TCGLabel *do_not_set_bta_and_de = gen_new_label();
> > +tcg_gen_brcondi_i32(TCG_COND_NE, take_branch, 1, 
> > do_not_set_bta_and_de);
> > +/*
> > + * In case an exception should be raised during the execution
> > + * of delay slot, bta value is used to set erbta.
> > + */
> > +tcg_gen_mov_tl(cpu_bta, bta);
> > +/* We are in a delay slot */
> > +tcg_gen_mov_tl(cpu_DEf, take_branch);
> > +gen_set_label(do_not_set_bta_and_de);
> > +
> > +tcg_gen_movi_tl(cpu_is_delay_slot_instruction, 1);
> > +
> > +/* Set the pc to the next pc */
> > +tcg_gen_movi_tl(cpu_pc, ctx->npc);
> > +/* Necessary for the likely call to restore_state_to_opc() */
> > +tcg_gen_insn_start(ctx->npc);
>
> Wow, this looks wrong.
>
> It doesn't work with icount or single-stepping.  You need to be able to begin 
> a
> TB at any instruction, including a delay slot.
>
> You should have a look at some of the other targets that handle this, e.g.
> openrisc or sparc.  Since you've got NPC already, for looping, sparc is
> probably the better match.
>
> There should be no reason why you'd need to emit insn_start outside of
> arc_tr_insn_start.
>
> > +void arc_gen_set_register(enum arc_registers reg, TCGv value)
> > +{
> > +switch (reg) {
> > +case R_SP:
> > +tcg_gen_mov_i32(cpu_sp, value);
> > +break;
> > +case R_STATUS32:
> > +gen_helper_set_status32(cpu_env, value);
>
> Lots of changes to status imply a significant change to process

Re: [PATCH 06/15] arc: TCG instruction definitions

2020-12-01 Thread Richard Henderson
On 11/11/20 10:17 AM, cupertinomira...@gmail.com wrote:
> +case 0x09:
> +/* (N & V & !Z) | (!N & !V & !Z) */

This is xnor(N, V) & !Z, and since as you now know xnor = eqv, you can perform
this in just two steps.

tcg_gen_eqv_tl(ret, cpu_Nf, cpu_Vf);
tcg_gen_andc_tl(ret, ret, cpu_Zf);

> +/* GE */
> +case 0x0A:
> +/* (N & V) | (!N & !V) */

xnor(N, V)

> +/* LT */
> +case 0x0B:
> +/* (N & !V) | (!N & V) */

xor(N, V)

> +/* LE */
> +case 0x0C:
> +/* Z | (N & !V) | (!N & V) */

xor(N, V) | Z

> +/* HI */
> +case 0x0D:
> +/* !C & !Z */
> +tcg_gen_xori_tl(nC, cpu_Cf, 1);
> +tcg_gen_xori_tl(nZ, cpu_Zf, 1);
> +
> +tcg_gen_and_tl(ret, nC, nZ);

!A & !B -> !(A | B), so one fewer xor.

> +/* PNZ */
> +case 0x0F:
> +/* !N & !Z */

Likewise.

> +void arc_gen_set_memory(DisasCtxt *ctx, TCGv vaddr, int size,
> +TCGv src, bool sign_extend)
> +{
> +switch (size) {
> +case 0x00:
> +tcg_gen_qemu_st_tl(src, vaddr, MEMIDX, MO_UL);
> +break;

Surely you'd put all of this size+sign-extend mapping into a table and issue
just one tcg_gen_qemu_st_tl.

> +void arc_gen_get_memory(DisasCtxt *ctx, TCGv dest, TCGv vaddr,
> +int size, bool sign_extend)
> +{

And re-use the same table here, it would appear.

> +void arc_gen_no_further_loads_pending(DisasCtxt *ctx, TCGv ret)
> +{
> +tcg_gen_movi_tl(ret, 1);
> +}

Huh?  A missing TODO, perhaps?

> +extern bool enabled_interrupts;

Race condition.  What is a global variable doing being set by a translation 
thread?


> +void
> +arc_gen_execute_delayslot(DisasCtxt *ctx, TCGv bta, TCGv take_branch)
> +{
> +static int in_delay_slot = false;

And another one.  Surely these belong in DisasContext.

> +
> +assert(ctx->insn.limm_p == 0 && !in_delay_slot);
> +
> +if (ctx->insn.limm_p == 0 && !in_delay_slot) {
> +in_delay_slot = true;
> +uint32_t cpc = ctx->cpc;
> +uint32_t pcl = ctx->pcl;
> +insn_t insn = ctx->insn;
> +
> +ctx->cpc = ctx->npc;
> +ctx->pcl = ctx->cpc & 0xfffc;
> +
> +++ctx->ds;
> +
> +TCGLabel *do_not_set_bta_and_de = gen_new_label();
> +tcg_gen_brcondi_i32(TCG_COND_NE, take_branch, 1, 
> do_not_set_bta_and_de);
> +/*
> + * In case an exception should be raised during the execution
> + * of delay slot, bta value is used to set erbta.
> + */
> +tcg_gen_mov_tl(cpu_bta, bta);
> +/* We are in a delay slot */
> +tcg_gen_mov_tl(cpu_DEf, take_branch);
> +gen_set_label(do_not_set_bta_and_de);
> +
> +tcg_gen_movi_tl(cpu_is_delay_slot_instruction, 1);
> +
> +/* Set the pc to the next pc */
> +tcg_gen_movi_tl(cpu_pc, ctx->npc);
> +/* Necessary for the likely call to restore_state_to_opc() */
> +tcg_gen_insn_start(ctx->npc);

Wow, this looks wrong.

It doesn't work with icount or single-stepping.  You need to be able to begin a
TB at any instruction, including a delay slot.

You should have a look at some of the other targets that handle this, e.g.
openrisc or sparc.  Since you've got NPC already, for looping, sparc is
probably the better match.

There should be no reason why you'd need to emit insn_start outside of
arc_tr_insn_start.

> +void arc_gen_set_register(enum arc_registers reg, TCGv value)
> +{
> +switch (reg) {
> +case R_SP:
> +tcg_gen_mov_i32(cpu_sp, value);
> +break;
> +case R_STATUS32:
> +gen_helper_set_status32(cpu_env, value);

Lots of changes to status imply a significant change to processor state, and so
you need to exit to the main loop.

> +/* TODO: Get this from props ... */
> +void arc_has_interrupts(DisasCtxt *ctx, TCGv ret)
> +{
> +tcg_gen_movi_i32(ret, 1);
> +}
> +
> +/*
> + ***
> + * Statically inferred return function *
> + ***
> + */
> +
> +TCGv arc_gen_next_reg(const DisasCtxt *ctx, TCGv reg)
> +{
> +int i;
> +for (i = 0; i < 64; i += 2) {
> +if (reg == cpu_r[i]) {
> +return cpu_r[i + 1];
> +}
> +}
> +/* Check if REG is an odd register. */
> +for (i = 1; i < 64; i += 2) {
> +/* If so, that is unsanctioned. */
> +if (reg == cpu_r[i]) {
> +arc_gen_excp(ctx, EXCP_INST_ERROR, 0, 0);
> +return NULL;
> +}
> +}

Wow, really?  Can't you grab this directly from the operand decoding?  Where
surely we have already ensured that the relevant regnos are not odd.

> +/* REG was not a register after all. */
> +g_assert_not_reached();
> +
> +/* We never get here, but to accommodate -Werror ... */
> +return NULL;

The g_assert_not reached should have been sufficient.

> @@ -0,0 +1,280 @@
> +/*
> + * QEMU ARC CPU
> + *
> + * Copyright (c) 2020 Synppsys Inc.
> + * Contributed by Cupertino Miranda 
> + *
> +