Re: [PATCH] [aarch64] Revert support for ARMv8.2 in tsv110

2018-12-19 Thread Zhangshaokun
Hi Richard,

On 2018/12/19 18:12, Richard Earnshaw (lists) wrote:
> On 19/12/2018 03:11, Shaokun Zhang wrote:
>> For HiSilicon's tsv110 cpu core, it supports some v8_4A features, but
>> some mandatory features are not implemented. Revert to ARMv8.2 that
>> all mandatory features are supported.
>>
> 
> Thanks, I've put this in.
> 

Thanks.

> I've modified the ChangeLog entry slightly - we normally use 'revert' in
> the specific sense of completely removing an existing patch.
> 

I have checked the modified ChangeLog that is precise. Thanks for more 
explanation
about 'revert', got it.

> Also, when sending patches, please do not send ChangeLog entries as part
> of the patch file.  Because the file is always updated at the head, the
> patch hunk is rarely going to apply cleanly.  Instead, include the
> ChangeLog text as part of your email description; that way we can then

Surely, I will follow it. At the beginning, I also had the doubt that every
one would update the ChangeLog when he upstreamed the patch, how to apply
the patch directly if the ChangeLog file is conflicted. I have understood
it when you given the detailed description.

Thanks,
Shaokun

> paste it directly into the ChangeLog file itself and simply correct the
> date.
> 
> R.
> 
>> ---
>>  gcc/ChangeLog| 5 +
>>  gcc/config/aarch64/aarch64-cores.def | 6 +++---
>>  2 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index e9f5baa6557c..842876b0ae90 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,8 @@
>> +2018-12-19 Shaokun Zhang  
>> +
>> +* config/aarch64/aarch64-cores.def (tsv110) : Revert support for ARMv8.2
>> +in tsv110.
>> +
>>  2018-12-18  Vladimir Makarov  
>>  
>>  PR rtl-optimization/87759
>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>> b/gcc/config/aarch64/aarch64-cores.def
>> index 74be5dbf2595..20f4924e084d 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -96,10 +96,10 @@ AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
>> AARCH64_FL_FOR_ARCH8_2
>>  AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
>> AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
>> AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0b, -1)
>>  AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
>> AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
>> cortexa72, 0x41, 0xd0c, -1)
>>  
>> -/* ARMv8.4-A Architecture Processors.  */
>> -
>>  /* HiSilicon ('H') cores. */
>> -AARCH64_CORE("tsv110", tsv110,cortexa57,8_4A, 
>> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES 
>> | AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
>> +AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
>> AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, 
>> tsv110,   0x48, 0xd01, -1)
>> +
>> +/* ARMv8.4-A Architecture Processors.  */
>>  
>>  /* Qualcomm ('Q') cores. */
>>  AARCH64_CORE("saphira", saphira,saphira,8_4A,  
>> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
>> 0x51, 0xC01, -1)
>>
> 
> 
> .
> 



Re: [PATCH v4] [aarch64] Add HiSilicon tsv110 CPU support

2018-09-26 Thread Zhangshaokun
Hi Kyrill,

Thanks your reply.

On 2018/9/26 19:20, Kyrill Tkachov wrote:
> Hi Shaokun,
> 
> On 25/09/18 14:40, Zhangshaokun wrote:
>> Hi ARM maintainers,
>>
>> Any plan to support CTR_EL0.DIC and CTR_EL0.IDC in GCC?
>> I saw it has been supported in linux mainline(on Mar 7),
>> Patch link:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/565090.html
>> Kernel link:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/cpufeature.c?h=v4.19-rc5
>>  +205
> 
> Do you mean implementing the data cache clearing elision in __clear_cache as 
> discussed in June [1]?

Yes.

> I am not aware of any plans to implement that support yet as we'd need 
> hardware to test this properly on.
> 

Okay, got it.

> If you can implement and test it and post it to the list I'm sure the 
> maintainers would be happy to review such patches though.
> 

Sure, I will double check and test it again on certain HiSilicon platform.

Thanks,
Shaokun

> Thanks,
> Kyrill
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2018-06/msg00307.html
>> Thanks,
>> Shaokun
>>
>> On 2018/9/20 22:22, James Greenhalgh wrote:
>>> On Wed, Sep 19, 2018 at 04:53:52AM -0500, Shaokun Zhang wrote:
>>>> This patch adds HiSilicon's an mcpu: tsv110, which supports v8_4A.
>>>> It has been tested on aarch64 and no regressions from this patch.
>>> This patch is OK for Trunk.
>>>
>>> Do you need someone to commit it on your behalf?
>>>
>>> Thanks,
>>> James
>>>
>>>> ---
>>>>   gcc/ChangeLog|   9 +++
>>>>   gcc/config/aarch64/aarch64-cores.def |   3 +
>>>>   gcc/config/aarch64/aarch64-cost-tables.h | 104 
>>>> +++
>>>>   gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>>>   gcc/config/aarch64/aarch64.c |  82 
>>>>   gcc/doc/invoke.texi  |   2 +-
>>>>   6 files changed, 200 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>>> index 69e2e14..a040daa 100644
>>>> --- a/gcc/ChangeLog
>>>> +++ b/gcc/ChangeLog
>>>> @@ -1,3 +1,12 @@
>>>> +2018-09-19  Shaokun Zhang  
>>>> +Bo Zhou  
>>>> +
>>>> +* config/aarch64/aarch64-cores.def (tsv110): New CPU.
>>>> +* config/aarch64/aarch64-tune.md: Regenerated.
>>>> +* doc/invoke.texi (AArch64 Options/-mtune): Add "tsv110".
>>>> +* config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
>>>> +* config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra costs.
>>>> +
>>>>   2018-09-18  Marek Polacek  
>>>> P1064R0 - Allowing Virtual Function Calls in Constant Expressions
>>>  
>>> .
>>>
> 
> 
> .
> 



Re: [PATCH v4] [aarch64] Add HiSilicon tsv110 CPU support

2018-09-25 Thread Zhangshaokun
Hi ARM maintainers,

Any plan to support CTR_EL0.DIC and CTR_EL0.IDC in GCC?
I saw it has been supported in linux mainline(on Mar 7),
Patch link:
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-March/565090.html
Kernel link:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/kernel/cpufeature.c?h=v4.19-rc5
 +205

Thanks,
Shaokun

On 2018/9/20 22:22, James Greenhalgh wrote:
> On Wed, Sep 19, 2018 at 04:53:52AM -0500, Shaokun Zhang wrote:
>> This patch adds HiSilicon's an mcpu: tsv110, which supports v8_4A.
>> It has been tested on aarch64 and no regressions from this patch.
> 
> This patch is OK for Trunk.
> 
> Do you need someone to commit it on your behalf?
> 
> Thanks,
> James
> 
>>
>> ---
>>  gcc/ChangeLog|   9 +++
>>  gcc/config/aarch64/aarch64-cores.def |   3 +
>>  gcc/config/aarch64/aarch64-cost-tables.h | 104 
>> +++
>>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>  gcc/config/aarch64/aarch64.c |  82 
>>  gcc/doc/invoke.texi  |   2 +-
>>  6 files changed, 200 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 69e2e14..a040daa 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,12 @@
>> +2018-09-19  Shaokun Zhang  
>> +Bo Zhou  
>> +
>> +* config/aarch64/aarch64-cores.def (tsv110): New CPU.
>> +* config/aarch64/aarch64-tune.md: Regenerated.
>> +* doc/invoke.texi (AArch64 Options/-mtune): Add "tsv110".
>> +* config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
>> +* config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra costs.
>> +
>>  2018-09-18  Marek Polacek  
>>  
>>  P1064R0 - Allowing Virtual Function Calls in Constant Expressions
>  
> 
> .
> 



Re: [PATCH v4] [aarch64] Add HiSilicon tsv110 CPU support

2018-09-21 Thread Zhangshaokun
Hi Kyrill,

On 2018/9/21 20:25, Kyrill Tkachov wrote:
> Hi Shaokun,
> 
> On 20/09/18 15:54, Zhangshaokun wrote:
>> Hi James,
>>
>> On 2018/9/20 22:22, James Greenhalgh wrote:
>>> On Wed, Sep 19, 2018 at 04:53:52AM -0500, Shaokun Zhang wrote:
>>>> This patch adds HiSilicon's an mcpu: tsv110, which supports v8_4A.
>>>> It has been tested on aarch64 and no regressions from this patch.
>>> This patch is OK for Trunk.
>>>
>>> Do you need someone to commit it on your behalf?
>>>
>> Sure, it is great.
> 
> I've committed this on your behalf with revision 264470.
> Thank you for your patience and persistence.
> 

It is pretty nice. I shall appreciate you and other maintainers professional
comments and guidance, it is really kind and helpful.

Thanks,
Shaokun

> Kyrill
> 
>> Thanks in advance,
>> Shaokun
>>
>>> Thanks,
>>> James
>>>
>>>> ---
>>>>   gcc/ChangeLog|   9 +++
>>>>   gcc/config/aarch64/aarch64-cores.def |   3 +
>>>>   gcc/config/aarch64/aarch64-cost-tables.h | 104 
>>>> +++
>>>>   gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>>>   gcc/config/aarch64/aarch64.c |  82 
>>>>   gcc/doc/invoke.texi  |   2 +-
>>>>   6 files changed, 200 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>>> index 69e2e14..a040daa 100644
>>>> --- a/gcc/ChangeLog
>>>> +++ b/gcc/ChangeLog
>>>> @@ -1,3 +1,12 @@
>>>> +2018-09-19  Shaokun Zhang  
>>>> +Bo Zhou  
>>>> +
>>>> +* config/aarch64/aarch64-cores.def (tsv110): New CPU.
>>>> +* config/aarch64/aarch64-tune.md: Regenerated.
>>>> +* doc/invoke.texi (AArch64 Options/-mtune): Add "tsv110".
>>>> +* config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
>>>> +* config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra costs.
>>>> +
>>>>   2018-09-18  Marek Polacek  
>>>> P1064R0 - Allowing Virtual Function Calls in Constant Expressions
>>>  
>>> .
>>>
> 
> 
> .
> 



Re: [PATCH v4] [aarch64] Add HiSilicon tsv110 CPU support

2018-09-20 Thread Zhangshaokun
Hi James,

On 2018/9/20 22:22, James Greenhalgh wrote:
> On Wed, Sep 19, 2018 at 04:53:52AM -0500, Shaokun Zhang wrote:
>> This patch adds HiSilicon's an mcpu: tsv110, which supports v8_4A.
>> It has been tested on aarch64 and no regressions from this patch.
> 
> This patch is OK for Trunk.
> 
> Do you need someone to commit it on your behalf?
> 

Sure, it is great.

Thanks in advance,
Shaokun

> Thanks,
> James
> 
>>
>> ---
>>  gcc/ChangeLog|   9 +++
>>  gcc/config/aarch64/aarch64-cores.def |   3 +
>>  gcc/config/aarch64/aarch64-cost-tables.h | 104 
>> +++
>>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>  gcc/config/aarch64/aarch64.c |  82 
>>  gcc/doc/invoke.texi  |   2 +-
>>  6 files changed, 200 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 69e2e14..a040daa 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,12 @@
>> +2018-09-19  Shaokun Zhang  
>> +Bo Zhou  
>> +
>> +* config/aarch64/aarch64-cores.def (tsv110): New CPU.
>> +* config/aarch64/aarch64-tune.md: Regenerated.
>> +* doc/invoke.texi (AArch64 Options/-mtune): Add "tsv110".
>> +* config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
>> +* config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra costs.
>> +
>>  2018-09-18  Marek Polacek  
>>  
>>  P1064R0 - Allowing Virtual Function Calls in Constant Expressions
>  
> 
> .
> 



Re: [PATCH v3] [aarch64] Add HiSilicon tsv110 CPU support

2018-07-08 Thread Zhangshaokun
Hi maintainers,

A gentle ping.

Thanks,
Shaokun

On 2018/6/21 19:13, Shaokun Zhang wrote:
> This patch adds HiSilicon's an mcpu: tsv110, which supports v8_4A.
> It has been tested on aarch64 and no regressions from this patch.
> 
> ---
>  gcc/ChangeLog|   8 +++
>  gcc/config/aarch64/aarch64-cores.def |   3 +
>  gcc/config/aarch64/aarch64-cost-tables.h | 103 
> +++
>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>  gcc/config/aarch64/aarch64.c |  82 
>  gcc/doc/invoke.texi  |   2 +-
>  6 files changed, 198 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index d9fbc0c..f5538f7 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,11 @@
> +2018-06-21  Shaokun Zhang  
> +Bo Zhou  
> +   * config/aarch64/aarch64-cores.def (tsv110): New CPU.
> +   * config/aarch64/aarch64-tune.md: Regenerated.
> +   * doc/invoke.texi (AArch64 Options/-mtune): Add "tsv110".
> +   * config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
> +   * config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra costs.
> +
>  2018-06-21  Richard Biener  
>  
>   * tree-data-ref.c (dr_step_indicator): Handle NULL DR_STEP.
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index e64d831..e6ebf02 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -88,6 +88,9 @@ AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
> AARCH64_FL_FOR_ARCH8_2
>  
>  /* ARMv8.4-A Architecture Processors.  */
>  
> +/* HiSilicon ('H') cores. */
> +AARCH64_CORE("tsv110", tsv110,cortexa57,8_4A, 
> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES 
> | AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
> +
>  /* Qualcomm ('Q') cores. */
>  AARCH64_CORE("saphira", saphira,falkor,8_4A,  
> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
> 0x51, 0xC01, -1)
>  
> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
> b/gcc/config/aarch64/aarch64-cost-tables.h
> index a455c62..44095ce 100644
> --- a/gcc/config/aarch64/aarch64-cost-tables.h
> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
> @@ -334,4 +334,107 @@ const struct cpu_cost_table thunderx2t99_extra_costs =
>}
>  };
>  
> +const struct cpu_cost_table tsv110_extra_costs =
> +{
> +  /* ALU */
> +  {
> +0, /* arith.  */
> +0, /* logical.  */
> +0, /* shift.  */
> +0, /* shift_reg.  */
> +COSTS_N_INSNS (1), /* arith_shift.  */
> +COSTS_N_INSNS (1), /* arith_shift_reg.  */
> +COSTS_N_INSNS (1), /* log_shift.  */
> +COSTS_N_INSNS (1), /* log_shift_reg.  */
> +0, /* extend.  */
> +COSTS_N_INSNS (1), /* extend_arith.  */
> +0, /* bfi.  */
> +0, /* bfx.  */
> +0, /* clz.  */
> +0,  /* rev.  */
> +0, /* non_exec.  */
> +true   /* non_exec_costs_exec.  */
> +  },
> +  {
> +/* MULT SImode */
> +{
> +  COSTS_N_INSNS (2),   /* simple.  */
> +  COSTS_N_INSNS (2),   /* flag_setting.  */
> +  COSTS_N_INSNS (2),   /* extend.  */
> +  COSTS_N_INSNS (2),   /* add.  */
> +  COSTS_N_INSNS (2),   /* extend_add.  */
> +  COSTS_N_INSNS (11)   /* idiv.  */
> +},
> +/* MULT DImode */
> +{
> +  COSTS_N_INSNS (3),   /* simple.  */
> +  0,   /* flag_setting (N/A).  */
> +  COSTS_N_INSNS (3),   /* extend.  */
> +  COSTS_N_INSNS (3),   /* add.  */
> +  COSTS_N_INSNS (3),   /* extend_add.  */
> +  COSTS_N_INSNS (19)   /* idiv.  */
> +}
> +  },
> +  /* LD/ST */
> +  {
> +COSTS_N_INSNS (3), /* load.  */
> +COSTS_N_INSNS (4), /* load_sign_extend.  */
> +COSTS_N_INSNS (3), /* ldrd.  */
> +COSTS_N_INSNS (3), /* ldm_1st.  */
> +1, /* ldm_regs_per_insn_1st.  */
> +2, /* ldm_regs_per_insn_subsequent.  */
> +COSTS_N_INSNS (4), /* loadf.  */
> +COSTS_N_INSNS (4), /* loadd.  */
> +COSTS_N_INSNS (4), /* load_unaligned.  */
> +0, /* store.  */
> +0, /* strd.  */
> +0, /* stm_1st.  */
> +1, /* stm_regs_per_insn_1st.  */
> +2, /* stm_regs_per_insn_subsequent.  */
> +0, /* storef.  */
> +0, /* stored.  */
> +COSTS_N_INSNS (1), /* store_unaligned.  */
> +COSTS_N_INSNS (4), /* loadv.  */
> +COSTS_N_INSNS (4)  /* storev.  */
> +  },
> +  {
> +/* FP SFmode */
> +{
> +  

Re: [PATCH v2] [aarch64] Add HiSilicon tsv110 CPU support

2018-06-21 Thread Zhangshaokun
Hi Kyrill,

On 2018/6/21 20:56, Kyrill Tkachov wrote:
> Hi Shaokun,
> 
> On 21/06/18 12:07, Zhangshaokun wrote:
>> Hi Kyrill,
>>
>> It was the Dragon Boat Festival for a short holiday in China, sorry to
>> reply later.
>>
>> On 2018/6/14 15:58, Kyrill Tkachov wrote:
>>> Hi Shaokun,
>>>
>>> On 14/06/18 02:09, Shaokun Zhang wrote:
>>>> This patch adds HiSilicon's an mcpu: tsv110, which supports v8_4A.
>>>>
>>>> ---
>>>>gcc/ChangeLog|   8 +++
>>>>gcc/config/aarch64/aarch64-cores.def |   3 +
>>>>gcc/config/aarch64/aarch64-cost-tables.h | 103 
>>>> +++
>>>>gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>>>gcc/config/aarch64/aarch64.c |  80 +++-
>>>>gcc/doc/invoke.texi  |   2 +-
>>>>6 files changed, 195 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>>> index 9c90875..e376714 100644
>>>> --- a/gcc/ChangeLog
>>>> +++ b/gcc/ChangeLog
>>>> @@ -1,3 +1,11 @@
>>>> +2018-06-12  Shaokun Zhang  
>>>> +Bo Zhou  
>>>> +* config/aarch64/aarch64-cores.def (tsv110): New CPU.
>>>> +* config/aarch64/aarch64-tune.md: Regenerated.
>>>> +* doc/invoke.texi (AArch64 Options/-mtune): Add "tsv110".
>>>> +* config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
>>>> +* config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra costs.
>>>> +
>>> Can you confirm that you've run a bootstrap and test run with this patch
>>> to check there are no regressions?
>>>
>> I have tested this patch (fix some typo) on aarch64 and didn't get any 
>> regressions.
>>
>> While, there is issue that is on the master branch:
>> ../.././gcc/bitmap.c: In function ‘unsigned int 
>> bitmap_last_set_bit(const_bitmap)’:
>> ../.././gcc/bitmap.c:841:26: error: array subscript -1 is below array bounds 
>> of ‘const BITMAP_WORD [2]’ {aka ‘const long unsigned int [2]’} 
>> [-Werror=array-bounds]
>> word = elt->bits[ix];
>>^
>> cc1plus: all warnings being treated as errors
>> Makefile:1110: recipe for target 'bitmap.o' failed
>> make[3]: *** [bitmap.o] Error 1
> 
> I don't see that error with the current trunk based off r261832 (today).

I got it based on fa681b4(also today).

> Can you make sure the bootstrap passes with your patch on top of the recent 
> trunk?

On this patch, My mistake that there were some typos, I have fixed them and 
sent patch
v3, please review.

Thanks,
Shaokun

> 
> Thanks,
> Kyrill
> 
>> My gcc version is: gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609.
>> Are you happy to fix it? I fixed it in my local, but I am not sure it is ok.
>>
>>> This version looks good to me but you'll need final approval from the 
>>> maintainers.
>>>
>> I will update patch based on latest branch code today.
>> Hopefully you and maintainers are happy on v3.
>>
>> Thanks,
>> Shaokun.
>>
>>> Thanks,
>>> Kyrill
>>>
>>>>2018-06-12  Eric Botcazou  
>>>>  * gcc.c: Document new %@{...} sequence.
>>>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>>>> b/gcc/config/aarch64/aarch64-cores.def
>>>> index e64d831..e6ebf02 100644
>>>> --- a/gcc/config/aarch64/aarch64-cores.def
>>>> +++ b/gcc/config/aarch64/aarch64-cores.def
>>>> @@ -88,6 +88,9 @@ AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
>>>> AARCH64_FL_FOR_ARCH8_2
>>>>  /* ARMv8.4-A Architecture Processors.  */
>>>>+/* HiSilicon ('H') cores. */
>>>> +AARCH64_CORE("tsv110", tsv110,cortexa57,8_4A, 
>>>> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | 
>>>> AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
>>>> +
>>>>/* Qualcomm ('Q') cores. */
>>>>AARCH64_CORE("saphira", saphira,falkor,8_4A,  
>>>> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
>>>> 0x51, 0xC01, -1)
>>>>diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
>>>> b/gcc/config/aarch64/aarch64-cost-tables.h
&

Re: [PATCH v2] [aarch64] Add HiSilicon tsv110 CPU support

2018-06-21 Thread Zhangshaokun
Hi Kyrill,

It was the Dragon Boat Festival for a short holiday in China, sorry to
reply later.

On 2018/6/14 15:58, Kyrill Tkachov wrote:
> Hi Shaokun,
> 
> On 14/06/18 02:09, Shaokun Zhang wrote:
>> This patch adds HiSilicon's an mcpu: tsv110, which supports v8_4A.
>>
>> ---
>>   gcc/ChangeLog|   8 +++
>>   gcc/config/aarch64/aarch64-cores.def |   3 +
>>   gcc/config/aarch64/aarch64-cost-tables.h | 103 
>> +++
>>   gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>   gcc/config/aarch64/aarch64.c |  80 +++-
>>   gcc/doc/invoke.texi  |   2 +-
>>   6 files changed, 195 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 9c90875..e376714 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,11 @@
>> +2018-06-12  Shaokun Zhang  
>> +Bo Zhou  
>> +* config/aarch64/aarch64-cores.def (tsv110): New CPU.
>> +* config/aarch64/aarch64-tune.md: Regenerated.
>> +* doc/invoke.texi (AArch64 Options/-mtune): Add "tsv110".
>> +* config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
>> +* config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra costs.
>> +
> 
> Can you confirm that you've run a bootstrap and test run with this patch
> to check there are no regressions?
> 

I have tested this patch (fix some typo) on aarch64 and didn't get any 
regressions.

While, there is issue that is on the master branch:
../.././gcc/bitmap.c: In function ‘unsigned int 
bitmap_last_set_bit(const_bitmap)’:
../.././gcc/bitmap.c:841:26: error: array subscript -1 is below array bounds of 
‘const BITMAP_WORD [2]’ {aka ‘const long unsigned int [2]’} 
[-Werror=array-bounds]
   word = elt->bits[ix];
  ^
cc1plus: all warnings being treated as errors
Makefile:1110: recipe for target 'bitmap.o' failed
make[3]: *** [bitmap.o] Error 1

My gcc version is: gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609.
Are you happy to fix it? I fixed it in my local, but I am not sure it is ok.

> This version looks good to me but you'll need final approval from the 
> maintainers.
> 

I will update patch based on latest branch code today.
Hopefully you and maintainers are happy on v3.

Thanks,
Shaokun.

> Thanks,
> Kyrill
> 
>>   2018-06-12  Eric Botcazou  
>> * gcc.c: Document new %@{...} sequence.
>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>> b/gcc/config/aarch64/aarch64-cores.def
>> index e64d831..e6ebf02 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -88,6 +88,9 @@ AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
>> AARCH64_FL_FOR_ARCH8_2
>> /* ARMv8.4-A Architecture Processors.  */
>>   +/* HiSilicon ('H') cores. */
>> +AARCH64_CORE("tsv110", tsv110,cortexa57,8_4A, 
>> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES 
>> | AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
>> +
>>   /* Qualcomm ('Q') cores. */
>>   AARCH64_CORE("saphira", saphira,falkor,8_4A,  
>> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
>> 0x51, 0xC01, -1)
>>   diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
>> b/gcc/config/aarch64/aarch64-cost-tables.h
>> index a455c62..b6890d6 100644
>> --- a/gcc/config/aarch64/aarch64-cost-tables.h
>> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
>> @@ -334,4 +334,107 @@ const struct cpu_cost_table thunderx2t99_extra_costs =
>> }
>>   };
>>   +const struct cpu_cost_table tsv110_extra_costs =
>> +{
>> +  /* ALU */
>> +  {
>> +0, /* arith.  */
>> +0, /* logical.  */
>> +0, /* shift.  */
>> +0, /* shift_reg.  */
>> +COSTS_N_INSNS (1), /* arith_shift.  */
>> +COSTS_N_INSNS (1), /* arith_shift_reg.  */
>> +COSTS_N_INSNS (1), /* log_shift.  */
>> +COSTS_N_INSNS (1), /* log_shift_reg.  */
>> +0, /* extend.  */
>> +COSTS_N_INSNS (1), /* extend_arith.  */
>> +0, /* bfi.  */
>> +0, /* bfx.  */
>> +0, /* clz.  */
>> +0,   /* rev.  */
>> +0, /* non_exec.  */
>> +true   /* non_exec_costs_exec.  */
>> +  },
>> +  {
>> +/* MULT SImode */
>> +{
>> +  COSTS_N_INSNS (2),   /* simple.  */
>> +  COSTS_N_INSNS (2),   /* flag_setting.  */
>> +  COSTS_N_INSNS (2),   /* extend.  */
>> +  COSTS_N_INSNS (2),   /* add.  */
>> +  COSTS_N_INSNS (2),   /* extend_add.  */
>> +  COSTS_N_INSNS (11)   /* idiv.  */
>> +},
>> +/* MULT DImode */
>> +{
>> +  COSTS_N_INSNS (3),   /* simple.  */
>> +  0,   /* flag_setting (N/A).  */
>> +  COSTS_N_INSNS (3),   /* extend.  */
>> +  COSTS_N_INSNS (3),   /* add.  */
>> +  COSTS_N_INSNS (3

Re: [RFC] [aarch64] Add HiSilicon tsv110 CPU support

2018-06-07 Thread Zhangshaokun
Hi Kyrill,

On 2018/6/6 22:51, Kyrill Tkachov wrote:
> Hi Shaokun,
> 
> On 01/06/18 10:56, Zhangshaokun wrote:
>> Hi Ramana,
>>
>> Sorry to reply so later because of short leave.
>>
>> On 2018/5/23 18:41, Ramana Radhakrishnan wrote:
>>>
>>> On 23/05/2018 03:50, Zhangshaokun wrote:
>>>> Hi Ramana,
>>>>
>>>> On 2018/5/22 18:28, Ramana Radhakrishnan wrote:
>>>>> On Tue, May 22, 2018 at 9:40 AM, Shaokun Zhang
>>>>>  wrote:
>>>>>> tsv110 is designed by HiSilicon and supports v8_4A, it also optimizes
>>>>>> L1 Icache which can access L1 Dcache.
>>>>>> Therefore, DC CVAU is not necessary in __aarch64_sync_cache_range for
>>>>>> tsv110, is there any good idea to skip DC CVAU operation for tsv110.
>>>>> A solution would be to use an ifunc but on a cpu variant.
>>>>>
>>>> ifunc, can you give further explanation?
>>>> If on a cpu variant, for HiSilicon tsv110, we have two version and CPU 
>>>> variants
>>>> are 0 and 1. Both are expected to skip DC CVAU operation in sync icache and
>>>> dcache.
>>>>> Since it is not necessary for sync icache and dcache, it is beneficial for
>>>>> performance to skip the redundant DC CVAU and do IC IVAU only.
>>>>> For JVM, __clear_cache is called many times.
>>>>>
>>> Thanks for some more detail as to where you think you want to use this. 
>>> Have you investigated whether the jvm can actually elide such a call rather 
>>> than trying to fix this in the toolchain ?
>>>
>> In fact, We(HiSilicon) want optimize DC CVAU not only in the toolchain, but 
>> also for LLVM and others.
>>
>>> If you really need to think about solutions in the toolchain -
>>>
>>> The simplest first step would be to implement the changes hinted at by the 
>>> comment in aarch64.h .
>>>
>>>   If you read the comment above CLEAR_INSN_CACHE in aarch64.h you would see 
>>> that
>>>
>>> /* This definition should be relocated to aarch64-elf-raw.h.  This macro
>>> should be undefined in aarch64-linux.h and a clear_cache pattern
>>> implmented to emit either the call to __aarch64_sync_cache_range()
>>> directly or preferably the appropriate sycall or cache clear
>>> instructions inline.  */
>>> #define CLEAR_INSN_CACHE(beg, end)  \
>>>extern void  __aarch64_sync_cache_range (void *, void *); \
>>>__aarch64_sync_cache_range (beg, end)
>>>
>>> Thus I would expect that by implementing the clear_cache pattern and 
>>> deciding whether to put out the call to the __aarch64_sync_cache_range 
>>> function or not depending on whether you had the tsv110 chosen on the 
>>> command line would allow you to have an idea of what the performance gain 
>>> actually is by compiling the jvm with -mcpu=tsv110 vs -march=armv8-a. You 
>>> probably also want to clean up the trampoline_init code while you are here.
>>>
>> Thanks for your nice explanation and guidance.
>> For our next generation cpu core tsv200, We will optimize for IC IVAU that 
>> there is no need to
>> flush Icache, keep the clear_cache as NOP function. Shall I consider this? 
>> or Maybe i lose
>> something what your said.
> 
> I've had a look at the __clear_cache implementation and investigated these 
> cache coherency bits.
> If clearing the instruction cache means you don't need to explicitly clear 
> the data cache then
> the IDC bit of the CTR_EL0 register will be set to 1. This is how you can 
> identify that you can

Thanks your guidance, I check it again that IDC bit has been added in CTR_EL0 
in ARMv8.5. It is
a pity that our tsv110 core doesn't enable this bit which supports v8.4, but 
adds elision DC CVAU
feature. For HiSilion tsv200, IDC and IDC will be enabled.

> avoid the explicit "DC CVAU" in __clear_cache.
> Have a look at the D10.2.33 section in the Arm Architecture Reference Manual 
> Issue C.a [1]
> for more documentation.
> 
> To implement this elision in libgcc you'd need to extend 
> __arch64_sync_cache_range
> in config/aarch64/sync_cache.c to read the IDC bit from CTR_EL0.
> The code there already reads CTR_EL0 and caches its value so you just need to 
> extract that bit
> and use it to decide whether to perform the "DC CVAU" loop.
> 
> But that should a patch on its own.

Got it, both IDC and DIC shall be checked in this function. It seems pretty 
good fo

Re: [RFC] [aarch64] Add HiSilicon tsv110 CPU support

2018-06-01 Thread Zhangshaokun
Hi Ramana,

Sorry to reply so later because of short leave.

On 2018/5/23 18:41, Ramana Radhakrishnan wrote:
> 
> 
> On 23/05/2018 03:50, Zhangshaokun wrote:
>> Hi Ramana,
>>
>> On 2018/5/22 18:28, Ramana Radhakrishnan wrote:
>>> On Tue, May 22, 2018 at 9:40 AM, Shaokun Zhang
>>>  wrote:
>>>> tsv110 is designed by HiSilicon and supports v8_4A, it also optimizes
>>>> L1 Icache which can access L1 Dcache.
>>>> Therefore, DC CVAU is not necessary in __aarch64_sync_cache_range for
>>>> tsv110, is there any good idea to skip DC CVAU operation for tsv110.
>>>
>>> A solution would be to use an ifunc but on a cpu variant.
>>>
>>
>> ifunc, can you give further explanation?
>> If on a cpu variant, for HiSilicon tsv110, we have two version and CPU 
>> variants
>> are 0 and 1. Both are expected to skip DC CVAU operation in sync icache and
>> dcache.
> 
>>> Since it is not necessary for sync icache and dcache, it is beneficial for
>>> performance to skip the redundant DC CVAU and do IC IVAU only.
>>> For JVM, __clear_cache is called many times.
>>>
> 
> Thanks for some more detail as to where you think you want to use this. Have 
> you investigated whether the jvm can actually elide such a call rather than 
> trying to fix this in the toolchain ?
> 

In fact, We(HiSilicon) want optimize DC CVAU not only in the toolchain, but 
also for LLVM and others.

> If you really need to think about solutions in the toolchain -
> 
> The simplest first step would be to implement the changes hinted at by the 
> comment in aarch64.h .
> 
>  If you read the comment above CLEAR_INSN_CACHE in aarch64.h you would see 
> that
> 
> /* This definition should be relocated to aarch64-elf-raw.h.  This macro
>should be undefined in aarch64-linux.h and a clear_cache pattern
>implmented to emit either the call to __aarch64_sync_cache_range()
>directly or preferably the appropriate sycall or cache clear
>instructions inline.  */
> #define CLEAR_INSN_CACHE(beg, end)  \
>   extern void  __aarch64_sync_cache_range (void *, void *); \
>   __aarch64_sync_cache_range (beg, end)
> 
> Thus I would expect that by implementing the clear_cache pattern and deciding 
> whether to put out the call to the __aarch64_sync_cache_range function or not 
> depending on whether you had the tsv110 chosen on the command line would 
> allow you to have an idea of what the performance gain actually is by 
> compiling the jvm with -mcpu=tsv110 vs -march=armv8-a. You probably also want 
> to clean up the trampoline_init code while you are here.
> 

Thanks for your nice explanation and guidance.
For our next generation cpu core tsv200, We will optimize for IC IVAU that 
there is no need to
flush Icache, keep the clear_cache as NOP function. Shall I consider this? or 
Maybe i lose
something what your said.

Thanks,
Shaokun

> I do think that's something that should be easy enough to do and the subject 
> of a patch series in its own right. If your users can rebuild the world for 
> tsv110 then this is sufficient.
> 
> If you want to have a single jvm binary without any run time checks, then you 
> need to investigate the use of ifuncs which are a mechanism in the GNU 
> toolchain for some of this kind of stuff. We tend not to ifuncs on a per CPU 
> basis unless there is a very good reason and the performance improvement is 
> worth it (but probably more on a per architecture or per architectural basis) 
> and you will need to make the case for it including what sort of performance 
> benefits it gives. Some introduction about this feature can be found here. 
> https://sourceware.org/glibc/wiki/GNU_IFUNC
> 
> regards
> Ramana
> 
>>
>> Hi ARM guys,
>> are you happy to share yours idea about this?
>>
>>> Is this really that important for performance and on what workloads ?
>>>
>>
>> Since it is not necessary for sync icache and dcache, it is beneficial for
>> performance to skip the redundant DC CVAU and do IC IVAU only.
>> For JVM, __clear_cache is called many times.
>>
>> Thanks,
>> Shaokun
>>
>>> regards
>>> Ramana
>>>
>>>>
>>>> Any thoughts and ideas are welcome.
>>>>
>>>> Shaokun Zhang (1):
>>>>[aarch64] Add HiSilicon tsv110 CPU support.
>>>>
>>>>   gcc/ChangeLog|   9 +++
>>>>   gcc/config/aarch64/aarch64-cores.def |   5 ++
>>>>   gcc/config/aarch64/aarch64-cost-tables.h | 103 
>>>> +++
>>>>   gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>>>   gcc/config/aarch64/aarch64.c |  79 
>>>>   gcc/doc/invoke.texi  |   2 +-
>>>>   6 files changed, 198 insertions(+), 2 deletions(-)
>>>>
>>>> -- 
>>>> 2.7.4
>>>>
>>>
>>>
>>
> 
> .
> 



Re: [RFC] [aarch64] Add HiSilicon tsv110 CPU support.

2018-05-23 Thread Zhangshaokun
Hi Kyrill,

On 2018/5/23 16:08, Kyrill Tkachov wrote:
> 
> On 23/05/18 05:54, Zhangshaokun wrote:
>> Hi Kyrill,
>>
>> On 2018/5/22 18:52, Kyrill Tkachov wrote:
>>> Hi Shaokun,
>>>
>>> On 22/05/18 09:40, Shaokun Zhang wrote:
>>>> This patch adds HiSilicon's an mcpu: tsv110.
>>>>
>>>> ---
>>>>   gcc/ChangeLog|   9 +++
>>>>   gcc/config/aarch64/aarch64-cores.def |   5 ++
>>>>   gcc/config/aarch64/aarch64-cost-tables.h | 103 
>>>> +++
>>>>   gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>>>   gcc/config/aarch64/aarch64.c |  79 
>>>>   gcc/doc/invoke.texi  |   2 +-
>>>>   6 files changed, 198 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>>> index cec2892..5d44966 100644
>>>> --- a/gcc/ChangeLog
>>>> +++ b/gcc/ChangeLog
>>>> @@ -1,3 +1,12 @@
>>>> +2018-05-22  Shaokun Zhang 
>>>> +Bo Zhou  
>>>> +
>>>> +   * config/aarch64/aarch64-cores.def (tsv110): New CPU.
>>>> +   * config/aarch64/aarch64-tune.md: Regenerated.
>>>> +   * doc/invoke.texi (AArch61 Options/-mtune): Add "tsv110".
>>> typo: AArch64.
>>>
>> Good catch, my mistake.
>>
>>>> +   * gcc/config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
>>>> +   * gcc/config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra 
>>>> costs.
>>> Please start the path with config/.
>>>
>> Sure, Will remove gcc/ next version.
>>
>>>> +
>>>>   2018-05-21  Michael Meissner 
>>>>
>>>>   PR target/85657
>>>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>>>> b/gcc/config/aarch64/aarch64-cores.def
>>>> index 33b96ca..db7a412 100644
>>>> --- a/gcc/config/aarch64/aarch64-cores.def
>>>> +++ b/gcc/config/aarch64/aarch64-cores.def
>>>> @@ -91,6 +91,11 @@ AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A, 
>>>>  AARCH64_FL_FOR_ARCH8_2
>>>>   /* Qualcomm ('Q') cores. */
>>>>   AARCH64_CORE("saphira", saphira,falkor,8_3A, 
>>>> AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
>>>> 0x51, 0xC01, -1)
>>>>
>>>> +/* ARMv8.4-A Architecture Processors.  */
>>>> +
>>>> +/* HiSilicon ('H') cores. */
>>>> +AARCH64_CORE("tsv110", tsv110,tsv110,8_4A, 
>>>> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | 
>>>> AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
>>>> +
>>> The third field is the scheduler model to use when optimising.
>>> Since there is no tsv110 scheduling model, using the name "tsv110"
>>> in the third field will generally give pretty poor schedules.
>>> I recommend you specify an scheduling model that most closely matches your 
>>> core
>>> for the time being. But I don't think it's required and I wouldn't let it 
>>> hold
>> I checked it again, cortexa57 is most closely matches tsv110 and thanks your
>> suggestion.
>> If i choose cortexa57, can i add the tsv110_tunings which will use tsv110's
>> pipeline features, like the rest patch as follow or only use generic feature?
> 
> If you use cortexa57 for the scheduling model (the 3rd field) you should still
> use tsv110_tunings in the 6th field as this will specify other important 
> parameters
> like instruction selection costs, fusion capabilities, alignment requirements 
> etc.
> 

Thanks your comments, i will wait other maintainers comments and prepare next 
version.
One more question, any thoughts on my cover letter issue that skips DC CVAU for
HiSilicon tsv110 when sync icache and dcache?

Thanks,
Shaokun

> Thanks,
> Kyrill
> 
>>
>>> up the patch.
>>>
>>> You'll need approval from an aarch64 maintainer (cc'ed some for you).
>>>
>> Good, thanks for your nice guidance.
>>
>> Thanks,
>> Shaokun
>>
>>> Thanks,
>>> Kyrill
>>>
>>>>   /* ARMv8-A big.LITTLE implementations.  */
>>>>
>>>>   AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 
>>>> 8A,  AARCH64_FL_FO

Re: [RFC] [aarch64] Add HiSilicon tsv110 CPU support.

2018-05-22 Thread Zhangshaokun

Hi Kyrill,

On 2018/5/22 18:52, Kyrill Tkachov wrote:
> Hi Shaokun,
> 
> On 22/05/18 09:40, Shaokun Zhang wrote:
>> This patch adds HiSilicon's an mcpu: tsv110.
>>
>> ---
>>  gcc/ChangeLog|   9 +++
>>  gcc/config/aarch64/aarch64-cores.def |   5 ++
>>  gcc/config/aarch64/aarch64-cost-tables.h | 103 
>> +++
>>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>  gcc/config/aarch64/aarch64.c |  79 
>>  gcc/doc/invoke.texi  |   2 +-
>>  6 files changed, 198 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index cec2892..5d44966 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,12 @@
>> +2018-05-22  Shaokun Zhang 
>> +Bo Zhou  
>> +
>> +   * config/aarch64/aarch64-cores.def (tsv110): New CPU.
>> +   * config/aarch64/aarch64-tune.md: Regenerated.
>> +   * doc/invoke.texi (AArch61 Options/-mtune): Add "tsv110".
> 
> typo: AArch64.
> 

Good catch, my mistake.

>> +   * gcc/config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
>> +   * gcc/config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra costs.
> 
> Please start the path with config/.
> 

Sure, Will remove gcc/ next version.

>> +
>>  2018-05-21  Michael Meissner 
>>
>>  PR target/85657
>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>> b/gcc/config/aarch64/aarch64-cores.def
>> index 33b96ca..db7a412 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -91,6 +91,11 @@ AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
>> AARCH64_FL_FOR_ARCH8_2
>>  /* Qualcomm ('Q') cores. */
>>  AARCH64_CORE("saphira", saphira,falkor,8_3A, 
>> AARCH64_FL_FOR_ARCH8_3 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
>> 0x51, 0xC01, -1)
>>
>> +/* ARMv8.4-A Architecture Processors.  */
>> +
>> +/* HiSilicon ('H') cores. */
>> +AARCH64_CORE("tsv110", tsv110,tsv110,8_4A, 
>> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES 
>> | AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
>> +
> 
> The third field is the scheduler model to use when optimising.
> Since there is no tsv110 scheduling model, using the name "tsv110"
> in the third field will generally give pretty poor schedules.
> I recommend you specify an scheduling model that most closely matches your 
> core
> for the time being. But I don't think it's required and I wouldn't let it hold

I checked it again, cortexa57 is most closely matches tsv110 and thanks your
suggestion.
If i choose cortexa57, can i add the tsv110_tunings which will use tsv110's
pipeline features, like the rest patch as follow or only use generic feature?

> up the patch.
> 
> You'll need approval from an aarch64 maintainer (cc'ed some for you).
> 

Good, thanks for your nice guidance.

Thanks,
Shaokun

> Thanks,
> Kyrill
> 
>>  /* ARMv8-A big.LITTLE implementations.  */
>>
>>  AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  
>> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE 
>> (0xd07, 0xd03), -1)
>> diff --git a/gcc/config/aarch64/aarch64-cost-tables.h 
>> b/gcc/config/aarch64/aarch64-cost-tables.h
>> index a455c62..b6890d6 100644
>> --- a/gcc/config/aarch64/aarch64-cost-tables.h
>> +++ b/gcc/config/aarch64/aarch64-cost-tables.h
>> @@ -334,4 +334,107 @@ const struct cpu_cost_table thunderx2t99_extra_costs =
>>}
>>  };
>>
>> +const struct cpu_cost_table tsv110_extra_costs =
>> +{
>> +  /* ALU */
>> +  {
>> +0, /* arith.  */
>> +0, /* logical.  */
>> +0, /* shift.  */
>> +0, /* shift_reg.  */
>> +COSTS_N_INSNS (1), /* arith_shift.  */
>> +COSTS_N_INSNS (1), /* arith_shift_reg.  */
>> +COSTS_N_INSNS (1), /* log_shift.  */
>> +COSTS_N_INSNS (1), /* log_shift_reg.  */
>> +0, /* extend.  */
>> +COSTS_N_INSNS (1), /* extend_arith.  */
>> +0, /* bfi.  */
>> +0, /* bfx.  */
>> +0, /* clz.  */
>> +0,/* rev.  */
>> +0, /* non_exec.  */
>> +true   /* non_exec_costs_exec.  */
>> +  },
>> +  {
>> +/* MULT SImode */
>> +{
>> +  COSTS_N_INSNS (2),   /* simple.  */
>> +  COSTS_N_INSNS (2),   /* flag_setting.  */
>> +  COSTS_N_INSNS (2),   /* extend.  */
>> +  COSTS_N_INSNS (2),   /* add.  */
>> +  COSTS_N_INSNS (2),   /* extend_add.  */
>> +  COSTS_N_INSNS (11)   /* idiv.  */
>> +},
>> +/* MULT DImode */
>> +{
>> +  COSTS_N_INSNS (3),   /* simple.  */
>> +  0,   /* flag_setting (N/A).  */
>> +  COSTS_N_INSNS (3),   /* extend.  */
>> +  COSTS_N_INSNS (3),   /* add.  */
>> +  COSTS_N_INSNS (3),   /* extend_add.  */
>> +  COSTS_N_INS

Re: [RFC] [aarch64] Add HiSilicon tsv110 CPU support

2018-05-22 Thread Zhangshaokun
Hi Ramana,

On 2018/5/22 18:28, Ramana Radhakrishnan wrote:
> On Tue, May 22, 2018 at 9:40 AM, Shaokun Zhang
>  wrote:
>> tsv110 is designed by HiSilicon and supports v8_4A, it also optimizes
>> L1 Icache which can access L1 Dcache.
>> Therefore, DC CVAU is not necessary in __aarch64_sync_cache_range for
>> tsv110, is there any good idea to skip DC CVAU operation for tsv110.
> 
> A solution would be to use an ifunc but on a cpu variant.
> 

ifunc, can you give further explanation?
If on a cpu variant, for HiSilicon tsv110, we have two version and CPU variants
are 0 and 1. Both are expected to skip DC CVAU operation in sync icache and
dcache.

Hi ARM guys,
are you happy to share yours idea about this?

> Is this really that important for performance and on what workloads ?
> 

Since it is not necessary for sync icache and dcache, it is beneficial for
performance to skip the redundant DC CVAU and do IC IVAU only.
For JVM, __clear_cache is called many times.

Thanks,
Shaokun

> regards
> Ramana
> 
>>
>> Any thoughts and ideas are welcome.
>>
>> Shaokun Zhang (1):
>>   [aarch64] Add HiSilicon tsv110 CPU support.
>>
>>  gcc/ChangeLog|   9 +++
>>  gcc/config/aarch64/aarch64-cores.def |   5 ++
>>  gcc/config/aarch64/aarch64-cost-tables.h | 103 
>> +++
>>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>  gcc/config/aarch64/aarch64.c |  79 
>>  gcc/doc/invoke.texi  |   2 +-
>>  6 files changed, 198 insertions(+), 2 deletions(-)
>>
>> --
>> 2.7.4
>>
> 
>