Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Andrew Pinski
On Fri, Oct 10, 2014 at 3:53 AM, Kyrill Tkachov  wrote:
> Hi all,
>
>
> Some early revisions of the Cortex-A53 have an erratum (835769) whereby
> it is possible for a 64-bit multiply-accumulate instruction in
> AArch64 state to generate an incorrect result.  The details are quite
> complex and hard to determine statically, since branches in the code
> may exist in some circumstances, but all cases end with a memory
> (load, store, or prefetch) instruction followed immediately by the
> multiply-accumulate operation.
>
> The safest work-around for this issue is to make the compiler avoid
> emitting multiply-accumulate instructions immediately after memory
> instructions and the simplest way to do this is to insert a NOP. A
> linker patching technique can also be used, by moving potentially
> affected multiply-accumulate instruction into a patch region and
> replacing the original instruction with a branch to the patch.
>
> This patch achieves the compiler part by using the final prescan pass.
> The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
> so that conditional branches work properly.
>
> The fix is disabled by default and can be turned on by the
> -mfix-cortex-a53-835769 compiler option.
>
> I'm attaching a trunk and a 4.9 version of the patch.
> The 4.9 version is different only in that rtx_insn* is replaced by rtx.
>
> Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
> built various large benchmarks with it.
>
> Ok?

A few comments:
This:
+static int
+is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
+{
+  return MEM_P (*x);
+}
+
+static bool
+is_memory_op (rtx_insn *mem_insn)
+{
+   rtx pattern = PATTERN (mem_insn);
+   return for_each_rtx (&pattern, is_mem_p, NULL);
+}

Should be using the new FOR_EACH_SUBRTX instead.  This should simplify
the code something like:

static bool
is_memory_op (rtx_insn *mem_insn)
{
  subrtx_iterator::array_type array;
  FOR_EACH_SUBRTX (iter, array, PATTERN (mem_insn), ALL)
if (MEM_P (*iter))
  return true;
  return false;
}

Also this should be a has_ function rather than a is_ function.

I think you should have ADJUST_INSN_LENGTH as a function call rather
than inline it there or at least wrap it with do { } while (0);  which
I think is documented part of the coding style.

Also you added no comment before each function.  The coding style says
each function needs a comment describing what the function does.

Thanks,
Andrew Pinski

>
> Thanks,
> Kyrill
>
> 2014-10-10  Kyrylo Tkachov
> Ramana Radhakrishnan
>
>  * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
>  (ADJUST_INSN_LENGTH): Define.
>  * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
>  * config/aarch64/aarch64.c (is_mem_p): New function.
>  (is_memory_op): Likewise.
>  (aarch64_prev_real_insn): Likewise.
>  (is_madd_op): Likewise.
>  (dep_between_memop_and_next): Likewise.
>  (aarch64_madd_needs_nop): Likewise.
>  (aarch64_final_prescan_insn): Likewise.
>  * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
>  and -mno-fix-cortex-a53-835769 options.


Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Richard Earnshaw
On 10/10/14 15:22, Kyrill Tkachov wrote:
> 
> On 10/10/14 15:18, Richard Earnshaw wrote:
>> On 10/10/14 11:53, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>>
>>> Some early revisions of the Cortex-A53 have an erratum (835769) whereby
>>> it is possible for a 64-bit multiply-accumulate instruction in
>>> AArch64 state to generate an incorrect result.  The details are quite
>>> complex and hard to determine statically, since branches in the code
>>> may exist in some circumstances, but all cases end with a memory
>>> (load, store, or prefetch) instruction followed immediately by the
>>> multiply-accumulate operation.
>>>
>>> The safest work-around for this issue is to make the compiler avoid
>>> emitting multiply-accumulate instructions immediately after memory
>>> instructions and the simplest way to do this is to insert a NOP. A
>>> linker patching technique can also be used, by moving potentially
>>> affected multiply-accumulate instruction into a patch region and
>>> replacing the original instruction with a branch to the patch.
>>>
>>> This patch achieves the compiler part by using the final prescan pass.
>>> The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
>>> so that conditional branches work properly.
>>>
>>> The fix is disabled by default and can be turned on by the
>>> -mfix-cortex-a53-835769 compiler option.
>>>
>>> I'm attaching a trunk and a 4.9 version of the patch.
>>> The 4.9 version is different only in that rtx_insn* is replaced by rtx.
>>>
>>> Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
>>> built various large benchmarks with it.
>>>
>>> Ok?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2014-10-10  Kyrylo Tkachov
>>>   Ramana Radhakrishnan
>>>
>>>* config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
>>>(ADJUST_INSN_LENGTH): Define.
>>>* config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
>>>* config/aarch64/aarch64.c (is_mem_p): New function.
>>>(is_memory_op): Likewise.
>>>(aarch64_prev_real_insn): Likewise.
>>>(is_madd_op): Likewise.
>>>(dep_between_memop_and_next): Likewise.
>>>(aarch64_madd_needs_nop): Likewise.
>>>(aarch64_final_prescan_insn): Likewise.
>>>* doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
>>>and -mno-fix-cortex-a53-835769 options.
>>>
>>>
>>> a53-workaround-new.patch
>>>
>>>
>>> commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
>>> Author: Kyrill Tkachov 
>>> Date:   Wed Oct 8 12:48:34 2014 +
>>>
>>>  [AArch64] Add final prescan workaround for Cortex-A53 erratum.
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-protos.h 
>>> b/gcc/config/aarch64/aarch64-protos.h
>>> index b5f53d2..c57a467 100644
>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>> @@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
>>>   
>>>   extern void aarch64_split_combinev16qi (rtx operands[3]);
>>>   extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx 
>>> sel);
>>> +extern bool aarch64_madd_needs_nop (rtx_insn *);
>>> +extern void aarch64_final_prescan_insn (rtx_insn *);
>>>   extern bool
>>>   aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
>>>   void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 5144c35..76a2480 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
>>> return NULL;
>>>   }
>>>   
>>> +static int
>>> +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
>>> +{
>>> +  return MEM_P (*x);
>>> +}
>>> +
>>> +static bool
>>> +is_memory_op (rtx_insn *mem_insn)
>>> +{
>>> +   rtx pattern = PATTERN (mem_insn);
>>> +   return for_each_rtx (&pattern, is_mem_p, NULL);
>>> +}
>>> +
>>> +/* Find the first rtx_insn before insn that will generate an assembly
>>> +   instruction.  */
>>> +
>>> +static rtx_insn *
>>> +aarch64_prev_real_insn (rtx_insn *insn)
>>> +{
>>> +  if (!insn)
>>> +return NULL;
>>> +
>>> +  do
>>> +{
>>> +  insn = prev_real_insn (insn);
>>> +}
>>> +  while (insn && recog_memoized (insn) < 0);
>>> +
>>> +  return insn;
>>> +}
>>> +
>>> +static bool
>>> +is_madd_op (enum attr_type t1)
>>> +{
>>> +  unsigned int i;
>>> +  /* A number of these may be AArch32 only.  */
>>> +  enum attr_type mlatypes[] = {
>>> +TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
>>> +TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
>>> +TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, 
>>> TYPE_SMLSLD
>>> +  };
>>> +
>>> +  for (i = 0; i < sizeof (mlatypes) / sizeof (enum attr_type); i++)
>>> +{
>>> +  if (t1 == mlatypes[i])
>>> +   return true;
>>> +}
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Check if there is a register depende

Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Kyrill Tkachov


On 10/10/14 15:18, Richard Earnshaw wrote:

On 10/10/14 11:53, Kyrill Tkachov wrote:

Hi all,


Some early revisions of the Cortex-A53 have an erratum (835769) whereby
it is possible for a 64-bit multiply-accumulate instruction in
AArch64 state to generate an incorrect result.  The details are quite
complex and hard to determine statically, since branches in the code
may exist in some circumstances, but all cases end with a memory
(load, store, or prefetch) instruction followed immediately by the
multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid
emitting multiply-accumulate instructions immediately after memory
instructions and the simplest way to do this is to insert a NOP. A
linker patching technique can also be used, by moving potentially
affected multiply-accumulate instruction into a patch region and
replacing the original instruction with a branch to the patch.

This patch achieves the compiler part by using the final prescan pass.
The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
so that conditional branches work properly.

The fix is disabled by default and can be turned on by the
-mfix-cortex-a53-835769 compiler option.

I'm attaching a trunk and a 4.9 version of the patch.
The 4.9 version is different only in that rtx_insn* is replaced by rtx.

Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
built various large benchmarks with it.

Ok?

Thanks,
Kyrill

2014-10-10  Kyrylo Tkachov
  Ramana Radhakrishnan

   * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
   (ADJUST_INSN_LENGTH): Define.
   * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
   * config/aarch64/aarch64.c (is_mem_p): New function.
   (is_memory_op): Likewise.
   (aarch64_prev_real_insn): Likewise.
   (is_madd_op): Likewise.
   (dep_between_memop_and_next): Likewise.
   (aarch64_madd_needs_nop): Likewise.
   (aarch64_final_prescan_insn): Likewise.
   * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
   and -mno-fix-cortex-a53-835769 options.


a53-workaround-new.patch


commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
Author: Kyrill Tkachov 
Date:   Wed Oct 8 12:48:34 2014 +

 [AArch64] Add final prescan workaround for Cortex-A53 erratum.

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index b5f53d2..c57a467 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
  
  extern void aarch64_split_combinev16qi (rtx operands[3]);

  extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
+extern bool aarch64_madd_needs_nop (rtx_insn *);
+extern void aarch64_final_prescan_insn (rtx_insn *);
  extern bool
  aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
  void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5144c35..76a2480 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
return NULL;
  }
  
+static int

+is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
+{
+  return MEM_P (*x);
+}
+
+static bool
+is_memory_op (rtx_insn *mem_insn)
+{
+   rtx pattern = PATTERN (mem_insn);
+   return for_each_rtx (&pattern, is_mem_p, NULL);
+}
+
+/* Find the first rtx_insn before insn that will generate an assembly
+   instruction.  */
+
+static rtx_insn *
+aarch64_prev_real_insn (rtx_insn *insn)
+{
+  if (!insn)
+return NULL;
+
+  do
+{
+  insn = prev_real_insn (insn);
+}
+  while (insn && recog_memoized (insn) < 0);
+
+  return insn;
+}
+
+static bool
+is_madd_op (enum attr_type t1)
+{
+  unsigned int i;
+  /* A number of these may be AArch32 only.  */
+  enum attr_type mlatypes[] = {
+TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
+TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
+TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD
+  };
+
+  for (i = 0; i < sizeof (mlatypes) / sizeof (enum attr_type); i++)
+{
+  if (t1 == mlatypes[i])
+   return true;
+}
+
+  return false;
+}
+
+/* Check if there is a register dependency between a load and the insn
+   for which we hold recog_data.  */
+
+static bool
+dep_between_memop_and_curr (rtx memop)
+{
+  rtx load_reg;
+  int opno;
+
+  if (!memop)
+return false;
+
+  if (!REG_P (SET_DEST (memop)))
+return false;
+
+  load_reg = SET_DEST (memop);
+  for (opno = 0; opno < recog_data.n_operands; opno++)
+{
+  rtx operand = recog_data.operand[opno];
+  if (REG_P (operand)
+  && reg_overlap_mentioned_p (load_reg, operand))
+return true;
+
+}
+  return false;
+}
+
+bool
+aarch64_madd_needs_nop (rtx_insn* ins

Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Richard Earnshaw
On 10/10/14 11:53, Kyrill Tkachov wrote:
> Hi all,
> 
> 
> Some early revisions of the Cortex-A53 have an erratum (835769) whereby
> it is possible for a 64-bit multiply-accumulate instruction in
> AArch64 state to generate an incorrect result.  The details are quite
> complex and hard to determine statically, since branches in the code
> may exist in some circumstances, but all cases end with a memory
> (load, store, or prefetch) instruction followed immediately by the
> multiply-accumulate operation.
> 
> The safest work-around for this issue is to make the compiler avoid
> emitting multiply-accumulate instructions immediately after memory
> instructions and the simplest way to do this is to insert a NOP. A
> linker patching technique can also be used, by moving potentially
> affected multiply-accumulate instruction into a patch region and
> replacing the original instruction with a branch to the patch.
> 
> This patch achieves the compiler part by using the final prescan pass.
> The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
> so that conditional branches work properly.
> 
> The fix is disabled by default and can be turned on by the
> -mfix-cortex-a53-835769 compiler option.
> 
> I'm attaching a trunk and a 4.9 version of the patch.
> The 4.9 version is different only in that rtx_insn* is replaced by rtx.
> 
> Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
> built various large benchmarks with it.
> 
> Ok?
> 
> Thanks,
> Kyrill
> 
> 2014-10-10  Kyrylo Tkachov
>  Ramana Radhakrishnan
> 
>   * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
>   (ADJUST_INSN_LENGTH): Define.
>   * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
>   * config/aarch64/aarch64.c (is_mem_p): New function.
>   (is_memory_op): Likewise.
>   (aarch64_prev_real_insn): Likewise.
>   (is_madd_op): Likewise.
>   (dep_between_memop_and_next): Likewise.
>   (aarch64_madd_needs_nop): Likewise.
>   (aarch64_final_prescan_insn): Likewise.
>   * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
>   and -mno-fix-cortex-a53-835769 options.
> 
> 
> a53-workaround-new.patch
> 
> 
> commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
> Author: Kyrill Tkachov 
> Date:   Wed Oct 8 12:48:34 2014 +
> 
> [AArch64] Add final prescan workaround for Cortex-A53 erratum.
> 
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index b5f53d2..c57a467 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
>  
>  extern void aarch64_split_combinev16qi (rtx operands[3]);
>  extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
> +extern bool aarch64_madd_needs_nop (rtx_insn *);
> +extern void aarch64_final_prescan_insn (rtx_insn *);
>  extern bool
>  aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
>  void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5144c35..76a2480 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
>return NULL;
>  }
>  
> +static int
> +is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
> +{
> +  return MEM_P (*x);
> +}
> +
> +static bool
> +is_memory_op (rtx_insn *mem_insn)
> +{
> +   rtx pattern = PATTERN (mem_insn);
> +   return for_each_rtx (&pattern, is_mem_p, NULL);
> +}
> +
> +/* Find the first rtx_insn before insn that will generate an assembly
> +   instruction.  */
> +
> +static rtx_insn *
> +aarch64_prev_real_insn (rtx_insn *insn)
> +{
> +  if (!insn)
> +return NULL;
> +
> +  do
> +{
> +  insn = prev_real_insn (insn);
> +}
> +  while (insn && recog_memoized (insn) < 0);
> +
> +  return insn;
> +}
> +
> +static bool
> +is_madd_op (enum attr_type t1)
> +{
> +  unsigned int i;
> +  /* A number of these may be AArch32 only.  */
> +  enum attr_type mlatypes[] = {
> +TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
> +TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
> +TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD
> +  };
> +
> +  for (i = 0; i < sizeof (mlatypes) / sizeof (enum attr_type); i++)
> +{
> +  if (t1 == mlatypes[i])
> + return true;
> +}
> +
> +  return false;
> +}
> +
> +/* Check if there is a register dependency between a load and the insn
> +   for which we hold recog_data.  */
> +
> +static bool
> +dep_between_memop_and_curr (rtx memop)
> +{
> +  rtx load_reg;
> +  int opno;
> +
> +  if (!memop)
> +return false;
> +
> +  if (!REG_P (SET_DEST (memop)))
> +return false;
> +
> +  load_reg = SET_DEST (memop);
> +  for (opno = 0; opno < recog_data.n_operands; opno++)
> +

Re: [PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Marcus Shawcroft
On 10 October 2014 11:53, Kyrill Tkachov  wrote:
> Hi all,
>
>
> Some early revisions of the Cortex-A53 have an erratum (835769) whereby
> it is possible for a 64-bit multiply-accumulate instruction in
> AArch64 state to generate an incorrect result.  The details are quite
> complex and hard to determine statically, since branches in the code
> may exist in some circumstances, but all cases end with a memory
> (load, store, or prefetch) instruction followed immediately by the
> multiply-accumulate operation.
>
> The safest work-around for this issue is to make the compiler avoid
> emitting multiply-accumulate instructions immediately after memory
> instructions and the simplest way to do this is to insert a NOP. A
> linker patching technique can also be used, by moving potentially
> affected multiply-accumulate instruction into a patch region and
> replacing the original instruction with a branch to the patch.
>
> This patch achieves the compiler part by using the final prescan pass.
> The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
> so that conditional branches work properly.
>
> The fix is disabled by default and can be turned on by the
> -mfix-cortex-a53-835769 compiler option.
>
> I'm attaching a trunk and a 4.9 version of the patch.
> The 4.9 version is different only in that rtx_insn* is replaced by rtx.
>
> Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
> built various large benchmarks with it.
>
> Ok?


OK /Marcus


[PATCH][AArch64] Implement workaround for ARM Cortex-A53 erratum 835769

2014-10-10 Thread Kyrill Tkachov

Hi all,


Some early revisions of the Cortex-A53 have an erratum (835769) whereby
it is possible for a 64-bit multiply-accumulate instruction in
AArch64 state to generate an incorrect result.  The details are quite
complex and hard to determine statically, since branches in the code
may exist in some circumstances, but all cases end with a memory
(load, store, or prefetch) instruction followed immediately by the
multiply-accumulate operation.

The safest work-around for this issue is to make the compiler avoid
emitting multiply-accumulate instructions immediately after memory
instructions and the simplest way to do this is to insert a NOP. A
linker patching technique can also be used, by moving potentially
affected multiply-accumulate instruction into a patch region and
replacing the original instruction with a branch to the patch.

This patch achieves the compiler part by using the final prescan pass.
The insn length attribute is updated accordingly using ADJUST_INSN_LENGTH
so that conditional branches work properly.

The fix is disabled by default and can be turned on by the
-mfix-cortex-a53-835769 compiler option.

I'm attaching a trunk and a 4.9 version of the patch.
The 4.9 version is different only in that rtx_insn* is replaced by rtx.

Tested on aarch64-none-linux-gnu (and bootstrap with the option) and
built various large benchmarks with it.

Ok?

Thanks,
Kyrill

2014-10-10  Kyrylo Tkachov
Ramana Radhakrishnan

 * config/aarch64/aarch64.h (FINAL_PRESCAN_INSN): Define.
 (ADJUST_INSN_LENGTH): Define.
 * config/aarch64/aarch64.opt (mfix-cortex-a53-835769): New option.
 * config/aarch64/aarch64.c (is_mem_p): New function.
 (is_memory_op): Likewise.
 (aarch64_prev_real_insn): Likewise.
 (is_madd_op): Likewise.
 (dep_between_memop_and_next): Likewise.
 (aarch64_madd_needs_nop): Likewise.
 (aarch64_final_prescan_insn): Likewise.
 * doc/invoke.texi (AArch64 Options): Document -mfix-cortex-a53-835769
 and -mno-fix-cortex-a53-835769 options.
commit 6ee2bec04fbce15b13b59b54ef4fe11aeda74ff4
Author: Kyrill Tkachov 
Date:   Wed Oct 8 12:48:34 2014 +

[AArch64] Add final prescan workaround for Cortex-A53 erratum.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b5f53d2..c57a467 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -308,6 +308,8 @@ aarch64_builtin_vectorized_function (tree fndecl,
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
 extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
+extern bool aarch64_madd_needs_nop (rtx_insn *);
+extern void aarch64_final_prescan_insn (rtx_insn *);
 extern bool
 aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
 void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5144c35..76a2480 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7586,6 +7586,128 @@ aarch64_mangle_type (const_tree type)
   return NULL;
 }
 
+static int
+is_mem_p (rtx *x, void *data ATTRIBUTE_UNUSED)
+{
+  return MEM_P (*x);
+}
+
+static bool
+is_memory_op (rtx_insn *mem_insn)
+{
+   rtx pattern = PATTERN (mem_insn);
+   return for_each_rtx (&pattern, is_mem_p, NULL);
+}
+
+/* Find the first rtx_insn before insn that will generate an assembly
+   instruction.  */
+
+static rtx_insn *
+aarch64_prev_real_insn (rtx_insn *insn)
+{
+  if (!insn)
+return NULL;
+
+  do
+{
+  insn = prev_real_insn (insn);
+}
+  while (insn && recog_memoized (insn) < 0);
+
+  return insn;
+}
+
+static bool
+is_madd_op (enum attr_type t1)
+{
+  unsigned int i;
+  /* A number of these may be AArch32 only.  */
+  enum attr_type mlatypes[] = {
+TYPE_MLA, TYPE_MLAS, TYPE_SMLAD, TYPE_SMLADX, TYPE_SMLAL, TYPE_SMLALD,
+TYPE_SMLALS, TYPE_SMLALXY, TYPE_SMLAWX, TYPE_SMLAWY, TYPE_SMLAXY,
+TYPE_SMMLA, TYPE_UMLAL, TYPE_UMLALS,TYPE_SMLSD, TYPE_SMLSDX, TYPE_SMLSLD
+  };
+
+  for (i = 0; i < sizeof (mlatypes) / sizeof (enum attr_type); i++)
+{
+  if (t1 == mlatypes[i])
+	return true;
+}
+
+  return false;
+}
+
+/* Check if there is a register dependency between a load and the insn
+   for which we hold recog_data.  */
+
+static bool
+dep_between_memop_and_curr (rtx memop)
+{
+  rtx load_reg;
+  int opno;
+
+  if (!memop)
+return false;
+
+  if (!REG_P (SET_DEST (memop)))
+return false;
+
+  load_reg = SET_DEST (memop);
+  for (opno = 0; opno < recog_data.n_operands; opno++)
+{
+  rtx operand = recog_data.operand[opno];
+  if (REG_P (operand)
+  && reg_overlap_mentioned_p (load_reg, operand))
+return true;
+
+}
+  return false;
+}
+
+bool
+aarch64_madd_needs_nop (rtx_insn* insn)
+{
+  enum attr_type attr_type;
+  rtx_insn *prev;
+  rtx body;
+
+  if (!aarch64_fix_a53_err835769)
+return false;
+
+  if (recog_memoized (insn) < 0)
+