date:20210418

Re: [PATCH] docs: Remove empty table column.

2021-04-18 Thread Paul Richard Thomas via Gcc-patches

Hi Martin,

It looks good to me - please push before release.

Thanks

Paul


On Mon, 12 Apr 2021 at 16:59, Martin Liška  wrote:

> A column with empty values seems suspicious.
>
> Ready to be installed?
> Thanks,
> Martin
>
> gcc/fortran/ChangeLog:
>
> * intrinsic.texi: The table has first column empty and it makes
> trouble when processing makeinfo --xml output.
> ---
>  gcc/fortran/intrinsic.texi | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
> index 73baa34104e..c74a7f56c60 100644
> --- a/gcc/fortran/intrinsic.texi
> +++ b/gcc/fortran/intrinsic.texi
> @@ -4764,15 +4764,15 @@ Unavailable time and date parameters return blanks.
>
>  @var{VALUES} is @code{INTENT(OUT)} and provides the following:
>
> -@multitable @columnfractions .15 .30 .40
> -@item @tab @code{VALUE(1)}: @tab The year
> -@item @tab @code{VALUE(2)}: @tab The month
> -@item @tab @code{VALUE(3)}: @tab The day of the month
> -@item @tab @code{VALUE(4)}: @tab Time difference with UTC in minutes
> -@item @tab @code{VALUE(5)}: @tab The hour of the day
> -@item @tab @code{VALUE(6)}: @tab The minutes of the hour
> -@item @tab @code{VALUE(7)}: @tab The seconds of the minute
> -@item @tab @code{VALUE(8)}: @tab The milliseconds of the second
> +@multitable @columnfractions .15 .70
> +@item @code{VALUE(1)}: @tab The year
> +@item @code{VALUE(2)}: @tab The month
> +@item @code{VALUE(3)}: @tab The day of the month
> +@item @code{VALUE(4)}: @tab Time difference with UTC in minutes
> +@item @code{VALUE(5)}: @tab The hour of the day
> +@item @code{VALUE(6)}: @tab The minutes of the hour
> +@item @code{VALUE(7)}: @tab The seconds of the minute
> +@item @code{VALUE(8)}: @tab The milliseconds of the second
>  @end multitable
>
>  @item @emph{Standard}:
> --
> 2.31.1
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein

Re: [Bug libstdc++/99402] [10 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

2021-04-18 Thread François Dumont via Gcc-patches


Hi

    Ok to backport this to gcc-10 branch ?

    Tested under Linux x86_64.

François


On 13/04/21 10:51 pm, redi at gcc dot gnu.org wrote:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99402

Jonathan Wakely  changed:

What|Removed |Added

 Summary|[10/11 Regression]  |[10 Regression] std::copy
|std::copy creates   |creates _GLIBCXX_DEBUG
|_GLIBCXX_DEBUG false|false positive for attempt
|positive for attempt to |to subscript a
|subscript a dereferenceable |dereferenceable
|(start-of-sequence) |(start-of-sequence)
|iterator|iterator

--- Comment #10 from Jonathan Wakely  ---
This was fixed on trunk by r11-8100:

libstdc++: [_GLIBCXX_DEBUG] Fix management of __dp_sign_max_size [PR 99402]

__dp_sign precision indicates that we found out what iterator comes first or
last in the range. __dp_sign_max_size is the same plus it gives the information
of the max size of the range that is to say the max_size value such that
distance(lhs, rhs) < max_size.
Thanks to this additional information we are able to tell when a copy of n
elements
to that range will fail even if we do not know exactly how large it is.

This patch makes sure that we are properly using this information.

libstdc++-v3/ChangeLog:

 PR libstdc++/99402
 * include/debug/helper_functions.h (__can_advance(_InputIterator,
 const std::pair<_Diff, _Distance_precision>&, int)): New.
 (__can_advance(const _Safe_iterator<>&,
 const std::pair<_Diff, _Distance_precision>&, int)): New.
 * include/debug/macros.h (__glibcxx_check_can_increment_dist): New,
 use latter.
 (__glibcxx_check_can_increment_range): Adapt to use latter.
 (__glibcxx_check_can_decrement_range): Likewise.
 * include/debug/safe_iterator.h
 (_Safe_iterator<>::_M_can_advance(const std::pair<_Diff,
_Distance_precision>&,
 int)): New.
 (__can_advance(const _Safe_iterator<>&,
 const std::pair<_Diff, _Distance_precision>&, int)): New.
 * include/debug/safe_iterator.tcc
 (_Safe_iterator<>::_M_can_advance(const std::pair<_Diff,
_Distance_precision>&,
 int)): New.
 (_Safe_iterator<>::_M_valid_range(const _Safe_iterator<>&,
 std::pair&, bool)): Adapt for
 __dp_sign_max_size.
 (__copy_move_a): Adapt to use __glibcxx_check_can_increment_dist.
 (__copy_move_backward_a): Likewise.
 (__equal_aux): Likewise.
 * include/debug/stl_iterator.h (__can_advance(const
std::reverse_iterator<>&,
 const std::pair<_Diff, _Distance_precision>&, int)): New.
 (__can_advance(const std::move_iterator<>&,
 const std::pair<_Diff, _Distance_precision>&, int)): New.
 * testsuite/25_algorithms/copy/debug/99402.cc: New test.



diff --git a/libstdc++-v3/include/debug/helper_functions.h b/libstdc++-v3/include/debug/helper_functions.h
index 62d5309257f..9e2d206f06c 100644
--- a/libstdc++-v3/include/debug/helper_functions.h
+++ b/libstdc++-v3/include/debug/helper_functions.h
@@ -64,7 +64,7 @@ namespace __gnu_debug
 {
 private:
   typedef
-  typename std::iterator_traits<_Iterator>::difference_type _ItDiffType;
+	typename std::iterator_traits<_Iterator>::difference_type _ItDiffType;
 
   template::__type>
@@ -287,6 +287,18 @@ namespace __gnu_debug
 __can_advance(const _Safe_iterator<_Iterator, _Sequence, _Category>&,
 		  _Size);
 
+  template
+_GLIBCXX_CONSTEXPR
+inline bool
+__can_advance(_InputIterator, const std::pair<_Diff, _Distance_precision>&, int)
+{ return true; }
+
+  template
+bool
+__can_advance(const _Safe_iterator<_Iterator, _Sequence, _Category>&,
+		  const std::pair<_Diff, _Distance_precision>&, int);
+
   /** Helper function to extract base iterator of random access safe iterator
*  in order to reduce performance impact of debug mode.  Limited to random
*  access iterator because it is the only category for which it is possible
diff --git a/libstdc++-v3/include/debug/macros.h b/libstdc++-v3/include/debug/macros.h
index 73fb50d0cbd..1f45f8e0adc 100644
--- a/libstdc++-v3/include/debug/macros.h
+++ b/libstdc++-v3/include/debug/macros.h
@@ -104,6 +104,12 @@ _GLIBCXX_DEBUG_VERIFY(__gnu_debug::__can_advance(_First, _Size),	\
 		  ._M_iterator(_First, #_First)			\
 		  ._M_integer(_Size, #_Size))
 
+#define __glibcxx_check_can_increment_dist(_First,_Dist,_Way)		\
+  _GLIBCXX_DEBUG_VERIFY(__gnu_debug::__can_advance(_First, _Dist, _Way), \
+		  _M_message(__gnu_debug::__msg_iter_subscript_oob)	\
+		  ._M_iterator(_First, #_First)			\
+		  ._M_integer(_Way * _Dist.first, #_Dist))
+
 #define __glibcxx_check_can_increment_range(_First1

Re: [Bug libstdc++/99402] [10 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

2021-04-18 Thread Jonathan Wakely via Gcc-patches

On Sun, 18 Apr 2021, 15:01 François Dumont via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> Hi
>
>  Ok to backport this to gcc-10 branch ?
>


Yes please, thanks.



>  Tested under Linux x86_64.
>
> François
>
>
> On 13/04/21 10:51 pm, redi at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99402
> >
> > Jonathan Wakely  changed:
> >
> > What|Removed |Added
> >
> 
> >  Summary|[10/11 Regression]  |[10 Regression]
> std::copy
> > |std::copy creates   |creates _GLIBCXX_DEBUG
> > |_GLIBCXX_DEBUG false|false positive for
> attempt
> > |positive for attempt to |to subscript a
> > |subscript a dereferenceable |dereferenceable
> > |(start-of-sequence) |(start-of-sequence)
> > |iterator|iterator
> >
> > --- Comment #10 from Jonathan Wakely  ---
> > This was fixed on trunk by r11-8100:
> >
> > libstdc++: [_GLIBCXX_DEBUG] Fix management of __dp_sign_max_size [PR
> 99402]
> >
> > __dp_sign precision indicates that we found out what iterator comes
> first or
> > last in the range. __dp_sign_max_size is the same plus it gives the
> information
> > of the max size of the range that is to say the max_size value such that
> > distance(lhs, rhs) < max_size.
> > Thanks to this additional information we are able to tell when a copy of
> n
> > elements
> > to that range will fail even if we do not know exactly how large it is.
> >
> > This patch makes sure that we are properly using this information.
> >
> > libstdc++-v3/ChangeLog:
> >
> >  PR libstdc++/99402
> >  * include/debug/helper_functions.h
> (__can_advance(_InputIterator,
> >  const std::pair<_Diff, _Distance_precision>&, int)): New.
> >  (__can_advance(const _Safe_iterator<>&,
> >  const std::pair<_Diff, _Distance_precision>&, int)): New.
> >  * include/debug/macros.h (__glibcxx_check_can_increment_dist):
> New,
> >  use latter.
> >  (__glibcxx_check_can_increment_range): Adapt to use latter.
> >  (__glibcxx_check_can_decrement_range): Likewise.
> >  * include/debug/safe_iterator.h
> >  (_Safe_iterator<>::_M_can_advance(const std::pair<_Diff,
> > _Distance_precision>&,
> >  int)): New.
> >  (__can_advance(const _Safe_iterator<>&,
> >  const std::pair<_Diff, _Distance_precision>&, int)): New.
> >  * include/debug/safe_iterator.tcc
> >  (_Safe_iterator<>::_M_can_advance(const std::pair<_Diff,
> > _Distance_precision>&,
> >  int)): New.
> >  (_Safe_iterator<>::_M_valid_range(const _Safe_iterator<>&,
> >  std::pair&, bool)): Adapt
> for
> >  __dp_sign_max_size.
> >  (__copy_move_a): Adapt to use
> __glibcxx_check_can_increment_dist.
> >  (__copy_move_backward_a): Likewise.
> >  (__equal_aux): Likewise.
> >  * include/debug/stl_iterator.h (__can_advance(const
> > std::reverse_iterator<>&,
> >  const std::pair<_Diff, _Distance_precision>&, int)): New.
> >  (__can_advance(const std::move_iterator<>&,
> >  const std::pair<_Diff, _Distance_precision>&, int)): New.
> >  * testsuite/25_algorithms/copy/debug/99402.cc: New test.
> >
>
>

[PATCH] combine: Don't create REG_UNUSED notes if the reg already died (PR99927)

2021-04-18 Thread Segher Boessenkool

If the register named in an existing REG_UNUSED note dies somewhere
between where the note used to be and I3, we should just drop it.

2021-04-21  Segher Boessenkool  

PR rtl-optimization/99927
* combine.c (distribute_notes) [REG_UNUSED]: If the register already
is dead, just drop it.
---

Committed to trunk.  This will need backports to all open branches.


Segher


 gcc/combine.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/combine.c b/gcc/combine.c
index 9063a07bd009..62bf4aeaabae 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -14366,6 +14366,11 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
rtx_insn *i3, rtx_insn *i2,
 we keep notes from i2 or i1 if they will turn into REG_DEAD
 notes.  */
 
+ /* If this register is set or clobbered between FROM_INSN and I3,
+we should not create a note for it.  */
+ if (reg_set_between_p (XEXP (note, 0), from_insn, i3))
+   break;
+
  /* If this register is set or clobbered in I3, put the note there
 unless there is one already.  */
  if (reg_set_p (XEXP (note, 0), PATTERN (i3)))
-- 
1.8.3.1

[Patch, fortran] PR fortran/100136 - ICE, regression, using flag -fcheck=pointer

2021-04-18 Thread José Rui Faustino de Sousa via Gcc-patches


Hi All!

Proposed patch to:

PR100136 - ICE, regression, using flag -fcheck=pointer

Patch tested only on x86_64-pc-linux-gnu.

Add handling for pointer expressions.

Thank you very much.

Best regards,
José Rui

Fortran: Fix ICE with -fcheck=pointer [PR100136]

gcc/fortran/ChangeLog:

PR fortran/100136
* trans-expr.c (gfc_conv_procedure_call): Add handling of pointer
expressions.

gcc/testsuite/ChangeLog:

PR fortran/100136
* gfortran.dg/PR100136.f90: New test.

diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 213f32b0a67..249b3904cdb 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -6782,16 +6782,15 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	  else
 		goto end_pointer_check;
 
+	  tmp = parmse.expr;
 	  if (fsym && fsym->ts.type == BT_CLASS)
 		{
-		  tmp = build_fold_indirect_ref_loc (input_location,
-		  parmse.expr);
+		  if (POINTER_TYPE_P (tmp))
+		tmp = build_fold_indirect_ref_loc (input_location, tmp);
 		  tmp = gfc_class_data_get (tmp);
 		  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (tmp)))
 		tmp = gfc_conv_descriptor_data_get (tmp);
 		}
-	  else
-		tmp = parmse.expr;
 
 	  /* If the argument is passed by value, we need to strip the
 		 INDIRECT_REF.  */
diff --git a/gcc/testsuite/gfortran.dg/PR100136.f90 b/gcc/testsuite/gfortran.dg/PR100136.f90
new file mode 100644
index 000..931a4796846
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR100136.f90
@@ -0,0 +1,40 @@
+! { dg-do run }
+! { dg-options "-fcheck=pointer" }
+! { dg-shouldfail "Argument not allocated" }
+! { dg-output "Fortran runtime error: Allocatable actual argument 'c_init2' is not allocated" }
+!
+! Tests fix for PR100136
+! 
+! Test cut down from PR58586
+! 
+
+module test_pr58586_mod
+  implicit none
+
+  type :: a
+  end type
+
+  type :: c
+ type(a), allocatable :: a
+  end type
+
+contains
+
+  subroutine add_class_c (d)
+class(c), value :: d
+  end subroutine
+
+  class(c) function c_init2()
+allocatable :: c_init2
+  end function
+
+end module test_pr58586_mod
+
+program test_pr58586
+  use test_pr58586_mod
+
+  ! This needs to execute, to see whether the segfault at runtime is resolved
+  call add_class_c(c_init2())
+
+end program
+

Re: [PATCH] combine: Don't create REG_UNUSED notes if the reg already died (PR99927)

2021-04-18 Thread Jakub Jelinek via Gcc-patches

On Sun, Apr 18, 2021 at 03:03:07PM +, Segher Boessenkool wrote:
> If the register named in an existing REG_UNUSED note dies somewhere
> between where the note used to be and I3, we should just drop it.
> 
> 2021-04-21  Segher Boessenkool  
> 
>   PR rtl-optimization/99927
>   * combine.c (distribute_notes) [REG_UNUSED]: If the register already
>   is dead, just drop it.
> ---
> 
> Committed to trunk.  This will need backports to all open branches.

Thanks for working on this.  Just some nits but note that I don't know much
about the combiner...

>  gcc/combine.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 9063a07bd009..62bf4aeaabae 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -14366,6 +14366,11 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
> rtx_insn *i3, rtx_insn *i2,
>we keep notes from i2 or i1 if they will turn into REG_DEAD
>notes.  */
>  
> +   /* If this register is set or clobbered between FROM_INSN and I3,
> +  we should not create a note for it.  */
> +   if (reg_set_between_p (XEXP (note, 0), from_insn, i3))
> + break;
> +
> /* If this register is set or clobbered in I3, put the note there
>unless there is one already.  */
> if (reg_set_p (XEXP (note, 0), PATTERN (i3)))

Doesn't this make the
  if (from_insn != i3 && i2 && INSN_P (i2)
  && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
{
  if (!reg_set_p (XEXP (note, 0), PATTERN (i2)))
PUT_REG_NOTE_KIND (note, REG_DEAD);
  if (! (REG_P (XEXP (note, 0))
 ? find_regno_note (i2, REG_NOTE_KIND (note),
REGNO (XEXP (note, 0)))
 : find_reg_note (i2, REG_NOTE_KIND (note),
  XEXP (note, 0
place = i2;
}
case unreachable (the reg_set_p stuff at least; I mean if
reg_set_p is true on i2 and i2 is in between from_ins and i3, then
reg_set_between_p will surely be true)?
And, shouldn't the
  record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
be called in some cases?

To me it would make more sense to add the if (reg_set_between_p (...)) break;
to the individual cases later, so before
  if (! (REG_P (XEXP (note, 0))
 ? find_regno_note (i3, REG_UNUSED, REGNO (XEXP (note, 0)))
 : find_reg_note (i3, REG_UNUSED, XEXP (note, 0
place = i3;
and before
  PUT_REG_NOTE_KIND (note, REG_DEAD);
  place = i3;
and into the
  if (from_insn != i3 && i2 && INSN_P (i2)
  && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
but there just checking if it isn't set in between from_insn and i2

Jakub

Re: [PATCH] combine: Don't create REG_UNUSED notes if the reg already died (PR99927)

2021-04-18 Thread Segher Boessenkool

Hi!

On Sun, Apr 18, 2021 at 05:24:50PM +0200, Jakub Jelinek wrote:
> On Sun, Apr 18, 2021 at 03:03:07PM +, Segher Boessenkool wrote:
> > If the register named in an existing REG_UNUSED note dies somewhere
> > between where the note used to be and I3, we should just drop it.
> > 
> > 2021-04-21  Segher Boessenkool  
> > 
> > PR rtl-optimization/99927
> > * combine.c (distribute_notes) [REG_UNUSED]: If the register already
> > is dead, just drop it.
> > ---
> > 
> > Committed to trunk.  This will need backports to all open branches.
> 
> Thanks for working on this.  Just some nits but note that I don't know much
> about the combiner...
> 
> >  gcc/combine.c | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/gcc/combine.c b/gcc/combine.c
> > index 9063a07bd009..62bf4aeaabae 100644
> > --- a/gcc/combine.c
> > +++ b/gcc/combine.c
> > @@ -14366,6 +14366,11 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
> > rtx_insn *i3, rtx_insn *i2,
> >  we keep notes from i2 or i1 if they will turn into REG_DEAD
> >  notes.  */
> >  
> > + /* If this register is set or clobbered between FROM_INSN and I3,
> > +we should not create a note for it.  */
> > + if (reg_set_between_p (XEXP (note, 0), from_insn, i3))
> > +   break;
> > +
> >   /* If this register is set or clobbered in I3, put the note there
> >  unless there is one already.  */
> >   if (reg_set_p (XEXP (note, 0), PATTERN (i3)))
> 
> Doesn't this make the
>   if (from_insn != i3 && i2 && INSN_P (i2)
>   && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
> {
>   if (!reg_set_p (XEXP (note, 0), PATTERN (i2)))
> PUT_REG_NOTE_KIND (note, REG_DEAD);
>   if (! (REG_P (XEXP (note, 0))
>  ? find_regno_note (i2, REG_NOTE_KIND (note),
> REGNO (XEXP (note, 0)))
>  : find_reg_note (i2, REG_NOTE_KIND (note),
>   XEXP (note, 0
> place = i2;
> }
> case unreachable (the reg_set_p stuff at least; I mean if
> reg_set_p is true on i2 and i2 is in between from_ins and i3, then
> reg_set_between_p will surely be true)?

reg_set_between_p is exclusive of the endpoints, so say for example
from_insn is i2, and that is where the set is, then your quoted code can
fire, while the new code doesn't normally.

> And, shouldn't the
>   record_value_for_reg (XEXP (note, 0), NULL, NULL_RTX);
> be called in some cases?

I don't see why?  We don't remove any clobber or set, just the REG_UNUSED
note?

> To me it would make more sense to add the if (reg_set_between_p (...)) break;
> to the individual cases later, so before
>   if (! (REG_P (XEXP (note, 0))
>  ? find_regno_note (i3, REG_UNUSED, REGNO (XEXP (note, 
> 0)))
>  : find_reg_note (i3, REG_UNUSED, XEXP (note, 0
> place = i3;
> and before
>   PUT_REG_NOTE_KIND (note, REG_DEAD);
>   place = i3;
> and into the
>   if (from_insn != i3 && i2 && INSN_P (i2)
>   && reg_referenced_p (XEXP (note, 0), PATTERN (i2)))
> but there just checking if it isn't set in between from_insn and i2

But the REG_UNUSED note should just be dropped in all these cases, so it
is much simpler code to do it like this.  Or am I missing something?


Ideally we can revamp all of this to just use DF directly, instead of
using notes etc. for it, but that is hard to do.  My plan is to attack
this from the other direction first: replace reg_stat with something
that works better, is more modern, much simpler, much less maintenace,
and as a bonus can be used everywhere, not just in combine.  We'll see
how well that works :-)


Segher

Re: [PATCH] libstdc++: Update some baseline_symbols.txt

2021-04-18 Thread H.J. Lu via Gcc-patches

On Fri, Apr 16, 2021 at 9:25 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> As we have only one P1 left right now, I think it is the right time
> to update abi list files in libstdc++.
>
> Attached are two patches, one is update for x86_64/i?86/s390x/ppc64
> linux (aarch64 seems to be correct already), the other one is the
> ppc64 -> ppc64le diff.  I have yet to test what exactly it will do
> on ppc64, whether we can share the same baseline_symbols.txt and it
> will ignore the IEEE128 symbols like it should be ignoring LDBL symbols
> when double double is not configured in, or whether we need
> to have separate powerpc64le-linux-gnu directory.
>
> Jakub

Here is the corresponding patch for x32.  OK for master?

Thanks.

-- 
H.J.


0001-libstdc-Update-some-baseline_symbols.txt-x32.patch
Description: Binary data

Re: [PATCH,FORTRAN 00/29] Move towards stringpool, part 1

2021-04-18 Thread Bernhard Reutner-Fischer via Gcc-patches

On Fri, 7 Sep 2018 10:30:30 +0200
Bernhard Reutner-Fischer  wrote:

> On Thu, 6 Sep 2018 at 03:25, Jerry DeLisle  wrote:
> >
> > On 09/05/2018 07:57 AM, Bernhard Reutner-Fischer wrote:  
> > > Hi,
> > >
> > > The fortran frontend still uses stack-based handling of (symbol) names
> > > with fixed-sized buffers. Furthermore these buffers often are too small
> > > when dealing with F2003 identifiers which can be up to, including 63
> > > bytes long.
> > >
> > > Other frontends use the stringpool since many years.
> > > This janitorial series is a first step towards using the stringpool in
> > > the frontend.
> > > Consequently this allows us to use pointer-comparison to see if two
> > > given "names" are identical instead of doing lots and lots of string
> > > comparisons.
> > >
> > >
> > > Part 1 switches most of the fortran FE. An eventual part 2 would
> > > continue to switch the few remaining stack-based identifier
> > > manipulations to use the stringpool. My initial plan was to also see if
> > > switching gfc_symtree from treap to a hash_map would bring us any
> > > measurable benefit, but that, too, is left for an eventual part 2.
> > >
> > > Bootstrapped and regtested on x86_64-foo-linux.
> > >
> > > I'd appreciate if someone could double check for regressions on other
> > > setups. Git branch:
> > > https://gcc.gnu.org/git/?p=gcc.git;a=log;h=refs/heads/aldot/fortran-fe-stringpool
> > >  
> >
> > I am not so git savvy as I would like. If you could send me a single
> > patch to trunk I will try to test on a FreeBSD system. It is usually a
> > bit more picky than linux..  
> 
> I've just encountered a regression in module writing, let me have a
> looksie at that first so i don't waste your time.
> 
> PS: You'd:
> git clone git://gcc.gnu.org/git/gcc.git ~/src/gcc-9.0-stringpool
> cd ~/src/gcc-9.0-stringpool
> git checkout aldot/fortran-fe-stringpool
> # this should output:
> # Branch aldot/fortran-fe-stringpool set up to track remote branch
> aldot/fortran-fe-stringpool from origin.
> # if your git client is too old, then do it manually:
> # git checkout -b aldot/fortran-fe-stringpool --track
> origin/aldot/fortran-fe-stringpool
> mkdir -p ~/obj/gcc-9.0-stringpool
> cd !$
> ../../src/gcc-9.0-stringpool/configure --prefix /opt/. && make
> install && make -k check;# as usual
> 
> I'll send you a full patch when i had a chance to track down the
> module writing bug.

IIRC i fixed the abovementioned bug and pushed it to the branch.

Unfortunately the abovementioned personal git branch somehow was
deleted in the meantime.

Now i was about to rebase my local tree in the light of the upcoming
GCC-11 release and i'm confronted with a couple of conflicts that stem
from /me touching all these spots that are troublesome _and_ slow (due
to gazillions of strcmp calls to match identifiers; Part of the .plan
was to see if sacrificing a little bit of memory to maintain hash_map of
identifier_pointers would pay off WRT way faster comparison -- pointer
cmp versus strcmp as noted above. ISTR that's what other frontends do
since decades and i think it would make sense for the fortran FE too).

Iff there is any interest in eventually accepting abovementioned
direction into the tree, either part1 only which has merity on his
own - as you can see from recent asan induced tweaks to our identifier
handling - or also part 2 which encompasses the endavour to speedup the
actual matching of identifiers and hence speed up some small parts of
the frontend then i'm willing to rebase the series.

Of course if nobody thinks that the proposed direction is the right
thing to do then i will happily delete the effort so far. Saves quite
some of my sparetime.
For posteriority, a previous series of part1 was archived here:
https://gcc.gnu.org/pipermail/gcc-patches/2018-September/thread.html#506111

As can bee seen in the series and was in part explained in
https://gcc.gnu.org/pipermail/gcc-patches/2018-September/506849.html
this series changes module format a little bit. Specifically, several
spots do (or did back then, at least) use NULL module names in certain
situations in a non-obvious way (i wouldn't dare to say hackish for
i do not remember the gory details offhand i admit). I believe i
initially wrote the series even before submodules but AFAIR they did
not interfere in any noticable way. Modules were misbehaving but
submodules not, IIRC.

The fact that i need to change the module content was the main reason
why i did not apply the batch back then, even if Jerry was kind
enough to OK the submission as is, as you can see in above thread.
So i again reach out for consensus so to maybe get this improvement
into GCC-12 iff deemed appropriate and useful. Or i'll just drop it for
good.

thanks,

[PATCH] wwwdocs: Do not rewrite the page titles

2021-04-18 Thread Jonathan Wakely via Gcc-patches

Remove GNU and FSF attribution from HTML page titles.


I don't see why we should have to "comply with the GNU style" if we're
truly an independent project run by the GCC developers and aided by
the steering committee.

OK for wwwdocs?


commit d157082f49725510560cb83f2f1c045e2968ad3b
Author: Jonathan Wakely 
Date:   Sun Apr 18 23:42:26 2021 +0100

Do not rewrite the page titles

Remove GNU and FSF attribution from HTML page titles.

diff --git a/htdocs/style.mhtml b/htdocs/style.mhtml
index 677fd05a..eb7c353d 100644
--- a/htdocs/style.mhtml
+++ b/htdocs/style.mhtml
@@ -32,14 +32,6 @@
  
 
 
-;;; Redefine the  tag to comply with the GNU style.
-
-
-
-%body
-- GNU Project - Free Software Foundation (FSF)
-
-
 ;;; Redefine the  tag, adding navigation and a standard footer.

Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

2021-04-18 Thread Andrew Stubbs


On 16/04/2021 18:30, Thomas Schwinge wrote:

Hi!

On 2021-04-16T17:05:24+0100, Andrew Stubbs  wrote:

On 15/04/2021 18:26, Thomas Schwinge wrote:

and optimisation, since shared memory might be faster than
the main memory on a GPU.


Do we potentially have a problem that making more use of (scarce)
gang-private memory may negatively affect peformance, because potentially
fewer OpenACC gangs may then be launched to the GPU hardware in parallel?
(Of course, OpenACC semantics conformance firstly is more important than
performance, but there may be ways to be conformant and performant;
"quality of implementation".)  Have you run any such performance testing
with the benchmarking codes that we've got set up?

(As I'm more familiar with that, I'm using nvptx offloading examples in
the following, whilst assuming that similar discussion may apply for GCN
offloading, which uses similar hardware concepts, as far as I remember.)


Yes, that could happen.


Thanks for sharing the GCN perspective.


However, there's space for quite a lot of
scalars before performance is affected: 64KB of LDS memory shared by a
hardware-defined maximum of 40 threads


(Instead of threads, something like thread blocks, I suppose?)


Workers. Wavefronts. The terminology is so confusing for these cases! 
They look like CPU threads running SIMD instructions, at least on GCN. 
OpenMP calls them threads.


Each GCN compute unit can run up to 40 of them. A gang can have up to 16 
workers (in AMD terminology, a work group can have up 16 wavefronts), so 
each compute unit will usually have at least two gangs, meaning each 
gang would get 32KB local memory. If there are no worker loops then you 
get 40 gangs (of one worker each) per compute unit, hence the minimum of 
1.5KB per gang.


The local memory is specific to the compute unit and gangs launched 
there will stay there until they're done, so the 40 gangs really is the 
limit for memory division. If you launch more gangs than there are 
resources then they get queued, so the memory doesn't get divided any more.



gives about 1.5KB of space for
worker-reduction variables and gang-private variables.


PTX, as I understand this, may generally have a lot of Thread Blocks in
flight: all for the same GPU kernel as well as any GPU kernels running
asynchronously/generally concurrently (system-wide), and libgomp does try
launching a high number of Thread Blocks ('num_gangs') (for purposes of
hiding memory access latency?).  Random example:

 nvptx_exec: kernel t0_r$_omp_fn$0: launch gangs=1920, workers=32, 
vectors=32

With that, PTX's 48 KiB of '.shared' memory per SM (processor) are then
not so much anymore: just '48 * 1024 / 1920 = 25' bytes of gang-private
memory available for each of the 1920 gangs: 'double x, y, z'?  (... for
the simple case where just one GPU kernel is executing.)


Your maths feels way off to me. That's not enough memory for any use, 
and it's not the only resource that will be stretched thin: how many GPU 
registers does an SM have? (I doubt that register contents are getting 
paged in and out.)


For comparison, with the maximum num_workers(16) GCN can run only 2 
gangs on each compute unit. Each compute unit can run 40 gangs 
simultaneously with num_workers(1), but that is the limit. If you launch 
more gangs than that then they are queued; even if you launch 100,000 
single-worker gangs, each one will still get 1/40th of the resources.


I doubt that NVPTX is magically running 1920 gangs of 32 workers on one 
SM without any queueing and with the gang resources split 1920 ways (and 
the worker resources split 61440 ways).



(I suppose that calculation is valid for a GPU hardware variant where
there is just one SM.  If there are several (typically in the order of a
few dozens?), I suppose the Thread Blocks launched will be distributed
over all these, thus improving the situation correspondingly.)

(And of course, there are certainly other factors that also limit the
number of Thread Blocks that are actually executing in parallel.)


We might have a
problem if there are large private arrays.


Yes, that's understood.

Also, directly related, the problem that comes with supporting
worker-private memory, which basically calculates to the amount necessary
for gang-private memory multiplied by the number of workers?  (Out of
scope at present.)


GCN just uses the stack space for that, which lives in main memory. 
That's limited resource, of course, but it's not architectural. I don't 
know what NVPTX does here.



I believe we have a "good enough" solution for the usual case


So you believe that.  ;-)

It's certainly what I'd hope, too!  But we don't know yet whether there's
any noticeable performance impact if we run with (potentially) lesser
parallelism, hence my question whether this patch has been run through
performance testing.


Well, indeed I don't know the comparative situation with benchmark 
results because the benchmarks couldn't run at full occupancy, on GCN

[PATCH] or1k: Add mcmodel option to handle large GOTs

2021-04-18 Thread Stafford Horne via Gcc-patches

When building libgeos we get an error with:

linux-uclibc/9.3.0/crtbeginS.o: in function `__do_global_dtors_aux':
crtstuff.c:(.text+0x118): relocation truncated to fit: R_OR1K_GOT16 against 
symbol `__cxa_finalize' defined in .text section in

/home/shorne/work/openrisc/3eb9f9d0f6d8274b2d19753c006bd83f7d536e3c/output/host/or1k-buildroot-linux-uclibc/sysroot/lib/libc.so.

This is caused by GOT code having a limit of 64k.  In OpenRISC this
looks to be the only relocation code pattern to be limited to 64k.

This patch allows specifying a new option -mcmodel=large which can be
used to generate 2 more instructions to construct 32-bit addresses for
up to 4G GOTs.

gcc/ChangeLog:

PR 99783
* config/or1k/or1k-opts.h: New file.
* config/or1k/or1k.c (or1k_legitimize_address_1, print_reloc):
Support generating gotha relocations if -mcmodel=large is
specified.
* config/or1k/or1k.h (TARGET_CMODEL_SMALL, TARGET_CMODEL_LARGE):
New macros.
* config/or1k/or1k.opt (mcmodel=): New option.
* doc/invoke.text (OpenRISC Options): Document mcmodel.
---

This depends on the binutils-gdb patch sent here:
 - https://sourceware.org/pipermail/binutils/2021-April/116155.html

 gcc/config/or1k/or1k-opts.h | 30 ++
 gcc/config/or1k/or1k.c  | 11 +--
 gcc/config/or1k/or1k.h  |  7 +++
 gcc/config/or1k/or1k.opt| 19 +++
 gcc/doc/invoke.texi | 12 +++-
 5 files changed, 76 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/or1k/or1k-opts.h

diff --git a/gcc/config/or1k/or1k-opts.h b/gcc/config/or1k/or1k-opts.h
new file mode 100644
index 000..f791b894fdd
--- /dev/null
+++ b/gcc/config/or1k/or1k-opts.h
@@ -0,0 +1,30 @@
+/* Definitions for option handling for OpenRISC.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   Contributed by Stafford Horne.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#ifndef GCC_OR1K_OPTS_H
+#define GCC_OR1K_OPTS_H
+
+/* The OpenRISC code generation models available.  */
+enum or1k_cmodel_type {
+  CMODEL_SMALL,
+  CMODEL_LARGE
+};
+
+#endif /* GCC_OR1K_OPTS_H */
diff --git a/gcc/config/or1k/or1k.c b/gcc/config/or1k/or1k.c
index e772a7addea..27d3fa17995 100644
--- a/gcc/config/or1k/or1k.c
+++ b/gcc/config/or1k/or1k.c
@@ -750,7 +750,14 @@ or1k_legitimize_address_1 (rtx x, rtx scratch)
{
  base = gen_sym_unspec (base, UNSPEC_GOT);
  crtl->uses_pic_offset_table = 1;
- t2 = gen_rtx_LO_SUM (Pmode, pic_offset_table_rtx, base);
+ if (TARGET_CMODEL_LARGE)
+   {
+ emit_insn (gen_rtx_SET (t1, gen_rtx_HIGH (Pmode, base)));
+ emit_insn (gen_add3_insn (t1, t1, pic_offset_table_rtx));
+ t2 = gen_rtx_LO_SUM (Pmode, t1, base);
+   }
+ else
+   t2 = gen_rtx_LO_SUM (Pmode, pic_offset_table_rtx, base);
  t2 = gen_const_mem (Pmode, t2);
  emit_insn (gen_rtx_SET (t1, t2));
  base = t1;
@@ -1089,7 +1096,7 @@ print_reloc (FILE *stream, rtx x, HOST_WIDE_INT add, 
reloc_kind kind)
  no special markup.  */
   static const char * const relocs[RKIND_MAX][RTYPE_MAX] = {
 { "lo", "got", "gotofflo", "tpofflo", "gottpofflo", "tlsgdlo" },
-{ "ha", NULL,  "gotoffha", "tpoffha", "gottpoffha", "tlsgdhi" },
+{ "ha", "gotha", "gotoffha", "tpoffha", "gottpoffha", "tlsgdhi" },
   };
   reloc_type type = RTYPE_DIRECT;
 
diff --git a/gcc/config/or1k/or1k.h b/gcc/config/or1k/or1k.h
index fe01ab81ead..669907e7e74 100644
--- a/gcc/config/or1k/or1k.h
+++ b/gcc/config/or1k/or1k.h
@@ -21,6 +21,8 @@
 #ifndef GCC_OR1K_H
 #define GCC_OR1K_H
 
+#include "config/or1k/or1k-opts.h"
+
 /* Names to predefine in the preprocessor for this target machine.  */
 #define TARGET_CPU_CPP_BUILTINS()  \
   do   \
@@ -37,6 +39,11 @@
 }  \
   while (0)
 
+#define TARGET_CMODEL_SMALL \
+  (or1k_code_model == CMODEL_SMALL)
+#define TARGET_CMODEL_LARGE \
+  (or1k_code_model == CMODEL_LARGE)
+
 /* Storage layout.  */
 
 #define DEFAULT_SIGNED_CHAR 1
diff --git a/gcc/config/or1k/or1k.opt b/gcc/config/or1k/or1k.opt
index 6bd0f3eee6d..cc23e3b8856 100644
--- a/gcc/config/or1k/or1k.opt
+++

[PATCH] PR tree-optimization/100081 - Limit depth of logical expression windback.

2021-04-18 Thread Andrew MacLeod via Gcc-patches

The code presented at the time wrestrict was invoking ranger has a basic 
block with about 7200 statements, all of them calculations which fed 
into 2614 logical ORs and 1480 logical ANDs.


the GORI component which calculates outgoing ranges starts at the exit 
of the block and works it way back  thru the basic block to the ssa_name 
requested, solving the equations along the way.


Logical expressions require a TRUE and FALSE results to be calculated 
for the ssa_name for each of two condition register operands on the 
logical expression.  Sometimes we can shortcut it, but frequently it its 
requires all 4 to properly evaluate.


When logical operations feed each other, this can become exponential.. 
and that was the situation here.


We had introduced a logical cache to attempt to reduce the number of 
calculations, but it was insufficient in this case.  We had previously 
discussed among ourselves that limiting the depth was probably 
sufficient, since the odds of being able to extract a meaningful range 
thru a large series of logical operations becomes highly unlikely.


This patch implements the depth limiter for logical statements and sets 
it at  6.  If there are more than 6 logical statements feeding each 
other we decide its not worth looking back any further...  and say this 
edge does not compute a range for that SSA_NAME.


This is different than the original patch I proposed in the PR, but 
accomplishes the same thing.  The original patch did the depth limit 
check at the time of evaluating an outgoing range.  This patch moves the 
check earlier when building the def chains, so the names never even 
appear as possible exports. The previous patch was still building these 
def chains a few thousand SSA names deep, but never using them due to 
the cutoff.   This patch will also therefore reduce the size of those 
dependency bitmaps which are not going to be used in either case.


This passes bootstrap and regression testing on x86_64-pc-linux-gnu.  I 
also ran it against the unlimited depth version on a full set of GCC 
source files, and there was 0 difference between this and the original 
in the cases ranger gets.


OK for trunk?

Andrew


PS. Second try with a new mailer system...  hopefully this time...

commit a36a130a1305bc2c103e47955354c0f0edda47f4
Author: Andrew MacLeod 
Date:   Fri Apr 16 17:08:51 2021 -0400

Limit depth of logical expression windback.

Limit how many logical expressions GORI will look back through when
evaluating outgoing edge range.

PR tree-optimization/100081
* gimple_range_gori.cc (LOGICAL_LIMIT): New local definition.
(is_gimple_logical_p): Move to top of file.
(range_def_chain::m_logical_depth): New member.
(range_def_chain::range_def_chain): Initialize m_logical_depth.
(range_def_chain::get_def_chain): Don't build defchains through more
than LOGICAL_LIMIT logical expressions.

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 7f7f3dc0d69..4be49369bbd 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -29,6 +29,36 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "gimple-range.h"
 
+// Limit the nested depth thru logical expressions which GORI will build
+// def chains.
+#define LOGICAL_LIMIT	6
+
+// Return TRUE if GS is a logical && or || expression.
+
+static inline bool
+is_gimple_logical_p (const gimple *gs)
+{
+  // Look for boolean and/or condition.
+  if (gimple_code (gs) == GIMPLE_ASSIGN)
+switch (gimple_expr_code (gs))
+  {
+	case TRUTH_AND_EXPR:
+	case TRUTH_OR_EXPR:
+	  return true;
+
+	case BIT_AND_EXPR:
+	case BIT_IOR_EXPR:
+	  // Bitwise operations on single bits are logical too.
+	  if (types_compatible_p (TREE_TYPE (gimple_assign_rhs1 (gs)),
+  boolean_type_node))
+	return true;
+	  break;
+
+	default:
+	  break;
+  }
+  return false;
+}
 
 /* RANGE_DEF_CHAIN is used to determine what SSA names in a block can
have range information calculated for them, and what the
@@ -76,6 +106,7 @@ public:
 private:
   vec m_def_chain;	// SSA_NAME : def chain components.
   void build_def_chain (tree name, bitmap result, basic_block bb);
+  int m_logical_depth;
 };
 
 
@@ -85,6 +116,7 @@ range_def_chain::range_def_chain ()
 {
   m_def_chain.create (0);
   m_def_chain.safe_grow_cleared (num_ssa_names);
+  m_logical_depth = 0;
 }
 
 // Destruct a range_def_chain.
@@ -157,6 +189,7 @@ range_def_chain::get_def_chain (tree name)
 {
   tree ssa1, ssa2, ssa3;
   unsigned v = SSA_NAME_VERSION (name);
+  bool is_logical = false;
 
   // If it has already been processed, just return the cached value.
   if (has_def_chain (name))
@@ -169,6 +202,15 @@ range_def_chain::get_def_chain (tree name)
   gimple *stmt = SSA_NAME_DEF_STMT (name);
   if (gimple_range_handler (stmt))
 {
+  is_logical = is_gimple_logical_p (stmt);
+  // Terminate the def chains

[PATCH] [i386] Fix a small performance regression. [PR98348]

2021-04-18 Thread Hongtao Liu via Gcc-patches

Hi:
  This patch is about to add a pre-reload splitter to transform
vpcmpeqd with a zero operand to vptestnmd, which could save a vpxor
instruction.

.i.e
-   vpxor   %xmm1, %xmm1, %xmm1
-   vpcmpd  $0, %zmm1, %zmm0, %k0
+   vptestnmd   %zmm0, %zmm0, %k0
vpmovm2d zmm0, k0

Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.

gcc/ChangeLog:

PR target/98348
* config/i386/sse.md (*_eq3): New pre_reload
splitter.
(*_eq3): Changed from combine splitter to
pre-reload splitter.

gcc/testsuite/ChangeLog:

PR target/98348
* gcc.target/i386/pr98348.c: New test.


-- 
BR,
Hongtao
From 7e59a44e079f6f4c9d39d625aaa3c5023cb545aa Mon Sep 17 00:00:00 2001
From: liuhongt 
Date: Fri, 16 Apr 2021 15:32:14 +0800
Subject: [PATCH] [i386] Add pre-reload splitter for blow transformation.

-   vpxor   %xmm1, %xmm1, %xmm1
-   vpcmpd  $0, %zmm1, %zmm0, %k0
+   vptestnmd   %zmm0, %zmm0, %k0
	vpmovm2d zmm0, k0

gcc/ChangeLog:

	PR target/98348
	* config/i386/sse.md (*_eq3): New pre_reload
	splitter.
	(*_eq3): Changed from combine splitter to
	pre-reload splitter.

gcc/testsuite/ChangeLog:

	PR target/98348
	* gcc.target/i386/pr98348.c: New test.
---
 gcc/config/i386/sse.md  | 23 +-
 gcc/testsuite/gcc.target/i386/pr98348.c | 57 +
 2 files changed, 78 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr98348.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 9d3728d1cb0..f2c16db1c1a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3273,7 +3273,7 @@ (define_int_attr pcmp_signed_mask
 ;; For signed comparison, handle EQ 0: NEQ 4,
 ;; for unsigned comparison extra handle LE:2, NLE:6, equivalent to EQ and NEQ.
 
-(define_split
+(define_insn_and_split "*_ucmp3"
   [(set (match_operand: 0 "register_operand")
 	(unspec:
 	  [(us_minus:VI12_AVX512VL
@@ -3282,9 +3282,11 @@ (define_split
 	   (match_operand:VI12_AVX512VL 3 "const0_operand")
 	   (match_operand:SI 4 "const_0_to_7_operand")]
 	  UNSPEC_PCMP_ITER))]
-  "TARGET_AVX512BW
+  "TARGET_AVX512BW && ix86_pre_reload_split ()
   && ix86_binary_operator_ok (US_MINUS, mode, operands)
   && (INTVAL (operands[4]) & ) == 0"
+  "#"
+  "&& 1"
   [(const_int 0)]
   {
 /* LE: 2, NLT: 5, NLE: 6, LT: 1  */
@@ -13026,6 +13028,23 @@ (define_insn "_eq3_1"
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
+(define_insn_and_split "*_eq3"
+  [(set (match_operand: 0 "register_operand")
+	(unspec:
+	  [(match_operand:VI_AVX512BW 1 "nonimm_or_0_operand")
+	   (match_operand:VI_AVX512BW 2 "nonimm_or_0_operand")
+	   (const_int 0)]
+	  UNSPEC_PCMP_ITER))]
+  "TARGET_AVX512F && ix86_pre_reload_split ()
+  && (operands[1] == CONST0_RTX (mode)
+  || operands[2] == CONST0_RTX (mode))"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+  	(unspec:
+	  [(match_dup 1) (match_dup 2)]
+	  UNSPEC_MASKED_EQ))])
+
 (define_insn "*sse4_1_eqv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
 	(eq:V2DI
diff --git a/gcc/testsuite/gcc.target/i386/pr98348.c b/gcc/testsuite/gcc.target/i386/pr98348.c
new file mode 100644
index 000..3c0259f8ca7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr98348.c
@@ -0,0 +1,57 @@
+/* PR target/98348  */
+/* { dg-do compile } */
+/* { dg-options "-mavx512bw -O2" } */
+/* { dg-final { scan-assembler-times "vptestnm\[bwdq\]" "8" } } */
+/* { dg-final { scan-assembler-not "pcmp" } } */
+/* { dg-final { scan-assembler-not "pxor" } } */
+
+
+typedef int v16si __attribute__ ((vector_size(64)));
+typedef short v32hi __attribute__ ((vector_size(64)));
+typedef char v64qi __attribute__ ((vector_size(64)));
+typedef long long v8di __attribute__ ((vector_size(64)));
+typedef unsigned int v16usi __attribute__ ((vector_size(64)));
+typedef unsigned short v32uhi __attribute__ ((vector_size(64)));
+typedef unsigned char v64uqi __attribute__ ((vector_size(64)));
+typedef unsigned long long v8udi __attribute__ ((vector_size(64)));
+
+
+v8di
+compareq(v8di x) {
+  return x == 0;
+}
+
+v16si
+compared(v16si x) {
+  return x == 0;
+}
+
+v32hi
+comparew(v32hi x) {
+  return x == 0;
+}
+
+v64qi
+compareb(v64qi x) {
+  return x == 0;
+}
+
+v8udi
+compareuq(v8udi x) {
+  return x == 0;
+}
+
+v16usi
+compareud(v16usi x) {
+  return x == 0;
+}
+
+v32uhi
+compareuw(v32uhi x) {
+  return x == 0;
+}
+
+v64uqi
+compareub(v64uqi x) {
+  return x == 0;
+}
-- 
2.18.1

Re: [PATCH] docs: Remove empty table column.

Re: [Bug libstdc++/99402] [10 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

Re: [Bug libstdc++/99402] [10 Regression] std::copy creates _GLIBCXX_DEBUG false positive for attempt to subscript a dereferenceable (start-of-sequence) iterator

[PATCH] combine: Don't create REG_UNUSED notes if the reg already died (PR99927)

[Patch, fortran] PR fortran/100136 - ICE, regression, using flag -fcheck=pointer

Re: [PATCH] combine: Don't create REG_UNUSED notes if the reg already died (PR99927)

Re: [PATCH] combine: Don't create REG_UNUSED notes if the reg already died (PR99927)

Re: [PATCH] libstdc++: Update some baseline_symbols.txt

Re: [PATCH,FORTRAN 00/29] Move towards stringpool, part 1

[PATCH] wwwdocs: Do not rewrite the page titles

Re: [PATCH 1/3] openacc: Add support for gang local storage allocation in shared memory

[PATCH] or1k: Add mcmodel option to handle large GOTs

[PATCH] PR tree-optimization/100081 - Limit depth of logical expression windback.

[PATCH] [i386] Fix a small performance regression. [PR98348]

14 matches

Site Navigation

Mail list logo

Footer information