Re: [patch, fortran] PR66461 ICE on missing end program in fixed source

2016-05-22 Thread Jerry DeLisle
On 05/22/2016 04:53 AM, Andre Vehreschild wrote:
> Hi Jerry,
> 
> I have tested your patch and gave it a review and the only thing I like
> to have is a testcase. Can you provide one from the PR? With a testcase
> I say the patch is ok for trunk and thanks for the patch.
> 
> Please note, I don't have review rights in the area the patch
> addresses, although I am familiar with the matcher having worked in it.
> This "review" is just a helper for an official reviewer to "second" my
> opinion, hoping to get your patch faster into trunk.
> 
> Regards and thanks for the patch,
>   Andre

Thanks Andre, I am going to hold off just a bit after some further info from 
Mikael.

We are obviously intercepting the problem in different ways, but the root
problem still exists.

Regards,

Jerry


Re: [RFC] Type promotion pass and elimination of zext/sext

2016-05-22 Thread Kugan Vivekanandarajah
Hi Richard,

> So what does this mean for this pass?  It means that we need to think
> about the immediate goal we want to fulfil - which might be to just
> promote things that we can fully promote, avoiding the necessity to
> prevent passes from undoing our work.  That said - we need a set of
> testcases the pass should enable to being optimized better than without it
> (I myself see the idea of promoting on GIMPLE according to PROMOTE_MODE
> as good design cleanup towards pushing GIMPLE farther out).

I will appreciate any test-cases you think that  think should work (optimized).

I will also try to gather test-cases based on testing/benchmarking.

Thanks,
Kugan


Re: [RFC] Type promotion pass and elimination of zext/sext

2016-05-22 Thread Kugan Vivekanandarajah
Hi Jeff,

On 20 May 2016 at 04:17, Jeff Law  wrote:
> On 05/15/2016 06:45 PM, Kugan Vivekanandarajah wrote:
>>
>> Hi Richard,
>>
>> Now that stage1 is open, I would like to get the type promotion passes
>> reviewed again. I have tested the patches on aarch64, x86-64, and
>> ppc64le without any new execution failures. There some test-cases that
>> fails for patterns. I will address them after getting feedback on the
>> basic structure.
>
> I find myself wondering if this will eliminate some of the cases where Kai's
> type casting motion was useful.  And just to be clear, that would be a good
> thing.

Let me try with the above patch.
However:

aarch64 testsuite diff is:
# Comparing 11 common sum files
## /bin/sh ./gcc/contrib/compare_tests  /tmp/gxx-sum1.20074 /tmp/gxx-sum2.20074
Tests that now fail, but worked before:

gcc.dg/tree-ssa/pr54245.c scan-tree-dump-times slsr "Inserting initializer" 0
gcc.dg/tree-ssa/pr69270-3.c scan-tree-dump-times uncprop1 ", 1" 1

New tests that PASS:

gcc.c-torture/unsorted/dump-noaddr.c.*t.promotion,  -O1  comparison
gcc.c-torture/unsorted/dump-noaddr.c.*t.promotion,  -O2  comparison
gcc.c-torture/unsorted/dump-noaddr.c.*t.promotion,  -O2 -flto
-fno-use-linker-plugin -flto-partition=none  comparison
gcc.c-torture/unsorted/dump-noaddr.c.*t.promotion,  -O3
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer
-finline-functions  comparison
gcc.c-torture/unsorted/dump-noaddr.c.*t.promotion,  -O3 -g  comparison
gcc.c-torture/unsorted/dump-noaddr.c.*t.promotion,  -Os  comparison

## Differences found:
# 1 differences in 11 common sum files found


ppc64 had more.

I see improvements and some regressions too;

For example:

short unPack( unsigned char c )
{
c = c & (unsigned char)0x0F ;

if( c > 7 ) {
return( ( short )( c - 8 ) ) ;
}
else
{
return( ( short )c ) ;
}
}

asm differnce:
 andw0, w0, 15
-cmpw0, 7
-bhi.L5
-sxthw0, w0
-ret
-.p2align 3
-.L5:
-subw0, w0, #8
-sxthw0, w0
+subw1, w0, #8
+cmpw0, 8
+cselw0, w1, w0, cs
 ret


short bar (short y);
int foo (short x)
{
  short y = bar (x) + 15;
  return y;
}

Base - optimized gimple
foo (short int x)
{
  short int y;
  short int _1;
  unsigned short _2;
  unsigned short _3;
  int _8;

  :
  _1 = bar (x_5(D));
  _2 = (unsigned short) _1;
  _3 = _2 + 15;
  y_7 = (short int) _3;
  _8 = (int) y_7;
  return _8;

}

With type promotion - optimized gimple
foo (short int x)
{
  unsigned int _2;
  unsigned int _3;
  signed int _7;
  int _8;
  short int _10;
  unsigned int _12;

  :
  _10 = bar (x_9(D));
  _2 = (unsigned int) _10;
  _3 = _2 + 15;
  _12 = _3 & 65535;
  _7 = (signed int) _12;
  _8 = (_7) sext (16);
  return _8;

}

ASM difference:

 stpx29, x30, [sp, -16]!
 addx29, sp, 0
 blbar
+sxthw0, w0
 addw0, w0, 15
 ldpx29, x30, [sp], 16
 sxthw0, w0


Thanks,
Kugan


>
>>
>> 1. When we promote SSA as part of promote_ssa, we either promote the
>> definition. Or create a copy stmt that is inserted after the stmt that
>> define it. i.e, we want to promote the SSA and reflect the promotion
>> on all the uses (we promote in place). We do this because, we don’t
>> want to change all the uses.
>>
>> +/* Promote definition DEF to promoted type.  If the stmt that defines def
>> +   is def_stmt, make the type of def promoted type.  If the stmt is such
>> +   that, result of the def_stmt cannot be of promoted type, create a
>> new_def
>> +   of the original_type and make the def_stmt assign its value to newdef.
>> +   Then, create a NOP_EXPR to convert new_def to def of promoted type.
>> +
>> +   For example, for stmt with original_type char and promoted_type int:
>> +char _1 = mem;
>> +becomes:
>> +char _2 = mem;
>> +int _1 = (int)_2;
>
> When does this case happen, and how is this any better than PRE or other
> elimination/code motion algorithms in improving the generated code?
>
> I would hazard a guess that it could happen if you still needed the char
> sized used in a small number of cases, but generally wanted to promote most
> uses to int?
>
>> +
>>
>> However, if the defining stmt has to be the last stmt in the basic
>> block (eg, stmt that can throw), and if there is more than one normal
>> edges where we use this value, we cant insert the copy in all the
>> edges. Please note that the copy stmt copes the value to promoted SSA
>> with the same name.
>>
>> Therefore I had to return false in this case for promote_ssa and fixup
>> uses. I ran into this while testing ppc64le. I am sure it can happen
>> in other cases.
>
> Right.
>
> Jeff


Re: [PATCH] Fix ICE with x87 asm operands (PR inline-asm/68843)

2016-05-22 Thread Uros Bizjak
On Sun, May 22, 2016 at 9:00 AM, Bernd Edlinger
 wrote:
> Hi!
>
> as described in the PR there are several non-intuitive rules that
> one has to follow to avoid ICEs with x87 asm operands.
>
> This patch adds an explicit rule, that avoids ICE in the first test case and
> removes an unnecessary error message in the second test case.
>
>
> Boot-strapped and regression-tested on x86_64-pc-linux-gnu.
> OK for trunk?

This patch is actually dealing with two separate problems

This part:

@@ -607,7 +631,7 @@ check_asm_stack_operands (rtx_insn *insn)
  record any earlyclobber.  */

   for (i = n_outputs; i < n_outputs + n_inputs; i++)
-if (op_alt[i].matches == -1)
+if (op_alt[i].matches == -1 && STACK_REG_P (recog_data.operand[i]))
   {
  int j;

is OK, although, I'd written it as:


+if (STACK_REG_P (recog_data.operand[i]) && op_alt[i].matches == -1)

with slightly simplified testcase:

--cut here--
int
__attribute__((noinline, noclone))
test (double y)
{
  int a, b;
  asm ("fistpl (%1)\n\t"
   "movl (%1), %0"
   : "=r" (a)
   : "r" (), "t" (y)
   : "st");
  return a;
}

int
main ()
{
  int t = -10;

  if (test (t) != t)
__builtin_abort ();
  return 0;
}
--cut here--

BTW: It looks to me you also don't need all-memory clobber here.

This part is OK, with a testcase you provided it borders on obvious.
However, you will need rtl-optimization approval for the other
problem.

Uros.


[C++ Patch] PR 69095

2016-05-22 Thread Paolo Carlini

Hi,

finally sending a patch for this issue. As noticed by submitter himself, 
it appears to boil down to a rather straightforward case of not 
rejecting unexpanded parameter packs in default arguments. In order to 
handle all the combinations (in/out of class, template 
parameter/function parameter) I added calls of 
check_for_bare_parameter_packs both to cp_parser_default_argument and 
cp_parser_late_parsing_default_args (to have a meaningful location for 
the latter, the patchlet which I sent earlier today is a must). Tested 
x86_64-linux.


Thanks,
Paolo.

//
/cp
2016-05-22  Paolo Carlini  

PR c++/69095
* parser.c (cp_parser_default_argument): Call
check_for_bare_parameter_packs.
(cp_parser_late_parsing_default_args): Likewise.

/testsuite
2016-05-22  Paolo Carlini  

PR c++/69095
* g++.dg/cpp0x/variadic168.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 236569)
+++ cp/parser.c (working copy)
@@ -20673,6 +20673,8 @@ cp_parser_default_argument (cp_parser *parser, boo
 }
   if (BRACE_ENCLOSED_INITIALIZER_P (default_argument))
 maybe_warn_cpp0x (CPP0X_INITIALIZER_LISTS);
+  if (check_for_bare_parameter_packs (default_argument))
+default_argument = error_mark_node;
   if (template_parm_p)
 pop_deferring_access_checks ();
   parser->greater_than_is_operator_p = saved_greater_than_is_operator_p;
@@ -26403,6 +26405,9 @@ cp_parser_late_parsing_default_args (cp_parser *pa
  continue;
}
 
+  if (check_for_bare_parameter_packs (parsed_arg))
+   parsed_arg = error_mark_node;
+
   TREE_PURPOSE (parm) = parsed_arg;
 
   /* Update any instantiations we've already created.  */
Index: testsuite/g++.dg/cpp0x/variadic168.C
===
--- testsuite/g++.dg/cpp0x/variadic168.C(revision 0)
+++ testsuite/g++.dg/cpp0x/variadic168.C(working copy)
@@ -0,0 +1,18 @@
+// PR c++/69095
+// { dg-do compile { target c++11 } }
+
+struct B1 {
+  template  // { 
dg-error "parameter packs not expanded" }
+  void insert(Ret);
+};
+
+struct B2 {
+  template 
+  void insert(Ret, unsigned = sizeof(Args)); // { dg-error "parameter packs 
not expanded" }
+};
+
+template  // { 
dg-error "parameter packs not expanded" }
+void insert1(Ret);
+
+template 
+void insert2(Ret, unsigned = sizeof(Args)); // { dg-error "parameter packs not 
expanded" }


match.pd: Relax some tree_nop_conversion_p

2016-05-22 Thread Marc Glisse

Hello,

this patch replaces some tree_nop_conversion_p tests with less restrictive 
conditions. In some cases I checked the transformation automatically (of 
course I could have messed up the checker, or the translation). I didn't 
always put the laxest possible check. For instance the transformation for 
(~x & ~y) is valid with sign extension, but the gain is less obvious in 
that case. ~(~X >> Y) also seems valid in some odd cases involving boolean 
types, not worth the complication. The bad case for a * (1 << b) is when 
1

gcc/
* match.pd (a * (1 << b), ~x & ~y, ~X ^ ~Y, (X ^ Y) ^ Y, ~ (-A),
~ (A - 1), ~(~X >> Y), ~(~X >>r Y)): Relax constraints.

gcc/testsuite/
* gcc.dg/fold-notshift-2.c: Adjust.

--
Marc GlisseIndex: gcc/match.pd
===
--- gcc/match.pd	(revision 236488)
+++ gcc/match.pd	(working copy)
@@ -447,21 +447,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for ops (conj negate)
  (for cabss (CABS)
   (simplify
(cabss (ops @0))
(cabss @0
 
 /* Fold (a * (1 << b)) into (a << b)  */
 (simplify
  (mult:c @0 (convert? (lshift integer_onep@1 @2)))
   (if (! FLOAT_TYPE_P (type)
-   && tree_nop_conversion_p (type, TREE_TYPE (@1)))
+   && (element_precision (type) <= element_precision (TREE_TYPE (@1))
+	   || TYPE_UNSIGNED (TREE_TYPE (@1
(lshift @0 @2)))
 
 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
  (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
   (if (flag_associative_math
&& single_use (@3))
(with
 { tree tem = const_binop (MULT_EXPR, type, @0, @2); }
 (if (tem)
@@ -648,22 +649,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (bit_and:c (bit_ior:c @0 @1) (bit_xor:c @1 (bit_not @0)))
  (bit_and @0 @1))
 
 /* ~x & ~y -> ~(x | y)
~x | ~y -> ~(x & y) */
 (for op (bit_and bit_ior)
  rop (bit_ior bit_and)
  (simplify
   (op (convert1? (bit_not @0)) (convert2? (bit_not @1)))
-  (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
-   && tree_nop_conversion_p (type, TREE_TYPE (@1)))
+  (if (element_precision (type) <= element_precision (TREE_TYPE (@0))
+   && element_precision (type) <= element_precision (TREE_TYPE (@1)))
(bit_not (rop (convert @0) (convert @1))
 
 /* If we are XORing or adding two BIT_AND_EXPR's, both of which are and'ing
with a constant, and the two constants have no bits in common,
we should treat this as a BIT_IOR_EXPR since this may produce more
simplifications.  */
 (for op (bit_xor plus)
  (simplify
   (op (convert1? (bit_and@4 @0 INTEGER_CST@1))
   (convert2? (bit_and@5 @2 INTEGER_CST@3)))
@@ -674,22 +675,22 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* (X | Y) ^ X -> Y & ~ X*/
 (simplify
  (bit_xor:c (convert? (bit_ior:c @0 @1)) (convert? @0))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (convert (bit_and @1 (bit_not @0)
 
 /* Convert ~X ^ ~Y to X ^ Y.  */
 (simplify
  (bit_xor (convert1? (bit_not @0)) (convert2? (bit_not @1)))
- (if (tree_nop_conversion_p (type, TREE_TYPE (@0))
-  && tree_nop_conversion_p (type, TREE_TYPE (@1)))
+ (if (element_precision (type) <= element_precision (TREE_TYPE (@0))
+  && element_precision (type) <= element_precision (TREE_TYPE (@1)))
   (bit_xor (convert @0) (convert @1
 
 /* Convert ~X ^ C to X ^ ~C.  */
 (simplify
  (bit_xor (convert? (bit_not @0)) INTEGER_CST@1)
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
   (bit_xor (convert @0) (bit_not @1
 
 /* Fold (X & Y) ^ Y and (X ^ Y) & Y as ~X & Y.  */
 (for opo (bit_and bit_xor)
@@ -715,22 +716,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* Some simple reassociation for bit operations, also handled in reassoc.  */
 /* (X & Y) & Y -> X & Y
(X | Y) | Y -> X | Y  */
 (for op (bit_and bit_ior)
  (simplify
   (op:c (convert?@2 (op:c @0 @1)) (convert? @1))
   @2))
 /* (X ^ Y) ^ Y -> X  */
 (simplify
  (bit_xor:c (convert? (bit_xor:c @0 @1)) (convert? @1))
- (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
-  (convert @0)))
+ (convert @0))
 /* (X & Y) & (X & Z) -> (X & Y) & Z
(X | Y) | (X | Z) -> (X | Y) | Z  */
 (for op (bit_and bit_ior)
  (simplify
   (op:c (convert1?@3 (op:c@4 @0 @1)) (convert2?@5 (op:c@6 @0 @2)))
   (if (tree_nop_conversion_p (type, TREE_TYPE (@1))
&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
(if (single_use (@5) && single_use (@6))
 (op @3 (convert @2))
 (if (single_use (@3) && single_use (@4))
@@ -908,31 +908,34 @@ 

Re: [PATCH] Fix up a few i386 tests

2016-05-22 Thread Uros Bizjak
On Sat, May 21, 2016 at 7:16 PM, Jakub Jelinek  wrote:
> Hi!
>
> While trying to look for bugs using the
> https://sourceware.org/ml/binutils/2016-05/msg00328.html
> hacks, in order to achive more testing I've also turned all
> dg-do compile tests into dg-do assemble, so that they would be assembled and
> I could watch out diagnostics.  There are about 2 tests that use complete
> garbage in inline asm, which is fine, but I guess the following ones are
> unintended that it isn't valid assembly.
> The first one, for -m64 it would be enough to use "r" (1LL) or "r" (2LL),
> but for -m32 kmovq supports just loading from m64.
> The second one has -mavx512f only enabled, so I've replaced the avx512dq
> instruction used in there with an avx512f one.
> And the third one contains asm template that is only valid for 32-bit code.
>
> Tested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-05-21  Jakub Jelinek  
>
> * gcc.target/i386/avx512bw-kunpckdq-1.c (avx512bw_test): Use "m"
> constraint instead of "r".
> * gcc.target/i386/avx512f-additional-reg-names.c (foo): Use vpxord
> insn instead of vxorpd.
> * gcc.target/i386/strinline.c (__mempcpy_by2): Use empty asm template
> string for x86_64.

strinline.c is a testcase for 32bit x86 compile failure (c.f.
PR33552). Please just mark it ia32 with:

/* { dg-require-effective-target ia32 } */

OK with this change.

Thanks,
Uros.

> --- gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-1.c.jj  2014-12-03 
> 15:06:06.469866209 +0100
> +++ gcc/testsuite/gcc.target/i386/avx512bw-kunpckdq-1.c 2016-05-21 
> 18:35:34.269533825 +0200
> @@ -8,9 +8,10 @@ void
>  avx512bw_test () {
>__mmask64 k1, k2, k3;
>volatile __m512i x;
> +  long long one = 1, two = 2;
>
> -  __asm__( "kmovq %1, %0" : "=k" (k1) : "r" (1) );
> -  __asm__( "kmovq %1, %0" : "=k" (k2) : "r" (2) );
> +  __asm__( "kmovq %1, %0" : "=k" (k1) : "m" (one) );
> +  __asm__( "kmovq %1, %0" : "=k" (k2) : "m" (two) );
>
>k3 = _mm512_kunpackd (k1, k2);
>x = _mm512_mask_avg_epu8 (x, k3, x, x);
> --- gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c.jj 
> 2014-10-01 16:27:25.838134349 +0200
> +++ gcc/testsuite/gcc.target/i386/avx512f-additional-reg-names.c
> 2016-05-21 18:37:46.505781090 +0200
> @@ -5,5 +5,5 @@ void foo ()
>  {
>register int zmm_var asm ("zmm6") __attribute__((unused));
>
> -  __asm__ __volatile__("vxorpd %%zmm0, %%zmm0, %%zmm7\n" : : : "zmm7" );
> +  __asm__ __volatile__("vpxord %%zmm0, %%zmm0, %%zmm7\n" : : : "zmm7" );
>  }
> --- gcc/testsuite/gcc.target/i386/strinline.c.jj2014-09-25 
> 15:02:06.703336175 +0200
> +++ gcc/testsuite/gcc.target/i386/strinline.c   2016-05-21 18:37:07.454298661 
> +0200
> @@ -8,7 +8,11 @@ __mempcpy_by2 (char *__dest, __const cha
>register char *__tmp = __dest;
>register unsigned long int __d0, __d1;
>__asm__ __volatile__
> -("shrl  $1,%3\n\t"
> +(
> +#ifdef __x86_64__
> + ""
> +#else
> + "shrl  $1,%3\n\t"
>   "jz2f\n"
>   "1:\n\t"
>   "movl  (%2),%0\n\t"
> @@ -20,6 +24,7 @@ __mempcpy_by2 (char *__dest, __const cha
>   "2:\n\t"
>   "movw  (%2),%w0\n\t"
>   "movw  %w0,(%1)"
> +#endif
>   : "=" (__d0), "=r" (__tmp), "=" (__src), "=" (__d1),
> "=m" ( *(struct { __extension__ char __x[__srclen]; } *)__dest)
>   : "1" (__tmp), "2" (__src), "3" (__srclen / 2),
>
> Jakub


Re: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-22 Thread Uros Bizjak
On Sat, May 21, 2016 at 9:48 AM, Uros Bizjak  wrote:
> On Fri, May 20, 2016 at 7:49 PM, H.J. Lu  wrote:
>> On Fri, May 20, 2016 at 10:15 AM, Rainer Orth
>>  wrote:
>>> "H.J. Lu"  writes:
>>>
 On Thu, May 12, 2016 at 10:54 AM, H.J. Lu  wrote:
>>> Here is a patch to add
>>> -mgeneral-regs-only option to x86 backend.   We can update
>>> spec for interrupt handle to recommend compiling interrupt handler
>>> with -mgeneral-regs-only option and add a note for compiler
>>> implementers.
>>>
>>> OK for trunk if there is no regression?
>>
>>
>> I can't comment on the code patch, but for the documentation part:
>>
>>> @@ -24242,6 +24242,12 @@ opcodes, to mitigate against certain forms of
>>> attack. At the moment,
>>>  this option is limited in what it can do and should not be relied
>>>  on to provide serious protection.
>>>
>>> +@item -mgeneral-regs-only
>>> +@opindex mgeneral-regs-only
>>> +Generate code which uses only the general-purpose registers.  This will
>>
>>
>> s/which/that/
>>
>>> +prevent the compiler from using floating-point, vector, mask and bound
>>
>>
>> s/will prevent/prevents/
>>
>>> +registers, but will not impose any restrictions on the assembler.
>>
>>
>> Maybe you mean to say "does not restrict use of those registers in inline
>> assembly code"?  In any case, please get rid of the future tense here, 
>> too.
>
> I changed it to
>
> ---
> @item -mgeneral-regs-only
> @opindex mgeneral-regs-only
> Generate code that uses only the general-purpose registers.  This
> prevents the compiler from using floating-point, vector, mask and bound
> registers.
> ---
>

 Here is the updated patch.  Tested on x86-64.  OK for trunk?
>>>
>>> This patch broke {i386,x86_64}-apple-darwin15.5.0 bootstrap:
>>>
>>> In file included from ./tm.h:16:0,
>>>  from /vol/gcc/src/hg/trunk/local/gcc/genattrtab.c:108:
>>> ./options.h:5443:2: error: #error too many target masks
>>>  #error too many target masks
>>>   ^
>>> Makefile:2497: recipe for target 'build/genattrtab.o' failed
>>> make[3]: *** [build/genattrtab.o] Error 1
>>>
>>> options.h has
>>>
>>> #define OPTION_MASK_ISA_XSAVES (HOST_WIDE_INT_1 << 62)
>>> #error too many target masks
>>>
>>> The tree bootstraps just fine at the previous revision.
>>>
>>
>> Tested on x86-64.  OK for trunk?
>
> No, this is a flag, not a variable. Let's figure out how to extend
> target flags to more than 63 flags first.
>
> Please revert the original patch in the mean time.

HJ is away from keyboard for a couple of days, I have reverted the
patch on his request.

Uros.


[fortran, patch, pr69659, v1] [6/7 Regression] ICE on using option -frepack-arrays, in gfc_conv_descriptor_data_get

2016-05-22 Thread Andre Vehreschild
Hi all,

attached patch fixes a regression that occurred on some testcases when
doing a validation run with -frepack-arrays. The issue here was that
for class arrays the array descriptor is in the _data component and not
directly at the address of the class_array. The patch fixes this issue
for pr69659 on trunk 7 and gcc-6-branch.

Ok for trunk and gcc-6?

Bootstrapped and regtested on x86_64-linux-gnu.

Regards,
Andre
-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 7be301d..403ce3a 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -6386,7 +6386,12 @@ gfc_trans_dummy_array_bias (gfc_symbol * sym, tree tmpdesc,
   stmtCleanup = gfc_finish_block ();
 
   /* Only do the cleanup if the array was repacked.  */
-  tmp = build_fold_indirect_ref_loc (input_location, dumdesc);
+  if (is_classarray)
+	/* For a class array the dummy array descriptor is in the _class
+	   component.  */
+	tmp = gfc_class_data_get (dumdesc);
+  else
+	tmp = build_fold_indirect_ref_loc (input_location, dumdesc);
   tmp = gfc_conv_descriptor_data_get (tmp);
   tmp = fold_build2_loc (input_location, NE_EXPR, boolean_type_node,
 			 tmp, tmpdesc);
diff --git a/gcc/testsuite/gfortran.dg/class_array_22.f03 b/gcc/testsuite/gfortran.dg/class_array_22.f03
new file mode 100644
index 000..9410741
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/class_array_22.f03
@@ -0,0 +1,25 @@
+! { dg-do compile }
+! { dg-options "-frepack-arrays " }
+!
+! Original class_array_11.f03 but with -frepack-arrays a new
+! ICE was produced reported in
+! PR fortran/69659
+!
+! Original testcase by Ian Harvey 
+! Reduced by Janus Weil 
+
+  IMPLICIT NONE
+
+  TYPE :: ParentVector
+INTEGER :: a
+  END TYPE ParentVector
+
+CONTAINS
+
+  SUBROUTINE vector_operation(pvec)
+CLASS(ParentVector), INTENT(INOUT) :: pvec(:)
+print *,pvec(1)%a
+  END SUBROUTINE
+
+END
+
gcc/testsuite/ChangeLog:

2016-05-22  Andre Vehreschild  

PR fortran/69659
* gfortran.dg/class_array_22.f03: New test.


gcc/fortran/ChangeLog:

2016-05-22  Andre Vehreschild  

PR fortran/69659
* trans-array.c (gfc_trans_dummy_array_bias): For class arrays use
the address of the _data component to reference the arrays data
component.



[C++ Patch] Improve check_for_bare_parameter_packs location

2016-05-22 Thread Paolo Carlini

Hi,

I'm finally completing a candidate fix for c++/69095 and I noticed that 
in order to get function parameters (not just template parameters, as 
originally reported) right it's important that the location used by 
check_for_bare_parameter_packs is (more) accurate. But the below seems 
conceptually independent anyway. Tested x86_64-linux.


Thanks,
Paolo.

/
/cp
2016-05-23  Paolo Carlini  

* pt.c (check_for_bare_parameter_packs): Improve error message
location.

/testsuite
2016-05-23  Paolo Carlini  

* g++.dg/cpp0x/pr31445.C: Test column number too.
* g++.dg/cpp0x/pr32253.C: Likewise.
* g++.dg/cpp0x/variadic-ex13.C: Likewise.
* g++.dg/cpp0x/variadic36.C: Likewise.
Index: cp/pt.c
===
--- cp/pt.c (revision 236569)
+++ cp/pt.c (working copy)
@@ -3761,7 +3761,8 @@ check_for_bare_parameter_packs (tree t)
 
   if (parameter_packs) 
 {
-  error ("parameter packs not expanded with %<...%>:");
+  location_t loc = EXPR_LOC_OR_LOC (t, input_location);
+  error_at (loc, "parameter packs not expanded with %<...%>:");
   while (parameter_packs)
 {
   tree pack = TREE_VALUE (parameter_packs);
@@ -3776,9 +3777,9 @@ check_for_bare_parameter_packs (tree t)
 name = DECL_NAME (pack);
 
  if (name)
-   inform (input_location, "%qD", name);
+   inform (loc, "%qD", name);
  else
-   inform (input_location, "");
+   inform (loc, "");
 
   parameter_packs = TREE_CHAIN (parameter_packs);
 }
Index: testsuite/g++.dg/cpp0x/pr31445.C
===
--- testsuite/g++.dg/cpp0x/pr31445.C(revision 236569)
+++ testsuite/g++.dg/cpp0x/pr31445.C(working copy)
@@ -2,7 +2,7 @@
 template  struct A
 {
   void foo(T...);
-  A(T... t) { foo(t); } // { dg-error "parameter packs|t" }
+  A(T... t) { foo(t); } // { dg-error "18:parameter packs|t" }
 };
 
 A a(0);
Index: testsuite/g++.dg/cpp0x/pr32253.C
===
--- testsuite/g++.dg/cpp0x/pr32253.C(revision 236569)
+++ testsuite/g++.dg/cpp0x/pr32253.C(working copy)
@@ -1,7 +1,7 @@
 // { dg-do compile { target c++11 } }
 template struct A
 {
-  A() { fp(); } // { dg-error "not expanded|fp" }
+  A() { fp(); } // { dg-error "11:parameter packs not expanded|fp" }
 };
 
 void foo();
Index: testsuite/g++.dg/cpp0x/variadic-ex13.C
===
--- testsuite/g++.dg/cpp0x/variadic-ex13.C  (revision 236569)
+++ testsuite/g++.dg/cpp0x/variadic-ex13.C  (working copy)
@@ -33,7 +33,7 @@ template void g(Args... args)
 {
f(const_cast()...); // okay: ``Args'' and ``args'' are 
expanded
f(5 ...); // { dg-error "contains no argument packs" }
-   f(args); // { dg-error "parameter packs not expanded" }
+   f(args); // { dg-error "5:parameter packs not expanded" }
// { dg-message "args" "note" { target *-*-* } 36 }
f(h(args...) + args...); // okay: first ``args'' expanded within h, second 
``args'' expanded within f.
 }
Index: testsuite/g++.dg/cpp0x/variadic36.C
===
--- testsuite/g++.dg/cpp0x/variadic36.C (revision 236569)
+++ testsuite/g++.dg/cpp0x/variadic36.C (working copy)
@@ -2,7 +2,7 @@
 template
 void f(const T&, const Args&... args)
 {
-  f(args); // { dg-error "packs not expanded" }
+  f(args); // { dg-error "4:parameter packs not expanded" }
 }
 
 template


Debug iterator cleanup

2016-05-22 Thread François Dumont

Hi

I just want to make sure that you agree that I can remove the @todo 
by implementing operator-> this way.


* include/debug/safe_iterator.h
(_Safe_iterator<>::operator->()): Implement using underlying iterator
operator ->.
* include/debug/safe_local_iterator.h
(_Safe_local_iterator<>::operator->()): Likewise.

François
diff --git a/libstdc++-v3/include/debug/safe_iterator.h b/libstdc++-v3/include/debug/safe_iterator.h
index 5368f3b..03c0263 100644
--- a/libstdc++-v3/include/debug/safe_iterator.h
+++ b/libstdc++-v3/include/debug/safe_iterator.h
@@ -274,7 +274,6 @@ namespace __gnu_debug
   /**
*  @brief Iterator dereference.
*  @pre iterator is dereferenceable
-   *  @todo Make this correct w.r.t. iterators that return proxies
*/
   pointer
   operator->() const _GLIBCXX_NOEXCEPT
@@ -282,7 +281,7 @@ namespace __gnu_debug
 	_GLIBCXX_DEBUG_VERIFY(this->_M_dereferenceable(),
 			  _M_message(__msg_bad_deref)
 			  ._M_iterator(*this, "this"));
-	return std::__addressof(*base());
+	return base().operator->();
   }
 
   // -- Input iterator requirements --
diff --git a/libstdc++-v3/include/debug/safe_local_iterator.h b/libstdc++-v3/include/debug/safe_local_iterator.h
index 4fcc05e..70e14697 100644
--- a/libstdc++-v3/include/debug/safe_local_iterator.h
+++ b/libstdc++-v3/include/debug/safe_local_iterator.h
@@ -236,7 +236,6 @@ namespace __gnu_debug
   /**
*  @brief Iterator dereference.
*  @pre iterator is dereferenceable
-   *  @todo Make this correct w.r.t. iterators that return proxies
*/
   pointer
   operator->() const
@@ -244,7 +243,7 @@ namespace __gnu_debug
 	_GLIBCXX_DEBUG_VERIFY(this->_M_dereferenceable(),
 			  _M_message(__msg_bad_deref)
 			  ._M_iterator(*this, "this"));
-	return std::__addressof(*base());
+	return base().operator->();
   }
 
   // -- Input iterator requirements --


PR 71181 Avoid rehash after reserve

2016-05-22 Thread François Dumont

Hi

To fix 71181 problem I propose to change how we deal with reserve 
called with pivot values that is to say prime numbers. Now _M_next_bkt 
always return a value higher than the input value. This way when 
reverse(97) is called we end up with 199 buckets and so enough space to 
store 97 values without rehashing.


I have integrated in this patch several other enhancements on the 
same subject. Improvement of _M_next_resize management when reaching 
highest bucket number. Remove sentinel value in __prime_list, just need 
to limit range when calling lower_bound.


PR libstdc++/71181
* include/tr1/hashtable_policy.h (_S_n_primes): Minus 1 to avoid check
on lower_bound call.
* src/c++11/hashtable_c++0x.cc (_Prime_rehash_policy::_M_next_bkt):
Always return a value greater than input value. Set _M_next_resize to
value when reaching highest prime number.
* src/shared/hashtable-aux.cc (__prime_list): Remove sentinel value.
* testsuite/23_containers/unordered_set/hash_policy/71181.cc: New.
* testsuite/23_containers/unordered_set/max_load_factor/robustness.cc:
Adapt.

Tested under Linux x86_64, ok to commit ?

François

diff --git a/libstdc++-v3/include/tr1/hashtable_policy.h b/libstdc++-v3/include/tr1/hashtable_policy.h
index 4ee6d45..67a1339 100644
--- a/libstdc++-v3/include/tr1/hashtable_policy.h
+++ b/libstdc++-v3/include/tr1/hashtable_policy.h
@@ -403,7 +403,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_need_rehash(std::size_t __n_bkt, std::size_t __n_elt,
 		   std::size_t __n_ins) const;
 
-enum { _S_n_primes = sizeof(unsigned long) != 8 ? 256 : 256 + 48 };
+enum { _S_n_primes = (sizeof(unsigned long) != 8 ? 256 : 256 + 48) - 1 };
 
 float_M_max_load_factor;
 float_M_growth_factor;
diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
index a5e6520..dc0dab5 100644
--- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc
+++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc
@@ -46,10 +46,10 @@ namespace __detail
   {
 // Optimize lookups involving the first elements of __prime_list.
 // (useful to speed-up, eg, constructors)
-static const unsigned char __fast_bkt[12]
-  = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11 };
+static const unsigned char __fast_bkt[13]
+  = { 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11, 13, 13 };
 
-if (__n <= 11)
+if (__n <= 12)
   {
 	_M_next_resize =
 	  __builtin_ceil(__fast_bkt[__n] * (long double)_M_max_load_factor);
@@ -58,10 +58,22 @@ namespace __detail
 
 constexpr auto __n_primes
   = sizeof(__prime_list) / sizeof(unsigned long) - 1;
+constexpr auto __prime_list_end = __prime_list + __n_primes;
 const unsigned long* __next_bkt =
-  std::lower_bound(__prime_list + 5, __prime_list + __n_primes, __n);
-_M_next_resize =
-  __builtin_ceil(*__next_bkt * (long double)_M_max_load_factor);
+  std::lower_bound(__prime_list + 6, __prime_list_end, __n);
+
+if (*__next_bkt == __n && __next_bkt != __prime_list_end)
+  ++__next_bkt;
+
+if (__next_bkt == __prime_list_end)
+  // Set next resize to the max value so that we never try to rehash again
+  // as we already reach the biggest possible bucket number.
+  // Note that it might result in max_load_factor not being respected.
+  _M_next_resize = std::size_t(-1);
+else
+  _M_next_resize =
+	__builtin_ceil(*__next_bkt * (long double)_M_max_load_factor);
+
 return *__next_bkt;
   }
 
diff --git a/libstdc++-v3/src/shared/hashtable-aux.cc b/libstdc++-v3/src/shared/hashtable-aux.cc
index 081bb12..7623b46 100644
--- a/libstdc++-v3/src/shared/hashtable-aux.cc
+++ b/libstdc++-v3/src/shared/hashtable-aux.cc
@@ -25,7 +25,7 @@
 namespace __detail
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
-  extern const unsigned long __prime_list[] = // 256 + 1 or 256 + 48 + 1
+  extern const unsigned long __prime_list[] = // 256 or 256 + 48
   {
 2ul, 3ul, 5ul, 7ul, 11ul, 13ul, 17ul, 19ul, 23ul, 29ul, 31ul,
 37ul, 41ul, 43ul, 47ul, 53ul, 59ul, 61ul, 67ul, 71ul, 73ul, 79ul,
@@ -66,13 +66,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 1259520799ul, 1362662261ul, 1474249943ul, 1594975441ul, 1725587117ul,
 1866894511ul, 2019773507ul, 2185171673ul, 2364114217ul, 2557710269ul,
 2767159799ul, 2993761039ul, 3238918481ul, 3504151727ul, 3791104843ul,
-4101556399ul, 4294967291ul,
-// Sentinel, so we don't have to test the result of lower_bound,
-// or, on 64-bit machines, rest of the table.
-#if __SIZEOF_LONG__ != 8
-4294967291ul
-#else
-6442450933ul, 8589934583ul, 12884901857ul, 17179869143ul,
+4101556399ul, 4294967291ul
+#if __SIZEOF_LONG__ >= 8
+,6442450933ul, 8589934583ul, 12884901857ul, 17179869143ul,
 25769803693ul, 34359738337ul, 51539607367ul, 68719476731ul,
 103079215087ul, 137438953447ul, 206158430123ul, 274877906899ul,
 412316860387ul, 549755813881ul, 824633720731ul, 1099511627689ul,
@@ -86,7 

Re: [patch, fortran] PR66461 ICE on missing end program in fixed source

2016-05-22 Thread Andre Vehreschild
Hi Jerry,

I have tested your patch and gave it a review and the only thing I like
to have is a testcase. Can you provide one from the PR? With a testcase
I say the patch is ok for trunk and thanks for the patch.

Please note, I don't have review rights in the area the patch
addresses, although I am familiar with the matcher having worked in it.
This "review" is just a helper for an official reviewer to "second" my
opinion, hoping to get your patch faster into trunk.

Regards and thanks for the patch,
Andre

On Wed, 18 May 2016 11:25:57 -0700
Jerry DeLisle  wrote:

> Hi all,
> 
> The following patch regression tested on x86-64.  The ICE is from an attempt 
> to
> free a bad expression after a MATCH_ERROR is returned. I have not been able to
> identify an exact cause, there being numerous matchers involved attempting to
> match the logical expression.
> 
> Regardless, it is an error on invalid so I suggest we commit this patch and
> close the PR.  I dont think its a regression as marked in bugzilla.  I see the
> the internal error as far back as 4.5.  If someone has an earlier build and 
> can
> see where this does not occur, please let me know. (In case I missed 
> something.
> 
> The results of the patch gives the following:
> 
> $ gfc s.f
> s.f:4:9:
> 
>   if ( x(1) < 0 .or.
>  1
> Error: Can not process after the IF statement shown at (1)
> f951: Error: Unexpected end of file in ‘s.f’
> 
> 
> OK for trunk?
> 
> Regards,
> 
> Jerry
> 
> 2016-05-18  Jerry DeLisle  
> 
>   PR fortran/66461
>   * match.c (gfc_match_if): Catch unxpected MATCH_ERROR and issue an error
>   message.
> 
> 
> diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
> index f3a4a43..85e6f92 100644
> --- a/gcc/fortran/match.c
> +++ b/gcc/fortran/match.c
> @@ -1560,7 +1560,16 @@ gfc_match_if (gfc_statement *if_type)
>if (m == MATCH_ERROR)
>  return MATCH_ERROR;
> 
> -  gfc_match (" if ( %e ) ", );/* Guaranteed to match.  */
> +  m = gfc_match (" if ( %e ) ", );  /* Not always guaranteed to match.  
> */
> +
> +  if (m == MATCH_ERROR)
> +{
> +  /* Under some invalid conditions like unexpected end of file, one
> +can get an error in the match. We bail out here and hope for
> +the best (the best being an error reported somewhere else).  */
> +  gfc_error ("Can not process after the IF statement shown at %C");
> +  return MATCH_ERROR;
> +}
> 
>m = gfc_match_pointer_assignment ();
>if (m == MATCH_YES)
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH] Make basic asm implicitly clobber memory

2016-05-22 Thread Andrew Haley
On 05/20/2016 07:50 AM, David Wohlferd wrote:

> At a minimum, suddenly forcing an unexpected/unneeded memory clobber
> can adversely impact the optimization of surrounding code.  This can
> be particularly annoying if the reason for the asm was to improve
> performance.  And adding a memory clobber does add a dependency of
> sorts, which might cause the location of the asm to shift in an
> unfortunate way.  And there's always the long-shot possibility that
> some weird quirk or (very) badly-written code will cause the asm to
> flat out fail when used with a memory clobber.  And if this change
> does produce any of these problems, I feel pity for whoever has to
> track it down.

OTOH, if a memory clobber does change code gen it probably changes it
in a way which better fits user expectations, and perhaps it fixes a
bug.  That's a win, and it is far, far more important than any other
consideration.

Given that a basic asm statements has neither inputs nor outputs, it
must have side effects to be useful.  All this patch does is recognize
that fact.  I'm not saying your scenario won't occur, but it won't in
the majority of cases.

> I realize deprecation/removal is drastic.  Especially since basic
> asm (mostly) works as is.  But fixing memory clobbers while leaving
> the rest broken feels like half a solution, meaning that some day
> we're going to have to fiddle with this again.

Yes, we will undoubtedly have to fiddle with basic asm again.  We
should plan for deprecation.

But I think you're close to the all-or-nothing fallacy: that because
this patch doesn't solve all the problems with basic asm, it isn't
worth having.

Andrew.


Re: [PATCH] Fix up vec_set_* for -mavx512vl -mno-avx512dq

2016-05-22 Thread Kirill Yukhin
Hello,
On 18 May 23:03, Jakub Jelinek wrote:
> Hi!
> 
> The vinsert[if]64x2 instructions are AVX512VL & AVX512DQ, so
> if only AVX512VL is on, we should emit the other insns - 32x4,
> which without masking do the same thing.
> With masking, we have to require TARGET_AVX512DQ.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.
> 
> 2016-05-18  Jakub Jelinek  
> 
>   * config/i386/sse.md (vec_set_lo_,
>   vec_set_hi_): Add && 
>   condition.  For !TARGET_AVX512DQ, emit 32x4 instruction instead
>   of 64x2.
> 
>   * gcc.target/i386/avx512dq-vinsert-1.c: New test.
>   * gcc.target/i386/avx512vl-vinsert-1.c: New test.

--
Thanks, K


Re: [PATCH] Improve XMM16+ handling in vec_set*

2016-05-22 Thread Kirill Yukhin
Hello,
On 18 May 23:01, Jakub Jelinek wrote:
> Hi!
> 
> vinserti32x4 is in AVX512VL.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.

> 
> 2016-05-18  Jakub Jelinek  
> 
>   * config/i386/sse.md (vec_set_lo_v16hi, vec_set_hi_v16hi,
>   vec_set_lo_v32qi, vec_set_hi_v32qi): Add alternative with
>   v constraint instead of x and vinserti32x4 insn.
> 
>   * gcc.target/i386/avx512vl-vinserti32x4-3.c: New test.

--
Thanks, K


Re: [PATCH] Improve 128-bit to 256-bit broadcasts

2016-05-22 Thread Kirill Yukhin
Hi Jakub,
On 18 May 23:00, Jakub Jelinek wrote:
> Hi!
> 
> vbroadcast[fi]32x4 and vinsert[fi]32x4 are in AVX512VL,
> vbroadcast[fi]64x2 and vinsert[fi]64x2 are in AVX512VL & AVX512DQ.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.
> 
> 2016-05-18  Jakub Jelinek  
> 
>   * config/i386/sse.md (i128vldq): New mode iterator.
>   (avx2_vbroadcasti128_, avx_vbroadcastf128_): Add
>   avx512dq and avx512vl altenratives.
> 
>   * gcc.target/i386/avx512dq-vbroadcast-2.c: New test.
>   * gcc.target/i386/avx512vl-vbroadcast-2.c: New test.

--
Thanks, K


Re: [PATCH] Improve XMM16-31 handling in various *vec_dup* patterns

2016-05-22 Thread Kirill Yukhin
Hello Jakub,
On 18 May 22:58, Jakub Jelinek wrote:
> Hi!
> 
> These instructions are available in AVX512VL, so we can use
> XMM16+ in there.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.
> 
> 2016-05-18  Jakub Jelinek  
> 
>   * config/i386/sse.md (avx2_vec_dupv4df): Use v instead of x
>   constraint, use maybe_evex prefix instead of vex.
>   (vec_dupv4sf): Use v constraint instead of x for output
>   operand except for noavx alternative, use Yv constraint
>   instead of x for input.  Use maybe_evex prefix instead of vex.
>   (*vec_dupv4si): Likewise.
>   (*vec_dupv2di): Likewise.
> 
>   * gcc.target/i386/avx512vl-vbroadcast-1.c: New test.

--
Thanks, K


[PATCH] Fix ICE with x87 asm operands (PR inline-asm/68843)

2016-05-22 Thread Bernd Edlinger
Hi!

as described in the PR there are several non-intuitive rules that
one has to follow to avoid ICEs with x87 asm operands.

This patch adds an explicit rule, that avoids ICE in the first test case and
removes an unnecessary error message in the second test case.


Boot-strapped and regression-tested on x86_64-pc-linux-gnu.
OK for trunk?


Thanks
Bernd.gcc:
2016-05-22  Bernd Edlinger  

PR inline-asm/68843
* reg-stack.c (check_asm_stack_operands): Explicit input arguments
must be grouped on top of stack.  Don't force early clobber
on ordinary reg outputs.

testsuite:
2016-05-22  Bernd Edlinger  

PR inline-asm/68843
* gcc.target/i386/pr68843-1.c: New test.
* gcc.target/i386/pr68843-2.c: New test.
Index: gcc/reg-stack.c
===
--- gcc/reg-stack.c	(revision 231598)
+++ gcc/reg-stack.c	(working copy)
@@ -97,6 +97,9 @@
 	All implicitly popped input regs must be closer to the top of
 	the reg-stack than any input that is not implicitly popped.
 
+	All explicitly referenced input operands may not "skip" a reg.
+	Otherwise we can have holes in the stack.
+
3. It is possible that if an input dies in an insn, reload might
   use the input reg for an output reload.  Consider this example:
 
@@ -461,6 +464,7 @@ check_asm_stack_operands (rtx_insn *insn)
 
   char reg_used_as_output[FIRST_PSEUDO_REGISTER];
   char implicitly_dies[FIRST_PSEUDO_REGISTER];
+  char explicitly_used[FIRST_PSEUDO_REGISTER];
 
   rtx *clobber_reg = 0;
   int n_inputs, n_outputs;
@@ -568,6 +572,7 @@ check_asm_stack_operands (rtx_insn *insn)
  popped.  */
 
   memset (implicitly_dies, 0, sizeof (implicitly_dies));
+  memset (explicitly_used, 0, sizeof (explicitly_used));
   for (i = n_outputs; i < n_outputs + n_inputs; i++)
 if (STACK_REG_P (recog_data.operand[i]))
   {
@@ -581,6 +586,8 @@ check_asm_stack_operands (rtx_insn *insn)
 
 	if (j < n_clobbers || op_alt[i].matches >= 0)
 	  implicitly_dies[REGNO (recog_data.operand[i])] = 1;
+	else if (reg_class_size[(int) op_alt[i].cl] == 1)
+	  explicitly_used[REGNO (recog_data.operand[i])] = 1;
   }
 
   /* Search for first non-popped reg.  */
@@ -600,6 +607,23 @@ check_asm_stack_operands (rtx_insn *insn)
   malformed_asm = 1;
 }
 
+  /* Search for first not-explicitly used reg.  */
+  for (i = FIRST_STACK_REG; i < LAST_STACK_REG + 1; i++)
+if (! implicitly_dies[i] && ! explicitly_used[i])
+  break;
+
+  /* If there are any other explicitly used regs, that's an error.  */
+  for (; i < LAST_STACK_REG + 1; i++)
+if (explicitly_used[i])
+  break;
+
+  if (i != LAST_STACK_REG + 1)
+{
+  error_for_asm (insn,
+		 "explicitly used regs must be grouped at top of stack");
+  malformed_asm = 1;
+}
+
   /* Enforce rule #3: If any input operand uses the "f" constraint, all
  output constraints must use the "&" earlyclobber.
 
@@ -607,7 +631,7 @@ check_asm_stack_operands (rtx_insn *insn)
  record any earlyclobber.  */
 
   for (i = n_outputs; i < n_outputs + n_inputs; i++)
-if (op_alt[i].matches == -1)
+if (op_alt[i].matches == -1 && STACK_REG_P (recog_data.operand[i]))
   {
 	int j;
 
Index: gcc/testsuite/gcc.target/i386/pr68843-1.c
===
--- gcc/testsuite/gcc.target/i386/pr68843-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/pr68843-1.c	(working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+double
+test ()
+{
+  double x = 1.0;
+  asm ("fld %1" /* { dg-error "explicitly used regs must be grouped at top of stack" } */
+   : "=" (x)
+   : "u" (x));
+  return x;
+}
Index: gcc/testsuite/gcc.target/i386/pr68843-2.c
===
--- gcc/testsuite/gcc.target/i386/pr68843-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/i386/pr68843-2.c	(working copy)
@@ -0,0 +1,24 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int
+test (double y)
+{
+  int a, b;
+  asm ("fistpl (%1)\n\t"
+   "movl (%1), %0"
+   : "=r" (a)
+   : "r" (), "t" (y)
+   : "st", "memory");
+  return a;
+}
+
+int 
+main ()
+{
+  int t;
+  for (t = -10; t < 10; t++)
+if (test ((double)t) != t)
+  __builtin_abort ();  
+  return 0;
+}