Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-11 Thread Bernd Schmidt

On 11/10/2015 10:35 PM, Abe wrote:

I wrote:

What I'm saying is I don't see a reason for a "definitely always
unsafe" state.
Why would any access not be made safe if a scratchpad is used?


Because the RTL if-converter doesn`t "know" how to convert
{everything that can be made safe using a scratchpad and is unsafe
otherwise}. My patch is only designed to enable the conversion of
half-hammocks with a single may-trap store.


Yeah, but how is that relevant to the question of whether a MEM is safe? 
The logic should be

 if mem is safe and we are allowed to speculate -> just do it
 otherwise mem is unsafe, so
   if we have prereqs like conditional moves -> use scratchpads
   otherwise fail

I don't see how a three-state property for a single MEM is necessary or 
helpful, and the implementation in the patch just roughly distinguishes 
between two different types of trap (invalid address vs. readonly 
memory). That seems irrelevant both to the question of whether something 
is safe or not, and to the question of whether we know how to perform 
the conversion.


You might argue that something that is known readonly will always fail 
if written to at runtime, but that's no different from any other kind of 
invalid address, and using a scratchpad prevents the write unless it 
would have happened without if-conversion.



In summary, the 3 possible analysis outcomes are something like this:

   * safe even without a scratchpad

   * only safe witha scratchpad, and we _do_ know how to convert it
safely

   * currently unsafe because we don`t yet   know how to convert it
safely


This could be seen as a property of the block being converted, and is 
kind of implicit in the algorithm sketched above, but I don't see how it 
would be a property of the MEM that represents the store.



Do you have performance numbers anywhere?


I think my analysis work so far on this project is not yet complete
enough for public review, mainly because it does not include the
benefit of  profiling.


I think performance numbers are a fairly important part of a submission 
like this where the transformation isn't an obvious improvement (as 
opposed to removing an instruction or suchlike).



Bernd


Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-07 Thread Bernhard Reutner-Fischer
On November 6, 2015 5:27:44 PM GMT+01:00, Bernd Schmidt 
 wrote:
>On 11/06/2015 04:52 PM, Sebastian Pop wrote:
>
>>> opinion). If you want a half-finished redzone allocator, I can send
>you a
>>> patch.
>>
>> Yes please.  Let's get it work.
>
>Here you go. This is incomplete and does not compile, but it shows the 
>direction I have in mind and isn't too far off. I had a similar patch 

--- a/gcc/function.c
+++ b/gcc/function.c
@@ -217,10 +217,10 @@ free_after_compilation (struct function *f)
 HOST_WIDE_INT
 get_frame_size (void)
 {
-  if (FRAME_GROWS_DOWNWARD)
-return -frame_offset;
+  if (-crtl->frame.grows_downward)
+return -crtl->frame.frame_offset;
   else
-return frame_offset;
+return crtl->frame.frame_offset;
 }
 
frame.grows_downward is a bool it seems and as such I wonder what the minus in 
the condition means or is supposed to achieve?
Something we (should?) warn about?

Just curious..
Cheers,




Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Jeff Law

On 11/06/2015 07:10 AM, Sebastian Pop wrote:

On Fri, Nov 6, 2015 at 2:56 AM, Bernd Schmidt  wrote:

Formatting problem, here and in a few other places. I didn't fully read the
patch this time around.

I'm probably not reviewing further patches because I don't see this
progressing to a state where it's acceptable. Others may do so, but as far
as I'm concerned the patch is rejected.


Bernd,
I would like to ask you to focus on the technical part, and provide a
review only based on technical reasons.
Please ignore all formatting changes: I will help address all those changes.
I will send a patch addressing all the comments you had in the current review.
A note that sometimes bad formatting make it fairly painful to review 
patches.  IMHO I strongly prefer to the submitter, even for an RFC, to 
make an honest attempt to have the code properly formatted.


Jeff



Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 04:52 PM, Sebastian Pop wrote:


opinion). If you want a half-finished redzone allocator, I can send you a
patch.


Yes please.  Let's get it work.


Here you go. This is incomplete and does not compile, but it shows the 
direction I have in mind and isn't too far off. I had a similar patch 
once for a machine that had two stack pointers (don't ask), and I 
started to recreate that for the given problem last week.


The temp slot handling code in function.c needs more frame arguments, 
but I got halfway through them and started wondering whether they should 
be member functions of struct frame_info instead.


The bits in cfgexpand and function, once complete, are essentially all 
that's necessary to support a second frame, but for this to work as a 
redzone allocator it needs to be integrated with target (i.e. i386) 
frame layout code. For purposes of optimizing we may also want to 
establish a maximum frame size for the rz_frame.


Bonus points if reload/lra use this for spilled pseudos that don't live 
across calls, but I can have a go at that if you don't feel like 
tackling it.



Bernd

diff --git a/gcc/caller-save.c b/gcc/caller-save.c
index 084d079..c3a5256 100644
--- a/gcc/caller-save.c
+++ b/gcc/caller-save.c
@@ -654,7 +654,7 @@ setup_save_areas (void)
 		{
 		  saved_reg->slot
 		= assign_stack_local_1
-		  (regno_save_mode[regno][1],
+		  (&crtl->frame, regno_save_mode[regno][1],
 		   GET_MODE_SIZE (regno_save_mode[regno][1]), 0,
 		   ASLK_REDUCE_ALIGN);
 		  if (dump_file != NULL)
@@ -712,7 +712,8 @@ setup_save_areas (void)
 	   when we restore and save the hard register in
 	   insert_restore and insert_save.  */
 	regno_save_mem[i][j]
-	  = assign_stack_local_1 (regno_save_mode[i][j],
+	  = assign_stack_local_1 (&crtl->frame,
+  regno_save_mode[i][j],
   GET_MODE_SIZE (regno_save_mode[i][j]),
   0, ASLK_REDUCE_ALIGN);
 
diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index bfbc958..8825217 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -335,11 +335,6 @@ static bitmap_obstack stack_var_bitmap_obstack;
is non-decreasing.  */
 static size_t *stack_vars_sorted;
 
-/* The phase of the stack frame.  This is the known misalignment of
-   virtual_stack_vars_rtx from PREFERRED_STACK_BOUNDARY.  That is,
-   (frame_offset+frame_phase) % PREFERRED_STACK_BOUNDARY == 0.  */
-static int frame_phase;
-
 /* Used during expand_used_vars to remember if we saw any decls for
which we'd like to enable stack smashing protection.  */
 static bool has_protected_decls;
@@ -375,32 +370,34 @@ align_base (HOST_WIDE_INT base, unsigned HOST_WIDE_INT align, bool align_up)
   return align_up ? (base + align - 1) & -align : base & -align;
 }
 
-/* Allocate SIZE bytes at byte alignment ALIGN from the stack frame.
+/* Allocate SIZE bytes at byte alignment ALIGN from the stack frame FRAME.
Return the frame offset.  */
 
 static HOST_WIDE_INT
-alloc_stack_frame_space (HOST_WIDE_INT size, unsigned HOST_WIDE_INT align)
+alloc_stack_frame_space (frame_info *frame, HOST_WIDE_INT size,
+			 unsigned HOST_WIDE_INT align)
 {
   HOST_WIDE_INT offset, new_frame_offset;
 
-  if (FRAME_GROWS_DOWNWARD)
+  if (frame->grows_downward)
 {
-  new_frame_offset
-	= align_base (frame_offset - frame_phase - size,
-		  align, false) + frame_phase;
+  new_frame_offset = align_base (frame->frame_offset - frame->phase - size,
+ align, false);
+  new_frame_offset += frame->phase;
   offset = new_frame_offset;
 }
   else
 {
-  new_frame_offset
-	= align_base (frame_offset - frame_phase, align, true) + frame_phase;
+  new_frame_offset = align_base (frame->frame_offset - frame->phase,
+ align, true);
+  new_frame_offset += frame->phase;
   offset = new_frame_offset;
   new_frame_offset += size;
 }
-  frame_offset = new_frame_offset;
+  frame->frame_offset = new_frame_offset;
 
-  if (frame_offset_overflow (frame_offset, cfun->decl))
-frame_offset = offset = 0;
+  if (frame_offset_overflow (frame->frame_offset, cfun->decl))
+frame->frame_offset = offset = 0;
 
   return offset;
 }
@@ -965,11 +962,11 @@ dump_stack_var_partition (void)
 }
 }
 
-/* Assign rtl to DECL at BASE + OFFSET.  */
+/* Assign rtl to DECL at BASE + OFFSET in frame FRAME.  */
 
 static void
-expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
-			 HOST_WIDE_INT offset)
+expand_one_stack_var_at (frame_info *frame, tree decl, rtx base,
+			 unsigned base_align, HOST_WIDE_INT offset)
 {
   unsigned align;
   rtx x;
@@ -988,7 +985,7 @@ expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
  If it is we generate stack slots only accidentally so it isn't as
 	 important, we'll simply use the alignment that is already set.  */
   if (base == virtual_stack_vars_rtx)
-	offset -= frame_phase;
+	offset -= frame->phase;
   align = offset & -offset;
   align *= BITS_PER_UNIT;

Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Sebastian Pop
On Fri, Nov 6, 2015 at 6:32 AM, Bernd Schmidt  wrote:
> On 11/06/2015 03:10 PM, Sebastian Pop wrote:
>>
>> On Fri, Nov 6, 2015 at 2:56 AM, Bernd Schmidt  wrote:
>>>
>>> Formatting problem, here and in a few other places. I didn't fully read
>>> the
>>> patch this time around.
>>>
>>> I'm probably not reviewing further patches because I don't see this
>>> progressing to a state where it's acceptable. Others may do so, but as
>>> far
>>> as I'm concerned the patch is rejected.
>>
>>
>> Bernd,
>> I would like to ask you to focus on the technical part, and provide a
>> review only based on technical reasons.
>> Please ignore all formatting changes: I will help address all those
>> changes.
>> I will send a patch addressing all the comments you had in the current
>> review.
>
>
> As long as this just has allocation from the normal stack frame as its only
> strategy, I consider it unacceptable (and I think Richard B voiced the same

Understood.

> opinion). If you want a half-finished redzone allocator, I can send you a
> patch.

Yes please.  Let's get it work.

Thanks,
Sebastian


Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 03:10 PM, Sebastian Pop wrote:

On Fri, Nov 6, 2015 at 2:56 AM, Bernd Schmidt  wrote:

Formatting problem, here and in a few other places. I didn't fully read the
patch this time around.

I'm probably not reviewing further patches because I don't see this
progressing to a state where it's acceptable. Others may do so, but as far
as I'm concerned the patch is rejected.


Bernd,
I would like to ask you to focus on the technical part, and provide a
review only based on technical reasons.
Please ignore all formatting changes: I will help address all those changes.
I will send a patch addressing all the comments you had in the current review.


As long as this just has allocation from the normal stack frame as its 
only strategy, I consider it unacceptable (and I think Richard B voiced 
the same opinion). If you want a half-finished redzone allocator, I can 
send you a patch.



Bernd



Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Sebastian Pop
On Fri, Nov 6, 2015 at 2:56 AM, Bernd Schmidt  wrote:
> Formatting problem, here and in a few other places. I didn't fully read the
> patch this time around.
>
> I'm probably not reviewing further patches because I don't see this
> progressing to a state where it's acceptable. Others may do so, but as far
> as I'm concerned the patch is rejected.

Bernd,
I would like to ask you to focus on the technical part, and provide a
review only based on technical reasons.
Please ignore all formatting changes: I will help address all those changes.
I will send a patch addressing all the comments you had in the current review.

Thanks,
Sebastian


Re: improved RTL-level if conversion using scratchpads [half-hammock edition]

2015-11-06 Thread Bernd Schmidt

On 11/06/2015 12:43 AM, Abe wrote:

Feedback from Bernd has also been applied.


But inconsistently, and I think not quite in the way I meant it in one case.


-/* Return true if a write into MEM may trap or fault.  */
-
  static bool
  noce_mem_write_may_trap_or_fault_p (const_rtx mem)
  {
-  rtx addr;
-
if (MEM_READONLY_P (mem))
  return true;

-  if (may_trap_or_fault_p (mem))
-return true;
-
+  rtx addr;
addr = XEXP (mem, 0);

/* Call target hook to avoid the effects of -fpic etc  */
@@ -2881,6 +2883,18 @@ noce_mem_write_may_trap_or_fault_p (const_rtx mem)
return false;
  }

+/* Return true if a write into MEM may trap or fault
+   without scratchpad support.  */
+
+static bool
+unsafe_address_p (const_rtx mem)
+{
+  if (may_trap_or_fault_p (mem))
+return true;
+
+  return noce_mem_write_may_trap_or_fault_p (mem);
+}


The naming seems backwards from what I suggested in terms of naming. You 
still haven't explained why you want to modify this function, or call 
the limited one even when generating scratchpads. I asked about this 
last time.



  static basic_block
-find_if_header (basic_block test_bb, int pass)
+find_if_header (basic_block test_bb, int pass, bool just_sz_spad)
  {


Arguments need documentation.


+DEFPARAM (PARAM_FORCE_ENABLE_RTL_IFCVT_SPADS,
+ "force-enable-rtl-ifcvt-spads",
+ "Force-enable the use of scratchpads in RTL if conversion, "
+ "overriding the target and the profile data or lack thereof.",
+ 0, 0, 1)
+

+DEFHOOKPOD
+(rtl_ifcvt_scratchpad_control,
+"*",
+enum rtl_ifcvt_spads_ctl_enum, rtl_ifcvt_spads_as_per_profile)
+
+DEFHOOK
+(rtl_ifcvt_get_spad,
+ "*",
+ rtx, (unsigned short size),
+ default_rtl_ifcvt_get_spad)


That moves the problematic bit in a target hook rather than fixing it. 
Two target hooks and a param is probably a bit much for a change like 
this. Which target do you actually want this for? It would help to 
understand why you're doing all this.



+enum rtl_ifcvt_spads_ctl_enum {


Names are still too verbose ("enum" shouldn't be part of it).


+rtx default_rtl_ifcvt_get_spad (unsigned short size)
+{
+  return assign_stack_local (BLKmode, size, 0);
+}


Formatting problem, here and in a few other places. I didn't fully read 
the patch this time around.


I'm probably not reviewing further patches because I don't see this 
progressing to a state where it's acceptable. Others may do so, but as 
far as I'm concerned the patch is rejected.



Bernd