On 19/01/15 18:14, Maxim Kuvyrkov wrote:
On Jan 19, 2015, at 6:05 PM, Richard Earnshaw <rearn...@arm.com> wrote:

On 16/01/15 15:06, Maxim Kuvyrkov wrote:
@@ -1874,7 +1889,8 @@ const struct tune_params arm_cortex_a15_tune =
   true, true,                                   /* Prefer 32-bit encodings.  */
   true,                                                /* Prefer Neon for 
stringops.  */
   8,                                           /* Maximum insns to inline 
memset.  */
-  ARM_FUSE_NOTHING                             /* Fuseable pairs of 
instructions.  */
+  ARM_FUSE_NOTHING,                            /* Fuseable pairs of 
instructions.  */
+  max_insn_queue_index + 1                     /* Sched L2 autopref depth.  */
};


Hmm, two issues here:
1) This requires a static constructor for the tuning table entry (since
the value of max_insn_queue_index has to be looked up at run time.

Are you sure?  I didn't check the object files, but, since max_insn_queue_index is a 
"const int", I would expect a relocation that would be resolved at link time, 
not a constructor.

I did miss this during the original review, sorry about that.

In theory yes, but in practice apparently not. Probably something LTO might be able to handle, but in the absence of defaults to LTO I don't see how we can guarantee something like this.

I do see a static constructor being put in for this on a cross compiler build.


0000000000d79338 <_Z41__static_initialization_and_destruction_0ii>:

d79338:       55                      push   %rbp
  d79339:       48 89 e5                mov    %rsp,%rbp
  d7933c:       89 7d fc                mov    %edi,-0x4(%rbp)
  d7933f:       89 75 f8                mov    %esi,-0x8(%rbp)
  d79342:       83 7d fc 01             cmpl   $0x1,-0x4(%rbp)
d79346: 75 27 jne d7936f <_Z41__static_initialization_and_destruction_0ii+0x37>
  d79348:       81 7d f8 ff ff 00 00    cmpl   $0xffff,-0x8(%rbp)
d7934f: 75 1e jne d7936f <_Z41__static_initialization_and_destruction_0ii+0x37> d79351: 8b 05 f1 43 48 00 mov 0x4843f1(%rip),%eax # 11fd748 <max_insn_queue_index>
  d79357:       83 c0 01                add    $0x1,%eax
d7935a: 89 05 b4 27 a9 00 mov %eax,0xa927b4(%rip) # 180bb14 <_ZL19arm_cortex_a15_tune+0x54> d79360: 8b 05 e2 43 48 00 mov 0x4843e2(%rip),%eax # 11fd748 <max_insn_queue_index>
  d79366:       83 c0 01                add    $0x1,%eax
d79369: 89 05 05 28 a9 00 mov %eax,0xa92805(%rip) # 180bb74 <_ZL19arm_cortex_a57_tune+0x54>
  d7936f:       5d                      pop    %rbp
  d79370:       c3                      retq


Is it a problem to have a static constructor for the tables?


2) Doesn't this mean that the depth of searching will depend on
properties of the automata rather than some machine specific values (so
that potentially adding or removing unrelated scheduler rules could
change the behaviour of the compiler)?

No.  The extra queue entries that will appear from extending an unrelated 
automaton will be empty, so the search will check them, but won't find anything.


In general, how should someone tuning the compiler for this parameter
select a value that isn't one of (-1, m_i_q_d+1)?

 From my experiments it seems there are 4 reasonable values for the parameter: 
(-1) autopref turned off, (0) turned on in rank_for_schedule, (m_i_q_d+1) 
turned on everywhere.  If there is a static constructor generated for tune 
tables and it is a problem to have it -- I can shrink acceptable values to 
these 3 and call it a day.

There doesn't seem to be anything documented in the coding conventions for this specifically. In the absence of any such documentation, I'd rather not grow static constructors for something like this.

Can we just shrink the acceptable values to 3 and just call it a day here ?


regards
Ramana


--
Maxim Kuvyrkov
www.linaro.org




Reply via email to