[PATCH] D70157: Align branches within 32-Byte boundary

Kan Shengchen via Phabricator via cfe-commits Thu, 05 Dec 2019 07:43:33 -0800

skan added a comment.

In D70157#1769932 <https://reviews.llvm.org/D70157#1769932>, @MaskRay wrote:


> I find another deficiency (infinite loop) with the current approach.
>
> Say, there is a `je 0` (0x0F 0x84 0x00 0x00 0x00 0x00) at byte 0x90. 
> (0x90+6)%32 == 0, so it ends on a 32-byte boundary.
>  MF.getMaxPrefixSize() is 4, so the size of MCMachineDependentFragment may 
> vary from 0 to 4.
>  If there are other MCMachineDependentFragment's in the section, some may 
> shrink while some may expand.
>  In some cases the following loop will not converge
>
>   bool MCAssembler::layoutOnce(MCAsmLayout &Layout) {
>     ++stats::RelaxationSteps;
>  
>     bool WasRelaxed = false;
>     for (iterator it = begin(), ie = end(); it != ie; ++it) {
>       MCSection &Sec = *it;
>       while (layoutSectionOnce(Layout, Sec)) ///
>         WasRelaxed = true;
>     }
>  
>     return WasRelaxed;
>   }
>   
>   // In MCAssembler::layoutSectionOnce,
>     case MCFragment::FT_MachineDependent:
>       RelaxedFrag =
>           relaxMachineDependent(Layout, *cast<MCMachineDependentFragment>(I));
>       break;
>   
>
> To give a concrete example, `clang++ -fsanitize=memory 
> compiler-rt/test/msan/cxa_atexit.cpp -mbranches-within-32B-boundaries` does 
> not converge. You may also try dtor-*.cpp in that directory.
>
> A simple iterative algorithm is not guaranteed to converge. We probably can 
> solve the layout problem with a dynamic programming algorithm:
>
> f[i][j] = the minimum cost that layouts the first i instructions with j extra 
> bytes (via NOPs or prefixes)
> or
>  g[i][j] = the minimum inserted bytes that layouts the first i instructions 
> with cost j
>
> I am not clear which one is better. A simple greedy approach is to set an 
> upper limit on the number of iterations when the section contains at least 
> one MCMachineDependentFragment, i.e.
>
>   bool HasMCMachineDependentFragment = false;
>   int count = 5; // arbitrary. Please find an appropriate value.
>   while (layoutSectionOnce(Layout, Sec, HasMCMachineDependentFragment) && 
> count > 0) {
>     WasRelaxed = true;
>     if (HasMCMachineDependentFragment)
>       count--;
>   }


I guess you originally wanted to say (90 + 6)%32 == 0 instead of (0x90 +6)%32 
== 0. With the last patch, the command

`clang++ -fsanitize=memory compiler-rt/test/msan/cxa_atexit.cpp 
-mbranches-within-32B-boundaries` does not converge, but this is not the fault 
of this simple iterative algorithm.

Let me briefly introduce this algorithm. Regardless of whether the prefix or 
nop is filled in, `MCMachineDependentFragment` is used to occupy a certain size 
of space to align the branch, and it has a pointer to the branch it is 
responsible for. Multiple MCMachineDependentFragments can work together to 
align a branch. For example, there are two MCMachineDependentFragment, M1 
<https://reviews.llvm.org/M1> and M2 <https://reviews.llvm.org/M2> to align 
branch J1 <https://reviews.llvm.org/J1>. In the first iteration, J1 
<https://reviews.llvm.org/J1> needs 7 bytes of padding, so the size of M1 
<https://reviews.llvm.org/M1> will grow to 7 as much as possible. However, M1 
<https://reviews.llvm.org/M1> can only grow to 5 or smaller from some reason. 
At this time, M2 <https://reviews.llvm.org/M2> will grow as much as possible to 
2. In the second iteration, when it's M1 <https://reviews.llvm.org/M1>'s turn 
to relax, we will subtract the size of M1 <https://reviews.llvm.org/M1> and M2 
<https://reviews.llvm.org/M2> from the current address of J1 
<https://reviews.llvm.org/J1> to get the size that J1 
<https://reviews.llvm.org/J1> needs to padding, assuming it is 6, and then M1 
<https://reviews.llvm.org/M1> will keep the size to 5 and the size of M2 
<https://reviews.llvm.org/M2> will be reduced to 1. It is not difficult to see 
that among the MCFragment from M1 <https://reviews.llvm.org/M1> to J1 
<https://reviews.llvm.org/J1>, as long as the size of MCFragment other than M1 
<https://reviews.llvm.org/M1> and M2 <https://reviews.llvm.org/M2> is fixed 
within a limited number of iterations, the sizes of M1 
<https://reviews.llvm.org/M1> and M2 <https://reviews.llvm.org/M2> will be 
fixed within a limited number of iterations, that is, the iteration will 
converge.

As far as I know, the `MCFragment` used to store instructions includes 
`MCDataFragment`, `MCRelaxableFragment` and `MCCompactEncodedFragment`.  
`MCDataFragment` and `MCCompactEncodedFragment` have fixed sizes. 
`MCRelaxableFragment` is used to store instructions of variable size and will 
only grow and not shrink, as a result, its size will be fixed within a limited 
number of cycles . So as long as we ensure that there is only MCFragment that 
stores instructions between the  instruction to be prefixed and the branch to 
be aligned, this iterative algorithm will converge.

The file cxa_atexit.s  has more than one text section. The instruction to be 
prefixed and the branch to align may be in two different text sections. I 
forgot to check if there is a `MCFragment` not storing instruction between the 
them, which results in non-convergence. This bug has been fixed in the latest 
patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D70157: Align branches within 32-Byte boundary

Reply via email to