[PATCH] D75017: [Driver][X86] Add helptext for malign-branch*, mbranches-within-32B-boundaries

2020-02-23 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang accepted this revision.
annita.zhang added a comment.
This revision is now accepted and ready to land.

LGTM


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75017/new/

https://reviews.llvm.org/D75017



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D75017: [Driver][X86] Add helptext for malign-branch*, mbranches-within-32B-boundaries

2020-02-23 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added inline comments.



Comment at: clang/include/clang/Driver/Options.td:2175
+def mbranches_within_32B_boundaries : Flag<["-"], 
"mbranches-within-32B-boundaries">, Flags<[DriverOption]>, Group,
+  HelpText<"Align branches within 32-byte boundary">;
 def mfancy_math_387 : Flag<["-"], "mfancy-math-387">, 
Group;

HelpText<"Align selected branches (fused/jcc/jmp) within 32-byte boundary. The 
branch types and boundary size can be overwritten by other specific options">;


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75017/new/

https://reviews.llvm.org/D75017



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72463: [Driver][X86] Add -malign-branch* and -mbranches-within-32B-boundaries

2020-01-17 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

In D72463#1826821 , @MaskRay wrote:

> In D72463#1826499 , @annita.zhang 
> wrote:
>
> > @MaskRay Did you merge it to LLVM 10 branch?
>
>
> It is included in the release branch.
>
> `git branch origin/release/10.x --contains 
> 5ca24d09aefaedf8e4148c7fce4b4ab0c4ecc72a  # suceeded`


Thank you!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72463/new/

https://reviews.llvm.org/D72463



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72463: [Driver][X86] Add -malign-branch* and -malign-branch-within-32B-boundaries

2020-01-17 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

@MaskRay Did you merge it to LLVM 10 branch?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72463/new/

https://reviews.llvm.org/D72463



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72227: Add options for clang to align branches within 32B boundary

2020-01-06 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added inline comments.



Comment at: clang/lib/Driver/ToolChains/Clang.cpp:2052
+CmdArgs.push_back("-mllvm");
+CmdArgs.push_back(Args.MakeArgString("-x86-align-branch-prefix-size=4"));
+  }

CmdArgs.push_back(Args.MakeArgString("-x86-align-branch-prefix-size=5"));


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72227/new/

https://reviews.llvm.org/D72227



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72227: Add options for clang to align branches within 32B boundary

2020-01-06 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added inline comments.



Comment at: clang/include/clang/Driver/Options.td:2198
+   "should be a power of 2 and no less than 32. Branches will be "
+   "aligned within the boundary of specified size. "
+   "-x86-align-branch-boundary=0 doesn't align branches.">;

"aligned to prevent from being across or against the boundary of specified 
size."



Comment at: clang/include/clang/Driver/Options.td:2206
+  "Specify types of branches to align (plus separated list of "
+  "types). The branches's types is combination of jcc, fused, "
+  "jmp, call, ret, indirect."

"types). The branches' types are combination of jcc, fused, "



Comment at: clang/include/clang/Driver/Options.td:2208-2211
+  " jcc, which aligns conditional jumps; fused, which aligns fused "
+  "conditional jumps; jmp, which aligns unconditional jumps; call, "
+  "which aligns calls; ret, which aligns rets; indirect, which "
+  "aligns indirect jumps.">;

"jcc indicates conditional jumps, fused indicates fused conditional jumps,"
"jmp indicates unconditional jumps, call indicates direct and indirect calls,"
"ret indicates rets, indirect indicates indirect jumps."



Comment at: clang/include/clang/Driver/Options.td:2217
+  HelpText<"Specify the maximum number of prefixes on an instruction to "
+   "align branches. The number should be between 0 and 4.">;
+def mbranches_within_32B_boundaries

Is there an upbound for this parameter?



Comment at: clang/include/clang/Driver/Options.td:2227
+  "instruction. It is equivalent to -malign-branch-boundary=32, "
+  "-malign-branch=fused+jcc+jmp, -malign-branch-prefix-size=4.">;
+def mno_branches_within_32B_boundaries

-malign-branch-prefix-size=5.">;


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72227/new/

https://reviews.llvm.org/D72227



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D72227: Add options for clang to align branches within 32B boundary

2020-01-06 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang requested changes to this revision.
annita.zhang added a comment.
This revision now requires changes to proceed.

A general comment. All the tests have been done with 
-malign-branch-prefix-size=5. I don't see any explicit reason to change the 
default prefix size from 5 to 4.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72227/new/

https://reviews.llvm.org/D72227



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary(NOP padding)

2019-12-20 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

In D70157#1793280 , @reames wrote:

> I've gone ahead and landed the patch so that we can iterate in tree.  See 
> commit 14fc20ca62821b5f85582bf76a467d412248c248 
> .
>
> I've also landed a couple of follow up patches to address issues which would 
> have otherwise required iteration on the review.  See commits 
> c148e2e2ef86f53391be459752511684424f331b 
> , 
> 4024d49edc1598a6f8017df541147b38bf1e2818 
> , and 
> 8b725f0459eee468ed7f9935fba3278fcb4997b1 
> .
>
> I still see some room for further cleanup (i.e. the fragment range scheme and 
> tests), but what's in is of reasonable quality.
>
> There's a couple follow up patches which are probably called for, but I think 
> we can work on these in parallel now.
>
> 1. We need to settle on assembler syntax.
> 2. We need a patch for the x86 MI to MC translation to mark regions unsafe to 
> pad.  (Probably best to separate from the above for the moment.)
> 3. We can incrementally add support for prefix padding.


Thanks for landing it! And happy holiday to all!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary(NOP padding)

2019-12-19 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

> To be concrete, I propose:
>  ".autopad", ".noautopad": allow/disallow the assembler to emit padding via 
> inserting a nop or prefix before any instruction, as needed.
>  ".align_branch_boundary [N,] [instruction,]": Enable branch-boundary padding 
> (per previous description).

I had thought I sent the comments yesterday, but I didn't. :( I think my 
comments are aligned with the conclusion that Philip and Jame got. Wait for 
Jame's summary.

In general, I think it's a good idea to have generic directives to control the 
padding behavior in assembler. ".autopad", ".noautopad", 
".align_branch_boundary" looks good except it can't handle the nested 
scenarios. I can imagine the nested cases in assembly file which includes 
another assembly file. If we want to handle it, we need to have a pair of 
directives for each above. It will make the semantics complicated. We need 
elaborately design it.
I'd echo what Philip said. We want a more focused and basic implementation 
here. We're very open to have more discussion on generic directives. However, 
I'd prefer it's a separate topic from this one.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary

2019-12-17 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

In D70157#1787160 , @MaskRay wrote:

>




> There is a precedant: .pushsection/.popsection (MCStreamer::SectionStack). 
> With .push_align_branch/.pop_align_branch, we probably don't need the 
> 'switch-to-default' action.
> 
> I don't know how likely we may ever need nested states (e.g. an `.include` 
> directive inside an .align_branch region where the included file has own idea 
> about branch alignment), but .push/.pop does not seem to be more complex than 
> disable/enable/default.

I rethink about the directives and prefer the .push/.pop pair as @MaskRay 
suggested. To be specified, I'd suggest to use .push_align_branch_boundary and 
.pop_align_branch_boundary to align with MC command line options. They will 
cowork with the command line options and overwrite the options if both are 
existing.

To be clarified, I described the behavior of the directives from my 
understanding. Feel free to speak if you have difference opinion.

.push_align_branch_boundary [N,] [instruction,]*

  This directive specifies the beginning of a region which will overwrite the 
value set by the command line or by the previous directive. It can represent 
either an enabling or disabling directive controlled by parameter N. 
  N indicates to align the branches within N byte boundary. The default value 
is 32. If N is 0, it means the branch alignment is off within this region. 
  Instruction specifies types of branches to align. The value is one or 
multiple values from fused, jcc, jmp, call, ret and indirect. The default value 
is fused, jcc and jmp. (may change later)

.pop_align_branch_boundary

  This directive specifies the end of a region to align branch boundary. The 
status will be back to which was set by the previous directive or the one set 
by the command line if there's no previous directive existing.

I will coordinate with GNU binutils community once we discuss and have 
agreement with the directives.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary

2019-12-16 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

In D70157#1787139 , @reames wrote:

> Here are the minutes from our phone call a few minutes ago.


Thanks for coordinating the meeting and having a clear summary. It helps a lot 
to accelerate the patch review. I really appreciate it!

> Annita will refresh current patch with two key changes.  1) Drop prefix 
> support and simplify and 2) drop clang driver support for now.  Desire is to 
> minimize cycle time before next iteration so that feedback on approach can be 
> given while reviewers are still around.

Yes, we are working on it right now. Hopefully we can submit a new patch today 
or tomorrow.

> Philip will prototype directive parsing support.  Annita and Yuo (??) to 
> handle coordination on syntax.

I suppose it's Annita and Fangrui

> Side note to Annita: For you to remove "hard code", you'll have to have a 
> placeholder for the enable/disable interface.  That should probably be split 
> and rebased in my patch.

Let's do it in your directive patch.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary

2019-12-16 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

> More performance data was posted on 
> http://lists.llvm.org/pipermail/llvm-dev/2019-December/137609.html and 
> http://lists.llvm.org/pipermail/llvm-dev/2019-December/137610.html. Let's 
> move on based on the data.

Based on the SPEC data, we observed 2.6% and 1.3% performance effect in INTRATE 
and FPRATE geomean respectively. Performance effect on individual components 
were observed up to 5.1%.

The tool SW mitigation can recover the geomean to within 99% of the original 
performance with prefix padding to jcc+jmp+fused. The maximum performance loss 
was reduced to within 2.2% of the original one.

The prefix padding can provide better performance as 0.3%~0.5% in geomean than 
nop padding on system with micro update. In individual cases, we observed up to 
1.4% performance improvement in prefix padding. On a system w/o micro update, 
we observed 0.7% better performance of prefix padding on INTRATE geomean.

In this SPEC test, the prefix padding to jcc+jmp+fused and prefix padding to 
all branches has almost the same performance. However, we observed the latter 
prefix padding had a little bit better performance than the previous one in 
some cases at the cost of code size.

Since the performance delta in prefix padding and nop padding is incremental, 
starting from nop padding may be easier to implement as a first step, with 
additional prefix padding options to explore for additional performance 
optimizations.

And since the current update enables a relatively simple and general solution 
to mitigate JCC microcode update (MCU) performance effects to both C/C++ and 
Assembly without any source code change, we recommend it as a starting point. 
Then, the proposal for compiler to generate directives could enable further 
enhancements.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary

2019-12-16 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

In D70157#1768338 , @annita.zhang 
wrote:

> In D70157#1768319 , @chandlerc wrote:
>
> > I'm seeing lots of updates to fix bugs, but no movement for many days on 
> > both my meta comments and (in some ways more importantly) James's meta 
> > comments. (And thanks Philip for chiming in too!)
> >
> > Meanwhile, we really, really need to get this functionality in place. The 
> > entire story for minimizing the new microcode performance hit hinges on 
> > these patches, and I'm really worried by how little progress we're seeing 
> > here.
>
>
> Sorry for belated response. We're working hard to go through some paper work 
> to get the performance data ready. I think maybe it's better to open a 
> mailing thread in llvm-dev to post those performance data and discuss those 
> suggestions.
>
> The first data was posted in 
> http://lists.llvm.org/pipermail/llvm-dev/2019-December/137413.html.
>
> Thanks,
> Annita


More performance data was posted on 
http://lists.llvm.org/pipermail/llvm-dev/2019-December/137609.html and 
http://lists.llvm.org/pipermail/llvm-dev/2019-December/137610.html. Let's move 
on based on the data.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary

2019-12-09 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

> The point is that we have explicit requirement at the start and we have a 
> lowering into 16-byte sequence that we need to be preserved exactly as it is.
>  Essentially what we need is  a "protection" for this sequence from any 
> changes by machinery that generates the binary code.
>  How can we protect a particular byte sequence from being changed by this 
> branch aligner?

No, we can't. The current solution is based on assembler to insert prefix or 
nop before the cross (or against) boundary branches. It can only ensure the 
explicit alignment specified by directive, but not any implicit alignment. I 
don't think any fixup based on assembler can do it. On the other hand, any code 
sequence after the alignment directive or even just in a function has some kind 
of implicit alignment. It's hard for assembler to tell which implicit alignment 
to preserve. The preferred way is to use explicit alignment directive to 
specify it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary

2019-12-07 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.



In D70157#1760278 , @jyknight wrote:

>




> Branch alignment
> 
>  ---
> 
> The primary goal of this patch, restricting the placement of branch 
> instructions, is a performance optimization. Similar to loop alignment, the 
> desire is to increase speed, at the cost of code-size. However, the way this 
> feature has been designed is a global assembler flag. I find that not ideal, 
> because it cannot take into account hotness of a block/function, as for 
> example loop alignment code does. Basic-block alignment of loops is 
> explicitly opt-in on an block-by-block basis -- the compiler simply emits a 
> p2align directive where it needs, and the assembler honors that. And so, 
> MachineBlockPlacement::alignBlocks has a bunch of conditions under which it 
> will avoid emitting a p2align. This seems like a good model -- the assembler 
> does what it's told by the compiler (or assembly-writer). Making the 
> branch-instruction-alignment work similarly seems like it would be good.
> 
> IMO it would be nicest if there could be a directive that requests to 
> specially-align the next instruction. However, the fused-jcc case makes that 
> quite tricky, so perhaps this ought to also be a mode which can be 
> enabled/disabled on a region as well.

Yes, the primary goal of this patch is a performance optimization or 
mitigation. The intention is to provide a simple method for users to mitigate 
the performance impact of JCC MCU with less effort. We also provide users 
several options to tune the performance. But the basic idea is to make it easy 
for users to mitigate it and improve the performance.

Your proposal is a good idea. But I'm afraid it may not cover all the 
scenarios. Firstly, the proposal replies on compiler to detect the hotspots. 
But the compiler needs LTO and/or PGO to get the precise hot spots. Otherwise, 
if the compiler misses the hot spots which impact the application performance, 
the users have no way but have to insert the directives manually. Secondly, for 
the existing codes written in assembly, the compiler can't handle it. The users 
have to insert the directives by hand, which are pretty much work.

I think the current patch wants to give a simple and general solution to 
mitigate JCC MCU performance impact to both C/C++ and Assembly. And it doesn't 
need any source code change. I think your proposal will be a good enhancement 
on top of it.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary

2019-12-06 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

> 
> 
>>> Third, I have not see a justification for why complexity for instruction 
>>> prefix padding is necessary.  All the effected CPUs support multi-byte 
>>> nops, so we're talking about a *single micro op* difference between the nop 
>>> form and prefix form.  Can anyone point to a performance delta due to this? 
>>>  If not, I'd suggest we should start with the nop form, and then build the 
>>> prefix form in a generic manner for all alignment varieties.
>> 
>> +1.
> 
> +1. Starting from just NOP padding sounds a simple and good first step. We 
> can explore segment override prefixes in the future.

I think it's a good suggestion to start with NOP padding as the first step. In 
our previous experiment, we saw that the prefix padding was slight better than 
NOP padding, but not much. We will retest the NOP padding and go back to you.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary

2019-12-03 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

In D70157#1768319 , @chandlerc wrote:

> I'm seeing lots of updates to fix bugs, but no movement for many days on both 
> my meta comments and (in some ways more importantly) James's meta comments. 
> (And thanks Philip for chiming in too!)
>
> Meanwhile, we really, really need to get this functionality in place. The 
> entire story for minimizing the new microcode performance hit hinges on these 
> patches, and I'm really worried by how little progress we're seeing here.


Sorry for belated response. We're working hard to go through some paper work to 
get the performance data ready. I think maybe it's better to open a mailing 
thread in llvm-dev to post those performance data and discuss those suggestions.

Thanks,
Annita


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D70157: Align branches within 32-Byte boundary

2019-11-14 Thread annita.zhang via Phabricator via cfe-commits
annita.zhang added a comment.

In D70157#1744243 , @chandlerc wrote:

> Thanks for sending this out!
>
> I've made a minor comment on the flag structure, but my bigger question would 
> be: can you summarize the performance hit and code size hit you've seen 
> across some benchmarks? SPEC and the LLVM test suite would be nice, and maybe 
> code size alone for some large binary like Chrome, Firefox, or Safari?
>
> I'd be particularly interested in comparing the performance & code size hit 
> incurred by the suggested "fused,jcc,jmp" set compared to all (adding call, 
> ret, and indirect). If there is a big drop in either, I'd love to know which 
> of these incurs the large drop.
>
> Thanks,
> -Chandler


We are working on the performance and code size evaluation. Will update once 
it's available.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70157/new/

https://reviews.llvm.org/D70157



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits