Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Alexander Monakov via Gcc-patches
On Tue, 11 Jul 2023, Michael Matz wrote: > Hey, > > On Tue, 11 Jul 2023, Alexander Monakov via Gcc-patches wrote: > > > > > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered > > > > > > > > This is t

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Alexander Monakov via Gcc-patches
On Tue, 11 Jul 2023, Michael Matz wrote: > > > To that end I introduce actually two related attributes (for naming > > > see below): > > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered > > > > This is the weak/active form; I'd suggest "preserve_high_sse". > > But it

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-11 Thread Alexander Monakov via Gcc-patches
On Tue, 11 Jul 2023, Richard Biener wrote: > > > If a function contains calls then GCC can't know which > > > parts of the XMM regset is clobbered by that, it may be parts > > > which don't even exist yet (say until avx2048 comes out), so we must > > > restrict ourself to only save/restore the

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-10 Thread Alexander Monakov via Gcc-patches
On Mon, 10 Jul 2023, Alexander Monakov wrote: > > I chose to make it possible to write function definitions with that > > attribute with GCC adding the necessary callee save/restore code in > > the xlogue itself. > > But you can't trivially restore if the callee is sibcalling — what > happens

Re: [x86-64] RFC: Add nosse abi attribute

2023-07-10 Thread Alexander Monakov via Gcc-patches
On Mon, 10 Jul 2023, Michael Matz via Gcc-patches wrote: > Hello, > > the ELF psABI for x86-64 doesn't have any callee-saved SSE > registers (there were actual reasons for that, but those don't > matter anymore). This starts to hurt some uses, as it means that > as soon as you have a call

Re: [PATCH] Break false dependence for vpternlog by inserting vpxor or setting constraint of input operand to '0'

2023-07-10 Thread Alexander Monakov via Gcc-patches
On Mon, 10 Jul 2023, liuhongt via Gcc-patches wrote: > False dependency happens when destination is only updated by > pternlog. There is no false dependency when destination is also used > in source. So either a pxor should be inserted, or input operand > should be set with constraint '0'. > >

Re: [PATCH] c-family: implement -ffp-contract=on

2023-06-19 Thread Alexander Monakov via Gcc-patches
Ping. OK for trunk? On Mon, 5 Jun 2023, Alexander Monakov wrote: > Ping for the front-end maintainers' input. > > On Mon, 22 May 2023, Richard Biener wrote: > > > On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches > > wrote: > > > > > &

Re: [PATCH] c-family: implement -ffp-contract=on

2023-06-05 Thread Alexander Monakov via Gcc-patches
Ping for the front-end maintainers' input. On Mon, 22 May 2023, Richard Biener wrote: > On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches > wrote: > > > > Implement -ffp-contract=on for C and C++ without changing default > > behavior (=off for -std=cNN

Re: [PATCH] doc: clarify semantics of vector bitwise shifts

2023-06-02 Thread Alexander Monakov via Gcc-patches
On Fri, 2 Jun 2023, Matthias Kretz wrote: > > Okay, I see opinions will vary here. I was thinking about our immintrin.h > > which is partially implemented in terms of generic vectors. Imagine we > > extend UBSan to trap on signed overflow for vector types. I expect that > > will blow up on

Re: [PATCH] doc: clarify semantics of vector bitwise shifts

2023-06-02 Thread Alexander Monakov via Gcc-patches
On Fri, 2 Jun 2023, Matthias Kretz wrote: > On Thursday, 1 June 2023 20:25:14 CEST Alexander Monakov wrote: > > On Wed, 31 May 2023, Richard Biener wrote: > > > So yes, we probably should clarify the semantics to match the > > > implementation (since we have two targets doing things differently

Re: [PATCH] doc: clarify semantics of vector bitwise shifts

2023-06-01 Thread Alexander Monakov via Gcc-patches
On Wed, 31 May 2023, Richard Biener wrote: > On Tue, May 30, 2023 at 4:49 PM Alexander Monakov wrote: > > > > > > On Thu, 25 May 2023, Richard Biener wrote: > > > > > On Wed, May 24, 2023 at 8:36 PM Alexander Monakov > > > wrote: > > > > > > > > > > > > On Wed, 24 May 2023, Richard Biener

Re: [PATCH] doc: clarify semantics of vector bitwise shifts

2023-05-30 Thread Alexander Monakov via Gcc-patches
On Thu, 25 May 2023, Richard Biener wrote: > On Wed, May 24, 2023 at 8:36 PM Alexander Monakov wrote: > > > > > > On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote: > > > > > I’d have to check the ISAs what they actually do here - it of course > > > depends > > > on RTL semantics as

Re: [PATCH] doc: clarify semantics of vector bitwise shifts

2023-05-24 Thread Alexander Monakov via Gcc-patches
On Wed, 24 May 2023, Richard Biener via Gcc-patches wrote: > I’d have to check the ISAs what they actually do here - it of course depends > on RTL semantics as well but as you say those are not strictly defined here > either. Plus, we can add the following executable test to the testsuite:

Re: [PATCH] doc: clarify semantics of vector bitwise shifts

2023-05-24 Thread Alexander Monakov via Gcc-patches
On Wed, 24 May 2023, Richard Biener wrote: > On Wed, May 24, 2023 at 2:54 PM Alexander Monakov via Gcc-patches > wrote: > > > > Explicitly say that bitwise shifts for narrow types work similar to > > element-wise C shifts with integer promotions, which coincides w

[PATCH] doc: clarify semantics of vector bitwise shifts

2023-05-24 Thread Alexander Monakov via Gcc-patches
Explicitly say that bitwise shifts for narrow types work similar to element-wise C shifts with integer promotions, which coincides with OpenCL semantics. gcc/ChangeLog: * doc/extend.texi (Vector Extensions): Clarify bitwise shift semantics. --- gcc/doc/extend.texi | 7 ++- 1

Re: [PATCH] c-family: implement -ffp-contract=on

2023-05-23 Thread Alexander Monakov via Gcc-patches
On Tue, 23 May 2023, Richard Biener wrote: > > Ah, no, I deliberately decided against that, because that way we would go > > via gimplify_arg, which would emit all side effects in *pre_p. That seems > > wrong if arguments had side-effects that should go in *post_p. > > Ah, true - that warrants

Re: [PATCH] c-family: implement -ffp-contract=on

2023-05-22 Thread Alexander Monakov via Gcc-patches
On Mon, 22 May 2023, Richard Biener wrote: > On Thu, May 18, 2023 at 11:04 PM Alexander Monakov via Gcc-patches > wrote: > > > > Implement -ffp-contract=on for C and C++ without changing default > > behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN). >

[PATCH] c-family: implement -ffp-contract=on

2023-05-18 Thread Alexander Monakov via Gcc-patches
Implement -ffp-contract=on for C and C++ without changing default behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN). gcc/c-family/ChangeLog: * c-gimplify.cc (fma_supported_p): New helper. (c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA contraction.

[committed] tree-ssa-math-opts: correct -ffp-contract= check

2023-05-17 Thread Alexander Monakov via Gcc-patches
Since tree-ssa-math-opts may freely contract across statement boundaries we should enable it only for -ffp-contract=fast instead of disabling it for -ffp-contract=off. No functional change, since -ffp-contract=on is not exposed yet. gcc/ChangeLog: * tree-ssa-math-opts.cc

Re: [PATCH] MATCH: Add pattern for `signbit(x) ? x : -x` into abs (and swapped)

2023-05-14 Thread Alexander Monakov via Gcc-patches
On Sun, 14 May 2023, Andrew Pinski wrote: > It is NOT a dummy iterator. SIGNBIT is a operator list that expands to > "BUILT_IN_SIGNBITF BUILT_IN_SIGNBIT BUILT_IN_SIGNBITL IFN_SIGNBIT". Ah, it's in cfn-operators.pd in the build tree, not the source tree. > > On the other hand, the following

Re: [PATCH] MATCH: Add pattern for `signbit(x) ? x : -x` into abs (and swapped)

2023-05-14 Thread Alexander Monakov via Gcc-patches
On Sun, 14 May 2023, Alexander Monakov wrote: > On Sat, 13 May 2023, Andrew Pinski via Gcc-patches wrote: > > > +/* signbit(x) != 0 ? -x : x -> abs(x) > > + signbit(x) == 0 ? -x : x -> -abs(x) */ > > +(for sign (SIGNBIT) > > Surprised to see a dummy iterator here. Was this meant to include

Re: [PATCH] MATCH: Add pattern for `signbit(x) ? x : -x` into abs (and swapped)

2023-05-14 Thread Alexander Monakov via Gcc-patches
On Sat, 13 May 2023, Andrew Pinski via Gcc-patches wrote: > +/* signbit(x) != 0 ? -x : x -> abs(x) > + signbit(x) == 0 ? -x : x -> -abs(x) */ > +(for sign (SIGNBIT) Surprised to see a dummy iterator here. Was this meant to include float and long double versions of the builtin too (SIGNBITF

[PATCH 1/3] genmatch: clean up emit_func

2023-05-08 Thread Alexander Monakov via Gcc-patches
Eliminate boolean parameters of emit_func. The first ('open') just prints 'extern' to generated header, which is unnecessary. Introduce a separate function to use when finishing a declaration in place of the second ('close'). Rename emit_func to 'fp_decl' (matching 'fprintf' in length) to unbreak

[PATCH 3/3] genmatch: fixup get_out_file

2023-05-08 Thread Alexander Monakov via Gcc-patches
get_out_file did not follow the coding conventions (mixing three-space and two-space indentation, missing linebreak before function name). Take that as an excuse to reimplement it in a more terse manner and rename as 'choose_output', which is hopefully more descriptive. gcc/ChangeLog: *

[PATCH 2/3] genmatch: clean up showUsage

2023-05-08 Thread Alexander Monakov via Gcc-patches
Display usage more consistently and get rid of camelCase. gcc/ChangeLog: * genmatch.cc (showUsage): Reimplement as ... (usage): ...this. Adjust all uses. (main): Print usage when no arguments. Add missing 'return 1'. --- gcc/genmatch.cc | 21 ++--- 1

[PATCH 0/3] Trivial cleanups for genmatch

2023-05-08 Thread Alexander Monakov via Gcc-patches
I'm trying to study match.pd/genmatch with the eventual goal of improving match-and-simplify code generation. Here's some trivial cleanups for the recent refactoring in the meantime. Alexander Monakov (3): genmatch: clean up emit_func genmatch: clean up showUsage genmatch: fixup

RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-08 Thread Alexander Monakov via Gcc-patches
On Fri, 5 May 2023, Alexander Monakov wrote: > > > gimple-head-export.cc does not exist. > > > > > > gimple-match-exports.cc is not a generated file. It's under source > > > control and > > > edited independently from genmatch.cc. It is compiled separately, > > > producing > > >

RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Alexander Monakov via Gcc-patches
ean up match.pd-related dependencies > > > > > > On Fri, 5 May 2023, Tamar Christina wrote: > > > > > > > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via Gcc-patches > > > > > > > > patc...@gcc.gnu.org>: > > &

RE: [PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Alexander Monakov via Gcc-patches
On Fri, 5 May 2023, Tamar Christina wrote: > > > Am 05.05.2023 um 19:03 schrieb Alexander Monakov via Gcc-patches > patc...@gcc.gnu.org>: > > > > > > Clean up confusing changes from the recent refactoring for parallel > > > match.pd build. > >

[PATCH] Makefile.in: clean up match.pd-related dependencies

2023-05-05 Thread Alexander Monakov via Gcc-patches
Clean up confusing changes from the recent refactoring for parallel match.pd build. gimple-match-head.o is not built. Remove related flags adjustment. Autogenerated gimple-match-N.o files do not depend on gimple-match-exports.cc. {gimple,generic)-match-auto.h only depend on the prerequisites of

[PATCH] do not tailcall __sanitizer_cov_trace_pc [PR90746]

2023-05-02 Thread Alexander Monakov via Gcc-patches
When instrumentation is requested via -fsanitize-coverage=trace-pc, GCC emits calls to __sanitizer_cov_trace_pc callback into each basic block. This callback is supposed to be implemented by the user, and should be able to identify the containing basic block by inspecting its return address.

[PATCH] haifa-sched: fix autopref_rank_for_schedule comparator [PR109187]

2023-03-28 Thread Alexander Monakov via Gcc-patches
Do not attempt to use a plain subtraction for generating a three-way comparison result in autopref_rank_for_schedule qsort comparator, as offsets are not restricted and subtraction may overflow. Open-code a safe three-way comparison instead. gcc/ChangeLog: PR rtl-optimization/109187

Re: Should -ffp-contract=off the default on GCC?

2023-03-22 Thread Alexander Monakov via Gcc-patches
On Mon, 20 Mar 2023, Jakub Jelinek via Gcc-patches wrote: > On Mon, Mar 20, 2023 at 10:05:57PM +, Qing Zhao via Gcc-patches wrote: > > My question: is the above section the place in C standard “explicitly > > allows contractions”? If not, where it is in C standard? > >

Re: Should -ffp-contract=off the default on GCC?

2023-03-22 Thread Alexander Monakov via Gcc-patches
On Wed, 22 Mar 2023, Richard Biener wrote: > I think it's even less realistic to expect users to know the details of > floating-point math. So I doubt any such sentence will be helpful > besides spreading some FUD? I think it's closer to "fundamental notions" rather than "details". For users

Re: Should -ffp-contract=off the default on GCC?

2023-03-21 Thread Alexander Monakov via Gcc-patches
On Tue, 21 Mar 2023, Jeff Law via Gcc-patches wrote: > On 3/21/23 11:00, Qing Zhao via Gcc-patches wrote: > > > >> On Mar 21, 2023, at 12:56 PM, Paul Koning wrote: > >> > >>> On Mar 21, 2023, at 11:01 AM, Qing Zhao via Gcc-patches > >>> wrote: > >>> > >>> ... > >>> Most of the compiler users

Re: [RFC/PATCH] sched: Consider debug insn in no_real_insns_p [PR108273]

2023-03-20 Thread Alexander Monakov via Gcc-patches
On Mon, 20 Mar 2023, Kewen.Lin wrote: > Hi, Hi. Thank you for the thorough analysis. Since I analyzed PR108519, I'd like to offer my comments. > As PR108273 shows, when there is one block which only has > NOTE_P and LABEL_P insns at non-debug mode while has some > extra DEBUG_INSN_P insns at

Re: [PATCH] [RFC] RAII auto_mpfr and autp_mpz

2023-03-07 Thread Alexander Monakov via Gcc-patches
On Tue, 7 Mar 2023, Jonathan Wakely wrote: > > Shouldn't this use the idiom suggested in ansidecl.h, i.e. > > > > private: > > DISABLE_COPY_AND_ASSIGN (auto_mpfr); > > > Why? A macro like that (or a base class like boost::noncopyable) has > some value in a code base that wants to work

Re: [PATCH] [RFC] RAII auto_mpfr and autp_mpz

2023-03-07 Thread Alexander Monakov via Gcc-patches
Hi, On Mon, 6 Mar 2023, Richard Biener via Gcc-patches wrote: > --- a/gcc/realmpfr.h > +++ b/gcc/realmpfr.h > @@ -24,6 +24,26 @@ > #include > #include > > +class auto_mpfr > +{ > +public: > + auto_mpfr () { mpfr_init (m_mpfr); } > + explicit auto_mpfr (mpfr_prec_t prec) { mpfr_init2

Re: RISC-V: Add divmod instruction support

2023-02-20 Thread Alexander Monakov via Gcc-patches
On Mon, 20 Feb 2023, Richard Biener via Gcc-patches wrote: > On Sun, Feb 19, 2023 at 2:15 AM Maciej W. Rozycki wrote: > > > > > The problem is you don't see it as a divmod in expand_divmod unless you > > > expose > > > a divmod optab. See tree-ssa-mathopts.cc's divmod handling. > > > >

Re: [PATCH][X86_64] Separate znver4 insn reservations from older znvers

2023-01-03 Thread Alexander Monakov via Gcc-patches
On Tue, 3 Jan 2023, Jan Hubicka wrote: > > * gcc/common/config/i386/i386-common.cc (processor_alias_table): > > Use CPU_ZNVER4 for znver4. > > * config/i386/i386.md: Add znver4.md. > > * config/i386/znver4.md: New. > OK, > thanks! Honza, I'm curious what are your further plans

Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-24 Thread Alexander Monakov via Gcc-patches
On Sat, 24 Dec 2022, Jose E. Marchesi wrote: > However, there is something I don't understand: wouldn't sched2 > introduce the same problem when -fsched2-use-superblocks is specified? Superblocks are irrelevant, a call instruction does not end a basic block and the problematic motion happens

Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-24 Thread Alexander Monakov via Gcc-patches
On Fri, 23 Dec 2022, Qing Zhao wrote: > BTW, Why sched1 is not enabled on x86 by default? Register allocation is tricky on x86 due to small number of general-purpose registers, and sched1 can make it even more difficult. I think before register pressure modeling was added, sched1 could not be

Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Alexander Monakov via Gcc-patches
On Fri, 23 Dec 2022, Qing Zhao wrote: > Then, sched2 still can move insn across calls? > So does sched2 have the same issue of incorrectly moving the insn across a > call which has unknown control flow? I think problems are unlikely because register allocator assigns pseudos that cross

Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Alexander Monakov via Gcc-patches
On Fri, 23 Dec 2022, Jose E. Marchesi wrote: > > (scheduling across calls in sched2 is somewhat dubious as well, but > > it doesn't risk register pressure issues, and on VLIW CPUs it at least > > can result in better VLIW packing) > > Does sched2 actually schedule across calls? All the

Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-23 Thread Alexander Monakov via Gcc-patches
On Fri, 23 Dec 2022, Qing Zhao wrote: > >> I am a little confused, you mean pre-RA scheduler does not look at the > >> data flow > >> information at all when scheduling insns across calls currently? > > > > I think it does not inspect liveness info, and may extend lifetime of a > > pseudo > >

Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-22 Thread Alexander Monakov via Gcc-patches
On Thu, 22 Dec 2022, Qing Zhao wrote: > > I think scheduling across calls in the pre-RA scheduler is simply an > > oversight, > > we do not look at dataflow information and with 50% chance risk extending > > lifetime of a pseudoregister across a call, causing higher register > > pressure at >

Re: [PATCH V2] Disable sched1 in functions that call setjmp

2022-12-22 Thread Alexander Monakov via Gcc-patches
On Thu, 22 Dec 2022, Jose E. Marchesi via Gcc-patches wrote: > The first instruction scheduler pass reorders instructions in the TRY > block in a way `b=true' gets executed before the call to the function > `f'. This optimization is wrong, because `main' calls setjmp and `f' > is known to call

Re: [PATCH] i386: correct division modeling in lujiazui.md

2022-12-19 Thread Alexander Monakov via Gcc-patches
Ping. If there are any questions or concerns about the patch, please let me know: I'm interested in continuing this cleanup at least for older AMD models. I noticed I had an extra line in my Changelog: > (lua_sseicvt_si): Ditto. It got there accidentally and I will drop it. Alexander On

[PATCH] i386: correct division modeling in lujiazui.md

2022-12-09 Thread Alexander Monakov via Gcc-patches
Model the divider in Lujiazui processors as a separate automaton to significantly reduce the overall model size. This should also result in improved accuracy, as pipe 0 should be able to accept new instructions while the divider is occupied. It is unclear why integer divisions are modeled as if

RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers

2022-12-01 Thread Alexander Monakov via Gcc-patches
On Thu, 1 Dec 2022, Joshi, Tejas Sanjay wrote: > I have addressed all your comments in this revised patch, PFA and inlined > below. Thank you. Honza, please let me know if any further input is needed from my side. For reference, here's how insn-automata.o table sizes look with this patch (top

Re: [PATCH] riscv: implement TARGET_MODE_REP_EXTENDED

2022-11-21 Thread Alexander Monakov via Gcc-patches
On Mon, 21 Nov 2022, Jeff Law wrote: > They're writing assembly code -- in my book that means they'd better have a > pretty good understanding of the architecture, its limitations and quirks. That GCC ties together optimization and inline asm interface via its internal TARGET_MODE_REP_EXTENDED

RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers

2022-11-21 Thread Alexander Monakov via Gcc-patches
On Mon, 21 Nov 2022, Joshi, Tejas Sanjay wrote: > I have addressed all your comments in the patch attached here. I have also > used znver4-direct for avx512 insns. Thanks. > * This patch increased the insn-automata.cc size from 201502 to 214902. Assuming it's the number of lines of code, I

Re: [PATCH] riscv: implement TARGET_MODE_REP_EXTENDED

2022-11-21 Thread Alexander Monakov via Gcc-patches
On Sun, 20 Nov 2022, Jeff Law wrote: > > The concern, as far as I understand would be the case where the > > assembly-sequence leaves an incompatible extension in the register. > > Right.  The question in my mind is whether or not the responsibility should be > on the compiler or on the

Re: [PATCH 2/2] i386: correct x87 multiplication modeling in znver.md

2022-11-16 Thread Alexander Monakov via Gcc-patches
On Wed, 16 Nov 2022, Jan Hubička wrote: > This looks really promising. I will experiment with the patch for separate > znver3 model, but I think we should be able to keep > them unified and hopefully get both less code duplicatoin and table sizes. Do you mean separate znver4 (not '3') model

Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-15 Thread Alexander Monakov via Gcc-patches
On Wed, 16 Nov 2022, Hongyu Wang wrote: > > When emitting a compare-and-swap loop for @ref{__sync Builtins} > > and @ref{__atomic Builtins} lacking a native instruction, optimize > > for the highly contended case by issuing an atomic load before the > > @code{CMPXCHG} instruction, and using the

Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-15 Thread Alexander Monakov via Gcc-patches
On Tue, 15 Nov 2022, Jonathan Wakely wrote: > > How about the following: > > > > When emitting a compare-and-swap loop for @ref{__sync Builtins} > > and @ref{__atomic Builtins} lacking a native instruction, optimize > > for the highly contended case by issuing an atomic load before the > >

Re: [PATCH] doc: Reword the description of -mrelax-cmpxchg-loop [PR 107676]

2022-11-15 Thread Alexander Monakov via Gcc-patches
On Tue, 15 Nov 2022, Jonathan Wakely via Gcc-patches wrote: > > @item -mrelax-cmpxchg-loop > > @opindex mrelax-cmpxchg-loop > >-Relax cmpxchg loop by emitting an early load and compare before cmpxchg, > >-execute pause if load value is not expected. This reduces excessive > >-cachline bouncing

RE: [PATCH][X86_64] Separate znver4 insn reservations from older znvers

2022-11-15 Thread Alexander Monakov via Gcc-patches
On Tue, 15 Nov 2022, Joshi, Tejas Sanjay wrote: > > > +;; AVX instructions > > > +(define_insn_reservation "znver4_sse_log" 1 > > > + (and (eq_attr "cpu" "znver4") > > > + (and (eq_attr "type" "sselog,sselog1") > > > +

Re: [PATCH][X86_64] Separate znver4 insn reservations from older znvers

2022-11-14 Thread Alexander Monakov via Gcc-patches
On Mon, 14 Nov 2022, Joshi, Tejas Sanjay wrote: > [Public] > > Hi, Hi. I'm still waiting for feedback on fixes for existing models: https://inbox.sourceware.org/gcc-patches/5ae6fc21-edc6-133-aee2-a41e16eb...@ispras.ru/T/#t did you have a chance to look at those? > PFA the patch which adds

Re: [PATCH 0/2] i386: slim down insn-automata [PR 87832]

2022-11-14 Thread Alexander Monakov via Gcc-patches
On Mon, 7 Nov 2022, Alexander Monakov wrote: > > On Tue, 1 Nov 2022, Alexander Monakov wrote: > > > Hi, > > > > I'm sending followup fixes for combinatorial explosion of znver scheduling > > automaton tables as described in the earlier thread: > > > >

Re: [RFC] docs: remove documentation for unsupported releases

2022-11-10 Thread Alexander Monakov via Gcc-patches
On Thu, 10 Nov 2022, Martin Liška wrote: > On 11/10/22 08:29, Gerald Pfeifer wrote: > > On Wed, 9 Nov 2022, Alexander Monakov wrote: > >> For this I would suggest using the tag to neatly fold links > >> for old releases. Please see the attached patch. > > > > Loving it, Alexander! > > > >

Re: [RFC] docs: remove documentation for unsupported releases

2022-11-09 Thread Alexander Monakov via Gcc-patches
On Wed, 9 Nov 2022, Martin Liška wrote: > Hi. > > I think we should remove documentation for unsupported GCC releases > as it's indexed by Google engine. I'd agree with previous responses that outright removing the links is undesirable, and pointing Google to recent documentation should be

Re: [PATCH] riscv: implement TARGET_MODE_REP_EXTENDED

2022-11-09 Thread Alexander Monakov via Gcc-patches
On Wed, 9 Nov 2022, Philipp Tomsich wrote: > > To give a specific example that will be problematic if you go far enough > > down > > the road of matching MIPS64 behavior: > > > > long f(void) > > { > > int x; > > asm("" : "=r"(x)); > > return x; > > } > > > > here GCC (unlike LLVM)

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-28 Thread Alexander Monakov via Gcc-patches
On Tue, 27 Sep 2022, Tobias Burnus wrote: > Ignoring (1), does the overall patch and this part otherwise look okay(ish)? > > > Caveat: The .sys scope works well with >= sm_60 but not does not handle > older versions. For those, the __atomic_{load/store}_n are used. I do not > see a good

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-26 Thread Alexander Monakov via Gcc-patches
Hi. My main concerns remain not addressed: 1) what I said in the opening paragraphs of my previous email; 2) device-issued atomics are not guaranteed to appear atomic to the host unless using atom.sys and translating for CUDA compute capability 6.0+. Item 2 is a correctness issue. Item 1 I

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-09-21 Thread Alexander Monakov via Gcc-patches
Hi. On the high level, I'd be highly uncomfortable with this. I guess we are in vague agreement that it cannot be efficiently implemented. It also goes against the good practice of accelerator programming, which requires queueing work on the accelerator and letting it run asynchronously with

Re: [PATCH] [x86]Don't optimize cmp mem, 0 to load mem, reg + test reg, reg

2022-09-16 Thread Alexander Monakov via Gcc-patches
On Fri, 16 Sep 2022, Uros Bizjak via Gcc-patches wrote: > On Fri, Sep 16, 2022 at 3:32 AM Jeff Law via Gcc-patches > wrote: > > > > > > On 9/15/22 19:06, liuhongt via Gcc-patches wrote: > > > There's peephole2 submit in 1990s which split cmp mem, 0 to load mem, > > > reg + test reg, reg. I don't

Re: [PATCH] riscv: implement TARGET_MODE_REP_EXTENDED

2022-09-06 Thread Alexander Monakov via Gcc-patches
On Mon, 5 Sep 2022, Philipp Tomsich wrote: > +riscv_mode_rep_extended (scalar_int_mode mode, scalar_int_mode mode_rep) > +{ > + /* On 64-bit targets, SImode register values are sign-extended to DImode. > */ > + if (TARGET_64BIT && mode == SImode && mode_rep == DImode) > +return

Re: [PATCH v2] ipa-visibility: Optimize TLS access [PR99619]

2022-08-30 Thread Alexander Monakov via Gcc-patches
> I see, thank you for explaining the issue, and sorry if I was a bit stubborn. > > Does the attached patch (incremental change below) look better? It no longer > has the 'shortcut' where iterating over referrers is avoided for the common > case of plain 'gcc -O2' and no 'optimize' attributes,

Re: [PATCH v2] ipa-visibility: Optimize TLS access [PR99619]

2022-08-30 Thread Alexander Monakov via Gcc-patches
On Tue, 30 Aug 2022, Martin Jambor wrote: > There is still the optimize attribute so in fact no, even in non-LTO > mode if there is no current function, you cannot trust the "global" > "optimize" thing. > > Ideally we would assert that no "analysis" phase of an IPA pass reads > the global

Re: [Patch] libgomp/nvptx: Prepare for reverse-offload callback handling

2022-08-26 Thread Alexander Monakov via Gcc-patches
On Fri, 26 Aug 2022, Tobias Burnus wrote: > @Tom and Alexander: Better suggestions are welcome for the busy loop in > libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking > its value. I think to do that without polling you can use PTX 'brkpt' instruction on the device

Re: [PATCH v2] ipa-visibility: Optimize TLS access [PR99619]

2022-08-26 Thread Alexander Monakov via Gcc-patches
On Fri, 26 Aug 2022, Martin Jambor wrote: > > +/* Check if promoting general-dynamic TLS access model to local-dynamic is > > + desirable for DECL. */ > > + > > +static bool > > +optimize_dyn_tls_for_decl_p (const_tree decl) > > +{ > > + if (optimize) > > +return true; > > ...this. This

Re: [PATCH] i386: avoid zero extension for crc32q

2022-08-24 Thread Alexander Monakov via Gcc-patches
On Tue, 23 Aug 2022, Alexander Monakov via Gcc-patches wrote: > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr106453.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-msse4.2 -O2 -fdump-rtl-final" } */ > +/* { dg-final { scan-rtl-dump-

[PATCH] i386: avoid zero extension for crc32q

2022-08-23 Thread Alexander Monakov via Gcc-patches
The crc32q instruction takes 64-bit operands, but ignores high 32 bits of the destination operand, and zero-extends the result from 32 bits. Let's model this in the RTL pattern to avoid zero-extension when the _mm_crc32_u64 intrinsic is used with a 32-bit type. PR target/106453

Re: [PATCH v2] ipa-visibility: Optimize TLS access [PR99619]

2022-08-23 Thread Alexander Monakov via Gcc-patches
Ping^3. On Fri, 5 Aug 2022, Alexander Monakov wrote: > Ping^2. > > On Wed, 20 Jul 2022, Alexander Monakov wrote: > > > > > Ping. > > > > On Thu, 7 Jul 2022, Alexander Monakov via Gcc-patches wrote: > > > > > From: Artem Klimov > &g

Re: [PATCH v2] ipa-visibility: Optimize TLS access [PR99619]

2022-08-05 Thread Alexander Monakov via Gcc-patches
Ping^2. On Wed, 20 Jul 2022, Alexander Monakov wrote: > > Ping. > > On Thu, 7 Jul 2022, Alexander Monakov via Gcc-patches wrote: > > > From: Artem Klimov > > > > Fix PR99619, which asks to optimize TLS model based on visibility. > > The fix is implemen

Re: [PATCH 2/2] Avoid registering __builtin_setjmp_receiver label twice [PR101347]

2022-07-20 Thread Alexander Monakov via Gcc-patches
On Wed, 20 Jul 2022, Eric Botcazou wrote: > > Eric is probably most familiar with this, but can you make sure to bootstrap > > and test this on a SJLJ EH target? I'm not sure --enable-sjlj-exceptions > > is well tested anywhere but on targets not supporting DWARF EH and the > > configury is a

Re: [PATCH v2] ipa-visibility: Optimize TLS access [PR99619]

2022-07-20 Thread Alexander Monakov via Gcc-patches
Ping. On Thu, 7 Jul 2022, Alexander Monakov via Gcc-patches wrote: > From: Artem Klimov > > Fix PR99619, which asks to optimize TLS model based on visibility. > The fix is implemented as an IPA optimization: this allows to take > optimized visibility status into account (a

Re: [PATCH 2/3] tree-cfg: do not duplicate returns_twice calls

2022-07-19 Thread Alexander Monakov via Gcc-patches
On Tue, 19 Jul 2022, Richard Biener wrote: > > Like below? > > Yes. > > Thanks and sorry for the back and forth - this _is_ a mightly > complicated area ... No problem! This is the good, healthy kind of back-and-forth, and I am grateful. Pushed, including the tree-cfg validator enhancement

[PATCH 2/2] Avoid registering __builtin_setjmp_receiver label twice [PR101347]

2022-07-19 Thread Alexander Monakov via Gcc-patches
The testcase in the PR demonstrates how it is possible for one __builtin_setjmp_receiver label to appear in nonlocal_goto_handler_labels list twice (after the block with __builtin_setjmp_setup referring to it was duplicated). remove_node_from_insn_list did not account for this possibility and

[PATCH 1/2] Remove unused remove_node_from_expr_list

2022-07-19 Thread Alexander Monakov via Gcc-patches
This function remains unused since remove_node_from_insn_list was cloned from it. gcc/ChangeLog: * rtl.h (remove_node_from_expr_list): Remove declaration. * rtlanal.cc (remove_node_from_expr_list): Remove (no uses). --- gcc/rtl.h | 1 - gcc/rtlanal.cc | 29

[committed] .gitignore: do not ignore config.h

2022-07-19 Thread Alexander Monakov via Gcc-patches
GCC does not support in-tree builds at the moment, so .gitignore concealing artifacts of accidental in-tree ./configure run may cause confusion. Un-ignore config.h, which is known to break the build. ChangeLog: * .gitignore: Do not ignore config.h. --- .gitignore | 3 ++- 1 file

Re: [PATCH 2/3] tree-cfg: do not duplicate returns_twice calls

2022-07-14 Thread Alexander Monakov via Gcc-patches
On Thu, 14 Jul 2022, Richard Biener wrote: > Indeed. Guess that's what __builtin_setjmp[_receiver] for SJLJ_EH got > "right". > > When copying a block we do not copy labels so any "jumps" remain to the > original > block and thus we are indeed able to isolate normal control flow. Given

Re: [PATCH 2/3] tree-cfg: do not duplicate returns_twice calls

2022-07-13 Thread Alexander Monakov via Gcc-patches
On Wed, 13 Jul 2022, Richard Biener wrote: > > > The thing to check would be incoming abnormal edges in > > > can_duplicate_block_p, not (only) returns twice functions? > > > > Unfortunately not, abnormal edges are also used for computed gotos, which > > are > > less magic than returns_twice

Re: [PATCH 2/3] tree-cfg: do not duplicate returns_twice calls

2022-07-12 Thread Alexander Monakov via Gcc-patches
Apologies for the prolonged silence Richard, it is a bit of an obscure topic, and I was unsure I'd be able to handle any complications in a timely manner. I'm ready to revisit it now, please see below. On Mon, 17 Jan 2022, Richard Biener wrote: > On Fri, Jan 14, 2022 at 7:21 PM Alexander

Re: [PATCH 3/3] lto-plugin: implement LDPT_GET_API_VERSION

2022-07-11 Thread Alexander Monakov via Gcc-patches
On Mon, 11 Jul 2022, Martin Liška wrote: > I've clarified that linker should return a value that is in range > [minimal_api_supported, maximal_api_supported] and added an abort > if it's not the case. I noticed that we are placing a trap for C++ consumers such as mold by passing

Re: [PATCH 3/3] lto-plugin: implement LDPT_GET_API_VERSION

2022-07-11 Thread Alexander Monakov via Gcc-patches
On Mon, 11 Jul 2022, Rui Ueyama wrote: > I updated my patch to support the proposed API: > https://github.com/rui314/mold/commit/22bbfa9bba9beeaf40b76481d175939ee2c62ec8 This still seems to ignore the thread safety aspect. Alexander

Re: [PATCH 3/3] lto-plugin: implement LDPT_GET_API_VERSION

2022-07-11 Thread Alexander Monakov via Gcc-patches
On Mon, 11 Jul 2022, Rui Ueyama wrote: > > but ignoring min_api_supported is wrong, and assuming max_api_supported > 0 > > is also wrong. It really should check how given [min; max] range intersects > > with its own range of supported versions. > > Currently only one version is defined which is

Re: [PATCH 3/3] lto-plugin: implement LDPT_GET_API_VERSION

2022-07-08 Thread Alexander Monakov via Gcc-patches
On Fri, 8 Jul 2022, Martin Liška wrote: > Hi. > > All right, there's updated version of the patch that reflects the following > suggestions: > > 1) strings are used for version identification > 2) thread-safe API version (1) is not used if target does not support locking > via pthreads > >

[PATCH v2] ipa-visibility: Optimize TLS access [PR99619]

2022-07-07 Thread Alexander Monakov via Gcc-patches
From: Artem Klimov Fix PR99619, which asks to optimize TLS model based on visibility. The fix is implemented as an IPA optimization: this allows to take optimized visibility status into account (as well as avoid modifying all language frontends). 2022-04-17 Artem Klimov gcc/ChangeLog:

Re: [PATCH 3/3] lto-plugin: implement LDPT_GET_API_VERSION

2022-06-16 Thread Alexander Monakov via Gcc-patches
On Thu, 16 Jun 2022, Martin Liška wrote: > Hi. > > I'm sending updated version of the patch where I addressed the comments. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? I noticed a typo (no objection on the substance on the patch from

Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-30 Thread Alexander Monakov via Gcc-patches
On Mon, 30 May 2022, Hongtao Liu wrote: > On Mon, May 30, 2022 at 3:44 PM Alexander Monakov wrote: > > > > On Mon, 30 May 2022, Hongtao Liu wrote: > > > > > On Mon, May 30, 2022 at 2:22 PM Alexander Monakov via Gcc-patches > > > wrote: > > > >

Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-30 Thread Alexander Monakov via Gcc-patches
On Mon, 30 May 2022, Hongtao Liu wrote: > On Mon, May 30, 2022 at 2:22 PM Alexander Monakov via Gcc-patches > wrote: > > > > > > The spill is mainly decided by 3 insns related to r92 > > > > > > 283(insn 3 61 4 2 (set (reg/v:SF 92 [ x ]) > &

RE: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-30 Thread Alexander Monakov via Gcc-patches
> > In the PR, the spill happens in the initial basic block of the function, > > i.e. > > the one with the highest frequency. > > > > Also as noted in the PR, swapping the 'unlikely' branch to 'likely' avoids > > the spill, > > even though it does not affect the frequency of the initial basic

Re: [PATCH] Add a bit dislike for separate mem alternative when op is REG_P.

2022-05-27 Thread Alexander Monakov via Gcc-patches
On Wed, 25 May 2022, liuhongt via Gcc-patches wrote: > Rigt now, mem_cost for separate mem alternative is 1 * frequency which > is pretty small and caused the unnecessary SSE spill in the PR, I've tried > to rework backend cost model, but RA still not happy with that(regress > somewhere else). I

Re: [PATCH] ipa-visibility: Optimize TLS access [PR99619]

2022-05-23 Thread Alexander Monakov via Gcc-patches
On Mon, 16 May 2022, Alexander Monakov wrote: > On Mon, 9 May 2022, Jan Hubicka wrote: > > > > On second thought, it might be better to keep the assert, and place the > > > loop > > > under 'if (optimize)'? > > > > The problem is that at IPA level it does not make sense to check > > optimize

Re: [PATCH] Add divide by zero side effect.

2022-05-22 Thread Alexander Monakov via Gcc-patches
On Fri, 20 May 2022, Richard Biener wrote: > > > > I suggest 'deduce', 'deduction', 'deducing a range'. What the code is > > > > actually > > > > doing is deducing that 'b' in 'a / b' cannot be zero. Function in GCC > > > > might be > > > > called like 'deduce_ranges_from_stmt'. > > > > > > So

Re: [PATCH] Add divide by zero side effect.

2022-05-20 Thread Alexander Monakov via Gcc-patches
On Fri, 20 May 2022, Richard Biener wrote: > On Fri, May 20, 2022 at 8:38 AM Alexander Monakov wrote: > > > > On Fri, 20 May 2022, Richard Biener via Gcc-patches wrote: > > > > > > Still waiting for a suggestion, since "side effect" is the description > > > > that made sense to me :-) > > > > >

Re: [PATCH] Add divide by zero side effect.

2022-05-20 Thread Alexander Monakov via Gcc-patches
On Fri, 20 May 2022, Richard Biener via Gcc-patches wrote: > > Still waiting for a suggestion, since "side effect" is the description > > that made sense to me :-) > > I think side-effect captures it quite well even if it overlaps with a term > used in language standards. Doing c = a << b has

Re: [PATCH] ipa-visibility: Optimize TLS access [PR99619]

2022-05-16 Thread Alexander Monakov via Gcc-patches
On Mon, 9 May 2022, Jan Hubicka wrote: > > On second thought, it might be better to keep the assert, and place the loop > > under 'if (optimize)'? > > The problem is that at IPA level it does not make sense to check > optimize flag as it is function specific. (shlib is OK to check it > anywhere

  1   2   >