[libgcc2.c] Implementation of __bswapsi2()
libgcc2.c defines __bswapsi2() as follows: typedef int SItype __attribute__ ((mode (SI))); SItype __bswapsi2 (SItype u) { return u) & 0xff00) >> 24) | (((u) & 0x00ff) >> 8) | (((u) & 0xff00) << 8) | (((u) & 0x00ff) << 24)); } JFTR: if (u & 0x80) == 0x80, (u & 0xff) << 24 exhibits undefined behaviour, but that's another story. For i386 and AMD64 processors GCC optimises the above code properly and generates a BSWAP or MOVBE instruction. What about processors without such an instruction? Does GCC generate (unoptimised) code there, similar to the following i386 assembly, using 4 loads, 4 shifts, 2 ands plus 3 ors? gcc -m32 -o- -O1 -S bswapsi2.c __bswapsi2: movl4(%esp), %eax movl%eax, %edx shrl$24, %edx movl%eax, %ecx sall$24, %ecx orl %ecx, %edx movl%eax, %ecx sarl$8, %ecx andl$65280, %ecx orl %ecx, %edx sall$8, %eax andl$16711680, %eax orl %edx, %eax ret Or is GCC able to optimise this to code similar to the following i386 assembly, using 2 loads, 2 rotates, 2 ands plus 1 or, i.e. halving the number of instructions, if the target processor has rotate instructions? __bswapsi2: movl4(%esp), %eax movl%eax, %edx andl$-16711936, %edx rorl$8, %edx andl$16711935, %eax roll$8, %eax orl %edx, %eax ret If not: shouldn't __bswapsi2() better be implemented as follows? unsigned __rotlsi3 (unsigned v, int w) { return (v << (31 & w)) | (v >> (31 & -w)); } unsigned __rotrsi3 (unsigned v, int w) { return (v >> (31 & w)) | (v << (31 & -w)); } int __bswapsi2 (int u) // should better be unsigned __bswapsi2 (unsigned u)! { return __rotlsi3 (u & 0xff00ff00, 8) | __rotrsi3 (u & 0x00ff00ff, 8); } Stefan KanthaK PS: reimplementing __bswapdi2() is left as an exercise to the reader. PPS: the following (due to the commented cast but wrong) implementation exhibits 4 bugs in the optimiser and register allocator: #1) failure to generate a second bswap from the high dword of the argument loaded into %edx; #2) superfluous pushl/popl of otherwise unused %esi; #3) unmotivated use of %edi instead of %edx; #4) use of movl/sarl to produce the sign of %eax in %edx. The first bug results in 6 (out of 18) instructions instead of 1, the last 3 bugs result in 6 (out of 18) superfluous instructions! typedef int DItype __attribute__ ((mode (DI))); DItype __bswapdi2 (DItype u) { return ((DItype) __bswapsi2 (u) << 32) | /* (unsigned) */ __bswapsi2 (u >> 32); } gcc -m32 -o- -O3 -S bswapdi2.c __bswapdi2: pushl %edi# Oops: not needed any more! pushl %esi# Ouch: superfluous! movl16(%esp), %edx movl12(%esp), %ecx popl%esi# Ouch: superfluous! movl%edx, %eax andl$16711935, %edx andl$-16711936, %eax rorl$8, %edx bswap %ecx roll$8, %eax orl %edx, %eax movl%eax, %edi # Oops: %edx should be used here instead of %edi sarl$31, %edi # Oops: cltd should be used here instead of movl plus sarl orl %edi, %ecx # Oops: orl %ecx, %edx should be used here popl%edi# Oops: not needed any more! movl%ecx, %edx # Oops: not needed any more! ret
Re: Using IFUNC with template functions.
* Amrita H. S. via Gcc: > I am interested to know if there any other better way to use ifuncs with > template functions. If there is none, is it worth suggesting to the C++ > standards? IFUNC is GNU-specific. It's not supported by all ELF platforms, and not even by all non-glibc Linux targets. I think you can get the same effect just using standard C++ features, by making add a global variable of a suitable type that overrides operator(). There are issues related to constructor ordering, of course, but you have the same problem for IFUNC relocations. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Re: [libgcc2.c] Implementation of __bswapsi2()
Hello, On Thu, 12 Nov 2020, Stefan Kanthak wrote: > Does GCC generate (unoptimised) code there, similar to the following i386 > assembly, using 4 loads, 4 shifts, 2 ands plus 3 ors? Try for yourself. '-m32 -O2 -march=i386' is your friend. Ciao, Michael. Spoiler: it's generating: movl4(%esp), %eax rolw$8, %ax roll$16, %eax rolw$8, %ax ret
Installing a generated header file
Hi! I'm working on a project where it's desirable to generate a target-specific header file while building GCC, and install it with the rest of the target-specific headers (i.e., in lib/gcc//11.0.0/include). Today it appears that only those headers listed in "extra_headers" in config.gcc will be placed there, and those are assumed to be found in gcc/config/. In my case, the header file will end up in my build directory instead. Questions: * Has anyone tried something like this before? I didn't find anything. * If so, can you please point me to an example? * Otherwise, I'd be interested in advice about providing new infrastructure to support this. I'm a relative noob with respect to the configury code, and I'm sure my initial instincts will be wrong. :) Thanks for any help! Bill
Re: Installing a generated header file
On Thu, 12 Nov 2020 at 15:39, Bill Schmidt via Gcc wrote: > > Hi! I'm working on a project where it's desirable to generate a > target-specific header > file while building GCC, and install it with the rest of the target-specific > headers > (i.e., in lib/gcc//11.0.0/include). Today it appears that only those > headers > listed in "extra_headers" in config.gcc will be placed there, and those are > assumed to > be found in gcc/config/. In my case, the header file will end up in > my build > directory instead. > > Questions: > > * Has anyone tried something like this before? I didn't find anything. > * If so, can you please point me to an example? > * Otherwise, I'd be interested in advice about providing new infrastructure > to support >this. I'm a relative noob with respect to the configury code, and I'm > sure my >initial instincts will be wrong. :) I don't know how relevant it is to your requirement, but libstdc++ creates a target-specific $target/bits/c++config.h header for each multilib target, but it installs them alongside the rest of the C++ library headers, not in lib/gcc//. It's done with a bunch of shell commands that takes the autoconf-generated config.h file, combines it with a template file that's in the source repo (libstdc++-v3/include/bits/c++config) and then modifies it with sed. See the ${host_builddir}/c++config.h target in libstdc++-v3/include/Makefile.am for the gory details. The other make targets below it (for gthr-single.h and gthr-posix.h) are also target-specific. Those headers are listed in the ${allcreated} variable which is a prerequisite of the all-local target, and then in the install target they get copied into place.
Re: Installing a generated header file
Thanks for the pointer! I'll have a look at this. Much obliged, Bill On 11/12/20 9:54 AM, Jonathan Wakely wrote: On Thu, 12 Nov 2020 at 15:39, Bill Schmidt via Gcc wrote: Hi! I'm working on a project where it's desirable to generate a target-specific header file while building GCC, and install it with the rest of the target-specific headers (i.e., in lib/gcc//11.0.0/include). Today it appears that only those headers listed in "extra_headers" in config.gcc will be placed there, and those are assumed to be found in gcc/config/. In my case, the header file will end up in my build directory instead. Questions: * Has anyone tried something like this before? I didn't find anything. * If so, can you please point me to an example? * Otherwise, I'd be interested in advice about providing new infrastructure to support this. I'm a relative noob with respect to the configury code, and I'm sure my initial instincts will be wrong. :) I don't know how relevant it is to your requirement, but libstdc++ creates a target-specific $target/bits/c++config.h header for each multilib target, but it installs them alongside the rest of the C++ library headers, not in lib/gcc//. It's done with a bunch of shell commands that takes the autoconf-generated config.h file, combines it with a template file that's in the source repo (libstdc++-v3/include/bits/c++config) and then modifies it with sed. See the ${host_builddir}/c++config.h target in libstdc++-v3/include/Makefile.am for the gory details. The other make targets below it (for gthr-single.h and gthr-posix.h) are also target-specific. Those headers are listed in the ${allcreated} variable which is a prerequisite of the all-local target, and then in the install target they get copied into place.
Re: Installing a generated header file
On Thu, 12 Nov 2020, Bill Schmidt via Gcc wrote: Hi! I'm working on a project where it's desirable to generate a target-specific header file while building GCC, and install it with the rest of the target-specific headers (i.e., in lib/gcc//11.0.0/include). Today it appears that only those headers listed in "extra_headers" in config.gcc will be placed there, and those are assumed to be found in gcc/config/. In my case, the header file will end up in my build directory instead. Questions: * Has anyone tried something like this before? I didn't find anything. * If so, can you please point me to an example? * Otherwise, I'd be interested in advice about providing new infrastructure to support this. I'm a relative noob with respect to the configury code, and I'm sure my initial instincts will be wrong. :) Does the i386 mm_malloc.h file match your scenario? -- Marc Glisse
Re: Installing a generated header file
On 11/12/20 10:06 AM, Marc Glisse wrote: On Thu, 12 Nov 2020, Bill Schmidt via Gcc wrote: Hi! I'm working on a project where it's desirable to generate a target-specific header file while building GCC, and install it with the rest of the target-specific headers (i.e., in lib/gcc//11.0.0/include). Today it appears that only those headers listed in "extra_headers" in config.gcc will be placed there, and those are assumed to be found in gcc/config/. In my case, the header file will end up in my build directory instead. Questions: * Has anyone tried something like this before? I didn't find anything. * If so, can you please point me to an example? * Otherwise, I'd be interested in advice about providing new infrastructure to support this. I'm a relative noob with respect to the configury code, and I'm sure my initial instincts will be wrong. :) Does the i386 mm_malloc.h file match your scenario? Ah, that looks promising indeed, and perhaps very simple! Marc, thanks for the pointer! Bill
Re: Installing a generated header file
On 11/12/20 10:15 AM, Bill Schmidt via Gcc wrote: On 11/12/20 10:06 AM, Marc Glisse wrote: Does the i386 mm_malloc.h file match your scenario? Ah, that looks promising indeed, and perhaps very simple! Marc, thanks for the pointer! And indeed, with this example it was a two-line change to do what I needed. Thanks again. :) Bill
gcc-8-20201112 is now available
Snapshot gcc-8-20201112 is now available on https://gcc.gnu.org/pub/gcc/snapshots/8-20201112/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 8 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-8 revision 6f53dfa9acec588c3c7fb19ab10a286c190045fe You'll find: gcc-8-20201112.tar.xzComplete GCC SHA256=56c1908be7eae6da42a37141217b57fd5a587437c29b74ae7b73f23a98d4a6f0 SHA1=c888b6ef596cef6c4abd872fe10e85bc826cc12a Diffs from 8-20201105 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
CSE deletes valid REG_EQUAL?
Hi all, In PR51505(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51505), Paolo Bonzini added the code to delete REG_EQUAL notes in df_remove_dead_eq_notes: gcc/df-problems.c: df_remove_dead_eq_notes (rtx_insn *insn, bitmap live) { ... case REG_EQUAL: case REG_EQUIV: { /* Remove the notes that refer to dead registers. As we have at most one REG_EQUAL/EQUIV note, all of EQ_USES will refer to this note so we need to purge the complete EQ_USES vector when removing the note using df_notes_rescan. */ df_ref use; bool deleted = false; FOR_EACH_INSN_EQ_USE (use, insn) if (DF_REF_REGNO (use) >= FIRST_PSEUDO_REGISTER && DF_REF_LOC (use) && (DF_REF_FLAGS (use) & DF_REF_IN_NOTE) && !bitmap_bit_p (live, DF_REF_REGNO (use)) && loc_mentioned_in_p (DF_REF_LOC (use), XEXP (link, 0))) { deleted = true; break; } if (deleted) { rtx next; if (REG_DEAD_DEBUGGING) df_print_note ("deleting: ", insn, link); next = XEXP (link, 1); free_EXPR_LIST_node (link); *pprev = link = next; df_notes_rescan (insn); } ... } while I have a test case as below: typedef long myint_t; __attribute__ ((noinline)) myint_t hash_loop (myint_t nblocks, myint_t hash) { int i; for (i = 0; i < nblocks; i++) hash = ((hash + 13) | hash) + 0x66546b64; return hash; } before cse1: 22: L22: 16: NOTE_INSN_BASIC_BLOCK 4 17: r125:DI=r120:DI+0xd 18: r118:DI=r125:DI|r120:DI 19: r126:DI=r118:DI+0x6654 20: r120:DI=r126:DI+0x6b64 REG_EQUAL r118:DI+0x66546b64 21: r119:DI=r119:DI-0x1 23: r127:CC=cmp(r119:DI,0) 24: pc={(r127:CC!=0)?L22:pc} REG_BR_PROB 955630228 The dump in cse1: 16: NOTE_INSN_BASIC_BLOCK 4 17: r125:DI=r120:DI+0xd 18: r118:DI=r125:DI|r120:DI REG_DEAD r125:DI REG_DEAD r120:DI 19: r126:DI=r118:DI+0x6654 REG_DEAD r118:DI 20: r120:DI=r126:DI+0x6b64 REG_DEAD r126:DI 21: r119:DI=r119:DI-0x1 23: r127:CC=cmp(r119:DI,0) 24: pc={(r127:CC!=0)?L22:pc} REG_DEAD r127:CC REG_BR_PROB 955630228 ; pc falls through to BB 6 The output shows "REQ_EQUAL r118:DI+0x66546b64" is deleted by df_remove_dead_eq_notes, but r120:DI is not REG_DEAD here, so is it correct here to check insn use and find that r118:DI is dead then do the delete? Thanks, Xionghu
Re: Order
I've invited you to fill out the following form: Re: Order To fill it out, visit: https://docs.google.com/forms/d/e/1FAIpQLSdvTz-uNrwzYEDRle3NKO8L0HG7h5hasmZNnR2EPGRKB8tXPQ/viewform?vc=0&c=0&w=1&flr=0&usp=mail_form_link I've invited you to fill out a form: Google Forms: Create and analyze surveys.
Re: CSE deletes valid REG_EQUAL?
On 11/12/20 7:02 PM, Xionghu Luo via Gcc wrote: > Hi all, > > In PR51505(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51505), Paolo Bonzini > added the code to delete REG_EQUAL notes in df_remove_dead_eq_notes: > > gcc/df-problems.c: > df_remove_dead_eq_notes (rtx_insn *insn, bitmap live) > { > ... > case REG_EQUAL: > case REG_EQUIV: > { > /* Remove the notes that refer to dead registers. As we have at > most > one REG_EQUAL/EQUIV note, all of EQ_USES will refer to this note > so we need to purge the complete EQ_USES vector when removing > the note using df_notes_rescan. */ > df_ref use; > bool deleted = false; > > FOR_EACH_INSN_EQ_USE (use, insn) > if (DF_REF_REGNO (use) >= FIRST_PSEUDO_REGISTER > && DF_REF_LOC (use) > && (DF_REF_FLAGS (use) & DF_REF_IN_NOTE) > && !bitmap_bit_p (live, DF_REF_REGNO (use)) > && loc_mentioned_in_p (DF_REF_LOC (use), XEXP (link, 0))) > { > deleted = true; > break; > } > if (deleted) > { > rtx next; > if (REG_DEAD_DEBUGGING) > df_print_note ("deleting: ", insn, link); > next = XEXP (link, 1); > free_EXPR_LIST_node (link); > *pprev = link = next; > df_notes_rescan (insn); > } > ... > } > > > while I have a test case as below: > > > typedef long myint_t; > __attribute__ ((noinline)) myint_t > hash_loop (myint_t nblocks, myint_t hash) > { > int i; > for (i = 0; i < nblocks; i++) > hash = ((hash + 13) | hash) + 0x66546b64; > return hash; > } > > before cse1: > >22: L22: >16: NOTE_INSN_BASIC_BLOCK 4 >17: r125:DI=r120:DI+0xd >18: r118:DI=r125:DI|r120:DI >19: r126:DI=r118:DI+0x6654 >20: r120:DI=r126:DI+0x6b64 > REG_EQUAL r118:DI+0x66546b64 >21: r119:DI=r119:DI-0x1 >23: r127:CC=cmp(r119:DI,0) >24: pc={(r127:CC!=0)?L22:pc} > REG_BR_PROB 955630228 > > The dump in cse1: > >16: NOTE_INSN_BASIC_BLOCK 4 >17: r125:DI=r120:DI+0xd >18: r118:DI=r125:DI|r120:DI > REG_DEAD r125:DI > REG_DEAD r120:DI >19: r126:DI=r118:DI+0x6654 > REG_DEAD r118:DI >20: r120:DI=r126:DI+0x6b64 > REG_DEAD r126:DI >21: r119:DI=r119:DI-0x1 >23: r127:CC=cmp(r119:DI,0) >24: pc={(r127:CC!=0)?L22:pc} > REG_DEAD r127:CC > REG_BR_PROB 955630228 > ; pc falls through to BB 6 > > > The output shows "REQ_EQUAL r118:DI+0x66546b64" is deleted by > df_remove_dead_eq_notes, > but r120:DI is not REG_DEAD here, so is it correct here to check insn use and > find that > r118:DI is dead then do the delete? It doesn't matter where the death occurs, any REG_DEAD note will cause the REG_EQUAL note to be removed. So given the death note for r118, then any REG_EQUAL note that references r118 will be removed. This is overly pessimistic as the note may still be valid/useful at some points. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92291 Jeff ps. Note that a REG_EQUAL note is valid at a particular point in the IL -- it is not a function-wide equivalence. So you have to be careful using such values as they can be invalidated by other statements. Contrast to a REG_EQUIV note where the equivalence is global and you don't have to worry about invalidation.
Re: Do we need to do a loop invariant motion after loop interchange ?
On 11/23/19 11:26 PM, Bin.Cheng wrote: > On Fri, Nov 22, 2019 at 3:23 PM Bin.Cheng wrote: >> On Fri, Nov 22, 2019 at 3:19 PM Richard Biener >> wrote: >>> On November 22, 2019 6:51:38 AM GMT+01:00, Li Jia He >>> wrote: On 2019/11/21 8:10 PM, Richard Biener wrote: > On Thu, Nov 21, 2019 at 10:22 AM Li Jia He wrote: >> Hi, >> >> I found for the follow code: >> >> #define N 256 >> int a[N][N][N], b[N][N][N]; >> int d[N][N], c[N][N]; >> void __attribute__((noinline)) >> double_reduc (int n) >> { >> for (int k = 0; k < n; k++) >> { >> for (int l = 0; l < n; l++) >>{ >> c[k][l] = 0; >> for (int m = 0; m < n; m++) >> c[k][l] += a[k][m][l] * d[k][m] + b[k][m][l] * d[k][m]; >>} >> } >> } >> >> I dumped the file after loop interchange and got the following information: >> [local count: 118111600]: >> # m_46 = PHI <0(7), m_45(11)> >> # ivtmp_44 = PHI <_42(7), ivtmp_43(11)> >> _39 = _49 + 1; >> >> [local count: 955630224]: >> # l_48 = PHI <0(3), l_47(12)> >> # ivtmp_41 = PHI <_39(3), ivtmp_40(12)> >> c_I_I_lsm.5_18 = c[k_28][l_48]; >> c_I_I_lsm.5_53 = m_46 != 0 ? c_I_I_lsm.5_18 : 0; >> _2 = a[k_28][m_46][l_48]; >> _3 = d[k_28][m_46]; >> _4 = _2 * _3; >> _5 = b[k_28][m_46][l_48]; >> _6 = _3 * _5; >> _7 = _4 + _6; >> _8 = _7 + c_I_I_lsm.5_53; >> c[k_28][l_48] = _8; >> l_47 = l_48 + 1; >> ivtmp_40 = ivtmp_41 - 1; >> if (ivtmp_40 != 0) >> goto ; [89.00%] >> else >> goto ; [11.00%] >> >> we can see '_3 = d[k_28][m_46];' is a loop invariant. >> Do we need to add a loop invariant motion pass after the loop interchange? > There is one at the end of the loop pipeline. Hi, The one at the end of the loop pipeline may miss some optimization opportunities. If we vectorize the above code (a.c.158t.vect), we can get information similar to the following: bb 3: # m_46 = PHI <0(7), m_45(11)> // loop m, outer loop if (_59 <= 2) goto bb 20; else goto bb 15; bb 15: _89 = d[k_28][m_46]; vect_cst__90 = {_89, _89, _89, _89}; bb 4: # l_48 = PHI // loop l, inner loop vect__6.23_100 = vect_cst__99 * vect__5.22_98; if (ivtmp_110 < bnd.8_1) goto bb 12; else goto bb 17; bb 20: bb 18: _27 = d[k_28][m_46]; if (ivtmp_12 != 0) goto bb 19; else goto bb 21; Vectorization will do some conversions in this case. We can see ‘ _89 = d[k_28][m_46];’ and ‘_27 = d[k_28][m_46];’ are loop invariant relative to loop l. We can move ‘d[k_28][m_46]’ to the front of ‘if (_59 <= 2)’ to get rid of loading data from memory in both branches. The one at at the end of the loop pipeline can't handle this situation. If we move d[k_28][m_46] from loop l to loop m before doing vectorization, we can get rid of this situation. >>> But we can't run every pass after every other. With multiple passes having >>> ordering issues is inevitable. >>> >>> Now - interchange could trigger a region based invariant motion just for >>> the nest it interchanged. But that doesn't exist right now. >> With data reference/dependence information in the pass, I think it >> could be quite straightforward. Didn't realize that we need it >> before. > FYI, attachment is a simple fix in loop interchange for the reported > issue. It's untested, neither for GCC10. > > Thanks, > bin >> Thanks, >> bin >>> Richard. >>> >>> linterchange-invariant-dataref-motion.patch >>> So it looks like Martin and Richi are working on this right now. I'm going to drop this from my queue. jeff