Re: __fp16 is ambiguous error in C++
On Thu, Jun 24, 2021 at 7:26 PM ALO via Gcc wrote: > foo.c: In function '__fp16 foo(__fp16, __fp16)': > foo.c:6:23: error: call of overloaded 'exp(__fp16&)' is ambiguous > 6 | return a + std::exp(b); > | ^ > No, there isn't a solution for this. You might want to try an ARM port clang/gcc to see what they do, but it probably isn't much better than the RISC-V port. Looks like the same gcc result to me with a quick check. And note that only the non-upstream V extension branch for RISC-V has the __fp16 support because the vector extension depends on it. It is hard to argue for changes when the official RISC-V GCC port has no __fp16 support. Kito started a related thread in March, and there was tentative agreement to add _Float16 support to the GCC C++ front end. https://gcc.gnu.org/pipermail/gcc/2021-March/234971.html That may or may not help you. I think it will be difficult to do anything useful here until the C and C++ standards figure out how they want half-float support to work. If we do something before then, it will probably end up incompatible with the official solution and we will end up stuck with a mess. Jim
Re: Default debug format for AVR
On Sat, Apr 3, 2021 at 6:24 PM Simon Marchi via Gcc wrote: > The default debug format (when using only -g) for the AVR target is > stabs. Is there a reason for it not being DWARF, and would it be > possible to maybe consider possibly thinking about making it default to > DWARF? I am asking because the support for stabs in GDB is pretty much > untested and bit-rotting, so I think it would be more useful for > everyone to use DWARF. > I tried to deprecate the stabs support a little over 4 years ago. https://gcc.gnu.org/pipermail/gcc-patches/2017-December/489296.html There was a suggestion to change the error to a warning, but my startup company job kept me so busy I never had a chance to follow up on this. I would like to see the stabs support deprecated and the later removed from gcc. No new features have been added in a long time, and it is only being maintained in the sense that when it fails it is fixed to ignore source code constructs that it doesn't support. The longer it survives in this state, the less useful it becomes. Jim
Re: Having trouble getting my school to sign the copyright disclaimer
On Wed, Mar 31, 2021 at 8:27 AM PKU via Gcc wrote: > I’m trying to get my school to sign the copyright disclaimer. > Unfortunately the officials are reluctant to do that. Can anyone suggest > what to do next? > Maybe the PLCT Lab at the Chinese Academy of Sciences can help. They are doing GCC and LLVM work, and have GCC etc assignments. Even if they can't help get an assignment for your current work, if you want to continue doing GCC work, maybe your next patch can be written for them. They hold regular meetings for people doing GCC/LLVM work in China to meet and discuss their work. The PLCT work is primarily RISC-V focused though. https://github.com/lazyparser/weloveinterns https://github.com/lazyparser/weloveinterns/blob/master/open-internships.md#bj37-gccbinutilsglibclinker-%E5%BC%80%E5%8F%91%E5%AE%9E%E4%B9%A0%E7%94%9F-10%E5%90%8D You might be able to find more appropriate links. I don't actually read Mandarin. This is just a link I happen to know about which has good info about the PLCT Lab. Jim
Re: HELP: MIPS PC Relative Addressing
On Wed, Feb 24, 2021 at 9:30 AM Maciej W. Rozycki wrote: > On Wed, 24 Feb 2021, Jiaxun Yang wrote: > > > For RISC-V, %pcrel_lo shall point to the label of corresponding > %pcrel_hi, > > like > > > > .LA0: > > auipca0, %pcrel_hi(sym) > > addi a0, a0, %pcrel_lo(.LA0) > > I commented on it once, in the course of the FDPIC design project, and I > find it broken by design. Sadly it has made it into the RISC-V psABI and > it is hard to revert at this time, too many places have started relying on > it. > It was already a production ABI before you asked for the change. And changing a production ABI is extremely difficult. You were not the first to complain about this, and you probably won't be the last. Jim
Re: HELP: MIPS PC Relative Addressing
On Wed, Feb 24, 2021 at 6:18 AM Jiaxun Yang wrote: > I found it's very difficult for GCC to generate this kind of pcrel_lo > expression, > RTX label_ref can't be lower into such LOW_SUM expression. > Yes, it is difficult. You need to generate a label, and put the label number in an unspec in the auipc pattern, and then create a label_ref to put in the addi. The fact that we have an unspec and a label_ref means a number of optimizations get disabled, like basic block duplication and loop unrolling, because they can't make a copy of an instruction that uses a label as data, as they have no way to know how to duplicate the label itself. Or at least RISC-V needs to create one label. You probably need to create two labels. There is a far easier way to do this, which is to just emit an assembler macro, and let the assembler generate the labels and relocs. This is what the RISC-V GCC port does by default. This prevents some optimizations like scheduling the two instructions, but enables some other optimizations like loop unrolling. So it is a tossup. Sometimes we get better code with the assembler macro, and sometimes we get better code by emitting the auipc and addi separately. The RISC-V gcc port can emit the auipc/addi with -mexplicit-relocs -mcode-model=medany, but this is known to sometimes fail. The problem is that if you have an 8-byte variable with 8-byte alignment, and try to load it with 2 4-byte loads, gcc knows that offset+4 must be safe from overflow because the data is 8-byte aligned. However, when you use a pc-relative offset that is data address-code address, the offset is only as aligned as the code is. RISC-V has 2-byte instruction alignment with the C extension. So if you have offset+4 and offset is only 2-byte aligned, it is possible that offset+4 may overflow the add immediate field. The same thing can happen with 16-byte data that is 16-byte aligned, accessed with two 8-byte loads. There is no easy software solution. We just emit a linker error in that case as we can't do anything else. I think this would work better if auipc cleared some low bits of the result, in which case the pc-relative offset would have enough alignment to prevent overflow when adding small offsets, but it is far too late to change how the RISC-V auipc works. If it looks infeasible for GCC side, another option would be adding > RISC-V style > %pcrel_{hi,lo} modifier at assembler side. We can add another pair of > modifier > like %pcrel_paired_{hi,lo} to implement the behavior. Would it be a good > idea? > I wouldn't recommend following the RISC-V approach for the relocation. Jim
RISC-V -menable-experimental-extensions option
I'm not aware of any other target that has a similar feature, so I thought a bit of discussion first might be useful. For most ISAs, there is one organization that owns it, and does development internally, in private. For RISC-V, the ISA is owned by RISC-V International which has no developers. The development all happens externally, in public, spread across at least a dozen different organizations. So we have the problem of coordinating this work, especially for draft versions of extensions. So we would like to add support for draft extensions to mainline, controlled by a -menable-experimental-extensions option. For features enabled by this option, there would be no guarantee that the next compiler release is compatible with the previous one, since the draft extension may change in incompatible ways. LLVM already has support for this option. http://lists.llvm.org/pipermail/llvm-dev/2020-January/138364.html https://reviews.llvm.org/D73891 We are still discussing the details of how this will work. We may want to limit this to 'stable" draft extensions, and put the unstable drafts on a vendor branch. We have been doing work on branches in the github.com riscv tree, but there are issues with tracking who has copyright assignments, issues with identifying who exactly a github user actually is, and issues with getting the right set of people right access to the trees. These won't be problems if we are using the FSF trees instead. We want this draft extension support on mainline for the same reasons that the LLVM developers do, to ensure that everyone is working in the same branch in the upstream tree. And it is easiest to do that if that branch is mainline. This is just a binutils and gcc proposal at the moment, but we might need something on the gdb side later, like a set riscv experimental-extensions 1 or whatever command to enable support for draft extensions. Jim
Re: Wrong insn scheduled by Sched1 pass
On Mon, Nov 2, 2020 at 11:45 PM Jojo R wrote: > From origin insn seqs, I think the insn 'r500=unspec[r100] 300’ is in > Good place because of the bypass of my pipeline description, it is not > needed to schedule. > ... > Is there any way to control my case ? > Or my description of pipeline is not good ? > I would suggest looking at verbose scheduler debugging dumps to see exactly what decisions the scheduler is making. See the -fsched-verbose=X option, and give it a value of at least 9 as I think that is the highest supported value. This will put a lot of info in the scheduler rtl dumps that will help you understand what the scheduler is doing. You can then use that info to try to figure out how to tweak your port to get the result you want. The problem may not be in the pipeline description file. You might need to define some macros. See the list of TARGET_SCHED_* macros you can use to control how the scheduler works. Jim
Re: Is there a way to tell GCC not to reorder a specific instruction?
On Wed, Sep 30, 2020 at 11:35 PM Richard Biener wrote: > On Wed, Sep 30, 2020 at 10:01 PM Jim Wilson wrote: > > We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that > > we are using for testing the vector support. > > That doesn't seem to exist (but maybe it's just not on trunk yet). The vector extension is still in draft form, and they are still making major compatibility breaks. There was yet another one about 3-4 weeks ago. I don't want to upstream anything until we have an officially accepted V extension, at which point they will stop allowing compatibility breaks. If we upstream now, we would need some protocol for how to handle unsupported experimental patches in mainline, and I don't think that we have one. So for now, the vector support is on a branch in the RISC-V International github repo. https://github.com/riscv/riscv-gnu-toolchain/tree/rvv-intrinsic The gcc testcases specifically are here https://github.com/riscv/riscv-gcc/tree/riscv-gcc-10.1-rvv-dev/gcc/testsuite/gcc.target/riscv/rvv A lot of the testcases use macros so we can test every variation of an instruction, and there is a large number of variations for most instructions, so most of these testcases aren't very readable. They are just to verify that we can generate the instructions we expect. Only the algorithm ones are readable, like saxpy, memcpy, strcpy. Jim
Re: Is there a way to tell GCC not to reorder a specific instruction?
On Tue, Sep 29, 2020 at 11:40 PM Richard Biener wrote: > But this also doesn't work on GIMPLE. On GIMPLE riscv_vlen would > be a barrier for code motion if you make it __attribute__((returns_twice)) > since then abnormal edges distort the CFG in a way preventing such motion. At the gimple level, all vector operations have an implicit vsetvl, so it doesn't matter much how they are sorted. As long as they don't get sorted across an explicit vsetvl that they depend on. But the normal way to use explicit vsetvl is to control a loop, and you can't move dependent operations out of the loop, so it tends to work. Setting vsetvl in the middle of a basic block is less useful and less common, and very unlikely to work unless you really know what you are doing. Basically, RISC-V wasn't designed to work this way, and so you probably shouldn't be writing your code this way. There might be edge cases where we aren't handling this right, as we aren't writing code this way, and hence we aren't testing this support. This is still a work in progress. Good RVV code should look more like this: #include #include void saxpy(size_t n, const float a, const float *x, float *y) { size_t l; vfloat32m8_t vx, vy; for (; (l = vsetvl_e32m8(n)) > 0; n -= l) { vx = vle32_v_f32m8(x); x += l; vy = vle32_v_f32m8(y); // vfmacc vy = a * vx + vy; vse32_v_f32m8(y, vy); y += l; } } We have a lot of examples in gcc/testsuite/gcc.target/riscv/rvv that we are using for testing the vector support. Jim
Re: Is there a way to tell GCC not to reorder a specific instruction?
On Tue, Sep 29, 2020 at 7:22 PM 夏 晋 wrote: > vint16m1_t foo3(vint16m1_t a, vint16m1_t b){ > vint16m1_t add = a+b; > vint16m1_t mul = a*b; > vsetvl_e8m1(32); > return add + mul; > } Taking another look at your example, you have type confusion. Using vsetvl to specify an element width of 8 does not magically convert types into 8-bit vector types. They are still 16-bit vector types and will still result in 16-bit vector operations. So your explicit vsetvl_e8m1 is completely useless. In the RISC-V V scheme, every vector operation emits an implicit vsetvl instruction, and then we optimize away the redundant ones. So the add and mul at the start are emitting two vsetvl instructions. Then you have an explicit vsetvl. Then another add, which will emit another implicit vsetvl. The compiler reordered the arithmetic in such a way that two of the implicit vsetvl instructions can be optimized away. That probably happened by accident. But we don't have support for optimizing away the useless explicit vsetvl, so it remains. Jim
Re: Is there a way to tell GCC not to reorder a specific instruction?
On Tue, Sep 29, 2020 at 3:47 AM 夏 晋 via Gcc wrote: > I tried to set the "vlen" after the add & multi, as shown in the following > code: > vf32 x3,x4; > void foo1(float16_t* input, float16_t* output, int vlen){ > vf32 add = x3 + x4; > vf32 mul = x3 * x4; > __builtin_riscv_vlen(vlen); //< > storevf(&output[0], add); > storevf(&output[4], mul); > } Not clear what __builtin_riscv_vlen is doing, or what exactly your target is, but the gcc port I did for the RISC-V draft V extension creates new fake vector type and vector length registers, like the existing fake fp and arg pointer registers, and the vsetvl{i} instruction sets the fake vector type and vector length registers, and all vector instructions read the fake vector type and vector length registers. That creates the dependence between the instructions that prevents reordering. It is a little more complicated than that, as you can have more than one vsetvl{i} instruction setting different vector type and/or vector length values, so we have to match on the expected values to make sure that vector instructions are tied to the right vsetvl{i} instruction. This is a work in progress, but overall it is working pretty well. This requires changes to the gcc port, as you have to add the new fake registers in gcc/config/riscv/riscv.h. This isn't something you can do with macros and extended asms. See for instance https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/Krhw8--wmi4/m/-3IPvT7JCgAJ Jim
Re: New pseudos in splitters
On Wed, Sep 23, 2020 at 7:51 AM Ilya Leoshkevich via Gcc wrote: > Is this restriction still valid today? Is there a reason we can't > introduce new pseudos in a splitter before LRA? See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91683 for an example of what can go wrong when a splitter creates a new pseudo. I think there was another one I fixed around the same time that failed for a different reason, but don't have time to look for it. Jim
Re: How to forbid register allocator to overlap bewteen DEST and SOURCE
On Wed, Jul 1, 2020 at 8:40 PM wrote: > GCC seems to overlap register bewteen DEST and SOURCE in different > machine mode, > Is there any target hooks to control this feature ? > I use ‘&’ to forbid register allocator to > overlap bewteen DEST and SOURCE, > but there are some redundancy instructions in the result code :( & is the correct solution in general. Presumably this is about your draft v0.7.1 vector port. This port uses an unspec in every pattern. This limits the compiler's ability to optimize code. You might get better results if you eliminated as many of the unspecs as you can. You might want to check TARGET_MODES_TIEABLE_P though this is mostly about casts and moves not register allocation. Jim
Re: sign_and_send_pubkey: signing failed: agent refused operation
On Mon, Jun 1, 2020 at 3:33 PM Martin Sebor via Gcc wrote: > So it sounds like you wouldn't expect the "agent refused operation" > error either, and it's not just a poor error message that I should > learn to live with. That makes me think I should try to figure out > what's wrong. I think the ~/.ssh/ contents are pretty standard: My experience with Ubuntu 18.04 is that 2K bit keys aren't accepted by something (gnome UI?) anymore. I had to upgrade to 4K bit keys. Though oddly ssh-keygen still generates 2K bit keys by default even though they won't be accepted by the gnome UI (or whatever). The work around is to run ssh-add manually to register your 2K bit key, because ssh-add will still accept 2K bit keys, and then ssh will work, and can be used to install a 4K bit public key on the other side, and then things will work normally again. A web search suggested that there was some security problem with 2K bit keys and apparently they are trying to force people to upgrade, but the inconsistent approach here between different packages makes this confusing as to what is actually going on. Jim
Re: `insn does not satisfy its constraints` when compiling a simple program.
On Sat, Apr 18, 2020 at 8:45 AM Joe via Gcc wrote: > test.c: In function ‘main’: > test.c:5:1: error: insn does not satisfy its constraints: The constrain_operands function is failing to match the insn to its constraints. Try putting a breakpoint there, and stepping through the code to see what is going wrong. The function is likely called many times, so you might need to figure out which call is the failing one first and only step through that one. Jim
Re: Modifying RTL cost model to know about long-latency loads
On Thu, Apr 16, 2020 at 7:28 PM Sasha Krassovsky wrote: > @Jim I saw you were from SiFive - I noticed that modifying the costs for > integer multiplies in the riscv_tune_info structs didn’t affect the generated > code. Could this be why? rtx_costs is used for instruction selection. For instance, choosing whether to use a shift and add sequence as opposed to a multiply depends on rtx_cost. rtx_cost is not used for instruction scheduling. This uses the latency info from the pipeline model, e.g. generic.md. It looks like I didn't read your first message closely enough and should have mentioned this earlier. Changing multiply rtx_cost does affect code generation. Just try a testcase multiplying by a number of small prime factors, and you will see that which ones use shift/add and which ones use multiply depends on the multiply cost in the riscv_tune_info structs. This also factors into the optimization that turns divide by constant into a multiply. When this happens depends on the relative values of the multiply cost and the divide cost. Jim
Re: Modifying RTL cost model to know about long-latency loads
On Sat, Apr 11, 2020 at 4:28 PM Sasha Krassovsky via Gcc wrote: > I’m currently modifying the RISC-V backend for a manycore processor where > each core is connected over a network. Each core has a local scratchpad > memory, but can also read and write other cores’ scratchpads. I’d like to add > an attribute to give a hint to the optimizer about which loads will be remote > and therefore longer latency than others. GCC has support for the proposed named address space extension to the ISO C standard. You may be able to use this instead of defining your own attributes. I don't know if this helps with the rtx cost calculation though. This is mostly about support for more than one address space. See "Named Address Spaces" in the gcc internals docs, and the *_ADDR_SPACE_* stuff in the sources. The problem may be one similar to what Alan Modra mentioned. I would suggest stepping through the cost calculation code in a debugger to see what is happening. Jim
Re: Not usable email content encoding
I'm one of the old timers that likes our current work flow, but even I think that we are risking our future by staying with antiquated tools. One of the first things I need to teach new people is now to use email "properly". It is a barrier to entry for new contributors, since our requirements aren't how the rest of the world uses email anymore. LLVM has phabricator. Some git based projects are using gerrit. Github and gitlab are useful services. We need to think about setting up easier ways for people to submit patches, rather than trying to fix all of the MUAs and MTAs in the world. Jim
Re: Update on SVE/sizeless types for C and C++
On Tue, Nov 12, 2019 at 2:12 PM Richard Sandiford wrote: > Are both RVV intrinsic proposals like SVE in that all sizeless types > can be/are built into the compiler? If so, do you think the target hook > added in: > https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00942.html > would be enough for RVV too? Or do the RVV proposals require support > for user-defined sizeless types? We only have built-in types. I think we have 54 of them, 32 integer, 12 float, and 10 mask. I hadn't thought about user-defined sizeless types, and hope that I don't have to support that. > If the hook is enough, I guess there are three ways we can go: > (1) Add hooks for both targets, with similar functionality. This means > a certain amount of cut-&-paste but also allows for more specific > error messages. I think this would be OK. I took a quick look at your patch. I'm a little surprised that you can't support alignof on a vector type, I would think that depends on the base type for the vector, but maybe this is a difference between SVE and RVV, or maybe I just haven't gotten far enough to find the problem yet. Otherwise it looks like this would also work for the RVV support. Jim
Re: Update on SVE/sizeless types for C and C++
On Tue, Nov 12, 2019 at 8:06 AM Richard Sandiford wrote: > If the use of sizeless types does expand beyond SVE built-in types > in future, the places that call the hook are the places that would > need to deal directly with sizeless types. We are using the same sizeless type infrastructure for the RISC-V vector extension work. The RVV extension is still in draft form and still evolving. The software is only in prototype form at the moment. We don't have an ABI yet. We have at least two competing proposals for the intrinsics based programming model. We don't have auto-vectorization support yet. Etc. But SiFive has been working on gcc patches for one of the intrinsics proposals, and EPI (European Processor Initiative) has been working on llvm patches for another intrinsics proposal, and both of these are using sizeless types. RVV has a similar design to ARM SVE where the size of types depends on the hardware you are running on, and those sizes can change at run-time, where they can be different from one loop iteration to the next. Jim
Re: gcc vs clang for non-power-2 atomic structures
I was pointed at https://bugs.llvm.org/show_bug.cgi?id=26462 for the LLVM discussion of this problem. Another issue here is that we should have ABI testing for atomic. For instance, gcc/testsuite/gcc.dg/compat has no atomic testcases. Likewise g++.dg/compat. Jim
gcc vs clang for non-power-2 atomic structures
We got a change request for the RISC-V psABI to define the atomic structure size and alignment. And looking at this, it turned out that gcc and clang are implementing this differently. Consider this testcase rohan:2274$ cat tmp.c #include struct s { int a; int b; int c;}; int main(void) { printf("size=%ld align=%ld\n", sizeof (struct s), _Alignof(struct s)); printf("size=%ld align=%ld\n", sizeof (_Atomic (struct s)), _Alignof(_Atomic (struct s))); return 0; } rohan:2275$ gcc tmp.c rohan:2276$ ./a.out size=12 align=4 size=12 align=4 rohan:2277$ clang tmp.c rohan:2278$ ./a.out size=12 align=4 size=16 align=16 rohan:2279$ This is with an x86 compiler. I get the same result with a RISC-V compiler. This is an ABI incompatibility between gcc and clang. gcc has code in build_qualified_type in tree.c that sets alignment for power-of-2 structs to the same size integer alignment, but we don't change alignment for non-power-of-2 structs. Clang is padding the size of non-power-of-2 structs to the next power-of-2 and giving them that alignment. Unfortunately, I don't know who to contact on the clang side, but we need to have a discussion here, and we probably need to fix one of the compilers to match the other one, as we should not have ABI incompatibilities like this between gcc and clang. The original RISC-V bug report is at https://github.com/riscv/riscv-elf-psabi-doc/pull/112 There is a pointer to a gist with a larger testcase with RISC-V results. Jim
Re: gcc/config/arch/arch.opt: Option mask gen problem
On Mon, Jul 22, 2019 at 4:05 AM Maxim Blinov wrote: > Is it possible, in the arch.opt file, to have GCC generate a bitmask > relative to a user-defined variable without an associated name? To > illustrate my problem, consider the following option file snippet: > ... > But, I don't want the user to be able to pass "-mbmi-zbb" or > "-mno-bmi-zbb" on the command line: If you don't want an option, why are you making changes to the riscv.opt file? This is specifically for supporting command line options. Adding a variable here does mean that it will automatically be saved and restored, and I can see the advantage of doing that, even if it is only indirectly tied to options. You could add a variable here, and then manually define the bitmasks yourself in riscv-opt.h or riscv.h. Or you could just add the variable to the machine_function struct in riscv.c, which will also automatically save and restore the variable. Jim
Re: [PATCH] Deprecate ia64*-*-*
On Thu, Jun 13, 2019 at 10:39 AM Joel Sherrill wrote: > Ok with me if no one steps up and the downstream projects like Debian gets > notice. This is just a reflection of this architecture's status in the > world. I sent email to the debian-ia64 list half an hour ago. Just got a response. They mentioned that there is also a gentoo group that I didn't know about, and want to know why exactly we want to deprecate it. I can discuss it with them. Jim
Re: [PATCH] Deprecate ia64*-*-*
On Thu, 2019-06-13 at 09:09 -0600, Jeff Law wrote: > On 6/13/19 5:13 AM, Richard Biener wrote: > > > > ia64 has no maintainer anymore so the following deprecates it > > with the goal of eliminating the port for GCC 11 if no maintainer > > steps up. OK with me since I'm not the maintainer anymore. > Works for me. James Clarke has been fixing small stuff recently, not > sure if he wants to step into a larger role though. There are 3 of them. See https://wiki.debian.org/Ports/ia64 It might be useful to send an email to the debian-ia64 list to notify them. > ia64 has been failing the qsort checking in the scheduler since the > day > that checking was introduced -- it shows up building the kernel IIRC. > Nobody's shown any interest in addressing those issues :-) I tried looking at this once. It looked more difficult than I was willing to do for IA-64. There are so many checks in the qsort compare functions that I don't think you can get a stable sort there. I think this is really a sel-sched problem not an IA-64 problem, but the IA-64 port is tied to sel-sched and may not work well without it. Most other ports aren't using it by default. Jim
Re: Dejagnu output size limit and flaky test ( c-c++-common/builtins.c -Wc++-compat )
On 5/8/19 3:34 AM, Matthew Malcomson wrote: The cause seems to be a restriction in dejagnu where it stops reading after a given read if its output buffer is greater than 512000 bytes. This dejagnu restriction was removed in 2016. Try using a newer dejagnu release. 2016-03-27 Ben Elliston * lib/remote.exp (standard_wait): Append any trailing characters to $output that may be still in $expect_out(buffer) when eof is matched. Remove arbitrary limitation in the ".+" matching case, similar to the change to local_exec on 2016-02-17. Jim
Re: Please help!!!
On Mon, May 6, 2019 at 6:02 AM Алексей Хилаев via gcc wrote: > Gcc riscv won`t emit my insns, binutils and spike(riscv sim) work correctly, > but gcc don`t. I want to add min/max for integer, gcc compiling correct, sim > executing correctly. > (define_insn "*min_" > [(set (match_operand:GPR 0 "register_operand" "=r") > (smin:GPR (match_operand:X 1 "register_operand" " r") > (match_operand:X 2 "register_operand" " r")))] > "" > "min\t%0,%1,%2" > [(set_attr "type" "move") > (set_attr "mode" "")]) You must have patterns named sminXi3 where X can be s and/or d. Likewise for smaxXi3. Once the named patterns exist, then gcc will automatically call the named patterns to generate RTL when appropriate. Then later passes like combine can create new RTL from the min/max pattern RTL. See for instance how the existing FP min/max patterns work. The pattern name is important. You might also consider adding uminXi3 and umaxXi3 patterns. You can find a list of supported named patterns in the gcc docs. Also note that the RTL that you generate must look sensible. You have a smin:GPR operation that is accepting Xmode operands which is not OK. The modes must match. You can use sign_extend/zero_extend to sign/zero extend a smaller mode to a larger mode, and subreg to reduce a larger mode to a smaller one. These will have to be separate patterns. But once you have the basic smin/smax patterns, combine can create the sign_extend/whatever versions for you. See for instance how the addsi3 and addsi3_extend* patterns work. Jim
Re: RISC-V sibcall optimization with save-restore
On 3/20/19 5:25 AM, Paulo Matos wrote: I am working on trying to get RISC-V 32 emitting sibcalls even in the present of `-msave-restore`, for a client concerned with generated code size. This won't work unless you define a new set of restore functions. The current ones restore the return address from the stack and return, which is wrong if you want to do a sibcall. This is why we tail call (jump to) the restore functions, because the actual function return is in the restore functions. You will need a new set of restore functions that restore regs without restoring the ra. You then probably also then need other cascading changes to make this work. The new set of restore functions will then increase code size a bit offsetting the gain you get from using them. You would have to have enough sibling calls that can use -msave-restore to make this worthwhile. It isn't clear if this would be a win or not. I thought I was on the right path until I noticed that the CFG is messed up because of assumptions related to emission of sibcall instead of a libcall until the epilogue is expanded. During the pro_and_epilogue pass I get an emergency dump and a segfault: gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c:11:1: error: in basic block 2: gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c:11:1: error: flow control insn inside a basic block (jump_insn 24 23 6 2 (parallel [ (return) (use (reg:SI 1 ra)) (const_int 0 [0]) ]) "gcc/gcc/testsuite/gcc.target/riscv/save-restore-1.c":11:1 -1 (nil)) If you look at the epilogue code, you will see that it emits a regular instruction which hides the call to the restore routine, and then it emits a special fake return insn that doesn't do anything. You can just stop emitting the special fake return insn in this case. This of course assumes that you have a new set of restore functions that actually return the caller, instead of the caller's parent. One of the issues with -msave-restore is that the limited offset ranges of calls and branches means that if you don't have a tiny program then each save/restore call/jump is probably an auipc/lui plus the call/tail, which limits the code size reduction you get from using it. If you can control where the -msave-restore routines are placed in memory, then putting them near address 0, or near the global pointer address, will allow linker relaxation to optimize these calls/jumps to a single instruction. This will probably help more than trying to get it to work with sibling calls. If you can modify the hardware, you might try adding load/store multiple instructions and using that instead of the -msave-restore option. I don't know if anyone has tried this yet, but it would be an interesting experiment that might result in smaller code size. Jim
Re: riscv64 dep. computation
On Thu, Feb 14, 2019 at 11:33 PM Paulo Matos wrote: > Are global variables not supposed to alias each other? > If I indeed do that, gcc still won't group loads and stores: > https://cx.rv8.io/g/rFjGLa I meant something like struct foo_t x, y; and now they clearly don't alias. As global pointers they may still alias. Jim
Re: riscv64 dep. computation
On 2/14/19 3:13 AM, Paulo Matos wrote: If I compile this with -O2, sched1 groups all loads and all stores together. That's perfect. However, if I change TYPE to unsigned char and recompile, the stores and loads are interleaved. Further investigation shows that for unsigned char there are extra dependencies that block the scheduler from grouping stores and loads. The ISO C standard says that anything can be casted to char *, and char * can be casted to anything. Hence, a char * pointer aliases everything. If you look at the alias set info in the MEMs, you can see that the char * references are in alias set 0, which means that they alias everything. The short * references are in alias set 2 which means they only alias other stuff in alias set 2. The difference here is that short * does not alias the structure pointers, but char * does. I haven't tried debugging your example, but this is presumably where the difference comes from. Because x and y are pointer parameters, the compiler must assume that they might alias. And because char * aliases everything, the char references alias them too. If you change x and y to global variables, then they no longer alias each other, and the compiler will schedule all of the loads first, even for char. Jim
Re: Replacing DejaGNU
On 1/14/19 5:44 AM, MCC CS wrote: I've been running the testsuite on my macOS, on which it is especially unbearable. I want to (at least try to) rewrite a DejaGNU replacement accepting the same syntax and having no dependency, should therefore be faster. I was wondering if there have been any attempts on this? CodeSourcery wrote one called qmtest, but there apparently hasn't been any work done on it in a while. Joseph Myers indirectly referred to it. You can find a copy here https://github.com/MentorEmbedded/qmtest It used to be possible to run the gcc testsuite using qmtest, but I don't know the current status. I do see that there is still a qmtest-g++ makefile rule for running the G++ testsuite via qmtest though. You could try that and see if it still works. There is so much stuff that depends on dejagnu that replacing it will be difficult. Jim
Re: how to build and test uClinux toolchains
On 10/16/18 7:19 AM, Christophe Lyon wrote: While reviewing one of my patches about FDPIC support for ARM, Richard raised the concern of testing the patch on other uClinux targets [1]. I looked at uclinux.org and at the GCC maintainers file, but it's still not obvious to me which uClinux targets are currently supported? You should try asking the uclinux developers. I tried looking at uclinux, and as far as I can tell, the best supported targets are arm and m68k/coldfire. crosstools-ng only supports one uclinux target for instance, which is m68k. qemu has m68k support, so you could try that. The other uclinux ports seem to be one time efforts with no long term maintenance. I see a lot of dead links on the uclinux.org site, and a lot of stuff that hasn't been updated since 2004. I see that buildroot has obvious blackfin (bfin), m68k, and xtensa uclinux support. But blackfin.uclinux.org says the uclinux port was deprecated in 2012. m68k as mentioned above should be usable. It appears that xtensa uclinux is still alive and usable. http://wiki.linux-xtensa.org/index.php/UClinux There may be other uclinux targets that are usable but don't have obvious patches to enable them. Jim
Re: Cannot compile using cc1.
On 10/06/2018 06:07 AM, Tejas Joshi wrote: I have gcc source code, stage1-build and test directories as siblings and I've been trying to compile test.c in test/ using: ../stage1-build/gcc/cc1 test.c That isn't expected to work. You need to use the compiler driver, which is called xgcc in the build dir, and pass an option to let it know where the cc1 binary is. So this should instead be ../stage1-build/gcc/xgcc -B../stage1-build/gcc/ test.c The trailing slash on the -B option path is important. If that doesn't work, then you may have configured your gcc tree wrong. Some operating systems require specific configure options to be used to get a working compiler. You can see the configure options used by the default compiler by using "/usr/bin/gcc -v". Debian/Ubuntu require --enable-multiarch for instance, and the compiler build may not succeed if that configure option is missing. If you want to run cc1 directly, you may need to pass in extra default options that the compiler driver normally passes to it. You can see these options by passing the -v option to the gcc driver while compiling a file. E.g. running "../stage1-build/gcc/xgcc -B../stage1-build/gcc/ -v test.c" and looking at the cc1 line will show you the options you need to pass to cc1 to make it work. Jim
Re: section attribute of compound literals
On Fri, Sep 14, 2018 at 7:44 AM Jason A. Donenfeld wrote: > Assuming this is an array of a huge amount of > chacha20poly1305_testvec, I'm not sure if there's a syntax for me to > define the symbol inline with the declarations. Any ideas? Don't do it inline. u8[] __stuffdata key_value = ... ... .key = key_value Jim
Re: section attribute of compound literals
On 09/10/2018 10:46 PM, Jason A. Donenfeld wrote: Hello, I'd like to have a compound literal exist inside a certain linker section. However, it doesn't appear to work as I'd like: #define __stuffdata __attribute__((__section__("stuff"))) const u8 works[] __stuffdata = { 0x1, 0x2, 0x3, 0x4 }; const u8 *breaks = (const u8[] __stuffdata){ 0x1, 0x2, 0x3, 0x4 }; Attribute section applies to symbols not to types, so you can't use it in a cast. In order to access data in another section, we need an address for it, and symbols have an address, but types do not. The compound literal could have an address, but we don't have a way to attach attributes to compound literals. Since your testcase already has a symbol in the right section, you could just use that to initialize breaks. const u8 *breaks = works; This means defining a bunch of extra symbols, but is a potential solution to your problem. Jim
Re: Error from dwarf2cfi.c in gcc vers 7.2.0
On 08/12/2018 02:38 PM, Dave Pitts wrote: I've been hacking with version 7.2.0 of gcc trying to adapt some old md files that I've got to this newer gcc. I've been getting errors from the dwarf2out_frame_debug_expr() function in dwarf2cfi.c line 1790 calling gcc_unreachable(). The expression being processed is a SET. The src operand is a MEM reference and the dest operand is a REG reference. If I run the compiler with the option -fno-asynchronous-unwind-tables I do NOT get the error and the generated code looks reasonable. (set (REG) (MEM)) is not one of the patterns handled by dwarf2out_frame_debug_expr. If this is an epilogue instruction to restore a register, then it should have a REG_CFA_RESTORE regnote which takes you to dwarf2out_frame_debug_cfa_restore instead. If you are missing REG_CFA_RESTORE regnotes, then you are probably missing other CFA related regnotes too. If this is not an epilogue register restore instruction, then you need to figure out why it was marked as frame related, and figure out what should have been done instead. Jim
Re: gcov questions
On 08/09/2018 02:38 AM, daro...@o2.pl wrote: Hello, I wanted to ask what model for branch coverage does gcov use? There is a comment at the start of gcc/profile.c that gives some details on how it works. It is computing execution counts for edges in the control flow graph. As for which edges get instrumented, basically, you construct a control flow graph, create a minimal spanning tree to cover the graph, and then you only need to instrument the edges not on the spanning tree, plus the function entry point. You can compute the rest of the edge counts from that. Then there are some tricks to improve efficiency by putting frequently executed edges on the minimal spanning tree, so that infrequently edges get instrumented. Gcov was originally written in 1990, based on an idea that came from Knuth's Art of Computer Programming. Ball & Larus wrote a nice paper in 1994 that does a good job of covering the methods used, though they may not have been aware of gcov at the time as it hadn't been accepted into GCC yet. This is "Optimally Profiling and Tracing Programs" TOPLAS July 1994. I don't know if there are free copies of that available. There may be better references available now, as these techniques are pretty widely known nowadays Jim
Re: decrement_and_branch_until_zero pattern
On Fri, Jun 8, 2018 at 1:12 PM, Paul Koning wrote: > Thanks. I saw those sections and interpreted them as support for signal > processor style fast hardware loops. If they can be adapted for dbra type > looping, great. I'll give that a try. The rs6000 port uses it for bdnz (branch decrement not zero) for instance, which is similar to the m68k dbra. > Meanwhile, yes, it looks like there is a documentation bug. I can clean that > up. It's more than a few lines, but does that qualify for an "obvious" > change? I think the obvious rule should only apply to trivial patches, and this will require some non-trivial changes to fix the looping pattern section. Just deleting the decrement_and_branch_until_zero named pattern section looks trivial. It looks like the REG_NONNEG section should mention the doloop_end pattern instead of decrement_and_branch_until_zero, since I think the same rule applies that they only get generated if the doloop_end pattern exists. Jim
Re: decrement_and_branch_until_zero pattern
On 06/08/2018 06:21 AM, Paul Koning wrote: Interesting. The ChangeLog doesn't give any background. I suppose I should plan to approximate the effect of this pattern with a define-peephole2 ? The old RTL loop optimizer was replaced with a new RTL loop optimizer. When the old one was written, m68k was a major target, and the dbra optimization was written for it. When the new one was written, m68k was not a major target, and this support was written differently. We now have doloop_begin and doloop_end patterns that do almost the same thing, and can be created by the loop-doloop.c code. There is a section in the internals docs that talks about this. https://gcc.gnu.org/onlinedocs/gccint/Looping-Patterns.html The fact that we still have decrement_and_branch_until_zero references in docs and target md files looks like a bug. The target md files should use doloop patterns instead, and the doc references should be dropped. Jim
Re: RISC-V ELF multilibs
On Thu, May 31, 2018 at 7:23 AM, Matthew Fortune wrote: > I do actually have a solution for this but it is not submitted upstream. > MIPS has basically the same set of problems that RISC-V does in this area > and in an ideal world there would be no 'fallback' multilib such that if > you use compiler options that map to a library variant that does not > exist then the linker just fails to find any libraries at all rather than > using the default multilib. > > I can share the raw patch for this and try to give you some idea about how > it works. I am struggling to find time to do much open source support at > the moment so may not be able to do all the due diligence to get it > committed. Would you be willing to take a look and do some of the work to > get it in tree? I have a long list of things on my to do list. RISC-V is a new target, and there is lots of stuff that needs to be bug fixed, finished, or added. I can't make any guarantees. But if you file a bug report and then attach a patch to it, someone might volunteer to help finish it. Or if it is too big to be reasonably attached to a bug report (like the nano mips work) you could put it on a branch, and mention the branch name as unfinished work in a bug report. Jim
Re: RISC-V problem with weak function references and -mcmodel=medany
On Tue, May 29, 2018 at 11:43 AM, Sebastian Huber wrote: > would you mind trying this with -Ttext=0x9000? This gives me for the weak call 9014: 7097 auipc ra,0x7 9018: fec080e7 jalr -20(ra) # 0 <__global_pointer$+0x6fffe7d4> > Please have a look at: > https://sourceware.org/bugzilla/show_bug.cgi?id=23244 > https://sourceware.org/ml/binutils/2018-05/msg00296.html OK. I'm still catching up on mailing lists after the US holiday weekend. Jim
Re: RISC-V problem with weak function references and -mcmodel=medany
On 05/29/2018 04:19 AM, Sebastian Huber wrote: Changing the code to something like this void f(void) __attribute__((__weak__)); void _start(void) { void (*g)(void) = f; if (g != 0) { (*g)(); } } This testcase works for me also, using -mcmodel=medany -O tmp.c -Ttext=0x8000 -nostdlib -nostartfiles. I need enough info to reproduce your problem in order to look at it. One thing you can try is adding -Wl,--noinhibit-exec, which will produce an executable even though there was a linker error, and then you can disassemble the binary to see what you have for the weak call. That might give a clue as to what is wrong. Why doesn't the RISC-V generate a trampoline code to call far functions? RISC-V is a new target. The answer to questions like this is that we haven't needed it yet, and hence haven't implemented it yet. But I don't see any need for trampolines to support a call to 0. We can reach anywhere in the low 32-bit address space with auipc/jalr. We can also use zero-relative addressing via the x0 register if necessary. We already have some linker relaxation support for that, but it doesn't seem to be triggering for this testcase. Jim
Re: RISC-V problem with weak function references and -mcmodel=medany
On 05/28/2018 06:32 AM, Sebastian Huber wrote: I guess, that the resolution of the weak reference to the undefined symbol __deregister_frame_info somehow sets __deregister_frame_info to the absolute address 0 which is illegal in the following "call __deregister_frame_info"? Is this construct with weak references and a -mcmodel=medany supported on RISC-V at all? Yes. It works for me. Given a simple testcase extern void *__deregister_frame_info (const void *) __attribute__ ((weak)); void * foo; int main (void) { if (__deregister_frame_info) __deregister_frame_info (foo); return 0; } and compiling with -mcmodel=medany -O -Ttext=0x8000, I get 8158: 8097auipc ra,0x8 815c: ea8080e7jalr-344(ra) # 0 <_start-0x8000> for the weak call. It isn't clear what you are doing differently. Jim
Re: RISC-V ELF multilibs
On 05/26/2018 06:04 AM, Sebastian Huber wrote: Why is the default multilib and a variant identical? This is supposed to be a single multilib, with two names. We use MULTILIB_REUSE to map the two names to a single multilib. rohan:1030$ ./xgcc -B./ -march=rv64imafdc -mabi=lp64d --print-libgcc ./rv64imafdc/lp64d/libgcc.a rohan:1031$ ./xgcc -B./ -march=rv64gc -mabi=lp64d --print-libgcc ./rv64imafdc/lp64d/libgcc.a rohan:1032$ ./xgcc -B./ --print-libgcc ./libgcc.a rohan:1033$ So this is working right when the -march option is given, but not when no -march is given. I'd suggest a bug report so I can track this, if you haven't already filed one. Most variants include the C extension. Would it be possible to add -march=rv32g and -march=rv64g variants? The expectation is that most implementations will include the C extension. It reduces code size, improves performance, and I think I read somewhere that it takes only 400 gates to implement. It isn't practical to try to support every possible combination of architecture and ABI here, as there are too many possible combinations. But if there is a major RISC-V target that is rv32g or rv64g then we should consider it. You can of course define your own set of multilibs. Jim
Re: GCC 8.1 Released
On 05/02/2018 10:21 AM, Damian Rouson wrote: Could someone please point me to instructions for how to submit a change to the gfortran changes list? I’d like to add the following bullet: See also https://gcc.gnu.org/contribute.html#webchanges Jim
Re: GCC changes for Fedora + riscv64
On 04/08/2018 08:22 AM, Jeff Law wrote: On 03/31/2018 12:27 PM, Richard W.M. Jones wrote: I'd like to talk about what changes we (may) need to GCC in Fedora to get it working on 64-bit RISC-V, and also (more importantly) to ask your advice on things we don't fully understand yet. However, I don't know even what venue you'd prefer to discuss this in. A discussion here is fine with me. I know of a few issues. I have a work-in-progress --with-multilib-list patch in PR 84797 but it isn't quite right yet, and needs to work more like the patch in PR 85142, which isn't OK to check in. There is a problem with atomics. We only have builtins for the ones that can be implemented with a single instruction. Adding -latomic unconditionally might fix it, but won't work for gcc builds and the gcc testsuite unless we also add paths pointing into the libatomic build dir. I'm also concerned that this might cause build problems, if we end up trying to link with libatomic before we have built it. The simplest solution might be to just add expanders for all of the missing atomics, even if they require multiple instructions, just like how all of the mainstream linux targets currently work. There is a problem with the linker not searching the right set of dirs by default. That is more a binutils problem than a gcc problem, but the linker might need some help from gcc to fix it, as the linker doesn't normally take -march and -mabi options. There is a problem with libffi, which has RISC-V support upstream, but not in the FSF GCC copy. This is needed for go language support. There was also a dispute about go something naming, as to whether it should be riscv64 or riscv, with one person doing a port choosing the former and another person doing another port choosing the latter. Those are all of the Linux specific ones I can remember at the moment. I might have missed some. Jim
Re: Copyright assignment form
On Tue, Jan 16, 2018 at 12:01 PM, Siddhesh Poyarekar wrote: > You need a separate assignment for every GNU project you intend to > contribute to, so separate assignments for GCC, glibc, binutils, etc. The form is the same for all GNU projects. You can file an assignment that covers a single patch, or a single project, or multiple patches, or multiple projects, or even all patches for all projects. Though of course the lawyers will have a say in this, as they may not be comfortable with a broad assignment, and may want to restrict it to specific projects, or even specific patches for specific projects. The more restricted the assignment, the more paperwork you have to do, but the easier it it to get it past lawyers uncomfortable with FSF requirements. For instance, you can get an assignment for a single patch, but then you have to go through the assignment process every time you contribute a patch. Jim
Re: Copyright assignment form
On 01/15/2018 03:11 PM, Shahid Khan wrote: Our team at Qualcomm Datacenter Technologies, Inc. is interested in contributing patches to the upstream GCC compiler project. To get the process started, we'd like to request a copyright assignment form as per contribution guidelines outlined at https://gcc.gnu.org/contribute.html. Please let me know if there are additional steps we need to take to become an effective contributor to the GCC community. You should contact ass...@gnu.org directly. The standard forms contain language about patents that Qualcomm lawyers are unlikely to be comfortable with, and may require negotiating a non-standard agreement. As best as I can tell, the FSF has never received a copyright assignment or disclaimer from Qualcomm. If this is the first time Qualcomm lawyers are talking to the FSF, this will take a while. I would not be surprised if this takes a year or two. You will also need a VP level signature for the forms once you get approval from Qualcomm lawyers. You may want to consider getting a disclaimer from your employer, and then filing personal assignments. It is probably easier to get a disclaimer from Qualcomm than an assignment, but this requires more paperwork, since each individual contributing then needs their own personal assignment. The disclaimers also have language about patents that the Qualcomm lawyers may not like, so while this should be easier, it is still likely a difficult process. Siddhesh can help you with this as the rules for gcc are the same as for glibc. Jim
Re: Fwd: gcc 7.2.0 error: no include path in which to search for stdc-predef.h
On 12/04/2017 01:11 PM, Marek wrote: looking at config.log i see theses errors: Configure does a number of feature tests to see what features are available for use. It is expected that some of these feature tests will fail. Some features are optional, and if that feature test fails there is no problem; we just use an alternative feature. Some features are required, and if that feature test fails then configure exits with an error. If you get one of these, it will be the very last feature test in the config.log file, and will have some kind of error message that indicates that configure can not continue after this failure. But this doesn't seem relevant to your problem, as Kai Ruottu already pointed out what is wrong... On Fri, Dec 1, 2017 at 11:23 AM, Kai Ruottu wrote: Kai Ruottu kirjoitti 1.12.2017 klo 12:02: Answering to my own question... Yes, it should include this : https://git.musl-libc.org/cgit/musl/tree/include Maybe there is another target name one should use like 'x86_64-lfs-linux-musl' in your case? The docs for musl are telling just this, one should use the '-linux-musl' triplet! Gcc assumes glibc for for a linux target. If you want to use musl, you must include musl in the target triplet that you configure for. See the gcc/config.gcc file, and look at the places where it checks the target triplet for musl to enable musl support. Jim
Re: gcc 7.2.0 error: no include path in which to search for stdc-predef.h
On 11/26/2017 11:09 PM, Marek wrote: Hi, while compiling 7.2.0 im getting the following: cc1: error: no include path in which to search for stdc-predef.h cc1: note: self-tests are not enabled in this build This doesn't appear to be a build error. Configure runs the compiler to check for features, and if a check fails, then the feature is disabled. This is normal, and nothing to worry about. Though the message is unusual. If the compiler is the one you just built, there might be something wrong with it. Or there might be a minor configure script bug. configure: error: in `/run/media/void/minnow/build/gcc-7.2.0/x86_64-lfs-linux-gnu/libgcc': configure: error: cannot compute suffix of object files: cannot compile See `config.log' for more details. make[1]: *** [Makefile:12068: configure-target-libgcc] Error 1 make: *** [Makefile:880: all] Error 2 This is the real build error. You need to look at the config.log file in the directory where configure failed to see what the problem is. This is usually a build environment problem of some sort. If gcc is able to recognize between sources in one dir and objects in another dir Yes. The usual way to configure gcc is something like mkdir build cd build ../gcc/configure Jim
Re: [net-next:master 488/665] verifier.c:undefined reference to `__multi3'
On 11/11/2017 05:33 PM, Fengguang Wu wrote: CC gcc list. According to Alexei: This is a known issue with gcc 7 on mips that is "optimizing" normal 64-bit multiply into 128-bit variant. Nothing to fix on the kernel side. I filed a bug report. This is now https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82981 I found a helpful thread at https://www.linux-mips.org/archives/linux-mips/2017-08/msg00041.html that had enough info for me to reproduce and file the bug report. Jim
Re: [PATCH] RISC-V: Add Jim Wilson as a maintainer
On Mon, Nov 6, 2017 at 6:39 PM, Palmer Dabbelt wrote: > > +riscv port Jim Wilson > > It is jimw not jim for the email address. Please fix. Jim
Re: -ffunction-sections and -fdata-sections documentation
On 10/13/2017 12:06 AM, Sebastian Huber wrote: The end-of-life of Solaris 2.6 was 2006. Is it worth to mention this here? The reference to Solaris 2.6 is no longer useful. Just mention ELF here. This "AIX may have these optimizations in the future." is there since at least 1996. What is the current AIX status? David answered this. Is the "Only use these options when there are significant benefits from doing so. When you specify these options, the assembler and linker create larger object and executable files and are also slower. You cannot use |gprof| on all systems if you specify this option, and you may have problems with debugging if you specify both this option and -g." still correct on the systems of today? You can get larger objects, because as Jeff mentioned, some compile-time/assembly-time optimizations get disabled. That should probably be clarified. The assembler/linker will be slower because they will have more work to do, more relocations, more sections, larger object files. Some old systems could not support both -ffunction-sections and -pg together, this used to give a warning, which was removed in 2012. I believe this is obsolete. The likely explanation for this doc is https://gcc.gnu.org/ml/gcc-help/2008-11/msg00139.html which mentions that it was already long ago fixed at that time. Using -g should not be a problem on an ELF/DWARF system, which is what most systems use nowadays. There could be issues with other object files/debug info formats, but this is unclear. I suspect this comment is obsolete and can be removed. The doc should probably refer to the linker --gc-sections option, as this is what makes -ffunction-sections useful for most people, by reducing the code size by eliminating unused functions. Do these options affect the code generation? Jeff answered this. Jim
Re: Byte swapping support
On 09/12/2017 02:32 AM, Jürg Billeter wrote: To support applications that assume big-endian memory layout on little- endian systems, I'm considering adding support for reversing the storage order to GCC. In contrast to the existing scalar storage order support for structs, the goal is to reverse the storage order for all memory operations to achieve maximum compatibility with the behavior on big-endian systems, as far as observable by the application. Intel has support for this in icc. It took about 5 years for a small team to make it work on a very large application. That includes both the compiler development and application development time. There are a lot of complicated issues that need to solved to make this work on real code, both in the compiler and in the application code. There is a Dr Dobbs article about some of it, search for "Writing a Bi-Endian Compiler" if you are interested. Even though they got it working, it was painful to use. Icc goes to a lot of trouble to optimize away unnecessary byte-swapping to improve performance, but that meant any variable could be big or little endian despite how it was declared, and could be different endianness at different places in the code, and could even be both endianness (stored in two locations) at the same time if the code needed both endianness. Sometimes we'd find a bug, and it would take a week to figure out if it was a compiler bug or an application bug. To facilitate byte swapping at endian boundaries (kernel or libraries), I'm also considering developing a new GCC builtin that can byte-swap whole structs in memory. There are limitations to this, e.g., unions could not be supported in general. However, I still expect this to be very useful. There is a lot more stuff that will cause problems. Byte-swapping FP doesn't make sense. You can only byte swap a variable if you know its type, but you don't know the type of a va_list ap argument, so you can't call a big-endian vprintf from little-endian code and vice versa. If you have a template expanded in both big and little endian code, you will run into problems unless name mangling changes to include endian info, which means you lose ABI compatibility with the current name mangling scheme. There will also be trouble with variables in shared libraries that get initialized by the dynamic linker. You will either have to add a new set of other-endian relocations, or else you will have to add code to byte-swap data after relocations are performed, probably via an init routine, which will have to run before the other init routines. There is also the same issue with static linking, but that one is a little easier to handle, as you can use a post-linking pass to edit the binary and byte swap stuff that needs to be byte swapped after relocations are performed. To handle endian boundaries, you willl need to force all declarations to have an endianness, and you will need to convert when calling a big-endian function from a little-endian function, and vice versa, and you will need to give an error if you see something you can't convert, like a va_list argument. Besides the issue of the C library not changing endianness, you will likely also have third party libraries that you can't change the endianness of, and that need to be linked into your application. Before you start, you should give some thought to how debugging will work. DWARF does have an endianity attribute, you will need to set it correctly, or debugging will be hopeless. Even if you set it correctly, if you have optimizations to remove unnecessary byte swapping, debugging optimized code will still be hard and people using the compiler will have to be trained on how to deal with endianness issues. And there are lots of other problems, I don't have time to document them all, or even remember them all. Personally, I think you are better off trying to fix the application to make it more portable. Fixing the compiler is not a magic solution to the problem that is any easier than fixing the application. Jim
Re: layout of __attribute__((packed)) vs. #pragma pack
On 07/28/2017 04:51 AM, Geza Herman wrote: There's an option in GCC "-mms-bitfields". The doc about it begins with: "If packed is used on a structure, or if bit-fields are used, it may be that the Microsoft ABI lays out the structure differently than the way GCC normally does. Particularly when moving packed data between functions compiled with GCC and the native Microsoft compiler (either via function call or as data in a file), it may be necessary to access either format." I'm particularly interested in packed structs, bit-fields are not a concern now. Does this doc mean, that a packed struct layout may differ between GCC and MSVC? The doc doesn't give an example of this, it just talks about bit-fields. If the packed layout can differ, in which way does it? Previously I thought that both compilers put members into the struct without any padding, so the layout must match. Different ABIs handle bit-fields differently, and as a result, different ABIs handle packed structures with bit-fields differently. There are testcases for this in the gcc testsuite. In the gcc sources, look at gcc/testsuite/gcc.dg/bf-ms-layout.c. Otherwise, for a packed structure without bit-fields, and without sub-structures, it is probably handled the same across most ABIs. Plus, __attribute__((packed)) documentation have changed: https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Common-Variable-Attributes.html, here the text is: The packed attribute specifies that a variable or structure field should have the smallest possible alignment—one byte for a variable, and one bit for a field, unless you specify a larger value with the aligned attribute. https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html, here the text is: "This attribute, attached to struct or union type definition, specifies that each member (other than zero-width bit-fields) of the structure or union is placed to minimize the memory required" What does "minimize" mean here? Does it give the same guarantees as the previous definition? Could it mean that there's still padding in the struct? Structure layout is complicated, and it isn't possible to explain all details in a single sentence. There may be cases where a packed structure still contains padding. It depends on the ABI, and how bit-fields are handled, etc. You can see some examples with the -ms-bitfields examples given above, where some packed structures are larger with gcc than msvc, and some are larger with msvc than gcc. If in doubt, the best solution is to write some testcases to check, or add some consistency checking code, e.g. you can add some code to verify that type sizes are what you expect at compile-time. Jim
Re: libatomic IFUNC question (arm & libat_have_strexbhd)
On 06/06/2017 09:01 AM, Steve Ellcey wrote: So the question remains, where is libat_have_strexbhd set? As near as I can tell it isn't set, which would make the libatomic IFUNC pointless on arm. libat_have_strexbhd isn't set anywhere. It looks like this was a prototype that was never fully fleshed out. See for instance the libatomic/config/x86/init.c file. Finishing this means someone has to figure out how to use the arm cpuid equivalent to set the two variables appropriately. Jim
Re: Getting spurious FAILS in testsuite?
On 06/01/2017 05:59 AM, Georg-Johann Lay wrote: Hi, when I am running the gcc testsuite in $builddir/gcc then $ make check-gcc RUNTESTFLAGS='ubsan.exp' comes up with spurious fails. This was discussed before, and the suspicion was that it was a linux kernel bug. There were multiple kernel fixes pointed at, it wasn't clear which one was required to fix it. I have Ubuntu 16.04 LTS on my laptop, and I see the problem. I can't run the ubsan testsuites with -j factor greater than one and get reproducible results. There may also be other ways to trigger the problem. See for instance the thread https://gcc.gnu.org/ml/gcc/2016-07/msg00117.html The first message in the thread from Andrew Pinski mentions that the log output is corrupted from apparent buffer overflow. Jim
Re: FW: Build failed in Jenkins: BuildThunderX_native_gcc_upstream #1267
On 03/17/2017 04:12 PM, Jim Wilson wrote: I have access to a fast box that isn't otherwise in use at the moment so I'm taking a look. r246225 builds OK. r246226 does not. So it is Bernd's combine patch. A little experimenting shows that the compare difference is triggered by the use of -gtoggle in stage2, which is not used in stage3. Otherwise stage2 and stage3 generate identical code. The bug is apparently due to a problem with handling debug insns in the combine patch. Changing a new prev_nonnote_insn call to a prev_nonnote_nondebug_insn call appears to solve the problem. I will have to do a bootstrap and make check from scratch to verify. I also noticed that there is a redundant i1 check in the patch which should be fixed also. Jim
Re: FW: Build failed in Jenkins: BuildThunderX_native_gcc_upstream #1267
On 03/17/2017 03:28 PM, Jeff Law wrote: On 03/17/2017 03:31 PM, Andrew Pinski wrote: On Fri, Mar 17, 2017 at 11:47 AM, Bernd Schmidt wrote: On 03/17/2017 07:38 PM, Pinski, Andrew wrote: One of the following revision caused a bootstrap comparison failure on aarch64-linux-gnu: r246225 r246226 r246227 Can you help narrow that down? I can though I don't want to duplicate work since Jeff was going to provision an aarch64 system. My automated testing is approximately every hour or so; these commits were within an hour window even. I did not look into the revisions when I write the email but I suspect r246227 did NOT cause it since aarch64 does not use reload anymore. My the box I got isn't terribly fast, but regardless I'll be walking through each commit to see if I can trigger the failure. 246224 tested OK (as it should). 246225 is in progress. I have access to a fast box that isn't otherwise in use at the moment so I'm taking a look. r246225 builds OK. r246226 does not. So it is Bernd's combine patch. A little experimenting shows that the compare difference is triggered by the use of -gtoggle in stage2, which is not used in stage3. Otherwise stage2 and stage3 generate identical code. The bug is apparently due to a problem with handling debug insns in the combine patch. Jim
Re: GNU Toolchain Fund established at the Free Software Foundation
On 03/10/2017 03:08 AM, David Edelsohn wrote: On Thu, Mar 9, 2017 at 8:48 PM, Ian Lance Taylor wrote: On Thu, Mar 9, 2017 at 11:49 AM, David Edelsohn wrote: As discussed at the last Cauldron, the first interest of the community seems to be the shared infrastructure of Sourceware: hosting, system administration, backups, and updating the websites. There was also a suggestion of funding travel for speakers at the GNU Cauldron, for people who might not be able to afford the travel otherwise. Jim
Re: Do we really need a CPP manual?
On 12/16/2016 10:06 AM, Jeff Law wrote: That's likely the manual RMS kept asking folks (semi-privately) to review. My response was consistently that such review should happen publicly, which RMS opposed for reasons I don't recall. I reviewed it, on the grounds that a happy rms is good for the gcc project, and because I haven't been doing much else useful. It was a lot of work, about 10 hours a week for 2 months. The document I reviewed has significant differences from the one on the web site, but has a lot of structural similarities. I think there is a major rewrite still in progress. I pointed out all of the obvious stuff, features dropped long ago, references to out-of-date standards, missing ISO C 2011 features, etc. Jim
Re: LSDA unwind information is off by one (in __gcc_personality_v0)
On 10/20/2016 11:51 AM, Florian Weimer wrote: exception handling region. Subtracting 1 is an extremely hackish way to achieve that and likely is not portable at all. Gdb has been doing this for over 25 years for every architecture. When you use the backtrace command, it gets a return address, subtracts one, and then does a file name/line number lookup. This is because the file name and source line number of the call instruction may not be the same as the instruction after the call. This does of course assume that you have a return address, and are doing some kind of range based lookup on addresses, so you don't need an exact instruction address to get a hit. Exception regions work the same way. I think that there is some sort of configure related problem here, as HAVE_GETIPINFO is set when I build on an Ubuntu x86_64-linux system. Looking at the configure test, which is in config/unwind_ipinfo.m4... if you don't use --with-system-libunwind, then HAVE_GETIPINFO defaults to on. If you do use --with-system-libunwind, then HAVE_GETIPINFO defaults to off, which will break handling for signal frames. I'm not sure if anyone is using --with-system-libunwind, so I'm not sure if this needs a gcc bug report. But I also see that while HAVE_GETIPINFO appears to be set by configure, it is apparently not being used when building unwind-c.o. I see that HAVE_GETIPINFO is set in the libgcc/auto-target.h file, but this file is not included by unwind-c.c. I only see includes of this in libgcc/config/i386/cpuinfo.c and libgcc/config/sol2/gmon.c. I don't know offhand how auto-target.h is supposed to work, but it appears that it needs to be included in the unwind files built as part of libgcc. This is maybe a bug accidentally caused when libgcc was moved out of the gcc dir and into its own top level dir. I think this warrants a gcc bug report. Jim
Re: Replacement for the .stabs directive
On 08/19/2016 12:55 PM, Umesh Kalappa via llvm-dev wrote: We have the legacy code ,that uses the .stabs directive quiet often in the source code like .stabs "symbol_name", 100, 0, 0, 0 + .label_one f; .label_one stmt and ,the above code is wrapped with the inline asm in the c source file . Presumably the ".label_one f" is actually "1f" and the ".label_one" is "1:". That would make more sense, as this is a use of the GNU as local label feature. Unfortunately, there is no easy to do this in dwarf, as dwarf debug info is split across multiple sections and encoded. Maybe this could work if you handled it like a comdat symbol, but that would be inconvenient, and might not even work. This seems like a option not worth pursuing. The fact that this worked for stabs is more accident by design. The code never should have been written this way in the first place. You can make the association between a symbol name and an address by using an equivalence. E.g. you could do asm ("symbol_name = 1f"); but this puts the symbol_name in the symbol table, which works only if symbol_name is unique or maybe unique within its scope if function local. If the name was unique, you probably wouldn't have used the ugly stabs trick in the first place, so this might not work. If the symbol names aren't unique, maybe you can change the code to make them unique? Using an equivalence gives the same effective result as using symbol_name: stmt Jim PS Cross posting like this is discouraged. I would suggest just asking assembler questions on the binutils list.
Re: Supporting subreg style patterns
On 08/16/2016 03:10 AM, shmuel gutl wrote: My hardware directly supports instructions of the form subreg:SI(reg:VEC v1,3) = SI:a1 Subregs of hard registers should be avoided. They are primarily useful for pseudo regs. Subregs that aren't lowpart subregs should be avoided also. Except when you have a subreg of a pseudo that maps to multiple hard regs, and can eventually become a lowpart subreg after the pseudo gets allocated to a hard reg and gets simplified. It isn't clear where the subregs are coming from, but what you are doing sounds like a bit-field extract/insert, and these are not operations that the register allocator will add to the code. Depending on what exactly you are trying to do, I have two general suggestions. 1) Define the vector registers as 32-bit registers, and define vector operations as using aligned groups of these 32-bit registers. This exposes the 32-bit registers to the register allocator so that it can use them directly. 2) Use zero_extract and/or vec_select instead of subreg, which requires that you have patterns that emit the zero_extract/vec_select operations, patterns that recognize them, and possibly builtin functions that the user can call to get these zero_extract/vec_select operations emitted into the rtl. There is a named pattern vec_extract that the vectorizer can use to generate these rtl operations. For examples of this, in the aarch64 port, see for instance the aarch64_movdi_* patterns in the aarch64.md file, and the aarch64_get_lane* patterns in the aarch64-simd.md file. Jim
Re: Change the arrch64 abi ...(Custom /Specific change)
On Tue, Apr 5, 2016 at 2:45 AM, Umesh Kalappa wrote: > I need to ,make the changes only to the function args(varargs),hence > making the changes in TARGET_PROMOTE_FUNCTION_MODE will do ?. If TARGET_PROMOTE_FUNCTION_MODE disagrees with PROMOTE_MODE, it is possible that the middle end may generate incorrect RTL. This was seen with the arm target when it was using different sign extension for args and locals. It may or may not be a problem for SImode extension versus DImode extension. If you run into optimizer problems, you may need to change PROMOTE_MODE also to solve them. > one more question ,i do have defined the TARGET_PROMOTE_FUNCTION_MODE > (arm.c) and cross compilling for aarch64 ,but still gcc calls > default_promote_function_node i.e Add it to aarch64.c instead of arm.c. arm.c is for 32-bit arm code. aarch64.c is for 64-bit arm code. Jim
Re: Change the arrch64 abi ...(Custom /Specific change)
On 04/04/2016 08:55 AM, Umesh Kalappa wrote: We are in process of changing the gcc compiler for aarch64 abi ,w.r.t varargs function arguments handling. default(LP64) ,where 1,2,4 bytes args are promoted to word size i.e 4 bytes ,we need to change these behaviour to 8 bytes (double word). we are looking both hooks like PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE to make the changes. I think this would work. You just need to promote all modes less than 8 bytes to DImode, instead of the current code that promotes modes smaller than 4 bytes to SImode. You would do this for the default LP64 type system, but not for the ILP32 type system. This would affect all function arguments and locals, which may cause code size and/or performance issues. You would have to check for that. Also, this may prevent linking with any 3rd party code compiled by unmodified gcc, or code compiled with other compilers (e.g. LLVM), because changing TARGET_PROMOTE_FUNCTION_MODE can cause ABI changes. You may need to check that also. Jim
Re: extendqihi2 and GCC RTL type system
On Mon, Feb 22, 2016 at 7:55 AM, David Edelsohn wrote: > If I remove extendqihi2 (extend:HI pattern) from the PowerPC port, > will that cause any problems for the GCC RTL type system or inhibit > optimizations? I see that Alpha and SPARC define extendqihi2, but > IA-64 and AArch64 do not, so there is precedent for both approaches. aarch64 does have an extendqihi2 pattern. It uses so many iterator macros that you can't use grep to look for stuff. The extendqihi2 pattern is called qihi2. If you have a target with registers larger than HImode, no HImode register operations, qi/hi loads set the entire register, and you define PROMOTE_MODE to convert all QImode and HImode operations to the same larger mode with the same signedness, then I don't think that there is any advantage to having an extendqihi2 pattern. You should get the same code with or without it, as a qimode to himode conversion is a no-op. The only difference should be that with an extendqihi2 pattern you will see some HImode operations in the RTL, without extendqihi2 you will see equivalent operations in the promoted mode. If you are concerned about this, then just try compiling some large code base using two compilers, one with extendqihi2 and one without, and check to see if there are any code generation differences. Jim
Re: [RFC PR43721] Optimize a/b and a%b to single divmod call
On Fri, Jan 29, 2016 at 12:09 AM, Richard Biener wrote: > I wonder if rather than introducing a target hook ports could use > a define_expand expanding to a libcall for this case? Of the two divmod libcall APIs, one requires a stack temporary, which would be awkward to allocate in a define_expand. Though we could have expand_twoval_binop implement the libgcc udivmoddi4 API which requires a stack temp, and then add an ARM divmod expander that implements the ARM API which has a double-wide result. That sounds like it could work. Jim
Re: [RFC PR43721] Optimize a/b and a%b to single divmod call
On Sun, Jan 31, 2016 at 8:43 PM, Jim Wilson wrote: >> Are we certain that the libcall is a win for any target? >> I would have expected a default of >> q = x / y >> r = x - (q * y) >> to be most efficient on modern machines. Even more so on targets like ARM >> that have multiply-and-subtract instructions. If there is a div insn, then yes, gcc will emit a div and a multiply. However, a div insn is a relatively recent addition to the 32-bit ARM architecture. Without the div insn, we get a div libcall and a mod libcall. That means two libcalls, both of which are likely implemented by calling the divmod libcall and returning the desired part of the result. One call to a divmod libcall is clearly more efficient than two calls to a divmod libcall. So that makes the transformation useful. Prathamesh's patch has a number of conditions required to trigger the optimization, such as a divmod insn, or a lack of a div insn and the presence of a divmod libcall. Jim
Re: [RFC PR43721] Optimize a/b and a%b to single divmod call
On Sun, Jan 31, 2016 at 2:15 PM, Richard Henderson wrote: > On 01/29/2016 12:37 AM, Richard Biener wrote: >>> >>> To workaround this, I defined a new hook expand_divmod_libfunc, which >>> targets must override for expanding call to target-specific dimovd. >>> The "default" hook default_expand_divmod_libfunc() expands call to >>> libgcc2.c:__udivmoddi4() since that's the only "generic" divmod >>> available. >>> Is this a reasonable approach ? >> >> >> Hum. How do they get to expand/generate it today? That said, I'm >> no expert in this area. >> >> A simpler solution may be to not do the transform if there is no >> instruction for divmod. > > > Are we certain that the libcall is a win for any target? > > I would have expected a default of > > q = x / y > r = x - (q * y) > > to be most efficient on modern machines. Even more so on targets like ARM > that have multiply-and-subtract instructions. > > I suppose there's the case of the really tiny embedded chips that don't have > multiply patterns either. So that this default expansion still results in > multiple libcalls. > > I do like the transformation, because for machines that don't have a divmod > instruction, being able to strength-reduce a mod operation to a multiply > operation is a nice win. > > > r~
Re: [RFC PR43721] Optimize a/b and a%b to single divmod call
On Thu, Jan 28, 2016 at 5:37 AM, Richard Biener wrote: >> To workaround this, I defined a new hook expand_divmod_libfunc, which >> targets must override for expanding call to target-specific dimovd. >> The "default" hook default_expand_divmod_libfunc() expands call to >> libgcc2.c:__udivmoddi4() since that's the only "generic" divmod >> available. >> Is this a reasonable approach ? > > Hum. How do they get to expand/generate it today? That said, I'm > no expert in this area. Currently, the only place where a divmod libfunc can be called is in expand_divmod in expmed.c, which can return either the div or mod result, but not both. If this is called for the mod result, and there is no div insn, and no mod insn, and no mod libfunc, then it will call the divmod libfunc to generate the mod result. This is exactly the case where the ARM port needs it, as this code was written for the arm. There are 3 targets that define a divmod libfunc: arm, c6x, and spu. The arm port is OK, because expand_divmod does the right thing for arm, using the arm divmod calling convention. The c6x port is OK because it defines mod insns and libfuncs, and hence the divmod libfunc will never be called and is redundant. The spu port is also OK, because it defines mod libcalls, and hence the divmod libfunc will never be called, and is likewise redundant. Both the c6x and spu ports have their own divmod library functions in libgcc/config/$target. The divmod library functions are called by the div and mod library functions, so they are necessary, they are just never directly called. Both the c6x and spu port uses the current libgcc __udivmoddi4 calling convention with a pointer to the mod result, which is different and incompatible to the ARM convention of returning a double size result that contains div and mod. WIth Prathamesh's patch to add support to the tree optimizers to create divmod operations, the c6x and spu ports break. The divmod libfuncs are no longer redundant, and will be called, except with the wrong ABI, so we need to extend the divmod support to handle multiple ABIs. This is why Prathamesh added the target hook for the divmod libcall, so the target can specify the ABI used by its divmod libcalls. Prathamesh has correct support for ARM (current code), and apparently correct code for c6x and spu (libgcc udivmodsi4). > A simpler solution may be to not do the transform if there is no > instruction for divmod. This prevents the optimization from happening on ARM, which has divmod libfuncs but no divmod insn. We want the optimization to happen there, as if we need both div and mod results, then calling the divmod libfunc is faster than calling both the div and mod libfuncs. Jim
Re: vectorization ICE for aarch64/armhf on SPEC2006 h264ref
On Tue, Jan 12, 2016 at 2:22 PM, Jim Wilson wrote: > I see a number of places in tree-vect-generic.c that add a > VIEW_CONVERT_EXPR if useless_type_convertsion_p is false. That should > work, except when I try this, I see that the VIEW_CONVERT_EXPR gets > converted to a NOP_EXPR by gimplify_build1, and gets stripped again. To elaborate on this a bit more, I see a number of places that do this if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_rhs))) new_rhs = gimplify_build1 (gsi, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), new_rhs); In match.pd, there is a rule to convert VIEW_CONVERT_EXPR to NOP_EXPR (simplify (view_convert @0) (if ((INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type)) && (INTEGRAL_TYPE_P (TREE_TYPE (@0)) || POINTER_TYPE_P (TREE_TYPE (@0))) && TYPE_PRECISION (type) == TYPE_PRECISION (TREE_TYPE (@0))) (convert @0))) But according to useless_type_conversion_p, there are two more conditions that need to be met, the signedness must be the same, and if one is boolean and one is not then the precision must be one. In my case, we have a 32-bit int type and a 32-bit boolean type. So useless_type_conversion_p is demanding a type conversion, but match.pd is converting the VIEW_CONVERT_EXPR to a NOP_EXPR, and gimplify_build1 is stripping it. So there appears to be an inconsistency here. Jim .
vectorization ICE for aarch64/armhf on SPEC2006 h264ref
I'm looking at an ICE on SPEC 2006 464.h264ref slice.c that occurs with -O3 for both aarch64 and armhf. palantir:2080$ ./xgcc -B./ -O3 -S slice.i slice.c: In function ‘poc_ref_pic_reorder’: slice.c:838:6: error: incorrect type of vector CONSTRUCTOR elements {_48, _55, _189, _59} vect_no_reorder_16.92_252 = {_48, _55, _189, _59}; slice.c:838:6: internal compiler error: verify_gimple failed ... This fails because it is expecting int type elements in the constructor, and we have instead elements with boolean type. useless_type_conversion_p says that it isn't OK to substitute bool for int. I used bisection to trace the problem to the patch for buzilla 68215 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68215 The problem occurs in expand_vector_condition. a_is_comparison is false. Before the patch, aa is an ssa_name with type boolean returned by tree_vec_extract. This is passed to passed to gimplify_build3 which returns a cond_expr with type int. After the patch, aa is a ne_expr with type boolean. gimplify_build3 calls fold_build3_loc which optimizes the cond_expr/ne_expr, and returns a nop_expr of type int of the boolean ne_expr. gimplify_build3 then calls STRIP_NOPS which removes the nop_expr, and the end result here is a ne_expr with a boolean type, which is the wrong type for the constructor. I don't have a lot of experience with the gimple work, so I'm not sure where this is going wrong. I see a number of places in tree-vect-generic.c that add a VIEW_CONVERT_EXPR if useless_type_convertsion_p is false. That should work, except when I try this, I see that the VIEW_CONVERT_EXPR gets converted to a NOP_EXPR by gimplify_build1, and gets stripped again. Maybe the gimplify_build* routines should be using STRIP_USELESS_TYPE_CONVERSION instead of STRIP_NOPS? That seems to work, but I don't know if that will have cascading effects. Or maybe verify_gimple should allow bools and ints to mix in a constructor? That doesn't seem like the right solution to me. Jim
Re: reload question about unmet constraints
On Tue, Sep 15, 2015 at 8:53 AM, Ulrich Weigand wrote: > Jim Wilson wrote: > In that case, you might be able to fix the bug by splitting the > offending insns into two patterns, one only handling near mems > and one handling one far mems, where the near/far-ness of the mem > is verified by the *predicate* and not the constraints. That is how it works currently. He was trying to optimize a case that involved mixed near and far mems and hence couldn't use a predicate in that case. Jim
Re: reload question about unmet constraints
On Tue, Sep 15, 2015 at 7:42 AM, Ulrich Weigand wrote: > But the only difference between define_memory_constraint and a plain > define_constraint is just that define_memory_constraint guarantees > that any memory operand can be made valid by reloading the address > into a base register ... > > If the set of operands accepted by a constraint does *not* have that > property, it must not be defined via define_memory_constraint, and > you should simply use define_constraint instead. An invalid near mem can be converted to a valid near mem by reloading its address into a base reg. An invalid far mem can be converted to a valid far mem by reloading its address into a base reg. But one can't convert a near mem to a far mem by reloading the address, nor can one convert a far mem to a near mem by reloading its address. So we need another dimension to the validity testing here, besides the question of whether the address can be reloaded, there is the question of whether it is in the right address space. Though I don't think the rl78 is actually using address spaces, and it isn't clear if that would help. Jim
Re: reload question about unmet constraints
On Mon, Sep 14, 2015 at 11:05 PM, DJ Delorie wrote: > As a test, I added this API. It seems to work. I suppose there could > be a better API where we determine if a constrain matches various > memory spaces, then compare with the memory space of the operand, but > I can't prove that's sufficiently flexible for all targets that > support memory spaces. Heck, I'm not even sure what to call the > macro, and > "TARGET_IS_THIS_MEMORY_ADDRESS_RELOADABLE_TO_MATCH_THIS_CONTRAINT_P()" > is a little long ;-) > > What do we think of this direction? We already have define_constraint and define_memory_constraint. We could perhaps add a define_special_memory_constraint that returns CT_SPECIAL_MEMORY which mostly operates like CT_MEMORY, except that it doesn't assume any MEM can be reloaded to match. We already have constraint_satisfied_p, which is generated from define*_constraint. We could have a constraint_reloadable_to_match_p function parallel to that, which is for operands that don't match, but can be reloaded to match. Perhaps we don't even need a distinction between define_memory_constraint and define_special_memory_constraint. We could have constraint_reloadable_to_match_p default to the current code for memory constraints, that assumes any mem is reloadable to match, if a special reloadable condition isn't specified. Perhaps define_memory_constraint can be extended with an optional field at the end, that is used to generate the constraint_reloadable_to_match_p function. Otherwise, I think you are headed in the right direction. I would worry a bit about whether we are making reload even more complicated for folks. But given that we already have the concept of address spaces, there should be some way to expose this info to reload. Jim
Re: reload question about unmet constraints
On Tue, Sep 1, 2015 at 6:20 PM, DJ Delorie wrote: > >> It did match the first alternative (alternative 0), but it matched the >> constraints Y/Y/m. > > It shouldn't match Y as those are for near addresses (unless it's only > matching MEM==MEM), and the ones in the insn are far, but ... Reload chooses the alternative that is the best match. When using the constraints Y/Y/m, 2 of the three operands match the constraints, so this ends up being the best match. It then tries to reload the far mem to match Y, which fails, as all it knows how to do is reload a mem address to make it match, which can't turn a far mem into a near mem. You would need some way to indicate that while Y does accept a mem, this particular mem can't be reloaded to match. We don't have a way to do that. The Y constraint gets classified as contraint type CT_MEMORY. In find_reloads, in reload.c, there is a case CT_MEMORY, and it does if (CONST_POOL_OK_P (operand_mode[i], operand) || MEM_P (operand)) badop = 0; constmemok = 1; offmemok = 1; Since the operand is a MEM, badop is set to zero. That makes this look like a good alternative. You want badop to be left alone. Also, the fact that offmemok was set means that reload thinks that any mem can be fixed by reloading the address to make it offsetable. You don't want offmemok set. Without offmemok set, it should get reloaded into a register, as reload will use the v constraint instead. Jim
Re: reload question about unmet constraints
On 09/01/2015 12:44 AM, DJ Delorie wrote: > I expected gcc to see that the operation doesn't meet the constraints, > and move operands into registers to make it work (alternative 1, > "v/v/v"). It did match the first alternative (alternative 0), but it matched the constraints Y/Y/m. Operands 1 and 2 are OK, so don't need reloads. It did create optional reloads, which it always does for mem, but these reloads are irrelevant. The interesting one is for operand 0. Since Y accepts mem, and operand 0 is a mem but doesn't match, reload assumes that we can fix it by reloading the address to make it an offsettable address. But a far mem is still not acceptable even with a reloaded address, and you get an ICE. Reload doesn't have any concept of two different kinds of memory operands which can't be converted via reloads. If the constraint accepts mem, and we have a mem operand, then it will always assume that the problem is with the address and reload it. I don't think that there is an easy solution to this, but my reload skills are a bit rusty too. Jim
Re: fake/abnormal/eh edge question
On 08/25/2015 02:54 PM, Steve Ellcey wrote: > Actually, it looks like is peephole2 that is eliminating the > instructions (and .cfi psuedo-ops). > I am not entirely sure I need the code or if I just need the .cfi > psuedo-ops and that I need the code to generate the .cfi stuff. Don't create any new edges. That doesn't make sense for unwind info. You don't need unwind info for unreachable code. You do need enough info to be able to unwind from any instruction in general, or just from call sites for C++ EH. This means you need to be able to calculate the value of sp from the caller, which you can only get from the r12/drap value. So you need to keep the instructions and the cfi directives generated from them. Looking at the i386 port, in the i386.md file, I see a few peepholes check the value of RTX_FRAME_RELATED_P and fail if it is set. It appears that you need to do something similar in the mips.md file to prevent these instructions from being deleted by the peephole2 pass. Jim
Re: Controlling instruction alternative selection
On 07/30/2015 09:54 PM, Paul Shortis wrote: > Resulting in ... > error: unable to find a register to spill in class ‘GP_REGS’ > > enabling lra and inspecting the rtl dump indicates that both > alternatives (R and r) seem to be equally appealing to the allocater so > it chooses 'R' and fails. The problem isn't in lra, it is in reload. You want lra to use the three address instruction, but you then want reload to use the two address alternative. > Using constraint disparaging (?R) eradicates the errors, but of course > that causes the 'R' three address alternative to never be used. You want to disparage the three address alternative in reload, but not in lra. There is a special code for that, you can use ^ instead of ? to make that happen. That may or may not help though. There is also a hook TARGET_CLASS_LIKELY_SPILLED_P which might help. You should try defining this to return true for the 'R' class if it doesn't already. Jim
Re: RETURN_ADDRESS_POINTER_REGNUM Macro
On 07/23/2015 11:09 PM, Ajit Kumar Agarwal wrote: > From the description of the definition of the macro > RETURN_ADDRESS_POINTER_REGNUM , > Does this impact the performance or correctness of the compiler? On what > cases it is applicable to define for the given architecture? This is used to help implement the __builtin_return_address builtin function. There is some default code for this, so it may work OK without defining RETURN_ADDRESS_POINTER_REGNUM. If the default code doesn't work, then you may need to define RETURN_ADDRESS_POINTER_REGNUM. Usually, it is trivial to make __builtin_return_address work for leaf functions, non-trivial to make it work for non-leaf functions, and difficult to impossible to make it work for level != 0. You will have better luck using the unwind info than __builtin_return_address. This is an optional builtin function, so there is no performance or correctness issue here. Jim
Re: How to express this complicated constraint in md file
On 07/16/2015 01:32 PM, Dmitry Grinberg wrote: > WUMUL x, y which will multiply 32-bit register x by 32-bit > register y, and produce a 64-bit result, storing the high bits into > register x and low bits into register y You can rewrite the RTL to make this easier. You can use a parallel to do for instance [(set (reg:SI x) (truncate:SI (lshiftrt:DI (mult:DI (sign_extend:DI (reg:SI x)) (sign_extend:DI (reg:SI y))) (const_int 32))) (set (reg:SI y) (truncate:SI (mult:DI (sign_extend:DI (reg:SI x)) (sign_extend:DI (reg:SI y)] Now you have only 32-bit regs, and you can use matching constraints to make it work. The truncate lshiftrt is the traditional way to write a mulX_highpart pattern. Some parts of the optimizer may recognize this construct and know how to handle it. For the second set, you might consider just using (mult:SI ...) if that gives the correct result, or you can use a subreg or whatever. The optimizer is unlikely to generate this pattern on its own, but you can have an expander and/or splitter that generates it. Use zero_extend instead of sign_extend if this is an unsigned widening multiply. You probably want to generate two SImode temporaries in the expander, and copy the input regs into the temporaries, as expanders aren't supposed to clobber input regs. If you want a 64-bit result out of this, then you would need extra instructions to combine x and y into a 64-bit output. Another way to do this is to arbtrarily force the result into a register pair, then you can use a subreg to match the high part or the low part of that register pair for the inputs. [(set (reg:DI x) (mult:DI (sign_extend:DI (subreg:SI (reg:DI x) 0)) (sign_extend:DI (subreg:SI (reg:DI x) 1] The subreg numbers may be reversed if this is little word endian instead of big word endian. You might need extra setup instructions to create the register pair first. Create a DI temp for the output, move the inputs into the high/low word of the DI temp, and then you can do the multiply on the DI tmep. Jim
Re: configure.{in -> ac} rename (commit 35eafcc71b) broke in-tree binutils building of gcc
On Tue, Jul 14, 2015 at 10:08 AM, H.J. Lu wrote: > Combined tree is useful when the latest binutils is needed by GCC. If you build and install binutils using the same --prefix as used for gcc, then gcc will automatically find that binutils and use it. You don't need combined trees to make this work. If you do still like combined trees, then I'd suggest putting binutils and gcc into the same dir, instead of placing binutils into gcc, and then add a simple Makefile that will configure, build and install binutils and then likewise for gcc. Jim
Re: configure.{in -> ac} rename (commit 35eafcc71b) broke in-tree binutils building of gcc
On 07/14/2015 02:13 AM, Jan Beulich wrote: > I was quite surprised for my gcc 4.9.3 build (using binutils 2.25 instead > of 2.24 as I had in use with 4.9.2) to fail in rather obscure ways. in-tree/combined-tree builds aren't recommended anymore, and hence aren't well maintained anymore. That is an anachronism from the old Cygnus days. I still find it useful to drop newlib into gcc so it can be built like the other gcc libs, but otherwise I wouldn't recommend combining anything. Jim
Re: Question about find modifiable mems
On 06/02/2015 11:39 PM, shmeel gutl wrote: > find_modifiable_mems was introduced to gcc 4.8 in september 2012. Is > there any documentation as to how it is supposed to help the haifa > scheduler? The patch was submitted here https://gcc.gnu.org/ml/gcc-patches/2012-08/msg00155.html and this message contains a brief explanation of what it is supposed to do. The explanation looks like a useful optimization, but perhaps it is triggering in cases when it shouldn't. Jim
Re: [RFC] Kernel livepatching support in GCC
On 05/28/2015 01:39 AM, Maxim Kuvyrkov wrote: > Hi, > > Akashi-san and I have been discussing required GCC changes to make kernel's > livepatching work for AArch64 and other architectures. At the moment > livepatching is supported for x86[_64] using the following options: "-pg > -mfentry -mrecord-mcount -mnop-mcount" which is geek-speak for "please add > several NOPs at the very beginning of each function, and make a section with > addresses of all those NOP pads". FYI, there is also the darwin/rs6000 -mfix-and-continue support, which adds 5 nops to the prologue. This was a part of a gdb feature, to allow one to load a fixed function into a binary inside the debugger, and then continue executing with the fixed code. It sounds like your kernel feature is doing something very similar. If you are making this a generic feature, then maybe the darwin/rs6000 -mfix-and-continue support can be merged with it somehow. Jim
Re: Is there a way to adjust alignment of DImode and DFmode?
On 05/20/2015 10:00 AM, H.J. Lu wrote: > By default, alignment of DImode and DFmode is set to 8 bytes. > Intel MCU psABI specifies alignment of DImode and DFmode > to be 4 bytes. I'd like to make get_mode_alignment to return > 32 bits for DImode and DFmode. Is there a way to adjust alignment > of DImode and DFmode via ADJUST_ALIGNMENT? I see that i386-modes.def already uses ADJUST_ALIGNMENT to change the alignment of XFmode to 4 for ilp32 code. ADJUST_ALIGNMENT should work the same for DImode and DFmode. Did you run into a problem when you tried it? Jim
Re: ldm/stm bus error
On 05/18/2015 02:05 AM, Umesh Kalappa wrote: > Getting a bus/hard error for the below case ,make sense since ldm/stm > expects the address to be word aligned . > --with-pkgversion='Cisco GCC c4.7.0-p1' --with-cisco-patch-level=1 The FSF doesn't support gcc-4.7.0 anymore. Generally, we only support the last two versions, which is 4.9 and 5.1. We also don't support vendor compilers. So you will have to reproduce in an FSF GCC 4.9 or later tree if you want to get anyone here interested. It appears that this is already fixed in gcc-4.9 though. The Cisco compiler is a MontaVista compiler with patches. You could try reporting this to MontaVista. Or you can try to track down the problem yourself. You can use bisection to find the patch that fixed it. Checkout a copy of the FSF gcc tree halfway between gcc-4.7 and gcc-4.9, and check to see if it has the bug. That narrows the search space by half. Then repeat on the remaining half until you have reduced it down to a few days. Then you can check ChangeLog entries for a likely patch to the ARM backend. Or you can continue the bisection until you have it down to one patch. You can then backport the patch to your gcc sources and/or point MontaVista at it. This bisection process can be scripted. You can find an example in the gcc contrib/reghunt directory. I don't have experience using this script though, as I like to do it by hand. If you have a sufficient understanding of gcc, you may be able to find the patch simply by looking at what gcc-4.9 is doing differently than gcc-4.7, and mapping that back to a ChangeLog entry and a patch. Jim
Re: Question about macro _GLIBCXX_RES_LIMITS in libstdc++ testsuite
On 05/17/2015 01:16 AM, Bin.Cheng wrote: > On Sat, May 16, 2015 at 5:35 PM, Hans-Peter Nilsson wrote: >> On Thu, 23 Apr 2015, Bin.Cheng wrote: >>> Hi, >>> In libstdc++ testsuite, I noticed that macro _GLIBCXX_RES_LIMITS is >>> checked/set by GLIBCXX_CHECK_SETRLIMIT, which is further guarded by >>> GLIBCXX_IS_NATIVE as below: The setrlimit checks were made dependent on GLIBCXX_IS_NATIVE on Aug 9, 2001. https://gcc.gnu.org/ml/gcc-patches/2001-08/msg00536.html This is 3 days after the feature was added. This was 14 years ago, so people might not remember exactly why the change was made. There was probably no specific reason for this, other than a concern that it might not work cross, and/or wasn't considered worth the effort to make it work cross at the time. It does look like this can work cross, at least for a cross to a linux target. For a cross to a bare metal target, it should be OK to run the tests, they will just fail and disable the macros. Someone just needs write the patches to make it work and test it. You could try submitting a bug report if you haven't already done so. Jim
Re: [OR1K port] where do I change the function frame structure
On 05/05/2015 05:19 PM, Peter T. Breuer wrote: Please .. where (in what file, dir) of the gcc (4.9.1) source should I rummage in order to change the sequence of instructions eventually emitted to do a function call? Are you trying to change the caller or the callee? For the callee, or1k_compute_frame_size calculates the frame size, which depends on the frame layout. or1k_expand_prologue emits the RTL for the prologue. or1k_expand_epilogue emits the RTL for the epilogue. There are also a few other closely related helper functions. These are all in gcc/config/or1k/or1k.c. For the caller, I see that the or1k port already sets ACCUMULATE_OUTGOING_ARGS, so there should be no stack pointer inc/dec around a call. Only in the prologue/epilogue. Jim
Re: Build oddity (Mode = sf\|df messages in output)
On 04/30/2015 03:59 PM, Steve Ellcey wrote: I am curious, has anyone started seeing these messages in their GCC build output: Mode = sf\|df Suffix = si\|2\|3 That comes from libgcc/config/t-hardfp. This is used to generate a list of function names from operator, mode, and suffix, e.g. fixsfsi and adddf3. Jim
Re: Question about perl while bootstrapping gcc
On 04/16/2010 11:10 AM, Dominique Dhumieres wrote: I use to build gcc with a command line such as make -j2>& somelogfile& I recently found that if I logout, the build fails with perl: no user 501 Try "nohup make ...". See the man page or info manual for nohup. Jim
Re: Error while building GCC 4.5 (MinGW)
On Mon, 2010-04-12 at 08:34 -0700, Name lastlong wrote: > Please check the following relevant information present in the config.log > as follows: Now that you can see what is wrong, you should try to manually reproduce the error. Check the libraries to see if they are OK, and if the right versions of the libraries are being linked in. Look to see where the undefined references are coming from. Etc. > Please let me know if there are any particular steps to be followed to build > mingw toolchain with mpc libraries. I have no idea. I haven't built a mingw toolchain anytime recently. Jim
Re: GCC documentation: info format
On 04/09/2010 05:08 AM, christophe.ja...@ouvaton.org wrote: I am currently trying to include GCC documentation into gNewSense distribution, in info format. The binutils response to the same question reminds me that the same answer works here. There are pre-built info files in our official releases. You can grab them from there. I forgot about this because I'm working from the development sources most of the time, and we don't have pre-built copies there. Jim
Re: Error while building GCC 4.5 (MinGW)
On 04/08/2010 07:21 AM, Name lastlong wrote: =error checking for the correct version of the gmp/mpfr/mpc libraries... no configure: error: Building GCC requires GMP 4.2+, MPFR 2.3.1+ and MPC 0.8.0+. error Check the config.log file for details. A successful build should show something like this configure:5634: checking for the correct version of the gmp/mpfr/mpc libraries configure:5665: gcc -o conftest -g -O2conftest.c -lmpc -lmpfr -lgmp >&5 configure:5665: $? = 0 configure:5666: result: yes Your file should have an error here. Jim
Re: GCC documentation: info format
On 04/09/2010 05:08 AM, christophe.ja...@ouvaton.org wrote: Where may I find gcc-vers.texi? It is created by the install.texi2html shell script, which also creates the HTML output files that go on the web site. You can probably modify this script to generate info files instead, but as Diego mentioned, the recommended way to produce info files is to configure and build a full source tree. Don't forget about the libstdc++-v3 docs which are not in the docs-sources.tar.gz file. Also, don't forget about the other docs which are not in the gcc/doc directory. "make info" in a build tree takes care of this stuff for you. Jim
Re: lower subreg optimization
On 04/07/2010 10:48 PM, roy rosen wrote: I saw in arm/neon.md that they have a similar problem: ... Their solution is also not complete. What is the proper way to handle such a case and how do I let gcc know that this is a simple move instruction so that gcc would be able to optimize it out? The only simple solution at the moment is the one that the ARM port is using. You avoid emitting the move when you got the lucky reg-alloc result, and you emit the move when you aren't lucky. As the neon.md comment suggests, and as Ian Taylor mentioned in his response, a possible solution is to modify the lower-subreg.c pass somehow so that it no longer splits subregs of vector modes, possibly controlled by a hook. We might be able to modify the register allocator to look for this pattern, to increase the chances of getting the good reg-alloc result, but the lower-subreg.c change is probably better. Another solution might be to add a pass (or modify an existing one like regmove.c) to try to put things back together again, but this is probably also not as good as the lower-subreg.c change. Jim
Re: Help with an Wierd Error
On 04/02/2010 11:02 AM, balaji.i...@gtri.gatech.edu wrote: /opt/or32/lib/gcc/or32-elf/4.2.2/../../../../or32-elf/lib/crt0.o: In function `loop': (.text+0x64): undefined reference to `___bss_start' It looks like a case of one-too-many underscores prepended to symbol names. The default for ELF is to not prepend an underscore. The default for COFF is to prepend an underscore. Check USER_LABEL_PREFIX which should be empty for ELF (default definition in defaults.h file). Check ASM_OUTPUT_LABELREF which should use %U and not an explicit underscore (see defaults.h file). Check for usage of the -fleading-underscores option (should not be used). Check for header file inclusion to see if something is out of place, such as a use of svr3.h when svr4.h should be used. Also check to make sure that gcc, gas, and gld all agree on whether a symbol is prepended with an underscore or not. If gcc does it but gas doesn't, then C code calling assembly language code may fail because of the mismatch. Jim
Re: lower subreg optimization
On 04/06/2010 02:24 AM, roy rosen wrote: (insn 33 32 34 7 a.c:25 (set (subreg:V2HI (reg:V4HI 114) 0) (plus:V2HI (subreg:V2HI (reg:V4HI 112) 0) (subreg:V2HI (reg:V4HI 113) 0))) 118 {addv2hi3} (nil)) Only subregs are decomposed. So use vec_select instead of subreg. I see you already have a vec_concat to combine the two v2hi into one v4hi, so there is no need for the subreg in the dest. You should try eliminating that first and see if that helps. If that isn't enough, then replace the subregs in the source with vec_select operations. Jim
Re: Fwd: constant hoisting out of loops
On Sun, 2010-03-21 at 03:40 +0800, fanqifei wrote: > foor_expand_move is changed and it works now. > However, I still don't understand why there was no such error if below > condition was used and foor_expand_move was not changed. > Both below condition and "(register_operand(operands[0], SImode) || > register_operand(operands[1],SImode)) ..." does not accept mem&&mem. The define_expand is used for generating RTL. The RTL expander calls the define_expand, which checks for MEM&CONST, and then falls through generating the mem copy insn. The define_insn is used for matching RTL. After it has been generated, we look at the movsi define_insn, and see that MEM&MEM doesn't match, so you get an error for unrecognized RTL. The define_expand must always match the define_insn(s). They are used in different phases, and they aren't checked against each other when gcc is built. If there is a mismatch, then you get a run-time error for unrecognized rtl. Jim