Re: Why not contribute? (to GCC)
Alfred M. Szmidt writes: You are still open to liabilities for your own project, if you incorporate code that you do not have copyright over, the original copyright holder can still sue you That's irrlevent. By signing the FSF's document I'd be doing nothing to reduce anyone's ability to sue me, I could only be increasing them. And please don't try to argue that's not true, because I have no reason to believe you. Only a lawyer working for myself would be in a position to convince me otherwise, but if I have to go that far, it's clearly not worth it. The debate over legalities has already derailed this thread, so let me try to put it another way. Years ago, I was asked to sign one of these documents for some public domain code I wrote that I never intended to become part of a FSF project. Someone wanted to turn it a regular GNU project with a GPL license, configure scripts, a cute acronym and all that stuff. I said no. It's public domain, take it or leave it. Why I should I sign some legally binding document for some code I had in effect already donated to the public? How would you feel if some charity you donated money to came back with a piece of paper for you to sign? Submitting a hypothetical patch to GCC isn't much different to me. For some people having their code in the GCC distribution is worth something. For me it's not. For them it's a fair trade. For me it's a donation. We are all humans, patches fall through the cracks. Would you like to help keeping an eye out for patches that have fallen through? Would anyone else like to do this? As I said, I was just listing the reasons why I don't contribute. I'm not arguing that anything should be changed or can be changed. However, what I do know is that excuses won't make me or anyone else more likely to contribute to GCC. Please refer to GCC as a free software project, it was written by the GNU project and the free software community. Oh, yah, forgot about that one. Political stuff like this another reason not to get involved with GCC. Ross Ridge
Re: Why not contribute? (to GCC)
Ross Ridge writes: Years ago, I was asked to sign one of these documents for some public domain code I wrote that I never intended to become part of a FSF project. Someone wanted to turn it a regular GNU project with a GPL license, configure scripts, a cute acronym and all that stuff. I said no. It's public domain, take it or leave it. Why I should I sign some legally binding document for some code I had in effect already donated to the public? Richard Kenner writes: Because that's the only way to PUT something in the public domain! That's absurd and beside the point. How would you feel if some charity you donated money to came back with a piece of paper for you to sign? A closer analogy: a charity receives an unsolicited script for a play from you. No, that's not a closer analogy. As I said, I never intended for my code to become part of an FSF project. I didn't send them anything unsolicited. I'm contributing to this thread solely to answer the question asked. Either take the time to read what I've written and use it try to understand why I don't and others might not contribute to GCC, or please just ignore it. Your unsubstantiated and irrelevent legal opinions aren't helping. Ross Ridge
Re: Why not contribute? (to GCC)
Manuel López-Ibáñez writes: What reasons keep you from contributing to GCC? The big reason the copyright assignment. I never even bothered to read it, but as I don't get anything in return there's no point. Why should put obligaitons on myself, open myself up to even unlikely liabilities, just so my patches can merged into the official source distribution? I work on software on my own time to solve my own problems. I'm happy enough not horde it and give it away for free, but it doesn't make much difference to me if anyone else actually ends up using it. I can have my own patched version of GCC that does what I want without signing anything. Another reason is the poor patch submission process. Why e-mail a patch if I know, as a new contributor, there's a good chance it won't even be looked at by anyone? Why would I want to go through I a process where I'm expected to keep begging until my patch finally gets someone's attention? I also just don't need the abuse. GCC, while not the most of hostile of open source projects out there, it's up there. Manuel López-Ibáñez's unjustified hostility towards Michael Witten in this thread is just a small example. Finally, it's also a lot of work. Just building GCC can be pain, having to find upto date versions of a growing list of math libraries that don't benefit me in the slightest way. Running the test suite takes a long time, so even trivial patches require a non-trivial amount of work. Anything more serious can take a huge ammount of time. I've abandonded projects once I realized it would be lot quicker to find some other solution like using assembly, rather than trying to get GCC to do what I wanted it to do. Now these are just the main reasons why I don't contribute to GCC. I'm not arguing that any these issues need to be or can be fixed. If I had what I thought where good solutions that would be better overall to GCC then I'd have suggested them long ago. I will add, that I don't think code quality is a problem with GCC. I hate the GNU coding style as much as anyone, but it's used consistantly and that's what matters. Compared other open and closed projects I've seen it's as easy to understand and maintain as anything. GNU binutils is a pile of poo, but I don't know of any codebase the size of GCC that's as nice to work with. Ross Ridge
Re: [PATCH][GIT PULL][v2.6.32] tracing/x86: Add check to detect GCC messing with mcount prologue
Andrew Haley writes: Alright. So, it is possible in theory for gcc to generate code that only uses -maccumulate-outgoing-args when it needs to realign SP. And, therefore, we could have a nice option for the kernel: one with (mostly) good code density and never generates the bizarre code sequence in the prologue. The best option would be for the Linux people to fix the underlying problem in their kernel sources. If the code no longer requested that certain automatic variables be aligned, then not only would this bizarre code sequence not be emitted, the unnecessary stack alignment would disapear as well. The kernel would then be free to choose to use whatever code generation options it felt was appropriate. Ross Ridge
Re: dg-error vs. i18n?
Ross Ridge wrote: The correct fix is for GCC not to intentionally choose to rely on implementation defined behaviour when using the C locale. GCC can't portably assume any other locale exists, but can portibly and easily choose to get consistant output when using the C locale. Joseph S. Myers writes: GCC is behaving properly according to the user's locale (representing English-language diagnostics as best it can - remember that ASCII does not allow good representation of English in all cases). This is an issue of style, but as I far as I'm concerned using these fancy quotes in English locales is unnecessary and unhelpful. The problem here is not a bug in the compiler proper, it is an issue with how to test the compiler portably - that is, how the testsuite can portably set a locale with English language and ASCII character set in order to test the output the compiler gives in such a locale. It's a design flaw in GCC. The C locale is the only locale that GCC can use to reliably and portably get consistant output across all ASCII systems and so should be the locale used to achieve consistant output. GCC can simply choose to restrict it's output to ASCII. It's not in any way being forced by POSIX to output non-ASCII characters, or for that matter to treat the C locale as an English locale. Ross Ridge
Re: dg-error vs. i18n?
Eric Blake writes: The correct workaround is indeed to specify a locale with specific charset encodings, rather than relying on plain C (hopefully cygwin will support C.ASCII, if it does not already). The correct fix is for GCC not to intentionally choose to rely on implementation defined behaviour when using the C locale. GCC can't portably assume any other locale exists, but can portibly and easily choose to get consistant output when using the C locale. As far as I know, the hole is intentional. But if others would like me to, I am willing to pursue the action of raising a defect against the POSIX standard, requesting that the next version of POSIX consider including a standardized name for a locale with guaranteed single-byte encoding. I don't see how a defect in POSIX is exposed here. Nothing in the standard forced GCC to output multi-byte characters when nl_langinfo(CHARSET) returns something like utf-8. GCC chould just as easily have choosen to output these quotes as single-byte characters when nl_langinfo(CHARSET) returns something like windows-1252, or some other non-ASCII single-byte characters when it returned iso-8859-1. Ross Ridge
Re: Add support for the Win32 hook prologue (try 3)
Stefan Dösinger writes: On a partly related topic, I think the Win64 ABI requires that the first function is two bytes long, and there at least 6 bytes of slack before the function. Does gcc implement that? As far as I can tell the Win64 ABI doesn't have either of these requirements. Microsoft's compiler certainly doesn't guarantee that functions begin with two byte instructions, and the x64 Software Conventions document gives examples of prologues with larger initial instructions: http://msdn.microsoft.com/en-us/library/tawsa7cb(VS.80).aspx Mind you, last I checked, GCC didn't actually follow the ABI requirements for prologues and epilogues given in the link above, but that only breaks ABI unwinding. Ross Ridge
Re: MSVC hook function prologue
Paolo Bonzini writes: The naked attribute has been proposed and bashed to death multiple times on the GCC list too. No, not really. It's been proposed a few times, but the discussion never gets anywhere because the i386 maintainers quickly put their foot down and end it. That hasn't stopped other ports from implementing a naked attribute or for that matter developers like me creating their own private implementations. Ross Ridge
Re: MSVC hook function prologue
Paolo Bonzini writes: Are there non-Microsoft DLLs that expect to be hooked this way? If so, I think the patch is interesting for gcc independent of whether it is useful for Wine. Stefan Dösinger writes: I haven't seen any so far. ... If this patch is essentially only for one application, maybe the idea of implementing a more generally useful naked attribute would be the way to go. I implemented a naked attribute in my private sources to do something similar, although supporting hookable prologues was just a small part of its more general use in supporting an assembler based API. Ross Ridge
Re: CVS/SVN binutils and gcc on MacOS X?
Stefan Dösinger writes: Unfortunately I need support for the swap suffix in as, so using the system binaries is not an option. Is the only thing I can do to find the source of the as version, backport the swap suffix and hope for the best? Another option might be a hack like this: (define_insn vswapmov [(set (match_operand:SI 0 register_operand =r) (match_operand:SI 1 register_operand r)) (unspec_volatile [(const_int 0)] UNSPECV_VSWAPMOV)] { #ifdef HAVE_AS_IX86_SWAP return movl.s\t{%1, %0|%0, %1}; #else if (true_regnum(operand[0]) == DI_REG true_regnum(operand[1]) == DI_REG) return ASM_BYTE 0x8B, 0xFF; if (true_regnum(operand[0]) == BP_REG true_regnum(operand[1]) == SP_REG) return ASM_BYTE 0x8B, 0xEC; gcc_unreachable(); #endif } [(set_attr length 2) (set_attr length_immediate 0) (set_attr modrm 0)]) It's not pretty but you won't be dependent on binutils. Ross Ridge
Re: Add crc32 function to libiberty
DJ Delorie writes: I didn't reference the web site for the polynomial, just for background. To be honest, I'm not sure what the polynomial is. As the comments explain, the algorithm I used is precisely taken from gdb, in remote.c, and is intended to produce the same result. Does anybody on the gdb side know the polynomial or any other information? Your code uses the (one and only) CRC-32 polynomial 0x04c11db7, so just describing it as the CRC-32 function should be sufficient documentation. It's the same CRC function as used by PKZIP, Ethernet, and chksum. It's not compatible with the Intel CRC32 instruction which uses the CRC-32C polynomial (0x1EDC6F41). Ross Ridge
Re: Ideas for Google Summer of Code
Paolo Bonzini writes: Regarding the NVIDIA GPU backend, I think NVIDIA is not yet distributing details about the instruction set unlike ATI, is it? In this case, I think ATI would be a better match. I think a GPU backend would be well beyond the scope of a Summer of Code project. GPUs don't have normal stacks and addressing support is limitted. Another possibility is to analyze OpenCL C and try to integrate its features in GCC as much as possible. This would include 1) masking/swizzling support for GCC's generic vector language extension; A project that started and ended here would give GCC, in particular GCC's Cell SPU port, the only major required functionality in the OpenCL language, outside the runtime, that GCC is missing. 2) half-precision floats; Do you mean just conversion only support, like Sandra Loosemore's proposed ARM patch, or full arithmetic support like any other scalar or vector type? Ross Ridge
Re: Ideas for Google Summer of Code
Joe Buck writes: I'm having trouble finding that document, I don't see a link to it on that page. Maybe I'm missing something obvious? Sticking nvidia ptx into Google turned up this document: http://www.nvidia.com/object/io_1195170102263.html It's an intermediate language, so isn't tied to any particular NVIDIA GPU. I beleive there's something similar for AMD/ATI GPUs. btw. The computational power of Intel's integrated GPUs is pretty dismal, so I don't think GCC port targetting them would be very useful. Ross Ridge
Re: GCC OpenCL ?
Mark Mitchell writes: That's correct. I was envisioning a proper compiler that would take OpenCL input and generate binary output, for a particular target, just as with all other GCC input languages. That target might be a GPU, or it might be a multi-core CPU, or it might be a single-core CPU. I have a hard time seeing why this would be all that worthwhile. Since the instruction sets for AMD, NVIDIA or current Intel GPUs are trade scretes, GCC won't be able to generate binary output for them. OpenCL is designed for heterogenous systems, compiling for multi-core or single-core CPUs would only be useful as a cheap fallback implementation. This limits a GCC-based OpenGL implementation to achieving it's primary purpose with just Cell processors and maybe Intel's Larrabee. Is that what you envision? Without AMD/NVIDIA GPU support it doesn't sound all that useful to me. Ross Ridge
Re: GCC OpenCL ?
Basile STARYNKEVITCH writes: It seems to me that some specifications seems to be available. I am not a GPU expert, but http://developer.amd.com/documentation/guides/Pages/default.aspx contains a R8xx Family Instruction Set Archictectire document at http://developer.amd.com/gpu_assets/r600isa.pdf and at a very quick first glance (perhaps wrongly) I feel that it could be enough to design write a code generator for it. Oh, ok, that makes a world of difference. Even with just AMD GPU support a GCC-based OpenCL implementation becomes a lot more practical. Ross Ridge
Re: GCC OpenCL ?
Ross Ridge wrote: Oh, ok, that makes a world of difference. Even with just AMD GPU support a GCC-based OpenCL implementation becomes a lot more practical. Michael Meissner writes: And bear in mind that x86's with GPUs are not the only platform of interest I never said anything about x86's and I already mentioned the Cell. Regardless, I don't think an GCC-based OpenCL implementation that didn't target GPUs would be that useful. Ross Ridge
Re: Problem with x64 SEH macro implementation for ReactOS
Timo Kreuzer writes: The problem of the missing Windows x64 unwind tables is already solved! I generate them from DWARF2 unwind info in a postprocessing step. Ok. The Windows unwind opcodes seemed to be much more limitted than DWARF2's that I wouldn't have thought this approach would work. The problem that I have is simply how to mark the try / except / finally blocks in the code with reference to each other, so I can also generate the SCOPE_TABLE data in that post processing step You can output address pairs to a special section to get the mapping you need. Something like: asm(.section .seh.data, \n\\n\t .quad %0, %1\n\t .text : : i (addr1), i (addr2)); Unfortunately, I don't think section stack directives work on PE-COFF targets, so you'd have to assume the function was using the .text section. btw. don't rely on GCC putting adjacent asm statements together like you did in your orignal message. Make them a single asm statement. Note that the SCOPE_TABLE structure is part of Microsoft's internal private SEH implementation. I don't think it's a good idea to use or copy Microsoft's implementation. Create your own handler function and give it whatever data you need. Ross Ridge
Re: Problem with x64 SEH macro implementation for ReactOS
Kai Tietz writes: Hmm, yes and no. First the exception handler uses the .pdata and .xdata section for checking throws. But there is still the stack based exception mechanism as for 32-bit IIRC. No. The mechanism is completely different. The whole point of the unwind tables is to remove the overhead of maintaining the linked list of records on the stack. It works just like DWARF2 exceptions in this respect. No, this isn't that curious as you mean. In the link you sent me, it is explained. The exception handler tables (.pdata/.xdata) are optional and not necessarily required. This is what Microsoft's documentation says: Every function that allocates stack space, calls other functions, saves nonvolatile registers, or uses exception handling must have a prolog whose address limits are described in the unwind data associated with the respective function table entry. In this very limited case RtlUnwindEx() can indeed unwind a function without it having any unwind info associated with it. If RtlUnwindEx() can't find the unwind data for a function then it assumes that the stack pointer points directly at the return address. To unwind through the function it pops the top of stack to get the next frame's RIP and RSP values. Otherwise RltUnwindEx() needs the unwind information. The restrictions on the format of prologue and epilogue only exist to making handle the case where the current RIP points to the prologue or epilogue much easier. Without the unwind info RtlUnwindEx() has no way of knowing where the prologue is. There's a very detailed explaination on how Windows x64 exceptions work, including RltUnwindEx() on this blog: http://www.nynaeve.net/?p=113 But in general I agree, that the generation of .pdata/.xdata sections would be a good thing for better support of MS abis by gcc. I'm not advocating that they should be added to GCC now. I'm just pointing out that without them 64-bit SEH macros will be of limitted use. Ross Ridge
Re: Problem with x64 SEH macro implementation for ReactOS
Kai Tietz writes: I am very interested to learn, which parts in calling convention aren't implemented for w64? Well, maybe I'm missing something but I can't any see code in GCC for generating prologues, epilogues and unwind tables in the format required by the Windows x64 ABI. http://msdn.microsoft.com/en-us/library/tawsa7cb.aspx I am a bit curious, as I found that the unwind mechanism of Windows itself working quite well on gcc compiled code, so I assumed, that the most important parts of its calling convention are implemented. How exactly are you testing this? Without SEH support Windows wouldn't ordinarily ever need to unwind through GCC compiled code. I assumed that's why it was never implemented. Ross Ridge
Re: Problem with x64 SEH macro implementation for ReactOS
Kai Tietz writes: Well, you mean the SEH tables on stack. No, I mean the ABI required unwind information. Well, those aren't implemented (as they aren't for 32-bit). 64-bit SEH handling is completely different from 32-bit SEH handling. In the 64-bit Windows ABI exceptions are handled using unwind tables similar in concept to DWARF2 exceptions. There are no SEH tables on the stack. In the 32-bit ABI exceptions are handled using a linked list of records on the stack, similar to SJLJ exceptions. But the the unwinding via RtlUnwind and RtlUnwindEx do their job even for gcc compiled code quite well I don't see how it would be possible in the general case. Without the unwind talbes Windows doesn't have the required information to unwind through GCC compiled functions. Ross Ridge
Re: Problem with x64 SEH macro implementation for ReactOS
Timo Kreuzer wrote: I am working on x64 SEH for ReactOS. The idea is to use .cfi_escape codes to mark the code positions where the try block starts / ends and of the except landing pad. The emitted .eh_frame section is parsed after linking and converted into Windows compatible unwind info / scope tables. This works quite well so far. Richard Henderson writes: I always imagined that if someone wanted to do SEH, they'd actually implement it within GCC, rather than post-processing it like this. Surely you're making things too hard for yourself with these escape hacks I assume he's trying to create the equivilent of the existing macro's for handling Windows structured exceptions in 32-bit code. The 32-bit macros don't require any post-processing and are fairly simple. Still even with the post-processing, Timo Kreuzer's solution would be heck of a lot easier to implement then adding SEH support to GCC. The big problem is that the last time I checked GCC wasn't generating the Windows x64 ABI required prologue, epilogues or unwind info for functions. Windows won't be able to unwind through GCC compiled functions whether the macros are used or not. I think the solution to the specific problem he mentioned, connecting nested functions to their try blocks, would be to emit address pairs to a special section. Ross Ridge
Re: How to teach gcc, that registers are clobbered by api calls?
Kai Tietz writes: I read that too, but how can I teach gcc to do this that registers are callee-saved? I tried it by use of call_used part in regclass.c, but this didn't worked as expected. I think you need to modify CALL_USED_REGISTERS and/or CONDITIONAL_REGISTER_USAGE in i386.h. Making any changes to regclass.c is probably not the right thing to do. Ross Ridge
Re: How to teach gcc, that registers are clobbered by api calls?
H.J. Lu writes: Are r10-r15 callee-saved in w64ABI? Here's what Microsoft's documentation says: Caller/Callee Saved Registers The registers RAX, RCX, RDX, R8, R9, R10, R11 are considered volatile and must be considered destroyed on function calls (unless otherwise safety-provable by analysis such as whole program optimization). The registers RBX, RBP, RDI, RSI, R12, R13, R14, and R15 are considered nonvolatile and must be saved and restored by a function that uses them Other parts of the documentation state that XMM0-XMM5 are volatile (caller-saved), while XMM6-XXM15 are non-volatile (callee-saved). Ross Ridge
Re: atomic accesses
Segher Boessenkool writes: ... People are relying on this undocumented GCC behaviour already, and when things break, chaos ensues. GCC has introduced many changes over the years that have broken many programs that have relied on undocumented or unspecified behaviour. You won't find much sympathy for who people assume that GCC must behave in some way where there is no requirement for it to do so. If we change this to be documented behaviour, at least it is clear where the problem lies (namely, with the compiler), and things can be fixed easily. I don't think you'll find any support for imposing a requirement on GCC that would always require it to use an atomic instruction when there is alternative instruction or sequence of instructions that would be faster and/or shorter. I think your best bet a long these lines would be adding __sync_fetch() and __sync_store() builtins, but doing so would be more difficult than a simple documentation change. Ross Ridge
Re: [PATCH][4.3] Deprecate -ftrapv
Ross Ridge: With INTO I don't see any way distignuish the SIGSEGV it generates on Linux from any of the myriad other ways a SIGSEGV can be generated. Paolo Bonzini writes: sc.eip == 0xCE (if I remember x86 opcodes well :-) as I'm going by heart...) The INTO instruction generates a trap exception, just like INT 4 would, so the return address on the stack points to the instruction after the INTO. That's similar to how Java traps SIGFPEs and transform them to zero-divide exceptions, IIRC. Floating point exceptions are fault exceptions so the return address points to the faulting instruction. At the risk of my curiousity getting me into more trouble, could any one explain to me how to access these eip and trapno members from a signal handler on Linux? I can't find any relevent documention with man nor Google. Ross Ridge
Re: [PATCH][4.3] Deprecate -ftrapv
Mark Mitchell writes: However, I don't think doing all of that work is required to make this feature useful to people. You seem to be focusing on making -ftrapv capture 100% of overflows, so that people could depend on their programs crashing if they had an overflow. That might be useful in two circumstances: (a) getting bugs out (though for an example like the one above, I can well imagine many people not considering that a bug worth fixing), and (b) in safety-critical situations where it's better to die than do the wrong thing. Richard Kenner writed: You forgot the third: if Ada is to use it rather than its own approach, it must indeed be 100% reliable. Actually, that's a different issue than catching 100% of overflows, which apparently Ada doesn't require. Robert is correct that if it's sufficiently more efficient than Ada's approach, it can be made the default, so that by default range-checking is on in Ada, but not in a 100% reliable fashion. On the issue of performance, out of curiosity I tried playing around with the IA-32 INTO instruction. I noticed two things, the first was that instruction wasn't supported in 64-bit mode, and the second was that it on the Linux I was using, it generated SIGSEGV signal that was indistinguishable from any other SIGSEGV. If Ada needs to be able to catch and distinguish overflow exceptions, this and possibile other cases of missing operating support might make processor specific overlow support detrimental. Ross Ridge
Re: [PATCH][4.3] Deprecate -ftrapv
Robert Dewar write: Usually there are ways of telling what is going on at a sufficiently low level, but in any case, code using the conditional jump instruction (jo/jno) is hugely better than what we do now (and it is often faster to usea jo than into). My point is that using INTO or some other processor's overlow mechanism that requires operating system support wouldn't necessarily be better for Ada, even it performs better (or uses less space) than the alternatives. Having the program crash with a vague exception would meet the requirements of -ftrapv, but not Ada. Ross Ridge
Re: [PATCH][4.3] Deprecate -ftrapv
Robert Dewar write: Usually there are ways of telling what is going on at a sufficiently low level, but in any case, code using the conditional jump instruction (jo/jno) is hugely better than what we do now (and it is often faster to usea jo than into). Ross Ridge wrote: My point is that using INTO or some other processor's overlow mechanism that requires operating system support wouldn't necessarily be better for Ada, even it performs better (or uses less space) than the alternatives. Having the program crash with a vague exception would meet the requirements of -ftrapv, but not Ada. Robert Dewar write: But, once again, using the processor specific JO instruction will be much better for Ada than double length arithmetic, using JO does not involve a program crash with a vague exception. *sigh* The possibility of using GCC's -ftrapv support to implement overlow exceptions in Ada was mentioned in this thread. There's no requirement that -ftrapv do anything other than crash when overflow occurs. A -ftrapv that did everything you've said you wanted, performed faster and caught 100% of overflows 100% reliabily wouldn't necessarily be better for Ada. On the 32-bit IA-32 platform, either the JO instruction or a INTO instruction could legitimately be used to provide a more optimal implementation of -ftrapv. Even the JO instruction could do nothing more than jump to abort(). Ross Ridge
Re: [PATCH][4.3] Deprecate -ftrapv
Robert Dewar writes: Yes, and that is what we would want for Ada, so I am puzzled by your sigh. All Ada needs to do is to issue a constraint_error exception, it does not need to know where the exception came from or why except in very broad detail. Unless printing This application has requested the Runtime to terminate it in an unusual way. counts an issuing a contraint_error in Ada, it seems to me that -ftrapv and Ada have differing requirements. How can you portabilty and correctly generate a constraint_error if the code generated by -ftrapv calls the C runtime function abort()? On Unix-like systems you can catch SIGABRT, but even there how do you tell that it didn't come from CTRL-\, a signal sent from a different process, or abort() called fom some other context? With INTO I don't see any way distignuish the SIGSEGV it generates on Linux from any of the myriad other ways a SIGSEGV can be generated. Ross Ridge
Re: [PATCH][4.3] Deprecate -ftrapv
Ross Ridge writes: On Unix-like systems you can catch SIGABRT, but even there how do you tell that it didn't come from CTRL-\... Oops, I forgot that CTRL-\ had it own signal SIGQUIT. Ross Ridge
Re: [m32c] type precedence rules and pointer signs
DJ Delorie writes: extern int U(); void *ra; ... foo((ra + U()) - 1) ... 1. What are the language rules controlling this expression, and do they have any say about signed vs unsigned wrt the int-pointer promotion? There is no integer to pointer promotion. You're adding an integer to a pointer and then subtracting an integer from the resulting pointer value. If U() returns zero then the pointer passed to foo() should point to the element before the one that ra points to. Well, it should if ra actually had a type that Standard C permitted using pointer arithmetic on. Ross Ridge
RE: Memory leaks in compiler
Diego Novillo wrote: I agree. Freeing memory right before we exit is a waste of time. Dave Korn writes: So, no gcc without an MMU and virtual memory platform ever again? Shame, it used to run on Amigas. I don't know if GCC ever freed all of its memory before exiting. An operating system doesn't need an MMU or virtual memory in order to free all the memory used by a process when it exits. MS-DOS did this, and I assume AmigaOS did as well. Ross Ridge
Re: __builtin_expect for indirect function calls
Mark Mitchell writes: What do people think? Do we have the leeway to change this? If it were just cases where using __builtin_expect is pointless that would break, like function overloading and sizeof then I don't think it would be a problem. However, it would change behaviour when C++ conversion operators are used and I can see these being legitimately used with __builtin_expect. Something like: struct C { operator long(); }; int foo() { if (__builtin_expect(C(), 0)) return 1; return 2; } If cases like these are rare enough it's probably an acceptable change if they give an error because the argument types don't match. Ross Ridge
Re: A proposal to align GCC stack
H.J. Lu writes: What value did you use for -mpreferred-stack-boundary? The x86 backend defaults to 16byte. On Windows the 16-byte default pretty much just wastes space, so I use -mpreferred-stack-boundary=2 where it might make a difference. In the case where I wanted to use SSE vector instructions, I explicitly used -mpreferred-stack-boundary=4 (16-byte alignment). STACK_BOUNDARY is the minimum stack boundary. MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) == PREFERRED_STACK_BOUNDARY. So the question is if we should assume INCOMING == PREFERRED_STACK_BOUNDARY in all cases: Doing this would also remove need for ABI_STACK_BOUNDARY in your proposal. Pros: 1. Keep the current behaviour of -mpreferred-stack-boundary. Cons: 1. The generated code is incompatible with the other object files. Well, your proposal wouldn't completely solve that problem, either. You can't guarantee compatiblity with object files compiled with different values -mpreferred-stack-boundary, including those compiled with current implementation, unless you assume the incomming stack is aligned to the lowest value the flag can have and align the outgoing stack to the highest value that the flag can have. Ross Ridge
Re: A proposal to align GCC stack
Ross Ridge writes: As I mentioned later in my message STACK_BOUNDARY shouldn't be defined in terms of hardware, but in terms of the ABI. While the i386 allows the stack pointer to bet set to any value, by convention the stack pointer is always kept 4-byte aligned at all times. GCC should never generate code that that would violate this requirement, even in leaf-functions or transitorily during the prologue/epilogue. H.J. Lu writes: From gcc internal manual I'm suggesting a different defintion of STACK_BOUNDARY which wouldn't, if strictly followed, result STACK_BOUNDARY being defined as 8 on the i386. The i386 hardware doesn't enforce a minimum alignment on the stack pointer. Since x86 always push/pop stack by decrementing/incrementing address size, it makes senses to define STACK_BOUNDARY as address size. The i386 PUSH and POP instructions adjust stack pointer the by the operand size of the instruction. The address size of the instruction has no effect. For example, GCC should never generate code like this: pushw $0 pushw %ax because the stack is temporarily misaligned. This could result in a signal, trap, interrupt or other asynchronous handler using a misaligned stack. In context of your proposal, defining STACK_BOUNDARY this way, as a requirement imposed on GCC by an ABI (or at least by convention), not the hardware, is important. Without an ABI requirement, there's nothing that would prohibit an i386 leaf function from adjusting the stack in a way that leaves the stack 1- or 2-byte aligned. Ross Ridge
Re: A proposal to align GCC stack
Andrew Pinski writes: Can we stop talking about x86/x86_64 specifics issues here? No. I have an use case for the PowerPC side of the Cell BE for variables greater than the normal stack boundary alignment of 16bytes. They need to be 128byte aligned for DMA transfering to the SPUs. I already proposed a patch [1] to fix this use case but I have not seen many replies yet. Complaining about someone talking about x86/x86_64 specific issues and then bringing up a PowerPC/Cell specific issue is probably not the best way to go about getting your patch approved. Ross Ridge
Re: A proposal to align GCC stack
Ross Ridge writes: This section doesn't make sense to me. The force_align_arg_pointer attribute and -mstackrealign assume that the ABI is being followed, while the -fpreferred-stack-boundary option effectively H.J. Lu hjl at lucon dot org writes According to Apple engineer who implemented the -mstackrealign, on MacOS/ia32, psABI is 16byte, but -mstackrealign will assume 4byte, which is STACK_BOUNDARY. Ok. The importanting thing is that for backwards compatibility it needs to continue to assume 4-byte alignment on entry and align the stack to a 16-byte alignment on x86 targets, so that makes more sense. changes the ABI. According your defintions, I would think that INCOMING should be ABI_STACK_BOUNDARY in the first case, and MAX(ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) in the second. That isn't true since some .o files may not be compiled with -fpreferred-stack-boundary or with a different value of -fpreferred-stack-boundary. Like with any ABI changing flag, that's not supported: ... Further, every function must be generated such that it keeps the stack aligned. Thus calling a function compiled with a higher preferred stack boundary from a function compiled with a lower preferred stack boundary will most likely misalign the stack. The -fpreferrred-stack-boundary flag currently generates code that assumes the stack aligned to the preferred alignment on function entry. If you assume a worse incoming alignment you'll be aligning the stack unnecessarily and generating code that this flag doesn't require. On x86-64, ABI_STACK_BOUNDARY is 16byte, but the Linux kernel may want to use 8 byte for PREFERRED_STACK_BOUNDARY. Ok, if people are using this flag to change the alignment to something smaller than used by the standard ABI, then INCOMING should be MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY). Ross Ridge
Re: A proposal to align GCC stack
Ye, Joey writes: i. STACK_BOUNDARY in bits, which is enforced by hardware, 32 for i386 and 64 for x86_64. It is the minimum stack boundary. It is fixed. Ross Ridge wrote: Strictly speaking by the above definition it would be 8 for i386. The hardware doesn't force the stack to be 32-bit aligned, it just performs poorly if it isn't. Robert Dewar writes: First, although for some types, the accesses may work, the optimizer is allowed to assume that data is properly aligned, and could possibly generate incorrect code ... That's not enforced by hardware. Second, I am pretty sure there are SSE types that require alignment at the hardware levell, even on the i386 This isn't a restriction on stack aligment. It's a restriction on what kinds of machine types can be accessed on the stack. As I mentioned later in my message STACK_BOUNDARY shouldn't be defined in terms of hardware, but in terms of the ABI. While the i386 allows the stack pointer to bet set to any value, by convention the stack pointer is always kept 4-byte aligned at all times. GCC should never generate code that that would violate this requirement, even in leaf-functions or transitorily during the prologue/epilogue. This is different than the proposed ABI_STACK_BOUNDARY macro which defines the possibily stricter aligment the ABI requires at function entry. Since most i386 ABIs don't require a stricter alignment, that has ment that SSE types couldn't be located on the stack. Currently you can get around this problem by changing the ABI using -fperferred-stack-boundary or by forcing an SSE compatible alignment using -mstackrealign or __attribute__ ((force_align_arg_pointer)). Joey Ye's proposal is another solution to this problem where GCC would automatically force an SSE compatible aligment when SSE types are used on the stack. Ross Ridge
Re: A proposal to align GCC stack
Ross Ridge wrote: The -fpreferrred-stack-boundary flag currently generates code that assumes the stack aligned to the preferred alignment on function entry. If you assume a worse incoming alignment you'll be aligning the stack unnecessarily and generating code that this flag doesn't require. H.J. Lu writes: That is how we get into trouble in the first place. The only place I think of where you can guarantee everything is compiled with the same -fpreferrred-stack-boundary is kernel. Our proposal will align stack only when needed. PREFERRED_STACK_BOUNDARY ABI_STACK_BOUNDARY will generate a largr stack unnecessarily. I'm currently using -fpreferred-stack-boundary without any trouble. Your proposal would in fact generate code to align stack when it's not necessary. This would change the behaviour of -fpreferred-stack-boundary, hurting performance and that's unacceptable to me. Ok, if people are using this flag to change the alignment to something smaller than used by the standard ABI, then INCOMING should be MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY). On x86-64, ABI_STACK_BOUNDARY is 16byte, but the Linux kernel may want to use 8 byte for PREFERRED_STACK_BOUNDARY. INCOMING will be MIN(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) == 8 byte. Using MAX(STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) also equals 8 in that case and preserves the behaviour -fpreferred-stack-boundary in every case. Ross Ridge
Re: A proposal to align GCC stack
Robert Dewar writes: Well if we have local variables of type float (and we have specified use of SSE), we are in trouble, no? Non-vector SSE instructions, like the ones that operate on scalar floats, don't require memory operands to be aligned. Ross Ridge
Re: A proposal to align GCC stack
Ross Ridge wrote: I'm currently using -fpreferred-stack-boundary without any trouble. Your proposal would in fact generate code to align stack when it's not necessary. This would change the behaviour of -fpreferred-stack-boundary, hurting performance and that's unacceptable to me. Ye, Joey writes: This proposal values correctness at first place. So when compile can't make sure a function is only called from functions with the same or bigger preferred-stack-boundary, it will conservatively align the stack. One optimization is to set INCOMING = PREFERRED for local functions. Do you think it more acceptable? Not really. It might reduce the amount of unnecessary stack adjustment, but the performance regression would remain. Changing the behaviour of -fpreferred-stack-boundary doesn't make it more correct. It supposed to change the ABI, it works as documented and, yes, if it's misused it will cause problems. So will any number of GCC's ABI changing options. Look at it another way. Lets say you were compiling x86_64 code with -fpreferred-stack-boundary=3, an 8-byte PREFERRED alignment. As you know, this is different from the standard x86_64 ABI which requires a 16-byte alignment. Now with your proposal, GCC's behaviour of won't change, because it's safe to assume that incoming stack is at least 8-byte aligned. There should be no change in the code GCC generates, with or without your proposal. However, the outgoing stack won't be 16-byte aligned as the x86_64 ABI requires. In this case, what also doesn't change is the fact that mixing code compiled with different -fpreferred-stack-boundary values doesn't work. It's just as problematic and unsafe as it was before. So when you said this proposal values correctness at first place, that really isn't true. The proposal only addresses safety when preferred alignment is raised from the standard ABI's alignment. You're conservatively aligning the incoming stack, but not the outgoing stack. You don't seem to be concerned about the problems that can arise when the preferred is raised above the ABI's. Why? My guess is that because correctness in this case would cause unacceptable regressions when compiling the x86_64 Linux kernel. If you can understand why it would be unacceptable to change how -fpreferred-stack-boundary behaves when compiling the Linux kernel, then maybe you can understand why I don't find it acceptable for it to change when compiling my code. Ross Ridge
Re: A proposal to align GCC stack
Ye, Joey writes: i. STACK_BOUNDARY in bits, which is enforced by hardware, 32 for i386 and 64 for x86_64. It is the minimum stack boundary. It is fixed. Strictly speaking by the above definition it would be 8 for i386. The hardware doesn't force the stack to be 32-bit aligned, it just performs poorly if it isn't. v. INCOMING_STACK_BOUNDARY in bits, which is the stack boundary at function entry. If a function is marked with __attribute__ ((force_align_arg_pointer)) or -mstackrealign option is provided, INCOMING = STACK_BOUNDARY. Otherwise, INCOMING == MIN(ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) because a function can be called via psABI externally or called locally with PREFERRED_STACK_BOUNDARY. This section doesn't make sense to me. The force_align_arg_pointer attribute and -mstackrealign assume that the ABI is being followed, while the -fpreferred-stack-boundary option effectively changes the ABI. According your defintions, I would think that INCOMING should be ABI_STACK_BOUNDARY in the first case, and MAX(ABI_STACK_BOUNDARY, PREFERRED_STACK_BOUNDARY) in the second. (Or just PREFERRED_STACK_BOUNDARY because a boundary less than the ABI's should be rejected during command line processing.) vi. REQUIRED_STACK_ALIGNMENT in bits, which is stack alignment required by local variables and calling other function. REQUIRED_STACK_ALIGNMENT == MAX(LOCAL_STACK_BOUNDARY,PREFERRED_STACK_BOUNDARY) in case of a non-leaf function. For a leaf function, REQUIRED_STACK_ALIGNMENT == LOCAL_STACK_BOUNDARY. Hmm... I think you should define STACK_BOUNDARY as the minimum alignment that ABI requires the stack pointer to keep at all times. ABI_STACK_BOUNDARY should be defined as the stack alignment the ABI requires at function entry. In that case a leaf function's REQUIRED_STACK_ALIGMENT should be MAX(LOCAL_STACK_BOUNDARY, STACK_BOUNDARY). Because I386 PIC requires BX as GOT pointer and I386 may use AX, DX and CX as parameter passing registers, there are limited candidates for this proposal to choose. Current proposal suggests EDI, because it won't conflict with i386 PIC or regparm. Could you pick a call-clobbered register in cases where one is availale? // Reserve two stack slots and save return address // and previous frame pointer into them. By // pointing new ebp to them, we build a pseudo // stack for unwinding Hmmm... I don't know much about the DWARF unwind information, but couldn't it handle this case without creating the pseudo frame? Or at least be extended so it could? Ross Ridge
Re: BITS_PER_UNIT less than 8
Boris Boesler writes: Ok, so what have I to do to write a back-end where all addresses are given in bits? Memory is addressed in bits, not bytes. So I set: #define BITS_PER_UNIT 1 #define UNITS_PER_WORD 32 I don't know if it's useful to define the size of a byte to be less than 8-bits, even if that more accurately reflects the hardware. Standard C requires that the char type both be at least 8 bits (UCHAR_MAX = 256) and the same size as a byte (sizeof(char) == 1). You can't define any types that are smaller than a char and have sizeof work correctly. So, what can I do to get this running for my architecture? If you think there's still some benefit from having GCC use a 1-bit byte, you'll probably have to fix a number of assumptions made in the code. Things like that the size of a byte is at least 8 bits and is the same in frontend and backend. Ross Ridge
Re: libiberty/pex-unix vfork abuse?
Dave Korn writes: Perhaps we could work around this case by setting environ in the parent before the vfork call and restoring it afterward, but we'd need kind of serialisation there, and I don't know how to do a critical section using pthreads/posix. A simple solution would be to call fork() instead of vfork() when changing the environment. Ross Ridge
Re: BITS_PER_UNIT larger than 8 -- word addressing
Miceal Eagar writes: I'm working with a target that has 32-bit word addressing, so there is a define of BITS_PER_UNIT = 32. According to the documentation, this changes the size of a byte to 32 bits, instead of the more usual 8 bits. This causes a problem: an error saying that there is no emulation for 'DI'. DImode has a precision of 128 bits, which is clearly incorrect. (All the other integer modes were incorrect as well.) DImode is defined to be 8 bytes long so with a 32-bit byte I'd expect it to be 256 bits. Trying use QImode and HImode for 32-bit and 64-bit operations respectively. Ross Ridge
Re: strict aliasing
Ian Lance Taylor wrote: Strict aliasing only refers to loads and stores using pointers. skaller writes: Ah, I see. So turning it off isn't really all that bad for optimisation. One example of where it hurts on just about any platform is something like this: void allocate(int **p, unsigned len); int *foo(unsigned len) { int *p; unsigned i; allocate(p, len); for (i = 0; i len; i++) p[i] = 1; return p; } Without strict aliasing being enabled, the compiler can't assume that that the assignment p[i] = 1 won't change p. This results the value of p being loaded on every loop iteration, instead of just once at the start of the loop. It also prevents GCC from vectorizing the loop. On Itaninum CPUs speculative loads can be used instead of strict alias analysis to avoid this problem. Ross Ridge
Re: gomp slowness
skaller writes: Unfortunately no, unless MSVC++ in VS2005 has openMP. I don't know if Visual C++ 2005 Express supports OpenMP, but the Professional edition should. Alternatively, the free, as in beer, Microsoft compiler included in the Windows SDK supports OpenMP. Ross Ridge
Re: Preparsing sprintf format strings
[EMAIL PROTECTED] (Ross Ridge) writes: The entire parsing of the format string is affected by the multi-byte character encoding. I don't know how GCC would be able tell that a byte with the same value as '%' in the middle of string would actually be interpreted as '%' character rather than a part of an extended multibyte character. This can easily happen with the ISO 2022-JP encoding. Andreas Schwab writes: The compiler is supposed to know the encoding of the strings. The compiler can't in general know what encoding that printf, fprintf, and sprintf will use to parse the string. It's locale dependent. Ross Ridge
Re: Preparsing sprintf format strings
Ross Ridge writes: The compiler can't in general know what encoding that printf, fprintf, and sprintf will use to parse the string. It's locale dependent. Paolo Bonzini writes: It is undefined what happens if you run a program in a different charset than in the one you specified for -fexec-charset. (locale != charset). I don't think that's true, but regardless many systems have runtime character sets that are dependent on locale. If GCC doesn't support this, then GCC is broken. A google code search for printf.*\\x1[bB][($].*%s hints that this is not a problem in practice. In practice, probably not. I doubt there are any ASCII based systems that actually support stateful encodings like ISO 2202-JP in their C runtimes. There is at least one EBCDIC based systems, that fully supports stateful encodings, but I don't know if in these encodings '%' byte values can appear outside of the initial shift state. Ross Ridge
Re: Preparsing sprintf format strings
Ross Ridge wrote: The compiler can't in general know what encoding that printf, fprintf, and sprintf will use to parse the string. It's locale dependent. Bernd Schmidt writes: Does this mean it can vary from one run of the program to another? Yes, that's the whole point having locales. So a single program can work with more than one language. In fact locales can chage during the execution of a program. I'll admit I don't understand locales very well, but doesn't this sound like a recipe for security holes? A program has to explicitly call setlocale() to change the locale to anything other than the default C locale. Ross Ridge
Re: Preparsing sprintf format strings
Ross Ridge writes: The entire parsing of the format string is affected by the multi-byte character encoding. I don't know how GCC would be able tell that a byte with the same value as '%' in the middle of string would actually be interpreted as '%' character rather than a part of an extended multibyte character. This can easily happen with the ISO 2022-JP encoding. Michael Meissner writes: Yes, and the ISO standard for C says that the compiler must be told what locale to use when parsing string constants anyway, since the compiler must behave as if it did a mbtowc on the source file. The compiler needs to know the source character set both to parse the string literal and to translate it into the execution character set. It doesn't need to know, nor can it generally know, the locale dependent character set that the standard library will use when parsing printf format strings. Ross Ridge
Re: Preparsing sprintf format strings
[EMAIL PROTECTED] (Ross Ridge) writes: I don't think that's true, but regardless many systems have runtime character sets that are dependent on locale. If GCC doesn't support this, then GCC is broken. Geoffrey Keating writes: I don't think it's unreasonable to insist that you tell the compiler a character set that matches the one you are using at execution time for string literals. It's completely unreasonable. I should be able put whatever byte values I want into strings literal, using octal and hexidecimal escapes if necessary, regardless of what locale might be at runtime or what GCC thinks the execution character set is. It would be absurd for code like like fprintf(f, \xFF\xFF); to be undefined only because GCC thinks the execution character set is UTF-8 or ASCII. Ross Ridge
Re: Preparsing sprintf format strings
Heikki Linnakangas writes: The only features in the printf-family of functions that depends on the locale are the conversion with thousand grouping (%'d), and glibc extension of using locale's alternative output digits (%Id). The entire parsing of the format string is affected by the multi-byte character encoding. I don't know how GCC would be able tell that a byte with the same value as '%' in the middle of string would actually be interpreted as '%' character rather than a part of an extended multibyte character. This can easily happen with the ISO 2022-JP encoding. Ross Ridge
Re: recent troubles with float vectors bitwise ops
Mark Mitchell Let's assume that the recent change is what we want, i.e., that the answer to (1) is No, these operations should not be part of the vector extensions because they are not valid scalar extensions. I don't think we should assume that. If we were to we'd also have to change vector casts to work like scalar casts and actually convert the values. (Or like valarray, disallow them completely.) That would force a solution like Paolo Bonzini's to use unions instead of casts, making it even more cumbersome. If you look at what these bitwise operations are doing, they're taking a floating point vector and applying an operation (eg. negation) to certain members of the vector of according to a (normally) constant mask. They're really unaray floating-point vector operations. I don't think it's unreasonable to want to express these operations using floating-point vector types directly. Using vector casts that behave differently than scalar casts has a lot more potential to generate confusion than allowing bitwise operations on vector floats does. As I see it, there's two ways you can express these kinds operations without using casts that are both cumbersome and misleading. The easy way would be to just revert the change, and allow bitwise operations on vector floats. This is essentially an old-school programmer-knows-best solution where the compiler provides operators that represent the sort of operations generally supported by CPUs. Even on Altivec these bitwise operations on vector floats are meaningful and useful. The other way is to provide a complete set operations that would make using the bitwise operators pretty much unnecessary, like it is with scalar floats. For example, you can express masked negation by multiplying with a constant vector of -1.0 and 1.0 elements. It shouldn't be too hard for GCC to optimize this into an appropriate bitwise instruction for the target. For other operations the solution isn't as nice. You could implement a set of builtin functions easily enough, but it wouldn't be much better than using target specific intrinsics. Chances are though that operatations are going to be missed. For example, I doubt anyone unfamiliar with 3D programming would've seen the need for only negating part of a vector. (A more concise way to eliminate the need for the bitwise operations on vector floats would be to implement either the swizzles used in 3D shaders or array indexing on vectors. It would require a lot of work to implement properly, so I don't see it happening.) Ross Ridge
Re: [RFC] try to generate FP and/or/xor instructions for SSE
Richard Guenther writes: As I said - at least for AMD CPUs - it looks like you can freely interchange the ps|pd or integer variants of the bitwise and/or operations without a penalty. An example in AMD's Software Optmization Guide for AMD64 Processors suggests that you can't freely interchange them. In the example it gives for using XOR to negate a double-precision vector, it uses XORPD. If PXOR, XORPS and XORPD were all interchangable, it should have used XORPS since it's a byte shorter than XORPD. The guide also says: When it is necessary to zero out an XMM register, use an instruction whose format matches the format required by the consumers of the zeroed register. ... When an XMM register must be set to zero, using the appropriate instruction helps reduce the chance of any performance penalty later. This advice differs from Intel's, which on Pentium 4 processors recommends always using PXOR to clear XMM registers, as that instruction breaks dependency chains, while the XORPS and XORPD instructions don't. Only the newer Intel Core processors support breaking chains with all three instructions. Ross Ridge
Re: recent troubles with float vectors bitwise ops
tbp writes: Apparently enough for a small vendor like Intel to propose such things as orps, andps, andnps, and xorps. Paolo Bonzini writes: I think you're running too far with your sarcasm. SSE's instructions do not go so far as to specify integer vs. floating point. To me, ps means 32-bit SIMD, independent of integerness The IA-32 instruction set does distignuish between integer and floating point bitiwse operations. In addition to the single-precision floating-point bitwise instructions that tbp mentioned (ORPS, ANDPS, ANDNPS and XORPS) there are both distinct double-precision floating-point bitwise instructions (ORPD, ANDPD, ANDNPD and XORPD) and integer bitwise instructions (POR, PAND, PANDN and PXOR). While these operations all do the same thing, they can differ in performance depending on the context. Intel's IA-32 Software Developer's Manual gives this warning: In this example: XORPS or PXOR can be used in place of XORPD and yield the same correct result. However, because of the type mismatch between the operand data type and the instruction data type, a latency penalty will be incurred due to implementations of the instructions at the microarchitecture level. And now i guess the only sanctioned access to those ops is via builtins/intrinsics. No, you can do so with casts. tbp is correct. Using casts gets you the integer bitwise instrucitons, not the single-precision bitwise instructions that are more optimal for flipping bits in single-precision vectors. If you want GCC to generate better code using single-precision bitwise instructions you're now forced to use the intrinsics. Ross Ridge
Re: recent troubles with float vectors bitwise ops
Ross Ridge writes: tbp is correct. Using casts gets you the integer bitwise instrucitons, not the single-precision bitwise instructions that are more optimal for flipping bits in single-precision vectors. If you want GCC to generate better code using single-precision bitwise instructions you're now forced to use the intrinsics. GCC makes the problem is even worse if only SSE and not SSE 2 instructions are enabled. Since the integer bitwise instructions are only available with SSE 2, using casts instead of intrinsics causes GCC to expand the operation into a long series of instructions. If I were tbp, I'd just code all his vector operatations using intrinsics. The other responses in this thread have made it clear that GCC's vector arithemetic operations are really only designed to be used with the Cell Broadband Engine and other Power PC processors. Ross Ridge
Re: recent troubles with float vectors bitwise ops
Ross Ridge [EMAIL PROTECTED] wrote: GCC makes the problem is even worse if only SSE and not SSE 2 instructions are enabled. Since the integer bitwise instructions are only available with SSE 2, using casts instead of intrinsics causes GCC to expand the operation into a long series of instructions. Andrew Pinski writes: ... Why did Intel split up these instructions in the first place, is it because they wanted to have a seperate vector units in some cases? I don't know and I don't care that much. Well, if you would rather remain ingorant, I suppose there's little point in discussing this with you. However, please don't try to pretend that the vector extenstions are supposed to be generic when you use justifications like it's how Altivec works, and it's compatible with a proprietary standard called C/C++ Language Extensions for Cell Broadband Engine Architecture. If you're going to continue to use justifications like this and ignore the performance implications of your changes on IA-32, then you should accept the fact that the vector extensions are not ment for platforms that you don't know and don't care that much about. Ross Ridge
Re: I'm sorry, but this is unacceptable (union members and ctors)
Lawrence Crowl writes: On the specific topic of unions, there is a proposal before the committee to extend unions in this direction. Let me caution you that this proposal has not been reviewed by a significant fraction of the committee, and hence has a small chance of being accepted and an even smaller chance of surviving unchanged. This only supports my position. If an active comittee member can't get their proposal reviewed by a significant fraction of the committee, then why should an outsider even bother? You're better off posting a patch to gcc-patches, at least that will have a chance of being seriously considered. Ross Ridge
Re: I'm sorry, but this is unacceptable (union members and ctors)
Ross Ridge wrote: I completely disagree. Standards should primarily standardize existing practice, not inventing new features. New features should be created by people who actually want and will use the features, not by some disinterested committee. Robert Dewar write: First of all, I think you mean uninterested and not disinterested, indeed the ideal is that all committee members *should* be disinterested, though this is not always the case. Since it's essentially impossible to be impartial about a feature you created, both senses of the word apply here. The history for C here does not apply to C++ in my opinion. Adding new features to a language like C++ is at this stage highly non-trivial in terms of getting a correct formal definition. Most of GCC's long list of extensions to C are also implemented as extensions to C++, so you've already lost this battle in GNU C++. Trying to add new a new feature without an existing implementation only makes it harder to get both a correct formal definition and something that people will actually want to use. Ross Ridge
Re: RFC: Make dllimport/dllexport imply default visibility
Daniel Jacobowitz writes: The minimum I'd want to accept this code would be a complete and useful example in the manual; since Mark and Danny say this happens a lot on Windows I don't understand how this issue can come up at all on Windows. As far I know, visibility is an ELF-only thing, while DLLs are a PE-COFF-only thing. Is there some platform that supports both sets of attributes? Ross Ridge
Re: I'm sorry, but this is unacceptable (union members and ctors)
Robert Dewar writes: The only time that it is reasonable to extend is when there are clear signals from the standards committee that it is likely that a feature will be added, in which case there may be an argument for adding the feature prematurely. I completely disagree. Standards should primarily standardize existing practice, not inventing new features. New features should be created by people who actually want and will use the features, not by some disinterested committee. GCC has always been a place for experimenting with new features. Many of the new features in C99 had already been implemented GCC. Even in the cases where C99 standardized features differently, I think both GCC and Standard C benefited from the work done in GCC. Ross Ridge
Re: MinGW, GCC Vista,
Mark Mitchell writes: In my opinion, this is a GCC bug: there's no such thing as X_OK on Windows (it's not in the Microsoft headers, or documented by Microsoft as part of the API), and so GCC shouldn't be using it. Strictly speaking, access() (or _access()) isn't a documented part of any Windows ABI. It's only documented as part of the C Runtime Library for Visual C++, a different product. This is an important distinction, while MinGW should support Windows APIs as documented by Microsoft, it's not ment to be compatable with Visual C++. MinGW does use the same runtime DLL as used by Visual C++ 6.0, but this is essentially just an implementation detail, not ment as a compatibility goal. There are a few of ways MinGW's runtime is incompatable with Visual C++ 6.0. One of those ways is that the MinGW headers define R_OK, W_OK and X_OK. That was a probably a mistake, but in order for the MinGW runtime to be compatibile with both previous implementations and Windows Vista I think this change makes sense. Ross Ridge
Re: Integer overflow in operator new
Florian Weimer writes: Yeah, but that division is fairly expensive if it can't be performed at compile time. OTOH, if __compute_size is inlined in all places, code size does increase somewhat. Well, I believe the assumption was that __compute_size would be inlined. If you want to minimize code size and avoid the division then a library function something like following might work: void *__allocate_array(size_t num, size_t size, size_t max_num) { if (num max_num) size = ~size_t(0); else size *= num; return operator new[](size); } GCC would caclulate the constant ~size_t(0) / size and pass it as the third argument. You'ld be trading a multiply for a couple of constant outgoing arguments, so the code growth should be small. Unfortunately, you'd be trading what in most cases is a fast shift and maybe add or two for slower multiply. So long as whatever switch is used to enable this check isn't on by default and its effect on code size and speed is documented, I don't think it matters that much what those effects are. Anything that works should make the people concerned about security happy. People more concerned with size or speed aren't going to enable this feature. Ross Ridge
Re: Integer overflow in operator new
Joe Buck writes: inline size_t __compute_size(size_t num, size_t size) { size_t product = num * size; return product = num ? product : ~size_t(0); } Florian Weimer writes: I don't think this check is correct. Consider num = 0x3334 and size = 6. It seems that the check is difficult to perform efficiently unless the architecture provides unsigned multiplication with overflow detection, or an instruction to implement __builtin_clz. This should work instead: inline size_t __compute_size(size_t num, size_t size) { if (num ~size_t(0) / size) return ~size_t(0); return num * size; } Ross Ridge
Re: Integer overflow in operator new
Joe Buck writes: If a check were to be implemented, the right thing to do would be to throw bad_alloc (for the default new) or return 0 (for the nothrow new). What do you do if the user has defined his own operator new that does something else? There cases where the penalty for this check could have an impact, like for pool allocators that are otherwise very cheap. If so, there could be a flag to suppress the check. Excessive code size growth could also be problem for some programs. Ross Ridge
Re: Integer overflow in operator new
Joe Buck writes: If a check were to be implemented, the right thing to do would be to throw bad_alloc (for the default new) or return 0 (for the nothrow new). Ross Ridge writes: What do you do if the user has defined his own operator new that does something else? Gabriel Dos Reis writes: More precisely? Well, for example, like all other things that a new_handler can do, like throwing an exception derived from bad_alloc or calling exit(). In addition, any number of side effects are possible, like printing error messages or setting flags. Those programs willing to do anything to avoid imagined or perceived excessive code size growth may use the suggested switch. The code size growth would be real, and there are enough applications out there that would consider any unnecessary growth in code excessive. The switch would be required both for that reason, and for Standard conformance. Ross Ridge
Re: Integer overflow in operator new
[EMAIL PROTECTED] (Ross Ridge) writes: Well, for example, like all other things that a new_handler can do, like throwing an exception derived from bad_alloc or calling exit(). In addition, any number of side effects are possible, like printing error messages or setting flags. Gabriel Dos Reis writes: I believe you're confused about the semantics. The issue here is that the *size of object* requested can be represented. That is independent of whether the machine has enough memory or not. So, new_handler is a red herring The issue is what GCC should do when the calculation of the size of memory to allocate with operator new() results in unsigned wrapping. Currently, GCC's behavior is standard conforming but probably isn't the expected result. If GCC does something other than what operator new() does when there isn't enough memory available then it will be doing something that is both non-conforming and probably not what was expected. Ross Ridge
Re: Integer overflow in operator new
Joe Buck writes: Consider an implementation that, when given Foo* array_of_foo = new Foo[n_elements]; passes __compute_size(elements, sizeof Foo) instead of n_elements*sizeof Foo to operator new, where __compute_size is inline size_t __compute_size(size_t num, size_t size) { size_t product = num * size; return product = num ? product : ~size_t(0); } Yes, doing something like this instead would largely answer my concerns. This counts on the fact that any operator new implementation has to fail when asked to supply every single addressible byte, less one. I don't know if you can assume ~size_t(0) is equal to the number of addressable bytes, less one. A counter example would be 16-bit 80x86 compilers where size_t is 16-bits and an allocation of 65535 bytes can succeed, but I don't know if GCC supports any targets where something similar can happen. I haven't memorized the standard, but I don't believe that this implementation would violate it. The behavior differs only when more memory is requested than can be delivered. It differs because the actual amount of memory requested is the result of the unsigned multiplication of n_elements * sizeof Foo, using your example above. Since this result of this caclulation isn't undefined, even if it overflows, there's no room for the compiler to calculate a different value to pass to operator new(). Ross Ridge
Re: RFC: Enable __declspec for Linux/x86
Joe Buck write: If the Windows version of GCC has to recognize __declspec to function as a hosted compiler on Windows, then the work already needs to be done to implement it. Well, I'm kinda surprised that Windows verision of GCC recognizes __declspec. The implementation is just a simple macro, and could've just as easily been implemented in a runtime header, as the MinGW runtime does. So what's the harm in allowing it on other platforms? Probably none, but since the macro can be defined on the command line with -D__declspec(x)=__attribute__((x)) defining it by default on other platforms is only a minor convenience. If it makes it easier for Windows programmers to move to free compilers and OSes, isn't that something that should be supported? I suppose that would argue for unconditionally defining the macro regardless of the platform. Ross Ridge
Re: i386: Problems with references to import symbols.
Richard Henderson writes: Dunno. One could also wait to expand *__imp_foo, for functions, until expanding the function call. And then this variable would receive the address of the import library thunk. What does VC++ do? It seems to always use *__imp_foo except when initializing a statically allocated variable in C. In that case it uses _foo, unless compiling with extensions disabled (/Za) in which case it generates a similar error as we do. In C++ it uses dynamic initialization like Dave Korn suggested. I'm mostly wondering about what pointer equality guarantees we can make. It looks like MSC requires that you link with the static CRT libraries if you want strict standard conformance. Ross Ridge
Re: Building mainline and 4.2 on Debian/amd64
Joe Buck writes: This brings up a point: the build procedure doesn't work by default on Debian-like amd64 distros, because they lack 32-bit support (which is present on Red Hat/Fedora/SuSE/etc distros). Ideally this would be detected when configuring. The Debian-like AMD64 system I'm using has 32-bit support, but the build procedure breaks anyways because it assumes 32-bit libraries are in lib and 64-bit libraries are in lib64. Instead, this Debian-like AMD64 system has 32-bit libraries in lib32 and 64-bit libraries in lib. Ross Ridge
Re: symbol names are not created with stdcall syntax: MINGW, (GCC) 4.3.0 20061021
Ross Ridge wrote: Any library that needs to be able to be called from VisualBasic 6 or some other stdcall only environment should explictly declare it's exported functions with the stdcall calling convention. Tobias Burnus writes: Thus, if I understood you correctly, you recommend that we add, e.g., pragma support to gfortran with a pragma which adds the __attribute__((stdcall)) to the tree? I have no idea what would be the best way to do it in Fortran, but yes, something that would add the stdcall attribute. Ross Ridge
Re: symbol names are not created with stdcall syntax: MINGW, (GCC) 4.3.0 20061021
Danny Smith writes: Unless you are planning to use a gfortran dll in a VisualBasic app, I can see little reason to change from the default C calling convention FX Coudert writes: That precise reason is, as far as I understand, important for some people. Fortran code is used for internal routines, built into shared libraries that are later plugged into commercial apps. Well, perhaps things are different in Fortran, but the big problem with using -mrtd in C/C++ is that it changes the default calling convention for all functions, not just those that are ment to be exported. While most of MinGW's of headers declare the calling convention of functions explictily, not all of them do. How hard do you think it would be to implement a -mrtd-naming option (or another name) to go with -mrtd and add name decorations It wouldn't be too hard, but I don't think it would be a good idea to implement. It would mislead people into thinking the option might be useful, and -mrtd fools enough people as it is. Adding name decorations won't make it more useful. From the examples I've seen, VisualBasic 6 has no problem DLL functions expored without @n suffixes. Any library that needs to be able to be called from VisualBasic 6 or some other stdcall only environment should explictly declare it's exported functions with the stdcall calling convention. Ross Ridge
Re: I need some advice for x86_64-pc-mingw32 va_list calling convention (in i386.c)
Kai Tietz writes: But I still have a problem about va-argument-passing. The MS compiler reserves stack space for all may va-callable methods register arguments. Passing arguments to functions with variable arguments isn't a special case here. According to Microsoft's documentation, you always need to allocate space for 4 arguments. The only thing different you need to do with functions taking variable arguments (and unprototyped functions) is to pass floating point values both in the integer and floating point registers for that argument. Ross Ridge
Re: bootstrap failure on HEAD
Dave Korn writes: Is it just me, or does anyone else get this? I objdump'd and diff'd the stage2 and stage3 versions of cfg.o and it seems to have developed a habit of inserting 'shrd'/'shld' opcodes: It looks to me like the stage3 version with the shrd/shld is correct and it's that stage2 version that's missing opcodes. In both versions the source and destination of the shift are a 64-bit pair of registers, but the stage2 version uses 32-bit shifts, while the stage3 version uses 64-bit shitfs. The code in the first chunk looks like it's the result of the expansion of the RDIV macro with the dividend being a gcov_type value and the divisor being 65536. It looks like gcov_type is 64-bits, so it should be using 64-bit arithmetic. although disturbingly enough there's a missing 'lea' too: It's a NOP. Probably inserted by the assembler because of an alignment directive. Ross Ridge
Re: strict aliasing question
Howard Chu wrote: extern void getit( void **arg ); main() { union { int *foo; void *bar; } u; getit( u.bar ); printf(foo: %x\n, *u.foo); } Rask Ingemann Lambertsen wrote: As far as I know, memcpy() is the answer: You don't need a union or memcpy() to convert the pointer types. You can solve the void ** aliasing problem with just a cast: void *p; getit(p); printf(%d\n, *(int *)p); This assumes that getit() actually writes to an int object and returns a void * pointer to that object. If it doesn't then you have another aliasing problem to worry about. If it writes to the object using some other known type, then you need two casts to make it safe: void *p; getit(p); printf(%d\n, (int)*(long *)p); If writes to the object using an unknown type then you might able to use memcpy() to get around the aliasing problem, but this assumes you know that two types are compatable at the bit level: void *p; int n; getit(p); memcpy(n, p, sizeof n); printf(%d\n, n); The best solution would be to fix the interface so that it returns the pointer types it acutally uses. This would make it typesafe and you wouldn't need to use any casts. If you can't fix the interface itself the next best thing would be to create your own wrappers which put all the nasty casts in one place: int sasl_getprop_str(sasl_conn_t *conn, int prop, char const **pvalue) { assert(prop == SASL_AUTHUSER || prop == SASL_APPNAME || ...); void *tmp; int r = sasl_getprop(conn, prop, tmp); if (r == SASL_OK) *pvalue = (char const *) tmp; return r; } Unfortuantely, there are aliasing problems in the Cyrus SASL source that can still come around and bite you once LTO arrives no matter what you do in your own code. You might want to see if you can't get them to change undefined code like this: *(unsigned **)pvalue = conn-oparams.maxoutbuf; into code like this: *pvalue = (void *) conn-oparams.maxoutbuf; Ross Ridge
Re: Threading the compiler
Ross Ridge wrote: Umm... those 80 processors that Intel is talking about are more like the 8 coprocessors in the Cell CPU. Michael Eager wrote: No, the Cell is asymmetrical (vintage 2000) architecture. The Cell CPU as a whole is asymmetrical, but I'm only comparing the design to the 8 identical coprocessors (of which only 7 are enabled in the CPU used in the PlayStation 3). Intel AMD have announced that they are developing large multi-core symmetric processors. The timelines I've seen say that the number of cores on each chip will double every year or two. This doesn't change that fact that SMP systems don't scale well after 16 processors or so. To go beyond that you need a different design. Clustering and NUMA have been ways of solving the problem outside the chip. Intel's plan for solving it inside the chip involves giving each of the 80 cores it's own 32 MB of SRAM and only connecting each core to its immediate neighbours. This is similiar to the Cell SPE's. Each has 256K of local memory and they're all connected together in a ring. Moore's law hasn't stopped. While Moore's Law may still be holding on, bus and memory speeds aren't doubling every two years. You can't design an 80 core CPU like an 4 core CPU with 20 times as many cores. Having 80 processors all competing over the same bus for the same memory won't work. Neither will make -j80. You need to do more than just divide up the work between different processes or threads. You need to divide up the program and data into chunks that will fit into each core's local memory and orchestrate everything so that the data propagates smoothly between cores. The number of gates per chip doubles every 18 months. Actually, in fact it's closer to doubling every 24 months and Gordon Moore never said it would double every 18 months. Originaly in 1965 he said that the number of components doubled every year, in 1975 after things slowed down he revised it to doubling every two years. Ross Ridge
Re: Threading the compiler
Mike Stump writes: We're going to have to think seriously about threading the compiler. Intel predicts 80 cores in the near future (5 years). [...] To use this many cores for a single compile, we have to find ways to split the work. The best way, of course is to have make -j80 do that for us, this usually results in excellent efficiencies and an ability to use as many cores as there are jobs to run. Umm... those 80 processors that Intel is talking about are more like the 8 coprocessors in the Cell CPU. It's not going to give you an 80-way SMP machine that you can just make -j80 on. If that's really your target achitecture you're going to have to come up with some really innovative techniques to take advantage of it in GCC. I don't think working on parallelizing GCC for 4- and 8-way SMP systems is going to give you much of a head start. Which isn't to say it wouldn't be a worthy enough project in it's own right. Ross Ridge
Re: Why doesn't libgcc define _chkstk on MinGW?
Ross Ridge wrote: There are other MSC library functions that MinGW doesn't provide, so libraries may not link even with a _chkstk alias. Mark Mitchell wrote: Got a list? Probably the most common missing symbols, using their assembler names are: __ftol2 @[EMAIL PROTECTED] ___security_cookie These are newer symbols in the MS CRT library and also cause problems for Visual C++ 6.0 users. I've worked around the missing security cookie symbols by providing my own stub implementation, but apparently newer versions of the Platform SDK include a library that fully implement these. I'm not sure how _ftol2 is supposed to be different from _ftol, but since I use -ffast-math anyways, I've just used the following code as a work around: long _ftol2(double f) { return (long) f; } Looking at an old copy of MSVCRT.LIB (c. 1998) other missing symbols that might be a problem include: T __alldiv [I] T __allmul [I] T __alloca_probe [I][*] T __allrem [I] T __allshl [I][*] T __allshr [I] T __aulldiv [I] T __aullrem [I] T __aullshr [I] A __except_list [I][*] T __matherr [D] T __setargv [D] T ___setargv [X] A __tls_array [I] B __tls_index [I] R __tls_used [I] T __wsetargv [D] [D] Documented external interface [I] Implicitly referenced by the MSC compiler [X] Undocumented external interface [*] Missing symbols I've encountered The are other problems related to linking that can make an MSC compiled static library incompatible including not processing MSC intialization and termination sections, no support for thread-local variables and broken COMDAT section handling. Ross Ridge
Re: Merging identical functions in GCC
Daniel Berlin writes Please go away and stop trolling. I'm not the one who's being rude and abusive. If your concern is function pointers or global functions, you can never eliminate any global function, unless your application doesn't call dlopen, or otherwise load anything dynamically, including through shared libs. I hope that doesn't include global functions like listint::sort() and listlong::sort(), otherwise your optimization is pretty much useless. If your optimization does merge such functions then you're still left with the problem that their member function pointers might be compared in another compilation unit. Ross Ridge
Re: Merging identical functions in GCC
Ross Ridge writes: No, and I can't see how how you've came up with such an abusurd misintepretation of what I said. As I said clearly and explicity, the example I gave was where you'd want to use function merging. Daniel Berlin writes: Whatever. Why would you turn on function merging if you are trying to specifically get the compiler to produce different code for your functions than it did before? Because I as already said, you want to merge the funtions that happen to be same. You don't want to merge the ones that aren't the same. Sometimes using different compiler options (eg. for CPU architecture) generates different code, sometimes it doesn't. If you could always predict what the exact code the compiler was going generate you'd might as well write your code in assembly. As an FYI, you already have this situation with linkonce functions. No, linkonce functions get merged because they have same name. I think this is best done by linker which can much more reliably compare the contents of functions to see if they are the same. No it can't. It has no idea what a function consists of other than a bunch of bytes, in pretty much all cases. ... Stupid byte comparisons of functions generally won't save you anything truly interesting. Microsoft's implementation has proven that stupid byte comparions can generate significant savings. Ross Ridge
Re: Merging identical functions in GCC
Gabriel Dos Reis write: Not very logn ago I spoke with the VC++ manager about this, and he said that their implementation currently is not conforming -- but they are working on it. The issue has to with fint and flong required to have difference addresses -- which is violated by their implementation. Yes, this issue has already been mentioned in this thread and is a problem regardless of how you compare functions to find out if they are the same. The compiler also needs to be able to detect when its safe to merge functions that are identical. Ross Ridge
Re: Merging identical functions in GCC
Ross Ridge writes: Microsoft's implementation has proven that stupid byte comparions can generate significant savings. Daniel Berlin wrtes: No they haven't. So Microsoft and everyone who says they've got significant savings using it is lying? But have fun implementing it in your linker, and still making it safe if that's what you really want. I'm not going to do that, and I don't believe it is a good idea. I'm not asking you to do anything. I'm just telling you that I don't think your idea is any good. Ross Ridge
Re: Merging identical functions in GCC
Daniel Berlin writes Do you really want me to sit here and knock down every single one of your arguments? Why would you think I would've wanted your No, it isn't responses instead? Your functions you are trying to optimize for multiple cpu types and compiled with different flags may be output as linkonce functions. The linker is simply going to pick one, regardless of what CPU architecture or assembly it generated... No, in the example I gave, the functions have different names. The fact is that Microsoft's implementation rarely generates significant savings over that given by linkonce functions, and when it does, it will in no way compare to anything that does *more* than stupid byte comparisons will give you. No, linkonce function discarding is always done by the Microsoft toolchain and can't be disabled. The reported savings are the result of comparing the results of enabling and disabling identical COMDAT folding. I don't see how your intelligent hashing can do significantly better except by merging functions that aren't really identical. That's nice. It's the only way to do it sanely and correctly in all cases, without having to teach the linker how to look at code, or to control the linker (which we don't on some platforms), and output a side channel explaining what it is allowed to eliminate, at which point, you might as well do it in the compiler! How does hashing the RTL and using that as the COMDAT label solve this problem? You're telling the linker you know it's safe to merge when you don't know if the function's address is compared in another compilation unit or not. You can believe what you like about the idea. Until you are willing to implement something *you* believe will help, or at the least explain how you forsee it being done safely (which Microsoft doesn't!), it's very hard to take you seriously As I've already said, it can be made safe by communicating to the linker which functions have had their address taken. Yes, this requires special support from the linker, but then so has linkonce on some platforms. If that special support isn't available you're still left with an unsafe but very useful optimization for applications that don't compare function pointers. Ross Ridge
Re: Merging identical functions in GCC
Ian Lance Taylor wrote: I think Danny has a 75% implementation based on hashing the RTL for a section and using that to select the COMDAT section signature. I don't think this is a good idea. With different compiler options the same RTL can generate different assembly instructions. Consider the case of compiling the same function multiple times with different names and different CPU architectures selected. You'd actually want the linker to merge the functions that ended up having the same assembly, but not the ones with the same RTL but different assembly. Also, I don't think it's safe if you merge only functions in COMDAT sections. Consider: #include assert.h template class T T foo(T a) { return a; } template class T T bar(T a) { return a; } int main() { assert((int (*)(int)) fooint != (int (*)(int)) barint); } Both fooint and barint get put in their own COMDAT section and their RTL and assembly are the same, but it's not safe to merge them. Simply merging identical COMDAT sections would have to be optional and disabled by default as Michael Popov said at the start of this thread. The only way I can see to do it safely would be to emit some sort instruction not to merge a function when the compiler sees that its address is taken. Ross Ridge
Re: Merging identical functions in GCC
Ross Ridge writes: I don't think this is a good idea. With different compiler options the same RTL can generate different assembly instructions. Consider the case of compiling the same function multiple times with different names and different CPU architectures selected. You'd actually want the linker to merge the functions that ended up having the same assembly, but not the ones with the same RTL but different assembly. Daniel Berlin writes: So basically you are saying if you don't know what you are doing, or know you don't want to use it, you shouldn't be using it. No, and I can't see how how you've came up with such an abusurd misintepretation of what I said. As I said clearly and explicity, the example I gave was where you'd want to use function merging. The current hash actually takes into account compiler options as a starting value for the hash, btw!) Well, then that brings up the other problem I have with this, figuring out exactly which options and which parts of the RTL should be hashed seems to be too error prone. I think this is best done by linker which can much more reliably compare the contents of functions to see if they are the same. Ross Ridge
Re: does gcc support multiple sizes, or not?
Mark Mitchell wrote: I think you really have to accept that the change you want to make goes to a relatively fundamental invariant of C++. I don't see how you can call this a realatively fundamental invariant of C++, given how various C++ implementations have supported multiple pointer sizes for much of the history of C++. Perhaps you could argue that Standard C++ made a fundamental change to the language, but I don't think so. The original STL made specific allowances for different memory models and pointer types, and this design, with it's otherwise unnecessary pointer and size_type types, was incorporated in to the standard. I think the intent of the (T *)(U *)(T *)x == (T *)x invariant was only to limit the standard pointer types, not make to non-standard pointer types of differt size fundamentally not C++. (Unlike, say, the fundamental changes the standard made to how templates work...) Ross Ridge
Re: RFC: __cxa_atexit for mingw32
Mark Mitchell writes: As a MinGW user, I would prefer not to see __cxa_atexit added to MinGW. I really want MinGW to provide the ability to link to MSVCRT: nothing more, nothing less. Well, even Microsoft's compiler doesn't just to link MSVCRT.DLL (or it's successors) a certain part of C runtime is implemented as static objects in MSVCRT.LIB. MinGW has to provide equivilent functionality in their static runtime library, or at least what GCC doesn't already provide in it's runtime library. ... I think it would be better to adopt G++ to use whatever method Microsoft uses to handle static destructions. I've looked into handling Microsoft's static constructors correctly when linking MSC compiled objects with MinGW and I don't think it's an either or situtation. MinGW can handle both it's own style of construction and Microsoft's at the same time. I didn't look into how Microsoft handles destructors though, because the objects in particular I was concerned about didn't seem to use them. Ultimately, I would like to see G++ support the Microsoft C++ ABI -- unless we can convince Microsoft to support the cross-platform C++ ABI. :-) Hmm... I'm not sure which would be easier. btw. regarding Microsoft's patents, Google turned up this link: http://www.codesourcery.com/archives/cxx-abi-dev/msg00097.html That message is from 1999, so I wouldn't be surprised if Microsoft has filed a bunch of new C++ ABI patents since then. Ross Ridge
Re: why are we not using const?
Andrew Pinski wrote: Stupid example where a const argument can change: tree a; int f(const tree b) { TREE_CODE(a) = TREE_CODE (b) + 1; return TREE_CODE (b); } You're not changing the constant argument b, just what b might point to. I don't think there are any optimizing opportunities for arguments declared as const, as opposed to arguments declared as pointing to const. Ross Ridge
Re: Coroutines
Ross Ridge wrote: Hmm? I don't see how the Lua-style coroutines you're looking are any lightweight than what Maurizio Vitale is looking for. They're actually more heavyweight because you need to implement some method of returning values to the coroutine being yeilded to. Dustin Laurence wrote: I guess that depends on whether the userspace thread package in question provides for a return value as pthreads does. Maurizio Vitale clearly wasn't looking for pthreads. In any case, coroutines don't need a scheduler, even a cooperative one. He also made it clear he wanted schedule his threads himself, just like you want to do. In fact, what he seems to be trying to implement are true symmetric coroutines. Ross Ridge
Re: Coroutines
Maurizio Vitale wrote: I'm looking at the very same problem, hoping to get very lightweight user-level threads for use in discrete event simulation. Dustin Laurence wrote: Yeah, though even that is more heavyweight than coroutines, so your job is harder than mine. Hmm? I don't see how the Lua-style coroutines you're looking are any lightweight than what Maurizio Vitale is looking for. They're actually more heavyweight because you need to implement some method of returning values to the coroutine being yeilded to. Ross Ridge
Re: TLS on windows
Ross Ridge wrote: Actually, the last one I haven't done yet. I've just been using a linker script to do that, but it should be in a library so the TLS directory entry isn't created if the executable doesn't use TLS. Richard Henderson wrote: You can also create this in the linker, without a library. Not too difficult, since you've got to do that to set the bit in the PE header anyway. Fortunately, the linker already supports setting the TLS directory entry in the PE header if a symbol named __tls_used exists. Section relative relocations are also already supported (for DWARF, I think), I just needed to add the syntax to gas. Ross Ridge
Re: [MinGW] Set NATIVE_SYSTEM_HEADER_DIR relative to configured prefix
Ranjit Mathew wrote: Danny, I'm using the same configure flags that you have used for GCC 3.4.5 MinGW release (*except* for --prefix=/mingw, which is something like --prefix=/j/mingw/mgw for me), but the GCC I get is not relocatable at all, while I can put the MinGW GCC 3.4.5 release anywhere on the filesystem and it still works. :-( The GCC I get from my native MinGW build of the trunk is relocatable: e:\util\mygcc.new\bin\gcc -v -E -o nul -x c x.c Using built-in specs. Target: mingw32 Configured with: ../gcc/configure --prefix=/src/gcc/runtime --target=mingw32 --host=mingw32 --enable-languages=c,c++ --enable-threads=win32 --with-win32-nlsapi=unicode --enable-bootstrap --disable-werror --with-ld=/src/gcc/runtime/bin/ld --with-as=/src/gcc/runtime/bin/as Thread model: win32 gcc version 4.2.0 20060513 (experimental) e:/util/mygcc.new/bin/../libexec/gcc/mingw32/4.2.0/cc1.exe -E -quiet -v -iprefix e:\util\mygcc.new\bin\../lib/gcc/mingw32/4.2.0/ x.c -o nul.exe -mtune=i386 ignoring nonexistent directory e:/util/mygcc.new/bin/../lib/gcc/mingw32/4.2.0/../../../../mingw32/include ignoring nonexistent directory /src/gcc/runtime/include ignoring nonexistent directory /src/gcc/runtime/include ignoring nonexistent directory /src/gcc/runtime/lib/gcc/mingw32/4.2.0/include ignoring nonexistent directory /src/gcc/runtime/mingw32/include ignoring nonexistent directory /mingw/include #include ... search starts here: #include ... search starts here: e:/util/mygcc.new/bin/../lib/gcc/mingw32/4.2.0/../../../../include e:/util/mygcc.new/bin/../lib/gcc/mingw32/4.2.0/include End of search list. It picks up the system include directory without a problem. What exactly is the error you're getting that indicates that your compiled version of GCC isn't relocatable? Ross Ridge
Re: [MinGW] Set NATIVE_SYSTEM_HEADER_DIR relative to configured prefix
Ross Ridge wrote: The GCC I get from my native MinGW build of the trunk is relocatable: Hmm... I should have sent that to gcc-patches, sorry. Ross Ridge
Re: TLS on windows
FX Coudert wrote: Now, for an idea of how much work it represents... perhaps someone here can tell us? It's not too hard but it requires changing GCC and binutils, plus a bit of library support. In my implementation (more or less finished, but I have had time to test it yet), I did the following: - Used the existing __thread support in the front-end. Silently ignore the ELF TLS models, because Windows only has one model. - Added target specific (cygming) support for __attribute__((thread)) aka __declspec(thread) for MSC compatibility. - Created an legitimize_win32_tls_address() to replace legitimize_tls_address() in i386.c. It outputs RTL like: (set (reg:SI tp) (mem:SI (unspec [(const_int 44)] WIN32_TIB))) (set (reg:SI index) (mem:SI (symbol_ref:SI __tls_index__))) (set (reg:SI base) (mem:SI (add:SI (reg:SI tp) (mult:SI (reg:SI index) (const_int 4) (plus:SI (reg:SI base) (const:SI (unspec:SI [(symbol_ref:SI foo)] SECREL - Handled the WIN32_TIB unspec by outputting %fs:44 and the SECREL unspec by outputting foo`SECREL. I couldn't use [EMAIL PROTECTED] because @ is valid in identifiers with PECOFF. - Support .tls sections in PECOFF by creating an i386_pe_select_section() based on the generic ELF version. - Added an -mfiber-safe-tls target specific option that makes the references to the WIN32 TIB non-constant. - Modified gas to handle foo`SECREL, based on the ELF support for @ relocations - Fixed some problems with TLS handling in the PECOFF linker script - Created an object file that defines the __tls_used structure (and thus the TLS directory entry) and __tls_index__. Actually, the last one I haven't done yet. I've just been using a linker script to do that, but it should be in a library so the TLS directory entry isn't created if the executable doesn't use TLS. Ross Ridge
Re: mingw32 subtle build failure
FX Coudert wrote: -B/mingw/i386-pc-mingw32/bin/ This looks wrong, it should be /mingw/mingw32/bin. Putting a copy of as and ld in /mingw/i386-pc-mingw32/bin might work around your problem. Ross Ridge
Re: Segment registers support for i386
Remy Saissy wrote: I've looked for a target specific callback to modify but I've found nothing, even in the gcc internals info pages. Do you mean I would have to modify some code outside of the i386 directory ? Or maybe to add such a callback if it doesn't exist ;) You'ld have to modify code in the main GCC directory, probably a lot of code. Since it's target dependent, you'ld need to implement it using a hook or hooks. In which file does the tree to RTL conversion code is located ? There are several files that do this jobs. See the internals documentation. Does it mean that an RTL expression which use reg: force gcc to use a particular pseudo register ? Pseudo registers aren't real registers. They either get changed to real hard registers, or memory references to stack slots. See the internals documentation for more details. Ross Ridge
Re: Segment registers support for i386
Remy Saissy wrote: if I understand well, to make gcc generating rtx according to an __attribute__((far(fs))) on a pointer I only have to add or modify rtx in the i386.md file and add an UNSPEC among the constants ? No, the work you need to on the backend, adding an UNSPEC constant to i386.md and writing code to handle the UNSPEC in i386.c is just the easy part. What I understand is that there is two kind of managment for attribute : Attributes are handled in various different ways depending on what the attribute does. To handle your case correctly, you'ld have to change how the tree to RTL conversion generates RTL addresses expressions whenever a pointer with the far attribute is dereferenced. This is probably going to be a lot work. Therefore, I can consider the following relationship: (mem:SI (plus:SI (unspec:SI [(reg:HI fs)] SEGREF) (reg:SI var))) | ||| \/ \/ \/ \/ int * __attribute__((far(fs))) p; No, that's not what the RTL expression represents. Declarations aren't represented in RTL. The example RTL expression I gave is just an expression, not a full RTL instruction. It's something that could be used as the memory operand of an instruction. The RTL expression I gave would correspond to a C expression (not a statement) like this: *(int * __atribute__((far(fs var does (reg:HI fs) care about the type of the parameter fs ? See the GCC Internals documentation. In my example, since I don't know what the actual hard register number you assigned to the FS segment register, I just put fs in the place where the actual register number would appear. Similarily, the var in (reg:SI var) represents the number of the pseudo-register GCC would allocate for an automatic variable named var. how does gcc recognize such an expression ? Since this expression is a memory operand, it's recognized by the GO_IF_LEGITIMATE_ADDRESS() macro. In the i386 port, that's implemented by legitimate_address_p() in i386.c. Ross Ridge
Re: Compiling files not encoded with system settings
Nicolas De Rico wrote: The file hi-utf16.c, created with Notepad and saved in unicode, contains a BOM which is, in essence, a small header at the beginning of the file that indicates the encoding. It's not a header that indicates the encoding. It's a header that indicates the byte order of the 16-bit values that follow when the encoding is already known to be UTF-16. When then encoding is known to be UTF-16LE or UTF-16BE there shouldn't be any BOM present at the start of a C file, since a BOM in the correct byte order is actually the Unicode zero-width non-breaking space character, which isn't valid as the first character in a C file. Similarly, there shouldn't be a BOM mark at the start of a UTF-8 C file, especially since UTF-8 encoded files don't have a byte-order. The presence of what looks to be UTF-16 BOM header can be used a part of a heuristic to guess the encoding of file, but I don't think it's a good idea for GCC to be guessing the encoding of files. Of course, stdio.h is stored in UTF-8 on the system so trying to convert it from UTF-16 will fail right away. It would probably be more accurate to describe stdio.h as an ASCII file. Ross Ridge