cameron.mcinally added inline comments.
================ Comment at: llvm/docs/LangRef.rst:1837 + are present, this overrides ``"denormal-fp-math"``. Not all targets + support separately setting the denormal mode per type. + ---------------- arsenm wrote: > andrew.w.kaylor wrote: > > cameron.mcinally wrote: > > > andrew.w.kaylor wrote: > > > > arsenm wrote: > > > > > andrew.w.kaylor wrote: > > > > > > arsenm wrote: > > > > > > > andrew.w.kaylor wrote: > > > > > > > > Can you document which targets do support the option? What > > > > > > > > happens if I try to use the option on a target where it is not > > > > > > > > supported? > > > > > > > I'm not sure where to document this, or if/how/where to diagnose > > > > > > > it. I don't think the high level LangRef description is the right > > > > > > > place to discuss specific target handling. > > > > > > > > > > > > > > Currently it won't error or anything. Code checking the denorm > > > > > > > mode will see the f32 specific mode, even if the target in the > > > > > > > end isn't really going to respect this. > > > > > > > > > > > > > > One problem is this potentially does require coordination with > > > > > > > other toolchain components. For AMDGPU, the compiler can directly > > > > > > > tell the driver what FP mode to set on each entry point, but for > > > > > > > x86 it requires linking in crtfastmath to set the default mode > > > > > > > bits. If another target had a similar runtime environment > > > > > > > requirement, I don't think we can be sure the attribute is > > > > > > > correct or not. > > > > > > There is precedent for describing target-specific behavior in > > > > > > LangRef. It just doesn't seem useful to say that not all targets > > > > > > support the attribute without saying which ones do. We should also > > > > > > say what is expected if a target doesn't support the attribute. It > > > > > > seems reasonable for the function attribute to be silently ignored. > > > > > > > > > > > > > One problem is this potentially does require coordination with > > > > > > > other toolchain components. For AMDGPU, the compiler can directly > > > > > > > tell the driver what FP mode to set on each entry point, but for > > > > > > > x86 it requires linking in crtfastmath to set the default mode > > > > > > > bits. > > > > > > > > > > > > This is a point I'm interested in. I don't like the current > > > > > > crtfastmath.o handling. It feels almost accidental when FTZ works > > > > > > as expected. My understanding is we link crtfastmath.o if we find > > > > > > it but if not everything just goes about its business. The Intel > > > > > > compiler injects code into main() to explicitly set the FTZ/DAZ > > > > > > control modes. That obviously has problems too, but it's at least > > > > > > consistent and predictable. As I understand it, crtfastmath.o sets > > > > > > these modes from a static initializer, but I'm not sure anything is > > > > > > done to determine the order of that initializer relative to others. > > > > > > > > > > > > How does the compiler identify entry points for AMDGPU? And does it > > > > > > emit code to set FTZ based on the function attribute here? > > > > > The entry points are a specific calling convention. There's no real > > > > > concept of main. Each kernel has an associated blob of metadata the > > > > > driver uses to set up various config registers on dispatch. > > > > > > > > > > I don't think specially recognizing main in the compiler is > > > > > fundamentally different than having it done in a static constructor. > > > > > It's still a construct not associated with any particular function or > > > > > anything. > > > > The problem with having it done in a static constructor is that you > > > > have no certainty of when it will be done relative to other static > > > > constructors. If it's in main you can at least say that it's after all > > > > the static constructors (assuming main is your entry point). > > > Yes and no. The linker should honor static constructor priorities. But, > > > yeah, there's no guarantee that this constructor will run before other > > > priority 101 constructors. > > > > > > The performance penalty for setting denormal flushing in main could be > > > significant (think C++). Also, there's precedent for using static > > > constructors, like GCC's crtfastmath.o. > > Fair enough. I don't necessarily like how icc handles this. I don't have a > > problem with how gcc handles it. I just really don't like how LLVM does it. > > If we want to take the static constructor approach we should define our > > own, not depend on whether or not the GNU object file happens to be around. > > > > Static initialization doesn't help for AMDGPU, and I suppose that's likely > > to be the case for any offload execution model. Since this patch is moving > > us toward a more consistent implementation I'm wondering if we can define > > some general rules for how this is supposed to work. Like when the function > > attribute will result in injected instructions setting the control flags > > and when it won't. > I think the most we can expect of this attribute as informing codegen of the > expected FP denormal handling mode, and not something responsible for > ensuring the mode will really be set. AMDGPU conceptually could have a > separate set of attributes for setting the denormal FP mode, but since it > would look identical, this gets a bonus usage for setting it for kernels. > This doesn't protect you from calling functions in modules compiled with > different attributes, so similar problems outside the view of the compiler > still exist > If we want to take the static constructor approach we should define our own, > not depend on whether or not the GNU object file happens to be around. That's a good idea. There's subtle differences between targets in the GNU implementation. It would be good to standardize them. CHANGES SINCE LAST ACTION https://reviews.llvm.org/D69878/new/ https://reviews.llvm.org/D69878 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits