[clang] [clang] Better bitfield access units (PR #65742)
vitalybuka wrote: This failure is caused by the patch https://lab.llvm.org/buildbot/#/builders/239/builds/6363/steps/10/logs/stdio cc @hctim https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: Be aware of bug #87227 and PR #87238 to address that https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: Committed, expect tailclipping cleanup PR soon https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/urnathan closed https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
AtariDreams wrote: Ping? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall approved this pull request. Okay. Thanks for putting up with such a protracted review! I really like where we've ended up. LGTM. Please wait a few days before merging to give other reviewers a chance to give final feedback. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
github-actions[bot] wrote: :white_check_mark: With the latest revision this PR passed the Python code formatter. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: @rjmccall thanks, here is an update with those changes. I've collapsed it to the 3 commits mentioned earlier 1) Tests marked up for the current behaviour 2) TargetInfo strict/flexible alignment load predicate 3) The new algorithm https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,247 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive run where + // adjacent bitfields share parts of a byte. (The first bitfield of such an + // access unit will start at the beginning of a byte.) + + // We then try and accumulate adjacent access units when the combined unit is + // naturally sized, no larger than a register, and (on a strict alignment + // ISA), naturally aligned. Note that this requires lookahead to one or more + // subsequent access units. For instance, consider a 2-byte access-unit + // followed by 2 1-byte units. We can merge that into a 4-byte access-unit, + // but we would not want to merge a 2-byte followed by a single 1-byte (and no + // available tail padding). We keep track of the best access unit seen so far, + // and use that when we determine we cannot accumulate any more. Then we start + // again at the bitfield following that best one. + + // The accumulation is also prevented when: + // *) it would cross a character-aigned zero-width bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + // Data about the start of the span we're accumulating to create an access + // unit from. Begin is the first bitfie
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,247 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive run where + // adjacent bitfields share parts of a byte. (The first bitfield of such an + // access unit will start at the beginning of a byte.) + + // We then try and accumulate adjacent access units when the combined unit is + // naturally sized, no larger than a register, and (on a strict alignment + // ISA), naturally aligned. Note that this requires lookahead to one or more + // subsequent access units. For instance, consider a 2-byte access-unit + // followed by 2 1-byte units. We can merge that into a 4-byte access-unit, + // but we would not want to merge a 2-byte followed by a single 1-byte (and no + // available tail padding). We keep track of the best access unit seen so far, + // and use that when we determine we cannot accumulate any more. Then we start + // again at the bitfield following that best one. + + // The accumulation is also prevented when: + // *) it would cross a character-aigned zero-width bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + // Data about the start of the span we're accumulating to create an access + // unit from. Begin is the first bitfie
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,247 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive run where + // adjacent bitfields share parts of a byte. (The first bitfield of such an + // access unit will start at the beginning of a byte.) + + // We then try and accumulate adjacent access units when the combined unit is + // naturally sized, no larger than a register, and (on a strict alignment + // ISA), naturally aligned. Note that this requires lookahead to one or more + // subsequent access units. For instance, consider a 2-byte access-unit + // followed by 2 1-byte units. We can merge that into a 4-byte access-unit, + // but we would not want to merge a 2-byte followed by a single 1-byte (and no + // available tail padding). We keep track of the best access unit seen so far, + // and use that when we determine we cannot accumulate any more. Then we start + // again at the bitfield following that best one. + + // The accumulation is also prevented when: + // *) it would cross a character-aigned zero-width bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + // Data about the start of the span we're accumulating to create an access + // unit from. Begin is the first bitfie
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,247 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive run where + // adjacent bitfields share parts of a byte. (The first bitfield of such an + // access unit will start at the beginning of a byte.) + + // We then try and accumulate adjacent access units when the combined unit is + // naturally sized, no larger than a register, and (on a strict alignment + // ISA), naturally aligned. Note that this requires lookahead to one or more + // subsequent access units. For instance, consider a 2-byte access-unit + // followed by 2 1-byte units. We can merge that into a 4-byte access-unit, + // but we would not want to merge a 2-byte followed by a single 1-byte (and no + // available tail padding). We keep track of the best access unit seen so far, + // and use that when we determine we cannot accumulate any more. Then we start + // again at the bitfield following that best one. + + // The accumulation is also prevented when: + // *) it would cross a character-aigned zero-width bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + // Data about the start of the span we're accumulating to create an access + // unit from. Begin is the first bitfie
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,247 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive run where + // adjacent bitfields share parts of a byte. (The first bitfield of such an + // access unit will start at the beginning of a byte.) + + // We then try and accumulate adjacent access units when the combined unit is + // naturally sized, no larger than a register, and (on a strict alignment + // ISA), naturally aligned. Note that this requires lookahead to one or more + // subsequent access units. For instance, consider a 2-byte access-unit + // followed by 2 1-byte units. We can merge that into a 4-byte access-unit, + // but we would not want to merge a 2-byte followed by a single 1-byte (and no + // available tail padding). We keep track of the best access unit seen so far, + // and use that when we determine we cannot accumulate any more. Then we start + // again at the bitfield following that best one. + + // The accumulation is also prevented when: + // *) it would cross a character-aigned zero-width bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + // Data about the start of the span we're accumulating to create an access + // unit from. Begin is the first bitfie
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,247 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive run where + // adjacent bitfields share parts of a byte. (The first bitfield of such an + // access unit will start at the beginning of a byte.) + + // We then try and accumulate adjacent access units when the combined unit is + // naturally sized, no larger than a register, and (on a strict alignment + // ISA), naturally aligned. Note that this requires lookahead to one or more + // subsequent access units. For instance, consider a 2-byte access-unit + // followed by 2 1-byte units. We can merge that into a 4-byte access-unit, + // but we would not want to merge a 2-byte followed by a single 1-byte (and no + // available tail padding). We keep track of the best access unit seen so far, + // and use that when we determine we cannot accumulate any more. Then we start + // again at the bitfield following that best one. + + // The accumulation is also prevented when: + // *) it would cross a character-aigned zero-width bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + // Data about the start of the span we're accumulating to create an access + // unit from. Begin is the first bitfie
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall edited https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,247 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive run where + // adjacent bitfields share parts of a byte. (The first bitfield of such an + // access unit will start at the beginning of a byte.) + + // We then try and accumulate adjacent access units when the combined unit is + // naturally sized, no larger than a register, and (on a strict alignment + // ISA), naturally aligned. Note that this requires lookahead to one or more + // subsequent access units. For instance, consider a 2-byte access-unit + // followed by 2 1-byte units. We can merge that into a 4-byte access-unit, + // but we would not want to merge a 2-byte followed by a single 1-byte (and no + // available tail padding). We keep track of the best access unit seen so far, + // and use that when we determine we cannot accumulate any more. Then we start + // again at the bitfield following that best one. + + // The accumulation is also prevented when: + // *) it would cross a character-aigned zero-width bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + // Data about the start of the span we're accumulating to create an access + // unit from. Begin is the first bitfie
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,247 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive run where + // adjacent bitfields share parts of a byte. (The first bitfield of such an + // access unit will start at the beginning of a byte.) + + // We then try and accumulate adjacent access units when the combined unit is + // naturally sized, no larger than a register, and (on a strict alignment + // ISA), naturally aligned. Note that this requires lookahead to one or more + // subsequent access units. For instance, consider a 2-byte access-unit + // followed by 2 1-byte units. We can merge that into a 4-byte access-unit, + // but we would not want to merge a 2-byte followed by a single 1-byte (and no + // available tail padding). We keep track of the best access unit seen so far, + // and use that when we determine we cannot accumulate any more. Then we start + // again at the bitfield following that best one. + + // The accumulation is also prevented when: + // *) it would cross a character-aigned zero-width bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + // Data about the start of the span we're accumulating to create an access + // unit from. Begin is the first bitfie
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall commented: This looks great, thanks! I have a few editorial nits, and there's an assertion that seems off, but otherwise this looks ready to go. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,247 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive run where + // adjacent bitfields share parts of a byte. (The first bitfield of such an + // access unit will start at the beginning of a byte.) + + // We then try and accumulate adjacent access units when the combined unit is + // naturally sized, no larger than a register, and (on a strict alignment + // ISA), naturally aligned. Note that this requires lookahead to one or more + // subsequent access units. For instance, consider a 2-byte access-unit + // followed by 2 1-byte units. We can merge that into a 4-byte access-unit, + // but we would not want to merge a 2-byte followed by a single 1-byte (and no + // available tail padding). We keep track of the best access unit seen so far, + // and use that when we determine we cannot accumulate any more. Then we start + // again at the bitfield following that best one. + + // The accumulation is also prevented when: + // *) it would cross a character-aigned zero-width bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + // Data about the start of the span we're accumulating to create an access + // unit from. Begin is the first bitfie
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: Sorry to push another update, but I realized the LimitOffset computation could be sunk to the point of use, and therefore not computed in all cases. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
rjmccall wrote: Yeah, we don't need to care about the actual bit offset of the zero-width bit-field as long as we honor the non-interference properties across it. I'll take a look at the patch, thanks. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: @rjmccall here is a rebase an update, which I think addresses all your comments. I did make some material changes though: 1) I removed the Volatile handling. I was always a little uncomfortable with it because it didn't affect the access units of a non-volatile bitfield that ends up being volatile qualified via the structure's volatile quals itself. Users of volatile fields are probably confused, but if not they have a std-blessed mechanism for isolating access units -- a zero-length bitfield. The heuristic broke the idea that this patch /just/ implemented a better access algorithm so I was being inconsistent. AFAICT the ARM ABI, which seems to be one that describes volatile bitfields carefully, does not specify that volatile bitfields should be as isolated as possible. This change causes one change, in `CodeGen/aapcs-bitfield.c` with: ``` struct st5 { int a : 12; volatile char c : 5; } st5; ``` The two bitfields are placed into a single 32-bit access unit, rather than separate 16bit ones. Either behaviour is ok with the aapcs. (The previous algorithm would have placed them into a single 24bit field if they abutted at a byte boundary, no change in that behaviour now.) 2) Implementing the barrier behaviour you wanted. I tried to find a psABI that had the right attributes to place barriers at arbitrary bit positions, to see if it had any comment, but failed to find one. as you say, this simplifies things, but ... 3) The barrier bitfield behaviour that we already had showed an interesting behaviour, which I suspect is from a later scissoring processing. Namely with: ``` struct A { unsigned a : 16; unsigned b : 8; char : 0; unsigned d; } a; ``` we'd generate an llvm struct of `{ i24, i32}`, but then adjust the access unit for the bitfields to be a 32-bit unit itself. That seems conformant, because nothing else uses that 4th byte, and the std merely says that the bitfield starts at an allocation unit boundary. This patch was originally treating that barrier as a limit to the current span, whereas we can use the next member with storage as that limit. This is actually the behaviour we had when we reached the end of a run of bitfields, so I have made that change as you can see from the changed handling setting Barrier. 4) I adjusted the new testcases to reduce their complexity -- we don't care about endianness, which only affects the placement of the bitfields within their access unit, not the access units themselves. It may be that the later pass that was adjusting bitfield accesses to natural sizes can be simplified or deleted. I've not investigated. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall edited https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall edited https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall edited https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall edited https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
wzssyqa wrote: > For MIPSr6, it is just like AARCH64, since some microarchitecture doesn't > support mis-unaligned well in hardware level, so we need an options to > disable it: kernel may need it. > > For GCC, we have `-mno-unalgined-access`. We need also add this option to > clang. https://github.com/llvm/llvm-project/pull/85174 Add -m(no-)unaligned-access for MIPSr6. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall edited https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
@@ -439,82 +444,194 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset), MemberInfo::Field, nullptr, *Field)); } -return; +return Field; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; - }; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Accessing a sequence + // of bitfields mustn't interfere with adjacent non-bitfields -- they're + // permitted to be accessed in separate threads for instance. + + // We split runs of bit-fields into a sequence of "access units". When we emit + // a load or store of a bit-field, we'll load/store the entire containing + // access unit. As mentioned, the standard requires that these loads and + // stores must not interfere with accesses to other memory locations, and it + // defines the bit-field's memory location as the current run of + // non-zero-width bit-fields. So an access unit must never overlap with + // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're + // free to draw the lines as we see fit. + + // Drawing these lines well can be complicated. LLVM generally can't modify a + // program to access memory that it didn't before, so using very narrow access + // units can prevent the compiler from using optimal access patterns. For + // example, suppose a run of bit-fields occupies four bytes in a struct. If we + // split that into four 1-byte access units, then a sequence of assignments + // that doesn't touch all four bytes may have to be emitted with multiple + // 8-bit stores instead of a single 32-bit store. On the other hand, if we use + // very wide access units, we may find ourselves emitting accesses to + // bit-fields we didn't really need to touch, just because LLVM was unable to + // clean up after us. + + // It is desirable to have access units be aligned powers of 2 no larger than + // a register. (On non-strict alignment ISAs, the alignment requirement can be + // dropped.) A three byte access unit will be accessed using 2-byte and 1-byte + // accesses and bit manipulation. If no bitfield straddles across the two + // separate accesses, it is better to have separate 2-byte and 1-byte access + // units, as then LLVM will not generate unnecessary memory accesses, or bit + // manipulation. Similarly, on a strict-alignment architecture, it is better + // to keep access-units naturally aligned, to avoid similar bit + // manipulation synthesizing larger unaligned accesses. + + // We do this in two phases, processing a sequential run of bitfield + // declarations. + + // a) Bitfields that share parts of a single byte are, of necessity, placed in + // the same access unit. That unit will encompass a consecutive + // run where adjacent bitfields share parts of a byte. (The first bitfield of + // such an access unit will start at the beginning of a byte.) + + // b) Accumulate adjacent access units when the combined unit is naturally + // sized, no larger than a register, and on a strict alignment ISA, + // aligned. Note that this requires lookahead to one or more subsequent access + // units. For instance, consider a 2-byte access-unit followed by 2 1-byte + // units. We can merge that into a 4-byte access-unit, but we would not want + // to merge a 2-byte followed by a single 1-byte (and no available tail + // padding). + + // This accumulation is prevented when: + // *) it would cross a zero-width bitfield (ABI-dependent), or + // *) one of the candidate access units contains a volatile bitfield, or + // *) fine-grained bitfield access option is in effect. + + CharUnits RegSize = + bitsToCharUnits(Context.getTargetInfo().getRegisterWidth()); + unsigned CharBits = Context.getCharWidth(); + + RecordDecl::field_iterator Begin = FieldEnd; + CharUnits StartOffset; + uint64_t BitSize; + CharUnits BestEndOffset; + RecordDecl::field_iterator BestEnd = B
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall commented: Thank you, the structure looks great! Mostly style and clarity comments from here. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: @rjmccall here's a refactoring along the lines you suggested. Once one stops worrying about rescanning when the next access unit fails to accumulate into the current one, things get simpler. The compiler should be able to turn the boolean conditional flow within the loop body into direct control flow. Would you like the API change I introduced in the previous patch (having `accumulateBitFields` return the following non-bitfield) broken out into a separate PR? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: This update * moves the creation and installation of the BitFieldAccessUnit objects into CGRecordLowering. * adds aarch64 and arm darwin target tests, as those have different layout rules (let me know if I've missed something here). * Simplifies the checking of barriers -- an explicit flag, rather than inferring from a zero-sized unit. The latter can represent a non-barrier that happens to fall at a byte boundary, and results in a simplification of the gathering operation. * changes the API of accumulateBitFields to return the following non-bitfield. This means that the caller does not itself have to scan across the bitfields to find their range. Given the bitfield processor has scan the bitfields anyway, this seems like a win. * because of that, the determination of the limiting boundary, only needed for non-ms abis, is moved to where it is needed. FWIW, I experimented with the representation in BitFieldAccessUnit: * Although adjacent units are contiguous and it would seem one of Begin and End is unneeded, it turned out easiest to have both -- omitting Begin simply makes the calling code to the bookwork * Whether to represent access unit boundaries as bit or char offsets. Whichever is chosen leads to conversions to the other representation at various points. Using chars was the least painful, and makes it clear these are char boundaries. It might be possible to go further and perform the accumulation you outlined above, but there's enough churn in this update that I thought it best to push it. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() { : getStorageType(*Field), *Field)); ++Field; -} else { - ++Field; } } } -void -CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, - RecordDecl::field_iterator FieldEnd) { - // Run stores the first element of the current run of bitfields. FieldEnd is - // used as a special value to note that we don't have a current run. A - // bitfield run is a contiguous collection of bitfields that can be stored in - // the same storage block. Zero-sized bitfields and bitfields that would - // cross an alignment boundary break a run and start a new one. - RecordDecl::field_iterator Run = FieldEnd; - // Tail is the offset of the first bit off the end of the current run. It's - // used to determine if the ASTRecordLayout is treating these two bitfields as - // contiguous. StartBitOffset is offset of the beginning of the Run. - uint64_t StartBitOffset, Tail = 0; +namespace { + +// A run of bitfields assigned to the same access unit -- the size of memory +// loads & stores. +class BitFieldAccessUnit { + RecordDecl::field_iterator Begin; // Field at start of this access unit. + RecordDecl::field_iterator End; // Field just after this access unit. + + CharUnits StartOffset; // Starting offset in the containing record. + CharUnits EndOffset; // Finish offset (exclusive) in the containing record. + + bool ContainsVolatile; // This access unit contains a volatile bitfield. + +public: + // End barrier constructor. + BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset, + bool Volatile = false) + : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset), +ContainsVolatile(Volatile) {} + + // Collect contiguous bitfields into an access unit. + BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin, + RecordDecl::field_iterator FieldEnd, + const CGRecordLowering &CGRL); + + // Compute the access unit following this one -- which might be a barrier at + // Limit. + BitFieldAccessUnit accumulateNextUnit(RecordDecl::field_iterator FieldEnd, +CharUnits Limit, +const CGRecordLowering &CGRL) const { +return end() != FieldEnd ? BitFieldAccessUnit(end(), FieldEnd, CGRL) + : BitFieldAccessUnit(FieldEnd, Limit); + } + // Re-set the end of this unit if there is space before Probe starts. + void enlargeIfSpace(const BitFieldAccessUnit &Probe, CharUnits Offset) { +if (Probe.getStartOffset() >= Offset) { + End = Probe.begin(); + EndOffset = Offset; +} + } + +public: + RecordDecl::field_iterator begin() const { return Begin; } + RecordDecl::field_iterator end() const { return End; } + +public: + // Accessors + CharUnits getSize() const { return EndOffset - StartOffset; } + CharUnits getStartOffset() const { return StartOffset; } + CharUnits getEndOffset() const { return EndOffset; } + + // Predicates + bool isBarrier() const { return getSize().isZero(); } + bool hasVolatile() const { return ContainsVolatile; } + + // Create the containing access unit and install the bitfields. + void installUnit(CGRecordLowering &CGRL) const { +if (!isBarrier()) { + // Add the storage member for the access unit to the record. The + // bitfields get the offset of their storage but come afterward and remain + // there after a stable sort. + llvm::Type *Type = CGRL.getIntNType(CGRL.Context.toBits(getSize())); + CGRL.Members.push_back(CGRL.StorageInfo(getStartOffset(), Type)); + for (auto F : *this) +if (!F->isZeroLengthBitField(CGRL.Context)) + CGRL.Members.push_back(CGRecordLowering::MemberInfo( + getStartOffset(), CGRecordLowering::MemberInfo::Field, nullptr, + F)); +} + } +}; + +// Create an access unit of contiguous bitfields. +BitFieldAccessUnit::BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin, + RecordDecl::field_iterator FieldEnd, + const CGRecordLowering &CGRL) +: BitFieldAccessUnit(FieldBegin, CharUnits::Zero(), + FieldBegin->getType().isVolatileQualified()) { + assert(End != FieldEnd); + + uint64_t StartBit = CGRL.getFieldBitOffset(*FieldBegin); + uint64_t BitSize = End->getBitWidthValue(CGRL.Context); + unsigned CharBits = CGRL.Context.getCharWidth(); + + assert(!(StartBit % CharBits) && "Not at start of char"); + + ++End; + if (BitSize || + !(CGRL.Context.getTargetInfo().useZeroLengthBitfieldAlignment() || +CGRL.Context.getTargetInfo().useBitFieldTypeAlignment())) +// The first field is not a (zero-width) barrier. Collect contiguous fields. +for (; End != FieldEnd; ++End) { + uin
[clang] [clang] Better bitfield access units (PR #65742)
@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() { : getStorageType(*Field), *Field)); ++Field; -} else { - ++Field; } } } -void -CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, - RecordDecl::field_iterator FieldEnd) { - // Run stores the first element of the current run of bitfields. FieldEnd is - // used as a special value to note that we don't have a current run. A - // bitfield run is a contiguous collection of bitfields that can be stored in - // the same storage block. Zero-sized bitfields and bitfields that would - // cross an alignment boundary break a run and start a new one. - RecordDecl::field_iterator Run = FieldEnd; - // Tail is the offset of the first bit off the end of the current run. It's - // used to determine if the ASTRecordLayout is treating these two bitfields as - // contiguous. StartBitOffset is offset of the beginning of the Run. - uint64_t StartBitOffset, Tail = 0; +namespace { + +// A run of bitfields assigned to the same access unit -- the size of memory +// loads & stores. +class BitFieldAccessUnit { + RecordDecl::field_iterator Begin; // Field at start of this access unit. + RecordDecl::field_iterator End; // Field just after this access unit. + + CharUnits StartOffset; // Starting offset in the containing record. + CharUnits EndOffset; // Finish offset (exclusive) in the containing record. + + bool ContainsVolatile; // This access unit contains a volatile bitfield. + +public: + // End barrier constructor. + BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset, + bool Volatile = false) + : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset), +ContainsVolatile(Volatile) {} + + // Collect contiguous bitfields into an access unit. + BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin, + RecordDecl::field_iterator FieldEnd, + const CGRecordLowering &CGRL); urnathan wrote: Updated to add gatherBitFieldAccessUnit and installBitFieldAccessUnit members of CGRecordLowering. You're right that it makes the structure clearer. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() { : getStorageType(*Field), *Field)); ++Field; -} else { - ++Field; } } } -void -CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, - RecordDecl::field_iterator FieldEnd) { - // Run stores the first element of the current run of bitfields. FieldEnd is - // used as a special value to note that we don't have a current run. A - // bitfield run is a contiguous collection of bitfields that can be stored in - // the same storage block. Zero-sized bitfields and bitfields that would - // cross an alignment boundary break a run and start a new one. - RecordDecl::field_iterator Run = FieldEnd; - // Tail is the offset of the first bit off the end of the current run. It's - // used to determine if the ASTRecordLayout is treating these two bitfields as - // contiguous. StartBitOffset is offset of the beginning of the Run. - uint64_t StartBitOffset, Tail = 0; +namespace { + +// A run of bitfields assigned to the same access unit -- the size of memory +// loads & stores. +class BitFieldAccessUnit { + RecordDecl::field_iterator Begin; // Field at start of this access unit. + RecordDecl::field_iterator End; // Field just after this access unit. + + CharUnits StartOffset; // Starting offset in the containing record. + CharUnits EndOffset; // Finish offset (exclusive) in the containing record. + + bool ContainsVolatile; // This access unit contains a volatile bitfield. + +public: + // End barrier constructor. + BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset, + bool Volatile = false) + : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset), +ContainsVolatile(Volatile) {} + + // Collect contiguous bitfields into an access unit. + BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin, + RecordDecl::field_iterator FieldEnd, + const CGRecordLowering &CGRL); + + // Compute the access unit following this one -- which might be a barrier at + // Limit. + BitFieldAccessUnit accumulateNextUnit(RecordDecl::field_iterator FieldEnd, +CharUnits Limit, +const CGRecordLowering &CGRL) const { +return end() != FieldEnd ? BitFieldAccessUnit(end(), FieldEnd, CGRL) + : BitFieldAccessUnit(FieldEnd, Limit); + } + // Re-set the end of this unit if there is space before Probe starts. + void enlargeIfSpace(const BitFieldAccessUnit &Probe, CharUnits Offset) { +if (Probe.getStartOffset() >= Offset) { + End = Probe.begin(); + EndOffset = Offset; +} + } + +public: + RecordDecl::field_iterator begin() const { return Begin; } + RecordDecl::field_iterator end() const { return End; } + +public: + // Accessors + CharUnits getSize() const { return EndOffset - StartOffset; } + CharUnits getStartOffset() const { return StartOffset; } + CharUnits getEndOffset() const { return EndOffset; } + + // Predicates + bool isBarrier() const { return getSize().isZero(); } + bool hasVolatile() const { return ContainsVolatile; } + + // Create the containing access unit and install the bitfields. + void installUnit(CGRecordLowering &CGRL) const { +if (!isBarrier()) { + // Add the storage member for the access unit to the record. The + // bitfields get the offset of their storage but come afterward and remain + // there after a stable sort. + llvm::Type *Type = CGRL.getIntNType(CGRL.Context.toBits(getSize())); + CGRL.Members.push_back(CGRL.StorageInfo(getStartOffset(), Type)); + for (auto F : *this) +if (!F->isZeroLengthBitField(CGRL.Context)) + CGRL.Members.push_back(CGRecordLowering::MemberInfo( + getStartOffset(), CGRecordLowering::MemberInfo::Field, nullptr, + F)); +} + } +}; + +// Create an access unit of contiguous bitfields. +BitFieldAccessUnit::BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin, + RecordDecl::field_iterator FieldEnd, + const CGRecordLowering &CGRL) +: BitFieldAccessUnit(FieldBegin, CharUnits::Zero(), + FieldBegin->getType().isVolatileQualified()) { + assert(End != FieldEnd); + + uint64_t StartBit = CGRL.getFieldBitOffset(*FieldBegin); + uint64_t BitSize = End->getBitWidthValue(CGRL.Context); + unsigned CharBits = CGRL.Context.getCharWidth(); + + assert(!(StartBit % CharBits) && "Not at start of char"); + + ++End; + if (BitSize || + !(CGRL.Context.getTargetInfo().useZeroLengthBitfieldAlignment() || +CGRL.Context.getTargetInfo().useBitFieldTypeAlignment())) +// The first field is not a (zero-width) barrier. Collect contiguous fields. +for (; End != FieldEnd; ++End) { + uin
[clang] [clang] Better bitfield access units (PR #65742)
@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() { : getStorageType(*Field), *Field)); ++Field; -} else { - ++Field; } } } -void -CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, - RecordDecl::field_iterator FieldEnd) { - // Run stores the first element of the current run of bitfields. FieldEnd is - // used as a special value to note that we don't have a current run. A - // bitfield run is a contiguous collection of bitfields that can be stored in - // the same storage block. Zero-sized bitfields and bitfields that would - // cross an alignment boundary break a run and start a new one. - RecordDecl::field_iterator Run = FieldEnd; - // Tail is the offset of the first bit off the end of the current run. It's - // used to determine if the ASTRecordLayout is treating these two bitfields as - // contiguous. StartBitOffset is offset of the beginning of the Run. - uint64_t StartBitOffset, Tail = 0; +namespace { + +// A run of bitfields assigned to the same access unit -- the size of memory +// loads & stores. +class BitFieldAccessUnit { + RecordDecl::field_iterator Begin; // Field at start of this access unit. + RecordDecl::field_iterator End; // Field just after this access unit. + + CharUnits StartOffset; // Starting offset in the containing record. + CharUnits EndOffset; // Finish offset (exclusive) in the containing record. + + bool ContainsVolatile; // This access unit contains a volatile bitfield. + +public: + // End barrier constructor. + BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset, + bool Volatile = false) + : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset), +ContainsVolatile(Volatile) {} + + // Collect contiguous bitfields into an access unit. + BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin, + RecordDecl::field_iterator FieldEnd, + const CGRecordLowering &CGRL); + + // Compute the access unit following this one -- which might be a barrier at + // Limit. + BitFieldAccessUnit accumulateNextUnit(RecordDecl::field_iterator FieldEnd, +CharUnits Limit, +const CGRecordLowering &CGRL) const { +return end() != FieldEnd ? BitFieldAccessUnit(end(), FieldEnd, CGRL) + : BitFieldAccessUnit(FieldEnd, Limit); + } + // Re-set the end of this unit if there is space before Probe starts. + void enlargeIfSpace(const BitFieldAccessUnit &Probe, CharUnits Offset) { +if (Probe.getStartOffset() >= Offset) { + End = Probe.begin(); + EndOffset = Offset; +} + } + +public: + RecordDecl::field_iterator begin() const { return Begin; } + RecordDecl::field_iterator end() const { return End; } + +public: + // Accessors + CharUnits getSize() const { return EndOffset - StartOffset; } + CharUnits getStartOffset() const { return StartOffset; } + CharUnits getEndOffset() const { return EndOffset; } + + // Predicates + bool isBarrier() const { return getSize().isZero(); } + bool hasVolatile() const { return ContainsVolatile; } + + // Create the containing access unit and install the bitfields. + void installUnit(CGRecordLowering &CGRL) const { +if (!isBarrier()) { + // Add the storage member for the access unit to the record. The + // bitfields get the offset of their storage but come afterward and remain + // there after a stable sort. + llvm::Type *Type = CGRL.getIntNType(CGRL.Context.toBits(getSize())); + CGRL.Members.push_back(CGRL.StorageInfo(getStartOffset(), Type)); + for (auto F : *this) +if (!F->isZeroLengthBitField(CGRL.Context)) + CGRL.Members.push_back(CGRecordLowering::MemberInfo( + getStartOffset(), CGRecordLowering::MemberInfo::Field, nullptr, + F)); +} + } +}; + +// Create an access unit of contiguous bitfields. +BitFieldAccessUnit::BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin, + RecordDecl::field_iterator FieldEnd, + const CGRecordLowering &CGRL) +: BitFieldAccessUnit(FieldBegin, CharUnits::Zero(), + FieldBegin->getType().isVolatileQualified()) { + assert(End != FieldEnd); + + uint64_t StartBit = CGRL.getFieldBitOffset(*FieldBegin); + uint64_t BitSize = End->getBitWidthValue(CGRL.Context); + unsigned CharBits = CGRL.Context.getCharWidth(); + + assert(!(StartBit % CharBits) && "Not at start of char"); + + ++End; + if (BitSize || + !(CGRL.Context.getTargetInfo().useZeroLengthBitfieldAlignment() || +CGRL.Context.getTargetInfo().useBitFieldTypeAlignment())) +// The first field is not a (zero-width) barrier. Collect contiguous fields. +for (; End != FieldEnd; ++End) { + uin
[clang] [clang] Better bitfield access units (PR #65742)
@@ -858,6 +861,10 @@ class TargetInfo : public TransferrableTargetInfo, return PointerWidth; } + /// Return true, iff unaligned accesses are a single instruction (rather than rjmccall wrote: ```suggestion /// Return true iff unaligned accesses are a single instruction (rather than ``` https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() { : getStorageType(*Field), *Field)); ++Field; -} else { - ++Field; } } } -void -CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, - RecordDecl::field_iterator FieldEnd) { - // Run stores the first element of the current run of bitfields. FieldEnd is - // used as a special value to note that we don't have a current run. A - // bitfield run is a contiguous collection of bitfields that can be stored in - // the same storage block. Zero-sized bitfields and bitfields that would - // cross an alignment boundary break a run and start a new one. - RecordDecl::field_iterator Run = FieldEnd; - // Tail is the offset of the first bit off the end of the current run. It's - // used to determine if the ASTRecordLayout is treating these two bitfields as - // contiguous. StartBitOffset is offset of the beginning of the Run. - uint64_t StartBitOffset, Tail = 0; +namespace { + +// A run of bitfields assigned to the same access unit -- the size of memory +// loads & stores. +class BitFieldAccessUnit { + RecordDecl::field_iterator Begin; // Field at start of this access unit. + RecordDecl::field_iterator End; // Field just after this access unit. + + CharUnits StartOffset; // Starting offset in the containing record. + CharUnits EndOffset; // Finish offset (exclusive) in the containing record. + + bool ContainsVolatile; // This access unit contains a volatile bitfield. + +public: + // End barrier constructor. + BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset, + bool Volatile = false) + : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset), +ContainsVolatile(Volatile) {} + + // Collect contiguous bitfields into an access unit. + BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin, + RecordDecl::field_iterator FieldEnd, + const CGRecordLowering &CGRL); rjmccall wrote: The structure here is pretty weird to me. This class seems to be mostly a helper class for breaking down the bit-field run into access units, but it's also a short-term representation of those access units? If having a helper class for breaking the run is useful, let's separate that and have it generate these access units. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() { : getStorageType(*Field), *Field)); ++Field; -} else { - ++Field; } } } -void -CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, - RecordDecl::field_iterator FieldEnd) { - // Run stores the first element of the current run of bitfields. FieldEnd is - // used as a special value to note that we don't have a current run. A - // bitfield run is a contiguous collection of bitfields that can be stored in - // the same storage block. Zero-sized bitfields and bitfields that would - // cross an alignment boundary break a run and start a new one. - RecordDecl::field_iterator Run = FieldEnd; - // Tail is the offset of the first bit off the end of the current run. It's - // used to determine if the ASTRecordLayout is treating these two bitfields as - // contiguous. StartBitOffset is offset of the beginning of the Run. - uint64_t StartBitOffset, Tail = 0; +namespace { + +// A run of bitfields assigned to the same access unit -- the size of memory +// loads & stores. +class BitFieldAccessUnit { + RecordDecl::field_iterator Begin; // Field at start of this access unit. + RecordDecl::field_iterator End; // Field just after this access unit. + + CharUnits StartOffset; // Starting offset in the containing record. + CharUnits EndOffset; // Finish offset (exclusive) in the containing record. + + bool ContainsVolatile; // This access unit contains a volatile bitfield. + +public: + // End barrier constructor. + BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset, + bool Volatile = false) + : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset), +ContainsVolatile(Volatile) {} + + // Collect contiguous bitfields into an access unit. + BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin, + RecordDecl::field_iterator FieldEnd, + const CGRecordLowering &CGRL); + + // Compute the access unit following this one -- which might be a barrier at + // Limit. + BitFieldAccessUnit accumulateNextUnit(RecordDecl::field_iterator FieldEnd, +CharUnits Limit, +const CGRecordLowering &CGRL) const { +return end() != FieldEnd ? BitFieldAccessUnit(end(), FieldEnd, CGRL) + : BitFieldAccessUnit(FieldEnd, Limit); + } + // Re-set the end of this unit if there is space before Probe starts. + void enlargeIfSpace(const BitFieldAccessUnit &Probe, CharUnits Offset) { +if (Probe.getStartOffset() >= Offset) { + End = Probe.begin(); + EndOffset = Offset; +} + } + +public: + RecordDecl::field_iterator begin() const { return Begin; } + RecordDecl::field_iterator end() const { return End; } + +public: + // Accessors + CharUnits getSize() const { return EndOffset - StartOffset; } + CharUnits getStartOffset() const { return StartOffset; } + CharUnits getEndOffset() const { return EndOffset; } + + // Predicates + bool isBarrier() const { return getSize().isZero(); } + bool hasVolatile() const { return ContainsVolatile; } + + // Create the containing access unit and install the bitfields. + void installUnit(CGRecordLowering &CGRL) const { +if (!isBarrier()) { + // Add the storage member for the access unit to the record. The + // bitfields get the offset of their storage but come afterward and remain + // there after a stable sort. + llvm::Type *Type = CGRL.getIntNType(CGRL.Context.toBits(getSize())); + CGRL.Members.push_back(CGRL.StorageInfo(getStartOffset(), Type)); + for (auto F : *this) +if (!F->isZeroLengthBitField(CGRL.Context)) + CGRL.Members.push_back(CGRecordLowering::MemberInfo( + getStartOffset(), CGRecordLowering::MemberInfo::Field, nullptr, + F)); +} + } +}; + +// Create an access unit of contiguous bitfields. +BitFieldAccessUnit::BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin, + RecordDecl::field_iterator FieldEnd, + const CGRecordLowering &CGRL) +: BitFieldAccessUnit(FieldBegin, CharUnits::Zero(), + FieldBegin->getType().isVolatileQualified()) { + assert(End != FieldEnd); + + uint64_t StartBit = CGRL.getFieldBitOffset(*FieldBegin); + uint64_t BitSize = End->getBitWidthValue(CGRL.Context); + unsigned CharBits = CGRL.Context.getCharWidth(); + + assert(!(StartBit % CharBits) && "Not at start of char"); + + ++End; + if (BitSize || + !(CGRL.Context.getTargetInfo().useZeroLengthBitfieldAlignment() || +CGRL.Context.getTargetInfo().useBitFieldTypeAlignment())) +// The first field is not a (zero-width) barrier. Collect contiguous fields. +for (; End != FieldEnd; ++End) { + uin
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall edited https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/rjmccall commented: I think using the existing information as a default is fine, but please continue to call the hook something like `hasCheapUnalignedAccess()` to preserve the intent, even if it always just calls the other hook. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: > > > > Thinking further, several (AArch64, ARM & Loongson) targets have a > 'HasUnaligned' bool in their TargetInfo objects. Perhaps the way forwards is > to promote that to the base TargetInfo and just use that, rather than the > bitfield-specific field I added (which just duplicates that functionality on > those targets)? I've added such a change. The AArch64 & ARM targets record this information to determine whether a `__ARM_FEATURE_UNALIGNED` macro should be defined. The Loongson target just needs to check the `-ual` target feature. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
github-actions[bot] wrote: :warning: C/C++ code formatter, clang-format found issues in your code. :warning: You can test this locally with the following command: ``bash git-clang-format --diff 8cfb71613c452dd45a84a74affe8464bfd33de02 b4cb8945224633fef27b69f0eed34f505032f76e -- clang/test/CodeGen/aapcs-bitfield-access-unit.c clang/test/CodeGen/bitfield-access-pad.c clang/test/CodeGen/bitfield-access-unit.c clang/test/CodeGenCXX/bitfield-access-empty.cpp clang/test/CodeGenCXX/bitfield-access-tail.cpp clang/test/CodeGenCXX/bitfield-ir.cpp clang/include/clang/Basic/TargetInfo.h clang/lib/Basic/TargetInfo.cpp clang/lib/Basic/Targets/AArch64.cpp clang/lib/Basic/Targets/AArch64.h clang/lib/Basic/Targets/ARM.cpp clang/lib/Basic/Targets/ARM.h clang/lib/Basic/Targets/LoongArch.cpp clang/lib/Basic/Targets/LoongArch.h clang/lib/Basic/Targets/Mips.h clang/lib/Basic/Targets/PPC.h clang/lib/Basic/Targets/SystemZ.h clang/lib/Basic/Targets/VE.h clang/lib/Basic/Targets/WebAssembly.h clang/lib/Basic/Targets/X86.h clang/lib/CodeGen/CGRecordLayoutBuilder.cpp clang/test/CodeGen/aapcs-bitfield.c clang/test/CodeGen/arm-bitfield-alignment.c clang/test/CodeGen/arm64-be-bitfield.c clang/test/CodeGen/bitfield-2.c clang/test/CodeGen/debug-info-bitfield-0-struct.c clang/test/CodeGen/no-bitfield-type-align.c clang/test/CodeGen/struct-x86-darwin.c clang/test/CodeGenCXX/bitfield.cpp clang/test/OpenMP/atomic_capture_codegen.cpp clang/test/OpenMP/atomic_read_codegen.c clang/test/OpenMP/atomic_update_codegen.cpp clang/test/OpenMP/atomic_write_codegen.c `` View the diff from clang-format here. ``diff diff --git a/clang/include/clang/Basic/TargetInfo.h b/clang/include/clang/Basic/TargetInfo.h index aab571e1d2..c046cab93d 100644 --- a/clang/include/clang/Basic/TargetInfo.h +++ b/clang/include/clang/Basic/TargetInfo.h @@ -863,9 +863,7 @@ public: /// Return true, iff unaligned accesses are a single instruction (rather than /// a synthesized sequence). - bool hasUnalignedAccess() const { -return HasUnalignedAccess; - } + bool hasUnalignedAccess() const { return HasUnalignedAccess; } /// \brief Returns the default value of the __USER_LABEL_PREFIX__ macro, /// which is the prefix given to user symbols by default. `` https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: > Yeah, LLVM supports changing subtarget options on a per-function basis. We > would presumably make the query based on the global setting. > > Anyway, getting this information from LLVM doesn't seem tractable in the > short-to-medium term; it's just unfortunate. Thinking further, several (AArch64, ARM & Loongson) targets have a 'HasUnaligned' bool in their TargetInfo objects. Perhaps the way forwards is to promote that to the base TargetInfo and just use that, rather than the bitfield-specific field I added (which just duplicates that functionality on those targets)? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo : LoongArchTargetInfo(Triple, Opts) { LongWidth = LongAlign = PointerWidth = PointerAlign = 64; IntMaxType = Int64Type = SignedLong; +HasCheapUnalignedBitfieldAccess = true; urnathan wrote: fixed https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo : LoongArchTargetInfo(Triple, Opts) { LongWidth = LongAlign = PointerWidth = PointerAlign = 64; IntMaxType = Int64Type = SignedLong; +HasCheapUnalignedBitfieldAccess = true; SixWeining wrote: > you say set to true when -mno-unaligned-access, but I think you mean set to > false in that case? Yes, I think it should be `false`. @wzssyqa https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
rjmccall wrote: > > On the target hook, it's a shame we can't easily get this information from > > LLVM. I believe it's already there — `TargetLowering` has an > > `allowsMisalignedMemoryAccesses` method that includes some approximation of > > how fast a particular access would be. In practice, it seems to be quite > > complex and often specific to the type and subtarget. Maybe it'd be worth > > starting a conversation with LLVM folks to see if this could reasonably be > > lifted somewhere we could use it? > > Agreed, I did look at that, and AFAICT the allowsMisalignedMemoryAccess is a > property of the llvm subtarget, which can, IIUC, change on a per-function > basis? One at least needs a function codegen context around? And, clang needs > to generate the llvm-structure in non-function contexts (eg namespace-scope > variables). I'd really like to not tangle this particular change with a > complex reorg. Yeah, LLVM supports changing subtarget options on a per-function basis. We would presumably make the query based on the global setting. Anyway, getting this information from LLVM doesn't seem tractable in the short-to-medium term; it's just unfortunate. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo : LoongArchTargetInfo(Triple, Opts) { LongWidth = LongAlign = PointerWidth = PointerAlign = 64; IntMaxType = Int64Type = SignedLong; +HasCheapUnalignedBitfieldAccess = true; urnathan wrote: you say set to true when -mno-unaligned-access, but I think you mean set to false in that case? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: > On the target hook, it's a shame we can't easily get this information from > LLVM. I believe it's already there — `TargetLowering` has an > `allowsMisalignedMemoryAccesses` method that includes some approximation of > how fast a particular access would be. In practice, it seems to be quite > complex and often specific to the type and subtarget. Maybe it'd be worth > starting a conversation with LLVM folks to see if this could reasonably be > lifted somewhere we could use it? Agreed, I did look at that, and AFAICT the allowsMisalignedMemoryAccess is a property of the llvm subtarget, which can, IIUC, change on a per-function basis? One at least needs a function codegen context around? And, clang needs to generate the llvm-structure in non-function contexts (eg namespace-scope variables). I'd really like to not tangle this particular change with a complex reorg. I had a discourse thread at, https://discourse.llvm.org/t/targetinfo-for-unaligned-load-capability/72832 about such a target hook. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo : LoongArchTargetInfo(Triple, Opts) { LongWidth = LongAlign = PointerWidth = PointerAlign = 64; IntMaxType = Int64Type = SignedLong; +HasCheapUnalignedBitfieldAccess = true; wzssyqa wrote: > > Only set it to true when -mno-unaligned-access. > > In `LoongArchTargetInfo::handleTargetFeatures()`, we can handle the feature > `-ual`. Maybe it's true, while you may need it if you need to build a generic kernel image for both microarchitectures. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/wzssyqa edited https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/wzssyqa edited https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo : LoongArchTargetInfo(Triple, Opts) { LongWidth = LongAlign = PointerWidth = PointerAlign = 64; IntMaxType = Int64Type = SignedLong; +HasCheapUnalignedBitfieldAccess = true; SixWeining wrote: > Only set it to true when -mno-unaligned-access. In `LoongArchTargetInfo::handleTargetFeatures()`, we can handle the feature `-ual`. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo : LoongArchTargetInfo(Triple, Opts) { LongWidth = LongAlign = PointerWidth = PointerAlign = 64; IntMaxType = Int64Type = SignedLong; +HasCheapUnalignedBitfieldAccess = true; wzssyqa wrote: Same to MIPS, while there is no -mno-unaligned-access for MIPS yet. I will add it support. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo : LoongArchTargetInfo(Triple, Opts) { LongWidth = LongAlign = PointerWidth = PointerAlign = 64; IntMaxType = Int64Type = SignedLong; +HasCheapUnalignedBitfieldAccess = true; wzssyqa wrote: Only set it to true when -mno-unaligned-access. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
wzssyqa wrote: > `-mno-unalgined-access` has been added since clang17: > https://reviews.llvm.org/D149946 This option is for LoongArch instead of MIPSr6. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
SixWeining wrote: > For GCC, we have `-mno-unalgined-access`. We need also add this option to > clang. `-mno-unalgined-access` has been added since clang17: https://reviews.llvm.org/D149946 https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
wzssyqa wrote: For MIPSr6, it is just like AARCH64, since some microarchitecture doesn't support mis-unaligned well in hardware level, so we need an options to disable it: kernel may need it. For GCC, we have `-mno-unalgined-access`. We need also add this option to clang. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
rjmccall wrote: > @rjmccall thanks for your comments. Here's a reimplementation using a single > loop -- the downside is that we may end up repeating the initial grouping > scan, if we search more than 1 Access Unit ahead and fail to find a 'good' > access unit. This update changes the algorithm slightly, as I realized that > `struct A { short a: 16; char b:8; int c;};` and `struct B { char b: 8; short > a:16; int c;};` should have the same (32-bit) access unit. Originally only > `A` got that. Thanks. Sorry about the delay, I'll try to take a look in the next day. > 1. I noticed that several `CGRecordLowering` member fns could either be > `const` or `static` -- would you like that as a separate NFC PR? Yes, please. > 2. I have corrected the error about merging across zero-sized members. > 3. It may not be obvious from GH, but this PR breaks down to 3 (or 4) > separate commits: >a) A set of test cases, marked up for the current scheme >b) A new Target hook indicating whether unaligned accesses are cheap >c) [optional] the CGRecordLowering change just mentioned >d) the algorithm change, which updates those tests and makes it easier to > see how the behaviour changes. >Do you want that commit structure maintained? That sounds like good commit structure. On the target hook, it's a shame we can't easily get this information from LLVM. I believe it's already there — `TargetLowering` has an `allowsMisalignedMemoryAccesses` method that includes some approximation of how fast a particular access would be. In practice, it seems to be quite complex and often specific to the type and subtarget. Maybe it'd be worth starting a conversation with LLVM folks to see if this could reasonably be lifted somewhere we could use it? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: @rjmccall thanks for your comments. Here's a reimplementation using a single loop -- the downside is that we may end up repeating the initial grouping scan, if we search more than 1 Access Unit ahead and fail to find a 'good' access unit. This update changes the algorithm slightly, as I realized that `struct A { short a: 16; char b:8; int c;};` and `struct B { char b: 8; short a:16; int c;};` should have the same (32-bit) access unit. Originally only `A` got that. 1) I noticed that several `CGRecordLowering` member fns could either be `const` or `static` -- would you like that as a separate NFC PR? 2) I have corrected the error about merging across zero-sized members. 3) It may not be obvious from GH, but this PR breaks down to 3 (or 4) separate commits: a) A set of test cases, marked up for the current scheme b) A new Target hook indicating whether unaligned accesses are cheap c) [optional] the CGRecordLowering change just mentioned d) the algorithm change, which updates those tests and makes it easier to see how the behaviour changes. Do you want that commit structure maintained? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -376,33 +377,41 @@ void CGRecordLowering::lowerUnion(bool isNoUniqueAddress) { } void CGRecordLowering::accumulateFields() { - for (RecordDecl::field_iterator Field = D->field_begin(), - FieldEnd = D->field_end(); -Field != FieldEnd;) { + RecordDecl::field_iterator FieldEnd = D->field_end(); + RecordDecl::field_iterator BitField = FieldEnd; + for (RecordDecl::field_iterator Field = D->field_begin(); Field != FieldEnd; + ++Field) { if (Field->isBitField()) { - RecordDecl::field_iterator Start = Field; - // Iterate to gather the list of bitfields. - for (++Field; Field != FieldEnd && Field->isBitField(); ++Field); - accumulateBitFields(Start, Field); + if (BitField == FieldEnd) +// Start gathering bitfields +BitField = Field; } else if (!Field->isZeroSize(Context)) { // Use base subobject layout for the potentially-overlapping field, // as it is done in RecordLayoutBuilder + CharUnits Offset = bitsToCharUnits(getFieldBitOffset(*Field)); urnathan wrote: ah, thanks -- I should have checked harder. Simple enough to undo that bit. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -442,79 +455,235 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, return; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Such overlap, in the + // absence of packing, is always complete -- one storage unit is entirely + // within another. However, llvm cannot represent that -- it's structures are + // entirely flat. We place bitfields in 'access units', which are similar to + // the SysV storage units, but a clang-specific concept. + + // It can be advantageous to concatenate two adjacent access units, if the + // concenation can be read or written in a single instruction. + + // We do two passes. + + // a) allocate bitfields into the smallest access units they can + // fit. This results in a set of integral-typed access units. + + // b) concatentate mergeable access units. This applies the + // above-mentioned optimization, and in general, requires lookahead + // to know the next access unit -- not merely the next bitfield. + + class AccessUnit { +// Which bitfields are in the access +RecordDecl::field_iterator First; +RecordDecl::field_iterator Last; + +CharUnits Start; // Starting offset within the record +CharUnits End; // Finish offset (exclusive) within the record + +bool ContainsVolatile = false; + + public: +AccessUnit(RecordDecl::field_iterator F, RecordDecl::field_iterator L, + CharUnits S, CharUnits E, bool HasVolatile = false) +: First(F), Last(L), Start(S), End(E), ContainsVolatile(HasVolatile) {} +AccessUnit(RecordDecl::field_iterator F, CharUnits Place) +: AccessUnit(F, F, Place, Place) {} + + public: +auto begin() const { return First; } +auto end() const { return Last; } + + public: +void MergeFrom(AccessUnit const &Earlier) { + First = Earlier.First; + Start = Earlier.Start; +} + + public: +// Accessors +CharUnits getSize() const { return getSize(*this); } +CharUnits getStart() const { return Start; } +CharUnits getSize(const AccessUnit &NotEarlier) const { + return NotEarlier.End - Start; +} + +// Setter +void setEnd(CharUnits E) { End = E; } + +// Predicates +bool isBarrier() const { return getSize().isZero(); } +bool hasVolatile() const { return ContainsVolatile; } + +bool StartsBefore(CharUnits Offset, bool NonStrict = false) const { + if (NonStrict) +// Not strictly <, permit == +++Offset; + return Start < Offset; +} +bool ExtendsBeyond(CharUnits Offset) const { return End > Offset; } +bool EndsAt(CharUnits Offset) const { return End == Offset; } }; rjmccall wrote: This is definitely big enough that I would say to pull it out of the function body. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -415,12 +424,16 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, uint64_t StartBitOffset, Tail = 0; if (isDiscreteBitFieldABI()) { for (; Field != FieldEnd; ++Field) { - uint64_t BitOffset = getFieldBitOffset(*Field); + if (!Field->isBitField()) { +assert(Field->isZeroSize(Context) && "non-zero sized non-bitfield"); +continue; + } rjmccall wrote: Per my comment above, this is unnecessary; this should never be possible. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -442,79 +455,235 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, return; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Such overlap, in the + // absence of packing, is always complete -- one storage unit is entirely + // within another. However, llvm cannot represent that -- it's structures are + // entirely flat. We place bitfields in 'access units', which are similar to + // the SysV storage units, but a clang-specific concept. + + // It can be advantageous to concatenate two adjacent access units, if the + // concenation can be read or written in a single instruction. + + // We do two passes. + + // a) allocate bitfields into the smallest access units they can + // fit. This results in a set of integral-typed access units. + + // b) concatentate mergeable access units. This applies the + // above-mentioned optimization, and in general, requires lookahead + // to know the next access unit -- not merely the next bitfield. rjmccall wrote: As far as I can tell, this doesn't really require doing two passes; it's still an online algorithm, just over first-phase access units instead of individual bit-fields. You could take your existing code to build up those first-phase units and then just call into your merge-or-emit logic whenever you finish one instead of adding it to a vector. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -442,79 +455,235 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, return; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Such overlap, in the + // absence of packing, is always complete -- one storage unit is entirely + // within another. However, llvm cannot represent that -- it's structures are + // entirely flat. We place bitfields in 'access units', which are similar to + // the SysV storage units, but a clang-specific concept. + + // It can be advantageous to concatenate two adjacent access units, if the + // concenation can be read or written in a single instruction. + + // We do two passes. + + // a) allocate bitfields into the smallest access units they can + // fit. This results in a set of integral-typed access units. rjmccall wrote: It's understood that the access units will be integer-typed. You should explain what you mean by "can fit", though. It looks like: bit-fields are merged whenever they start at a non-exact byte boundary. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -442,79 +455,235 @@ CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field, return; } - // Check if OffsetInRecord (the size in bits of the current run) is better - // as a single field run. When OffsetInRecord has legal integer width, and - // its bitfield offset is naturally aligned, it is better to make the - // bitfield a separate storage component so as it can be accessed directly - // with lower cost. - auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord, - uint64_t StartBitOffset) { -if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses) - return false; -if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) || -!DataLayout.fitsInLegalInteger(OffsetInRecord)) - return false; -// Make sure StartBitOffset is naturally aligned if it is treated as an -// IType integer. -if (StartBitOffset % -Context.toBits(getAlignment(getIntNType(OffsetInRecord))) != -0) - return false; -return true; + // The SysV ABI can overlap bitfield storage units with both other bitfield + // storage units /and/ other non-bitfield data members. Such overlap, in the + // absence of packing, is always complete -- one storage unit is entirely + // within another. However, llvm cannot represent that -- it's structures are + // entirely flat. We place bitfields in 'access units', which are similar to + // the SysV storage units, but a clang-specific concept. rjmccall wrote: Hmm. I'm not really sure this is right — it seems to be contradicted by `struct { char array[5]; int bitfield: 24; };` — but I don't think it matters anyway. If I had to explain the philosophy here, I would say something like: > We split runs of bit-fields into a sequence of "access units". When we emit > a load or store of a bit-field, we'll load/store the entire containing access > unit. The standard requires that these loads and stores must not interfere > with accesses to other memory locations, and it defines the bit-field's > memory location as the current run of non-zero-width bit-fields. So an > access unit must never overlap with non-bit-field storage or cross a > zero-width bit-field. Otherwise, we're free to draw the lines as we see fit. > > Drawing these lines well can be complicated. LLVM generally can't modify a > program to access memory that it didn't before, so using very narrow access > units can prevent the compiler from using optimal access patterns. For > example, suppose a run of bit-fields occupies four bytes in a struct. If we > split that into four 1-byte access units, then a sequence of assignments that > doesn't touch all four bytes may have to be emitted with multiple 8-bit > stores instead of a single 32-bit store. On the other hand, if we use very > wide access units, we may find ourselves emitting accesses to bit-fields we > didn't really need to touch, just because LLVM was unable to clean up after > us. And then you can get into explaining your algorithm. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
@@ -376,33 +377,41 @@ void CGRecordLowering::lowerUnion(bool isNoUniqueAddress) { } void CGRecordLowering::accumulateFields() { - for (RecordDecl::field_iterator Field = D->field_begin(), - FieldEnd = D->field_end(); -Field != FieldEnd;) { + RecordDecl::field_iterator FieldEnd = D->field_end(); + RecordDecl::field_iterator BitField = FieldEnd; + for (RecordDecl::field_iterator Field = D->field_begin(); Field != FieldEnd; + ++Field) { if (Field->isBitField()) { - RecordDecl::field_iterator Start = Field; - // Iterate to gather the list of bitfields. - for (++Field; Field != FieldEnd && Field->isBitField(); ++Field); - accumulateBitFields(Start, Field); + if (BitField == FieldEnd) +// Start gathering bitfields +BitField = Field; } else if (!Field->isZeroSize(Context)) { // Use base subobject layout for the potentially-overlapping field, // as it is done in RecordLayoutBuilder + CharUnits Offset = bitsToCharUnits(getFieldBitOffset(*Field)); rjmccall wrote: This is permitting accesses to cross a regular field if it has zero size. That is not allowed; bit-fields are only part of the same memory location if they are part of a consecutive sequence of non-zero-width bit-fields, and bit-fields in different memory locations must never interfere. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: Can someone please review this? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: ping? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: Rebased https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
llvmbot wrote: @llvm/pr-subscribers-backend-webassembly @llvm/pr-subscribers-clang Author: Nathan Sidwell (urnathan) Changes Reimplement the SysV (non-Microsoft) bitfield access unit computation. For avoidance of doubt, this is not an ABI change, but the memory accesses Clang emits to access bitfields. This is a stack of two commits, to make it easier to see the change in behaviour. The first commit is a set of new tests that verify the placement of access units under the current algorith. The second commit is the change, and alters those tests as required. The guts of the change is in `clang/lib/CodeGen/CGRecordLayoutBuilder.cpp`. The SysV ABI describes placing bitfields in 'storage units', which may end up overlapping eachother or other data members (for instance 'int bf :4;' may be in a 32-bit storage unit that overlaps a plan 'char c;' field). LLVM doesn't represent structures at that level, and Clang must create a non-overlapping representation of the structure. That's what's happening here. To avoid confusion, I'm using the phrase 'access unit' to refer to the integer type Clang uses. There are two aspects to chosing access units: 1) bitfields that share bits of a single byte must be in the same access unit -- you can't touch only one of those bitfields. This computation is unchanged. 2) adjacent bitfields that occupy distinct bytes, might want to share an access unit. One wants to do this if that makes a better access size. This computation is changed. The current algorithm computes both of these in a single pass. Computing if the next bitfield extends the current access unit by requiring it exacly abut the previous, and a few other constraints. But there are some issues with this: a) If there's even a single bit of padding between the two, no concatenation occurs. This seems suboptimal. b) Because of the above-mentioned overlapping, bitfield access units might not start on an aligned boundary, and some architectures do not support unaligned accesses. c) Likewise, the access unit might not be able to be a natural 2^n size, and the adjacent bytes occupied (with either other data members, or with C++, a derived class occupying tail-padding bytes). We must emit exactly a (say) 24-bit access (to avoid aliasing issues?) (example below). d) Concatentating access units can lead to an unaligned larger access, and again, that's problematic for strict alignment ISA. (consider `struct C { char a : 8; char b : 8;};` `C` has alignment 1, but a 2 byte access unit would naturally want 2-byte alignment. My understanding is that larger access units were desired, to reduce the total number of accesses, and there was an assertion that the access would be narrowed if only some of the bytes changed. That's not what we've observed, and its demonstrable on x86_64: ``` struct B { int a : 16; int b : 8; char c; }; ``` Here, B::a and B::b could be placed into a 3 byte access unit, but we can't use 4-byte accesses due to the 4th byte containing `c`. The code we end up generating for an increment of B::b is: ``` movzbl 2(%rdi), %eax shll$16, %eax movzwl (%rdi), %ecx addl%ecx, %eax addl$65536, %eax# imm = 0x1 movw%ax, (%rdi) shrl$16, %eax movb%al, 2(%rdi) ``` we're loading both `B::a` and `B::b`, sticking them together, incrementing `B::b` and then only storing the latter. I.e. we've reduced the memory accesses, but we're doing more work than necessary. Without the `B::c` field, there's 1 byte of padding, and a regular 4-byte access can be used (and is). X86_64 is not unique in this behaviour. I'm of the opinion that one cannot do these in a single pass -- concatenating access units fundamentally requires knowing how large both units are, and you cannot tell that just by looking at the first bitfield of the second access unit. That's what this patch does: separate the two algorithm into separate phases. We first determine which bitfields /must/ be in a single access unit, and then determine which of those access units could be concatentated. We pay attention to the storage-size of the integral type -- an i24 naturally requires 4 bytes of storage. When generating the access units we note barriers such as `:0` bitfields (on ABIs that care), and the starting location of the next non-bitfield or, the structure tail-padding. The concatenation only occurs if: a) The larger access is naturally aligned, or unaligned access is cheap. b) The larger access storage size doesn't expand into a forbidden zone c) The larger access is not bigger than a register. d) Neither of the two access units contain a volatile bitfield. (Note that if an access-unit's storage-size naturally overlaps a subsequent access unit, we'll merge the two provided. A partial overlap can occur with packed structures and is handled too.) ETA: in essence, it only merges ac
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: > I'm planning to take a closer look at this when I have more time. Sorry I > haven't been more responsive here. When might that be? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: You knew this ping was coming, right? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: thanks @yonghong-song! Taking the opportunity for a ping :) https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
yonghong-song wrote: With this pull request, I tried to build linux kernel and kernel bpf selftest and run the selftest, and didn't find any issues. So presumably the implementation is correct. I tried a particular kernel bpf selftest: https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_tunnel_kern.c#L39-L42 which has a few bitfield usages. The generated code is the same with or without this change. The bitfield patterns in the above test_tunnel_kern.c is pretty typical in kernel which is pro aligned bitfields. This patch tries to handle some cases better for not-commonly-used bitfield patterns, those patterns (like the one in the pull request summary) are very rare in bpf program or kernel, so I guess it should have minimum impact on BPF backend. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: > One very brief note: in the comments in the code, you might want to > distinguish between the semantic width of a bitfield (i.e. the C standard > notion of a "memory location", which has ABI significance), vs. the accesses > we choose to generate (we don't need to generate no-op reads/writes). Indeed, the naming here is unfortunately overloaded. SysV psABIs use 'Storage Unit' (at least 86 does, and IIRC others follow the same nomenclature). But Clang also uses 'StorageInfo' and 'Storage$FOO' in its bitfield structures, which is unfortunate. I've used 'Access Unit' to describe the latter in this patch. If you meant something else, please clarify (or if I've missed some places?). https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
efriedma-quic wrote: I'm planning to take a closer look at this when I have more time. Sorry I haven't been more responsive here. One very brief note: in the comments in the code, you might want to distinguish between the semantic width of a bitfield (i.e. the C standard notion of a "memory location", which has ABI significance), vs. the accesses we choose to generate (we don't need to generate no-op reads/writes). https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
nigelp-xmos wrote: XCore: yes the current setting of the new flag, the default false value, is right for XCore. Misaligned loads can generate a function call. We do not have benchmarks as such, but comparing the code generated for the test files touched here, there is an improvement. In particular bitfield-ir.cpp is a big improvement. In bitfield.cpp, two misaligned loads remain. One disappears if I remove the struct packed attribute. The other remains: N0::read31(). This is not a problem, and is not a degradation, it's a function call intended for the purpose; just mentioning for information. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: time for another ping, I think ... https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: Ping? Anything more I can do to progress this? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/urnathan edited https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
https://github.com/urnathan ready_for_review https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: thanks for your comments, > When I refer to CSE/DSE, I'm mostly talking about keeping values in registers > for longer. They don't know anything about individual fields in bitfields. If > we split bitfields too narrowly, we end up with extra memory accesses for > accessing multiple adjacent fields. And if you have accesses which overlap > (access some, but not all, of the same memory), CSE/DSE get much worse. To be clear by 'accesses which overlap' you mean accesses to bitfields that share parts of a byte? I agree -- that's an unchanged part of the algorithm, bitfields that share parts of a single byte must be in the same access unit. Or do you mean an access to a non-bitfield adjacent to a bitfield? How does that square with the memory model you pointed to earlier? Or is it that if you see both such accesses, it must be safe to merge them, or there's some UB already happening? But anyway, this algorithm doesn't affect that -- in both before and after cases something outside of bitfieldAccumulation must be merging them. > Having a bitfield unit that can't be loaded in a single memory access > probably doesn't have much benefit, if there's a natural boundary we can use > to split. I think we're agreeing here. The current algorithm generates access units that can require multiple accesses, even on a relaxed alignment machine (I refer back to the x86 example). This replacement tries much harder to not do that -- it only merges access units if that demonstrably lowers the number of accesses needed for the merged access unit from accessing the original set of units. The motivating case we had was a sequence of 4 byte loads, to fetch an unaligned 32-bit access unit, 1 byte of that being fiddled with, and then that one byte being written back. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
efriedma-quic wrote: > Hm, I've been thinking of it the opposite way round -- merging bitfields is > throwing away information (about where cuts might be). In essence, late optimizations can make accesses narrower, but it can't make them wider. The information about cuts is generally recoverable by analyzing the masking instructions. -- When I refer to CSE/DSE, I'm mostly talking about keeping values in registers for longer. They don't know anything about individual fields in bitfields. If we split bitfields too narrowly, we end up with extra memory accesses for accessing multiple adjacent fields. And if you have accesses which overlap (access some, but not all, of the same memory), CSE/DSE get much worse. Having a bitfield unit that can't be loaded in a single memory access probably doesn't have much benefit, if there's a natural boundary we can use to split. > Certainly, by could you clarify why changing the access patterns could do > that? My understanding of Poison is that should they cause a user-visible > state change, that's indicative of the original progam's UB, and the > languages in question don't guarantee anything at that point -- nasal demons > and all. That the observed behaviour of a UB-exhibiting program changes is > not a defect. What am I missing? Depending on how we fix the interaction between poison and bitfields, there could be an invariant that accessing any bitfield has to freeze all adjacent bitfield values. If we do that, then the set of "adjacent" values actually matters. There's probably some way we can make this work without actually introducing memory accesses, though; probably not worth worrying about too much here. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: > The advantage of exposing the wide accesses to the optimizer is that it > allows memory optimizations, like CSE or DSE, to reason about the entire > bitfield as a single unit, and easily eliminate accesses. Reducing the size > of bitfield accesses in the frontend is basically throwing away information. Hm, I've been thinking of it the opposite way round -- merging bitfields is throwing away information (about where cuts might be). And it's unclear to me how CSE or DSE could make use of a merged access unit to eliminate accesses -- it would seem to me that a merged access unit accessed at a single point would make it look like the whole unit was live? (Can you point me at an example of the analysis you describe happening?) That the simple x86 example I showed doesn't show (complete) undoing of the merging suggests it is hard for CSE and DSE to do the analysis you indicate. DSE did work there, to undo the merge, but there's no dead load elimination happening. But, that DSE is merely undoing the gluing that the front end did -- if we didn't glue, then it would always happen. > The disadvantage is, as you note here, that sometimes codegen doesn't manage > to clean up the resulting code well. My contention is that the current algorithm both (a) fails to merge some mergeable access units and (b) inappropriately merges some access units *especially* on strict alignment machines. > I guess my primary question here is, instead of making clang try to guess > which accesses will be optimal, can we improve the way the LLVM code > generator handles the patterns currently generated by clang? I'm not exactly > against changing the IR generated by clang, but changing the IR like this > seems likely to improve some cases at the expense of others. In the testcases that were changed I did not encounter one that generated worse code (an ARM testcase showed better code due to, IIRC, not merging more than a register width, as with the x86 case it didn't eliminate unnecessary loads whereas with this those are gone). I would be very interested in seeing cases that degrade though. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: > > > Note that changing the memory accesses performed by clang (load or store) > > > _is_ an ABI change at IR level because of UB. > > > > > > Could you clarify why UB is a consideration here? > > There's some weirdness with poison values; see > https://reviews.llvm.org/D129541 etc. This is mostly an IR modeling thing; > the machine instructions work fine, we just need to make sure we don't > accidentally poison IR values that aren't supposed to be poison. Certainly, by could you clarify why changing the access patterns could do that? My understanding of Poison is that should they cause a user-visible state change, that's indicative of the original progam's UB, and the languages in question don't guarantee anything at that point -- nasal demons and all. That the observed behaviour of a UB-exhibiting program changes is not a defect. What am I missing? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: > Also, please have a look at the existing `-ffine-grained-bitfield-accesses` > flag and the discussions around it and IPO. That flag's behaviour is unchanged here -- it continues to not merge access units. My contention here is that (a) merging units can be advantageous, but (b) the existing algorithm has some deficiencies. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
efriedma-quic wrote: > > Note that changing the memory accesses performed by clang (load or store) > > _is_ an ABI change at IR level because of UB. > > Could you clarify why UB is a consideration here? There's some weirdness with poison values; see https://reviews.llvm.org/D129541 etc. This is mostly an IR modeling thing; the machine instructions work fine, we just need to make sure we don't accidentally poison IR values that aren't supposed to be poison. https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [clang] Better bitfield access units (PR #65742)
urnathan wrote: > Note that changing the memory accesses performed by clang (load or store) > _is_ an ABI change at IR level because of UB. Could you clarify why UB is a consideration here? https://github.com/llvm/llvm-project/pull/65742 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits