[clang] [clang] Better bitfield access units (PR #65742)

2024-04-01 Thread Vitaly Buka via cfe-commits

vitalybuka wrote:

This failure is caused by the patch
https://lab.llvm.org/buildbot/#/builders/239/builds/6363/steps/10/logs/stdio
cc @hctim 

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-04-01 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

Be aware of bug #87227 and PR #87238 to address that

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-29 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

Committed, expect tailclipping cleanup PR soon

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-29 Thread Nathan Sidwell via cfe-commits

https://github.com/urnathan closed 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-29 Thread via cfe-commits

AtariDreams wrote:

Ping?

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-22 Thread John McCall via cfe-commits

https://github.com/rjmccall approved this pull request.

Okay.  Thanks for putting up with such a protracted review!  I really like 
where we've ended up.

LGTM.  Please wait a few days before merging to give other reviewers a chance 
to give final feedback.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-22 Thread via cfe-commits

github-actions[bot] wrote:



:white_check_mark: With the latest revision this PR passed the Python code 
formatter.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-22 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

@rjmccall thanks, here is an update with those changes.  I've collapsed it to 
the 3 commits mentioned earlier

1) Tests marked up for the current behaviour
2) TargetInfo strict/flexible alignment load predicate
3) The new algorithm

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-22 Thread Nathan Sidwell via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first bitfie

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-19 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first bitfie

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-19 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first bitfie

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-19 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first bitfie

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-19 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first bitfie

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-19 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first bitfie

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-19 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-19 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first bitfie

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-19 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first bitfie

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-19 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

This looks great, thanks!  I have a few editorial nits, and there's an 
assertion that seems off, but otherwise this looks ready to go.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-19 Thread John McCall via cfe-commits


@@ -439,82 +444,247 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // Bitfields that share parts of a single byte are, of necessity, placed in
+  // the same access unit. That unit will encompass a consecutive run where
+  // adjacent bitfields share parts of a byte. (The first bitfield of such an
+  // access unit will start at the beginning of a byte.)
+
+  // We then try and accumulate adjacent access units when the combined unit is
+  // naturally sized, no larger than a register, and (on a strict alignment
+  // ISA), naturally aligned. Note that this requires lookahead to one or more
+  // subsequent access units. For instance, consider a 2-byte access-unit
+  // followed by 2 1-byte units. We can merge that into a 4-byte access-unit,
+  // but we would not want to merge a 2-byte followed by a single 1-byte (and 
no
+  // available tail padding). We keep track of the best access unit seen so 
far,
+  // and use that when we determine we cannot accumulate any more. Then we 
start
+  // again at the bitfield following that best one.
+
+  // The accumulation is also prevented when:
+  // *) it would cross a character-aigned zero-width bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  // Data about the start of the span we're accumulating to create an access
+  // unit from. Begin is the first bitfie

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-18 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

Sorry to push another update, but I realized the LimitOffset computation could 
be sunk to the point of use, and therefore not computed in all cases.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-18 Thread John McCall via cfe-commits

rjmccall wrote:

Yeah, we don't need to care about the actual bit offset of the zero-width 
bit-field as long as we honor the non-interference properties across it.  I'll 
take a look at the patch, thanks.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-18 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

@rjmccall here is a rebase an update, which I think addresses all your 
comments.  I did make some material changes though:

1) I removed the Volatile handling. I was always a little uncomfortable with it 
because it didn't affect the access units of a non-volatile bitfield that ends 
up being volatile qualified via the structure's volatile quals itself. Users of 
volatile fields are probably confused, but if not they have a std-blessed 
mechanism for isolating access units -- a zero-length bitfield. The heuristic 
broke the idea that this patch /just/ implemented a better access algorithm so 
I was being inconsistent. AFAICT the ARM ABI, which seems to be one that 
describes volatile bitfields carefully, does not specify that volatile 
bitfields should  be as isolated as possible.  This change causes one change, 
in `CodeGen/aapcs-bitfield.c` with:
```
struct st5 {
  int a : 12;
  volatile char c : 5;
} st5;
```
The two bitfields are placed into a single 32-bit access unit, rather than 
separate 16bit ones. Either behaviour is ok with the aapcs. (The previous 
algorithm would have placed them into a single 24bit field if they abutted at a 
byte boundary, no change in that behaviour now.)

2) Implementing the barrier behaviour you wanted. I tried to find a psABI that 
had the right attributes to place barriers at arbitrary bit positions, to see 
if it had any comment, but failed to find one.  as you say, this simplifies 
things, but ...

3) The barrier bitfield behaviour that we already had showed an interesting 
behaviour, which I suspect is from a later scissoring processing.  Namely with:
```
struct  A {
  unsigned a : 16;
  unsigned b : 8;
  char : 0;
  unsigned d;
} a;
```
we'd generate an llvm struct of `{ i24, i32}`, but then adjust the access unit 
for the bitfields to be a 32-bit unit itself. That seems conformant, because 
nothing else uses that 4th byte, and the std merely says that the bitfield 
starts at an allocation unit boundary. This patch was originally treating that 
barrier as a limit to the current span, whereas we can use the next member with 
storage as that limit.  This is actually the behaviour we had when we reached 
the end of a run of bitfields, so I have made that change as you can see from 
the changed handling setting Barrier.

4) I adjusted the new testcases to reduce their complexity -- we don't care 
about endianness, which only affects the placement of the bitfields within 
their access unit, not the access units themselves.

It may be that the later pass that was adjusting bitfield accesses to natural 
sizes can be simplified or deleted. I've not investigated.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread Nathan Sidwell via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-15 Thread Nathan Sidwell via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-14 Thread YunQiang Su via cfe-commits

wzssyqa wrote:

> For MIPSr6, it is just like AARCH64, since some microarchitecture doesn't 
> support mis-unaligned well in hardware level, so we need an options to 
> disable it: kernel may need it.
> 
> For GCC, we have `-mno-unalgined-access`. We need also add this option to 
> clang.

https://github.com/llvm/llvm-project/pull/85174
Add -m(no-)unaligned-access for MIPSr6.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-13 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-13 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-13 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-13 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-13 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-13 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-13 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-13 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-13 Thread John McCall via cfe-commits


@@ -439,82 +444,194 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   Members.push_back(MemberInfo(bitsToCharUnits(StartBitOffset),
MemberInfo::Field, nullptr, *Field));
 }
-return;
+return Field;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
-  };
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Accessing a sequence
+  // of bitfields mustn't interfere with adjacent non-bitfields -- they're
+  // permitted to be accessed in separate threads for instance.
+
+  // We split runs of bit-fields into a sequence of "access units". When we 
emit
+  // a load or store of a bit-field, we'll load/store the entire containing
+  // access unit. As mentioned, the standard requires that these loads and
+  // stores must not interfere with accesses to other memory locations, and it
+  // defines the bit-field's memory location as the current run of
+  // non-zero-width bit-fields. So an access unit must never overlap with
+  // non-bit-field storage or cross a zero-width bit-field. Otherwise, we're
+  // free to draw the lines as we see fit.
+
+  // Drawing these lines well can be complicated. LLVM generally can't modify a
+  // program to access memory that it didn't before, so using very narrow 
access
+  // units can prevent the compiler from using optimal access patterns. For
+  // example, suppose a run of bit-fields occupies four bytes in a struct. If 
we
+  // split that into four 1-byte access units, then a sequence of assignments
+  // that doesn't touch all four bytes may have to be emitted with multiple
+  // 8-bit stores instead of a single 32-bit store. On the other hand, if we 
use
+  // very wide access units, we may find ourselves emitting accesses to
+  // bit-fields we didn't really need to touch, just because LLVM was unable to
+  // clean up after us.
+
+  // It is desirable to have access units be aligned powers of 2 no larger than
+  // a register. (On non-strict alignment ISAs, the alignment requirement can 
be
+  // dropped.) A three byte access unit will be accessed using 2-byte and 
1-byte
+  // accesses and bit manipulation. If no bitfield straddles across the two
+  // separate accesses, it is better to have separate 2-byte and 1-byte access
+  // units, as then LLVM will not generate unnecessary memory accesses, or bit
+  // manipulation. Similarly, on a strict-alignment architecture, it is better
+  // to keep access-units naturally aligned, to avoid similar bit
+  // manipulation synthesizing larger unaligned accesses.
+
+  // We do this in two phases, processing a sequential run of bitfield
+  // declarations.
+
+  // a) Bitfields that share parts of a single byte are, of necessity, placed 
in
+  // the same access unit. That unit will encompass a consecutive
+  // run where adjacent bitfields share parts of a byte. (The first bitfield of
+  // such an access unit will start at the beginning of a byte.)
+
+  // b) Accumulate adjacent access units when the combined unit is naturally
+  // sized, no larger than a register, and on a strict alignment ISA,
+  // aligned. Note that this requires lookahead to one or more subsequent 
access
+  // units. For instance, consider a 2-byte access-unit followed by 2 1-byte
+  // units. We can merge that into a 4-byte access-unit, but we would not want
+  // to merge a 2-byte followed by a single 1-byte (and no available tail
+  // padding).
+
+  // This accumulation is prevented when:
+  // *) it would cross a zero-width bitfield (ABI-dependent), or
+  // *) one of the candidate access units contains a volatile bitfield, or
+  // *) fine-grained bitfield access option is in effect.
+
+  CharUnits RegSize =
+  bitsToCharUnits(Context.getTargetInfo().getRegisterWidth());
+  unsigned CharBits = Context.getCharWidth();
+
+  RecordDecl::field_iterator Begin = FieldEnd;
+  CharUnits StartOffset;
+  uint64_t BitSize;
+  CharUnits BestEndOffset;
+  RecordDecl::field_iterator BestEnd = B

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-13 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

Thank you, the structure looks great!  Mostly style and clarity comments from 
here.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-12 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

@rjmccall here's a refactoring along the lines you suggested. Once one stops 
worrying about rescanning when the next access unit fails to accumulate into 
the current one, things get simpler. The compiler should be able to turn the 
boolean conditional flow within the loop body into direct control flow. 

Would you like the API change I introduced in the previous patch (having 
`accumulateBitFields` return the following non-bitfield) broken out into a 
separate PR?

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-08 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

This update 
* moves the creation and installation of the BitFieldAccessUnit objects into 
CGRecordLowering.
* adds aarch64 and arm darwin target tests, as those have different layout 
rules (let me know if I've missed something here).
* Simplifies the checking of barriers -- an explicit flag, rather than 
inferring from a zero-sized unit.  The latter can represent a non-barrier that 
happens to fall at a byte boundary, and results in a simplification of the 
gathering operation.
* changes the API of accumulateBitFields to return the following non-bitfield.  
This means that the caller does not itself have to scan across the bitfields to 
find their range. Given the bitfield processor has scan the bitfields anyway, 
this seems like a win.
* because of that, the determination of the limiting boundary, only needed for 
non-ms abis, is moved to where it is needed.

FWIW, I experimented with the representation in BitFieldAccessUnit:
* Although adjacent units are contiguous and it would seem one of Begin and End 
is unneeded, it turned out easiest to have both -- omitting Begin simply makes 
the calling code to the bookwork
* Whether to represent access unit boundaries as bit or char offsets.  
Whichever is chosen leads to conversions to the other representation at various 
points. Using chars was the least painful, and makes it clear these are char 
boundaries.

It might be possible to go further and perform the accumulation you outlined 
above, but there's enough churn in this update that I thought it best to push 
it.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-08 Thread Nathan Sidwell via cfe-commits


@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() {
   : getStorageType(*Field),
   *Field));
   ++Field;
-} else {
-  ++Field;
 }
   }
 }
 
-void
-CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
-  RecordDecl::field_iterator FieldEnd) {
-  // Run stores the first element of the current run of bitfields.  FieldEnd is
-  // used as a special value to note that we don't have a current run.  A
-  // bitfield run is a contiguous collection of bitfields that can be stored in
-  // the same storage block.  Zero-sized bitfields and bitfields that would
-  // cross an alignment boundary break a run and start a new one.
-  RecordDecl::field_iterator Run = FieldEnd;
-  // Tail is the offset of the first bit off the end of the current run.  It's
-  // used to determine if the ASTRecordLayout is treating these two bitfields 
as
-  // contiguous.  StartBitOffset is offset of the beginning of the Run.
-  uint64_t StartBitOffset, Tail = 0;
+namespace {
+
+// A run of bitfields assigned to the same access unit -- the size of memory
+// loads & stores.
+class BitFieldAccessUnit {
+  RecordDecl::field_iterator Begin; // Field at start of this access unit.
+  RecordDecl::field_iterator End;   // Field just after this access unit.
+
+  CharUnits StartOffset; // Starting offset in the containing record.
+  CharUnits EndOffset;   // Finish offset (exclusive) in the containing record.
+
+  bool ContainsVolatile; // This access unit contains a volatile bitfield.
+
+public:
+  // End barrier constructor.
+  BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset,
+ bool Volatile = false)
+  : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset),
+ContainsVolatile(Volatile) {}
+
+  // Collect contiguous bitfields into an access unit.
+  BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+ RecordDecl::field_iterator FieldEnd,
+ const CGRecordLowering &CGRL);
+
+  // Compute the access unit following this one -- which might be a barrier at
+  // Limit.
+  BitFieldAccessUnit accumulateNextUnit(RecordDecl::field_iterator FieldEnd,
+CharUnits Limit,
+const CGRecordLowering &CGRL) const {
+return end() != FieldEnd ? BitFieldAccessUnit(end(), FieldEnd, CGRL)
+ : BitFieldAccessUnit(FieldEnd, Limit);
+  }
+  // Re-set the end of this unit if there is space before Probe starts.
+  void enlargeIfSpace(const BitFieldAccessUnit &Probe, CharUnits Offset) {
+if (Probe.getStartOffset() >= Offset) {
+  End = Probe.begin();
+  EndOffset = Offset;
+}
+  }
+
+public:
+  RecordDecl::field_iterator begin() const { return Begin; }
+  RecordDecl::field_iterator end() const { return End; }
+
+public:
+  // Accessors
+  CharUnits getSize() const { return EndOffset - StartOffset; }
+  CharUnits getStartOffset() const { return StartOffset; }
+  CharUnits getEndOffset() const { return EndOffset; }
+
+  // Predicates
+  bool isBarrier() const { return getSize().isZero(); }
+  bool hasVolatile() const { return ContainsVolatile; }
+
+  // Create the containing access unit and install the bitfields.
+  void installUnit(CGRecordLowering &CGRL) const {
+if (!isBarrier()) {
+  // Add the storage member for the access unit to the record. The
+  // bitfields get the offset of their storage but come afterward and 
remain
+  // there after a stable sort.
+  llvm::Type *Type = CGRL.getIntNType(CGRL.Context.toBits(getSize()));
+  CGRL.Members.push_back(CGRL.StorageInfo(getStartOffset(), Type));
+  for (auto F : *this)
+if (!F->isZeroLengthBitField(CGRL.Context))
+  CGRL.Members.push_back(CGRecordLowering::MemberInfo(
+  getStartOffset(), CGRecordLowering::MemberInfo::Field, nullptr,
+  F));
+}
+  }
+};
+
+// Create an access unit of contiguous bitfields.
+BitFieldAccessUnit::BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+   RecordDecl::field_iterator FieldEnd,
+   const CGRecordLowering &CGRL)
+: BitFieldAccessUnit(FieldBegin, CharUnits::Zero(),
+ FieldBegin->getType().isVolatileQualified()) {
+  assert(End != FieldEnd);
+
+  uint64_t StartBit = CGRL.getFieldBitOffset(*FieldBegin);
+  uint64_t BitSize = End->getBitWidthValue(CGRL.Context);
+  unsigned CharBits = CGRL.Context.getCharWidth();
+
+  assert(!(StartBit % CharBits) && "Not at start of char");
+
+  ++End;
+  if (BitSize ||
+  !(CGRL.Context.getTargetInfo().useZeroLengthBitfieldAlignment() ||
+CGRL.Context.getTargetInfo().useBitFieldTypeAlignment()))
+// The first field is not a (zero-width) barrier. Collect contiguous 
fields.
+for (; End != FieldEnd; ++End) {
+  uin

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-08 Thread Nathan Sidwell via cfe-commits


@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() {
   : getStorageType(*Field),
   *Field));
   ++Field;
-} else {
-  ++Field;
 }
   }
 }
 
-void
-CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
-  RecordDecl::field_iterator FieldEnd) {
-  // Run stores the first element of the current run of bitfields.  FieldEnd is
-  // used as a special value to note that we don't have a current run.  A
-  // bitfield run is a contiguous collection of bitfields that can be stored in
-  // the same storage block.  Zero-sized bitfields and bitfields that would
-  // cross an alignment boundary break a run and start a new one.
-  RecordDecl::field_iterator Run = FieldEnd;
-  // Tail is the offset of the first bit off the end of the current run.  It's
-  // used to determine if the ASTRecordLayout is treating these two bitfields 
as
-  // contiguous.  StartBitOffset is offset of the beginning of the Run.
-  uint64_t StartBitOffset, Tail = 0;
+namespace {
+
+// A run of bitfields assigned to the same access unit -- the size of memory
+// loads & stores.
+class BitFieldAccessUnit {
+  RecordDecl::field_iterator Begin; // Field at start of this access unit.
+  RecordDecl::field_iterator End;   // Field just after this access unit.
+
+  CharUnits StartOffset; // Starting offset in the containing record.
+  CharUnits EndOffset;   // Finish offset (exclusive) in the containing record.
+
+  bool ContainsVolatile; // This access unit contains a volatile bitfield.
+
+public:
+  // End barrier constructor.
+  BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset,
+ bool Volatile = false)
+  : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset),
+ContainsVolatile(Volatile) {}
+
+  // Collect contiguous bitfields into an access unit.
+  BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+ RecordDecl::field_iterator FieldEnd,
+ const CGRecordLowering &CGRL);

urnathan wrote:

Updated to add gatherBitFieldAccessUnit and installBitFieldAccessUnit members 
of CGRecordLowering.  You're right that it makes the structure clearer.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits


@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() {
   : getStorageType(*Field),
   *Field));
   ++Field;
-} else {
-  ++Field;
 }
   }
 }
 
-void
-CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
-  RecordDecl::field_iterator FieldEnd) {
-  // Run stores the first element of the current run of bitfields.  FieldEnd is
-  // used as a special value to note that we don't have a current run.  A
-  // bitfield run is a contiguous collection of bitfields that can be stored in
-  // the same storage block.  Zero-sized bitfields and bitfields that would
-  // cross an alignment boundary break a run and start a new one.
-  RecordDecl::field_iterator Run = FieldEnd;
-  // Tail is the offset of the first bit off the end of the current run.  It's
-  // used to determine if the ASTRecordLayout is treating these two bitfields 
as
-  // contiguous.  StartBitOffset is offset of the beginning of the Run.
-  uint64_t StartBitOffset, Tail = 0;
+namespace {
+
+// A run of bitfields assigned to the same access unit -- the size of memory
+// loads & stores.
+class BitFieldAccessUnit {
+  RecordDecl::field_iterator Begin; // Field at start of this access unit.
+  RecordDecl::field_iterator End;   // Field just after this access unit.
+
+  CharUnits StartOffset; // Starting offset in the containing record.
+  CharUnits EndOffset;   // Finish offset (exclusive) in the containing record.
+
+  bool ContainsVolatile; // This access unit contains a volatile bitfield.
+
+public:
+  // End barrier constructor.
+  BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset,
+ bool Volatile = false)
+  : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset),
+ContainsVolatile(Volatile) {}
+
+  // Collect contiguous bitfields into an access unit.
+  BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+ RecordDecl::field_iterator FieldEnd,
+ const CGRecordLowering &CGRL);
+
+  // Compute the access unit following this one -- which might be a barrier at
+  // Limit.
+  BitFieldAccessUnit accumulateNextUnit(RecordDecl::field_iterator FieldEnd,
+CharUnits Limit,
+const CGRecordLowering &CGRL) const {
+return end() != FieldEnd ? BitFieldAccessUnit(end(), FieldEnd, CGRL)
+ : BitFieldAccessUnit(FieldEnd, Limit);
+  }
+  // Re-set the end of this unit if there is space before Probe starts.
+  void enlargeIfSpace(const BitFieldAccessUnit &Probe, CharUnits Offset) {
+if (Probe.getStartOffset() >= Offset) {
+  End = Probe.begin();
+  EndOffset = Offset;
+}
+  }
+
+public:
+  RecordDecl::field_iterator begin() const { return Begin; }
+  RecordDecl::field_iterator end() const { return End; }
+
+public:
+  // Accessors
+  CharUnits getSize() const { return EndOffset - StartOffset; }
+  CharUnits getStartOffset() const { return StartOffset; }
+  CharUnits getEndOffset() const { return EndOffset; }
+
+  // Predicates
+  bool isBarrier() const { return getSize().isZero(); }
+  bool hasVolatile() const { return ContainsVolatile; }
+
+  // Create the containing access unit and install the bitfields.
+  void installUnit(CGRecordLowering &CGRL) const {
+if (!isBarrier()) {
+  // Add the storage member for the access unit to the record. The
+  // bitfields get the offset of their storage but come afterward and 
remain
+  // there after a stable sort.
+  llvm::Type *Type = CGRL.getIntNType(CGRL.Context.toBits(getSize()));
+  CGRL.Members.push_back(CGRL.StorageInfo(getStartOffset(), Type));
+  for (auto F : *this)
+if (!F->isZeroLengthBitField(CGRL.Context))
+  CGRL.Members.push_back(CGRecordLowering::MemberInfo(
+  getStartOffset(), CGRecordLowering::MemberInfo::Field, nullptr,
+  F));
+}
+  }
+};
+
+// Create an access unit of contiguous bitfields.
+BitFieldAccessUnit::BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+   RecordDecl::field_iterator FieldEnd,
+   const CGRecordLowering &CGRL)
+: BitFieldAccessUnit(FieldBegin, CharUnits::Zero(),
+ FieldBegin->getType().isVolatileQualified()) {
+  assert(End != FieldEnd);
+
+  uint64_t StartBit = CGRL.getFieldBitOffset(*FieldBegin);
+  uint64_t BitSize = End->getBitWidthValue(CGRL.Context);
+  unsigned CharBits = CGRL.Context.getCharWidth();
+
+  assert(!(StartBit % CharBits) && "Not at start of char");
+
+  ++End;
+  if (BitSize ||
+  !(CGRL.Context.getTargetInfo().useZeroLengthBitfieldAlignment() ||
+CGRL.Context.getTargetInfo().useBitFieldTypeAlignment()))
+// The first field is not a (zero-width) barrier. Collect contiguous 
fields.
+for (; End != FieldEnd; ++End) {
+  uin

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread Eli Friedman via cfe-commits


@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() {
   : getStorageType(*Field),
   *Field));
   ++Field;
-} else {
-  ++Field;
 }
   }
 }
 
-void
-CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
-  RecordDecl::field_iterator FieldEnd) {
-  // Run stores the first element of the current run of bitfields.  FieldEnd is
-  // used as a special value to note that we don't have a current run.  A
-  // bitfield run is a contiguous collection of bitfields that can be stored in
-  // the same storage block.  Zero-sized bitfields and bitfields that would
-  // cross an alignment boundary break a run and start a new one.
-  RecordDecl::field_iterator Run = FieldEnd;
-  // Tail is the offset of the first bit off the end of the current run.  It's
-  // used to determine if the ASTRecordLayout is treating these two bitfields 
as
-  // contiguous.  StartBitOffset is offset of the beginning of the Run.
-  uint64_t StartBitOffset, Tail = 0;
+namespace {
+
+// A run of bitfields assigned to the same access unit -- the size of memory
+// loads & stores.
+class BitFieldAccessUnit {
+  RecordDecl::field_iterator Begin; // Field at start of this access unit.
+  RecordDecl::field_iterator End;   // Field just after this access unit.
+
+  CharUnits StartOffset; // Starting offset in the containing record.
+  CharUnits EndOffset;   // Finish offset (exclusive) in the containing record.
+
+  bool ContainsVolatile; // This access unit contains a volatile bitfield.
+
+public:
+  // End barrier constructor.
+  BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset,
+ bool Volatile = false)
+  : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset),
+ContainsVolatile(Volatile) {}
+
+  // Collect contiguous bitfields into an access unit.
+  BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+ RecordDecl::field_iterator FieldEnd,
+ const CGRecordLowering &CGRL);
+
+  // Compute the access unit following this one -- which might be a barrier at
+  // Limit.
+  BitFieldAccessUnit accumulateNextUnit(RecordDecl::field_iterator FieldEnd,
+CharUnits Limit,
+const CGRecordLowering &CGRL) const {
+return end() != FieldEnd ? BitFieldAccessUnit(end(), FieldEnd, CGRL)
+ : BitFieldAccessUnit(FieldEnd, Limit);
+  }
+  // Re-set the end of this unit if there is space before Probe starts.
+  void enlargeIfSpace(const BitFieldAccessUnit &Probe, CharUnits Offset) {
+if (Probe.getStartOffset() >= Offset) {
+  End = Probe.begin();
+  EndOffset = Offset;
+}
+  }
+
+public:
+  RecordDecl::field_iterator begin() const { return Begin; }
+  RecordDecl::field_iterator end() const { return End; }
+
+public:
+  // Accessors
+  CharUnits getSize() const { return EndOffset - StartOffset; }
+  CharUnits getStartOffset() const { return StartOffset; }
+  CharUnits getEndOffset() const { return EndOffset; }
+
+  // Predicates
+  bool isBarrier() const { return getSize().isZero(); }
+  bool hasVolatile() const { return ContainsVolatile; }
+
+  // Create the containing access unit and install the bitfields.
+  void installUnit(CGRecordLowering &CGRL) const {
+if (!isBarrier()) {
+  // Add the storage member for the access unit to the record. The
+  // bitfields get the offset of their storage but come afterward and 
remain
+  // there after a stable sort.
+  llvm::Type *Type = CGRL.getIntNType(CGRL.Context.toBits(getSize()));
+  CGRL.Members.push_back(CGRL.StorageInfo(getStartOffset(), Type));
+  for (auto F : *this)
+if (!F->isZeroLengthBitField(CGRL.Context))
+  CGRL.Members.push_back(CGRecordLowering::MemberInfo(
+  getStartOffset(), CGRecordLowering::MemberInfo::Field, nullptr,
+  F));
+}
+  }
+};
+
+// Create an access unit of contiguous bitfields.
+BitFieldAccessUnit::BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+   RecordDecl::field_iterator FieldEnd,
+   const CGRecordLowering &CGRL)
+: BitFieldAccessUnit(FieldBegin, CharUnits::Zero(),
+ FieldBegin->getType().isVolatileQualified()) {
+  assert(End != FieldEnd);
+
+  uint64_t StartBit = CGRL.getFieldBitOffset(*FieldBegin);
+  uint64_t BitSize = End->getBitWidthValue(CGRL.Context);
+  unsigned CharBits = CGRL.Context.getCharWidth();
+
+  assert(!(StartBit % CharBits) && "Not at start of char");
+
+  ++End;
+  if (BitSize ||
+  !(CGRL.Context.getTargetInfo().useZeroLengthBitfieldAlignment() ||
+CGRL.Context.getTargetInfo().useBitFieldTypeAlignment()))
+// The first field is not a (zero-width) barrier. Collect contiguous 
fields.
+for (; End != FieldEnd; ++End) {
+  uin

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits


@@ -858,6 +861,10 @@ class TargetInfo : public TransferrableTargetInfo,
 return PointerWidth;
   }
 
+  /// Return true, iff unaligned accesses are a single instruction (rather than

rjmccall wrote:

```suggestion
  /// Return true iff unaligned accesses are a single instruction (rather than
```

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits


@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() {
   : getStorageType(*Field),
   *Field));
   ++Field;
-} else {
-  ++Field;
 }
   }
 }
 
-void
-CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
-  RecordDecl::field_iterator FieldEnd) {
-  // Run stores the first element of the current run of bitfields.  FieldEnd is
-  // used as a special value to note that we don't have a current run.  A
-  // bitfield run is a contiguous collection of bitfields that can be stored in
-  // the same storage block.  Zero-sized bitfields and bitfields that would
-  // cross an alignment boundary break a run and start a new one.
-  RecordDecl::field_iterator Run = FieldEnd;
-  // Tail is the offset of the first bit off the end of the current run.  It's
-  // used to determine if the ASTRecordLayout is treating these two bitfields 
as
-  // contiguous.  StartBitOffset is offset of the beginning of the Run.
-  uint64_t StartBitOffset, Tail = 0;
+namespace {
+
+// A run of bitfields assigned to the same access unit -- the size of memory
+// loads & stores.
+class BitFieldAccessUnit {
+  RecordDecl::field_iterator Begin; // Field at start of this access unit.
+  RecordDecl::field_iterator End;   // Field just after this access unit.
+
+  CharUnits StartOffset; // Starting offset in the containing record.
+  CharUnits EndOffset;   // Finish offset (exclusive) in the containing record.
+
+  bool ContainsVolatile; // This access unit contains a volatile bitfield.
+
+public:
+  // End barrier constructor.
+  BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset,
+ bool Volatile = false)
+  : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset),
+ContainsVolatile(Volatile) {}
+
+  // Collect contiguous bitfields into an access unit.
+  BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+ RecordDecl::field_iterator FieldEnd,
+ const CGRecordLowering &CGRL);

rjmccall wrote:

The structure here is pretty weird to me.  This class seems to be mostly a 
helper class for breaking down the bit-field run into access units, but it's 
also a short-term representation of those access units?  If having a helper 
class for breaking the run is useful, let's separate that and have it generate 
these access units.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits


@@ -394,33 +412,155 @@ void CGRecordLowering::accumulateFields() {
   : getStorageType(*Field),
   *Field));
   ++Field;
-} else {
-  ++Field;
 }
   }
 }
 
-void
-CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
-  RecordDecl::field_iterator FieldEnd) {
-  // Run stores the first element of the current run of bitfields.  FieldEnd is
-  // used as a special value to note that we don't have a current run.  A
-  // bitfield run is a contiguous collection of bitfields that can be stored in
-  // the same storage block.  Zero-sized bitfields and bitfields that would
-  // cross an alignment boundary break a run and start a new one.
-  RecordDecl::field_iterator Run = FieldEnd;
-  // Tail is the offset of the first bit off the end of the current run.  It's
-  // used to determine if the ASTRecordLayout is treating these two bitfields 
as
-  // contiguous.  StartBitOffset is offset of the beginning of the Run.
-  uint64_t StartBitOffset, Tail = 0;
+namespace {
+
+// A run of bitfields assigned to the same access unit -- the size of memory
+// loads & stores.
+class BitFieldAccessUnit {
+  RecordDecl::field_iterator Begin; // Field at start of this access unit.
+  RecordDecl::field_iterator End;   // Field just after this access unit.
+
+  CharUnits StartOffset; // Starting offset in the containing record.
+  CharUnits EndOffset;   // Finish offset (exclusive) in the containing record.
+
+  bool ContainsVolatile; // This access unit contains a volatile bitfield.
+
+public:
+  // End barrier constructor.
+  BitFieldAccessUnit(RecordDecl::field_iterator F, CharUnits Offset,
+ bool Volatile = false)
+  : Begin(F), End(F), StartOffset(Offset), EndOffset(Offset),
+ContainsVolatile(Volatile) {}
+
+  // Collect contiguous bitfields into an access unit.
+  BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+ RecordDecl::field_iterator FieldEnd,
+ const CGRecordLowering &CGRL);
+
+  // Compute the access unit following this one -- which might be a barrier at
+  // Limit.
+  BitFieldAccessUnit accumulateNextUnit(RecordDecl::field_iterator FieldEnd,
+CharUnits Limit,
+const CGRecordLowering &CGRL) const {
+return end() != FieldEnd ? BitFieldAccessUnit(end(), FieldEnd, CGRL)
+ : BitFieldAccessUnit(FieldEnd, Limit);
+  }
+  // Re-set the end of this unit if there is space before Probe starts.
+  void enlargeIfSpace(const BitFieldAccessUnit &Probe, CharUnits Offset) {
+if (Probe.getStartOffset() >= Offset) {
+  End = Probe.begin();
+  EndOffset = Offset;
+}
+  }
+
+public:
+  RecordDecl::field_iterator begin() const { return Begin; }
+  RecordDecl::field_iterator end() const { return End; }
+
+public:
+  // Accessors
+  CharUnits getSize() const { return EndOffset - StartOffset; }
+  CharUnits getStartOffset() const { return StartOffset; }
+  CharUnits getEndOffset() const { return EndOffset; }
+
+  // Predicates
+  bool isBarrier() const { return getSize().isZero(); }
+  bool hasVolatile() const { return ContainsVolatile; }
+
+  // Create the containing access unit and install the bitfields.
+  void installUnit(CGRecordLowering &CGRL) const {
+if (!isBarrier()) {
+  // Add the storage member for the access unit to the record. The
+  // bitfields get the offset of their storage but come afterward and 
remain
+  // there after a stable sort.
+  llvm::Type *Type = CGRL.getIntNType(CGRL.Context.toBits(getSize()));
+  CGRL.Members.push_back(CGRL.StorageInfo(getStartOffset(), Type));
+  for (auto F : *this)
+if (!F->isZeroLengthBitField(CGRL.Context))
+  CGRL.Members.push_back(CGRecordLowering::MemberInfo(
+  getStartOffset(), CGRecordLowering::MemberInfo::Field, nullptr,
+  F));
+}
+  }
+};
+
+// Create an access unit of contiguous bitfields.
+BitFieldAccessUnit::BitFieldAccessUnit(RecordDecl::field_iterator FieldBegin,
+   RecordDecl::field_iterator FieldEnd,
+   const CGRecordLowering &CGRL)
+: BitFieldAccessUnit(FieldBegin, CharUnits::Zero(),
+ FieldBegin->getType().isVolatileQualified()) {
+  assert(End != FieldEnd);
+
+  uint64_t StartBit = CGRL.getFieldBitOffset(*FieldBegin);
+  uint64_t BitSize = End->getBitWidthValue(CGRL.Context);
+  unsigned CharBits = CGRL.Context.getCharWidth();
+
+  assert(!(StartBit % CharBits) && "Not at start of char");
+
+  ++End;
+  if (BitSize ||
+  !(CGRL.Context.getTargetInfo().useZeroLengthBitfieldAlignment() ||
+CGRL.Context.getTargetInfo().useBitFieldTypeAlignment()))
+// The first field is not a (zero-width) barrier. Collect contiguous 
fields.
+for (; End != FieldEnd; ++End) {
+  uin

[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits

https://github.com/rjmccall edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-03-01 Thread John McCall via cfe-commits

https://github.com/rjmccall commented:

I think using the existing information as a default is fine, but please 
continue to call the hook something like `hasCheapUnalignedAccess()` to 
preserve the intent, even if it always just calls the other hook.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-28 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> > 
> 
> Thinking further, several (AArch64, ARM & Loongson) targets have a 
> 'HasUnaligned' bool in their TargetInfo objects. Perhaps the way forwards is 
> to promote that to the base TargetInfo and just use that, rather than the 
> bitfield-specific field I added (which just duplicates that functionality on 
> those targets)?

I've added such a change.  The AArch64 & ARM targets record this information to 
determine whether a `__ARM_FEATURE_UNALIGNED` macro should be defined.  The 
Loongson target just needs to check the `-ual` target feature.


https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-28 Thread via cfe-commits

github-actions[bot] wrote:




:warning: C/C++ code formatter, clang-format found issues in your code. 
:warning:



You can test this locally with the following command:


``bash
git-clang-format --diff 8cfb71613c452dd45a84a74affe8464bfd33de02 
b4cb8945224633fef27b69f0eed34f505032f76e -- 
clang/test/CodeGen/aapcs-bitfield-access-unit.c 
clang/test/CodeGen/bitfield-access-pad.c 
clang/test/CodeGen/bitfield-access-unit.c 
clang/test/CodeGenCXX/bitfield-access-empty.cpp 
clang/test/CodeGenCXX/bitfield-access-tail.cpp 
clang/test/CodeGenCXX/bitfield-ir.cpp clang/include/clang/Basic/TargetInfo.h 
clang/lib/Basic/TargetInfo.cpp clang/lib/Basic/Targets/AArch64.cpp 
clang/lib/Basic/Targets/AArch64.h clang/lib/Basic/Targets/ARM.cpp 
clang/lib/Basic/Targets/ARM.h clang/lib/Basic/Targets/LoongArch.cpp 
clang/lib/Basic/Targets/LoongArch.h clang/lib/Basic/Targets/Mips.h 
clang/lib/Basic/Targets/PPC.h clang/lib/Basic/Targets/SystemZ.h 
clang/lib/Basic/Targets/VE.h clang/lib/Basic/Targets/WebAssembly.h 
clang/lib/Basic/Targets/X86.h clang/lib/CodeGen/CGRecordLayoutBuilder.cpp 
clang/test/CodeGen/aapcs-bitfield.c clang/test/CodeGen/arm-bitfield-alignment.c 
clang/test/CodeGen/arm64-be-bitfield.c clang/test/CodeGen/bitfield-2.c 
clang/test/CodeGen/debug-info-bitfield-0-struct.c 
clang/test/CodeGen/no-bitfield-type-align.c 
clang/test/CodeGen/struct-x86-darwin.c clang/test/CodeGenCXX/bitfield.cpp 
clang/test/OpenMP/atomic_capture_codegen.cpp 
clang/test/OpenMP/atomic_read_codegen.c 
clang/test/OpenMP/atomic_update_codegen.cpp 
clang/test/OpenMP/atomic_write_codegen.c
``





View the diff from clang-format here.


``diff
diff --git a/clang/include/clang/Basic/TargetInfo.h 
b/clang/include/clang/Basic/TargetInfo.h
index aab571e1d2..c046cab93d 100644
--- a/clang/include/clang/Basic/TargetInfo.h
+++ b/clang/include/clang/Basic/TargetInfo.h
@@ -863,9 +863,7 @@ public:
 
   /// Return true, iff unaligned accesses are a single instruction (rather than
   /// a synthesized sequence).
-  bool hasUnalignedAccess() const {
-return HasUnalignedAccess;
-  }
+  bool hasUnalignedAccess() const { return HasUnalignedAccess; }
 
   /// \brief Returns the default value of the __USER_LABEL_PREFIX__ macro,
   /// which is the prefix given to user symbols by default.

``




https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-26 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> Yeah, LLVM supports changing subtarget options on a per-function basis. We 
> would presumably make the query based on the global setting.
> 
> Anyway, getting this information from LLVM doesn't seem tractable in the 
> short-to-medium term; it's just unfortunate.

Thinking further, several (AArch64, ARM & Loongson) targets have a 
'HasUnaligned' bool in their TargetInfo objects.  Perhaps the way forwards is 
to promote that to the base TargetInfo and just use that, rather than the 
bitfield-specific field I added (which just duplicates that functionality on 
those targets)?


https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-26 Thread Nathan Sidwell via cfe-commits


@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo
   : LoongArchTargetInfo(Triple, Opts) {
 LongWidth = LongAlign = PointerWidth = PointerAlign = 64;
 IntMaxType = Int64Type = SignedLong;
+HasCheapUnalignedBitfieldAccess = true;

urnathan wrote:

fixed

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-25 Thread Lu Weining via cfe-commits


@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo
   : LoongArchTargetInfo(Triple, Opts) {
 LongWidth = LongAlign = PointerWidth = PointerAlign = 64;
 IntMaxType = Int64Type = SignedLong;
+HasCheapUnalignedBitfieldAccess = true;

SixWeining wrote:

> you say set to true when -mno-unaligned-access, but I think you mean set to 
> false in that case?

Yes, I think it should be `false`. @wzssyqa 

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-23 Thread John McCall via cfe-commits

rjmccall wrote:

> > On the target hook, it's a shame we can't easily get this information from 
> > LLVM. I believe it's already there — `TargetLowering` has an 
> > `allowsMisalignedMemoryAccesses` method that includes some approximation of 
> > how fast a particular access would be. In practice, it seems to be quite 
> > complex and often specific to the type and subtarget. Maybe it'd be worth 
> > starting a conversation with LLVM folks to see if this could reasonably be 
> > lifted somewhere we could use it?
> 
> Agreed, I did look at that, and AFAICT the allowsMisalignedMemoryAccess is a 
> property of the llvm subtarget, which can, IIUC, change on a per-function 
> basis? One at least needs a function codegen context around? And, clang needs 
> to generate the llvm-structure in non-function contexts (eg namespace-scope 
> variables). I'd really like to not tangle this particular change with a 
> complex reorg.

Yeah, LLVM supports changing subtarget options on a per-function basis.  We 
would presumably make the query based on the global setting.

Anyway, getting this information from LLVM doesn't seem tractable in the 
short-to-medium term; it's just unfortunate.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-23 Thread Nathan Sidwell via cfe-commits


@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo
   : LoongArchTargetInfo(Triple, Opts) {
 LongWidth = LongAlign = PointerWidth = PointerAlign = 64;
 IntMaxType = Int64Type = SignedLong;
+HasCheapUnalignedBitfieldAccess = true;

urnathan wrote:

you say set to true when -mno-unaligned-access, but I think you mean set to 
false in that case?

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-23 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> On the target hook, it's a shame we can't easily get this information from 
> LLVM. I believe it's already there — `TargetLowering` has an 
> `allowsMisalignedMemoryAccesses` method that includes some approximation of 
> how fast a particular access would be. In practice, it seems to be quite 
> complex and often specific to the type and subtarget. Maybe it'd be worth 
> starting a conversation with LLVM folks to see if this could reasonably be 
> lifted somewhere we could use it?

Agreed, I did look at that, and AFAICT the allowsMisalignedMemoryAccess is a 
property of the llvm subtarget, which can, IIUC, change on a per-function 
basis? One at least needs a function codegen context around? And, clang needs 
to generate the llvm-structure in non-function contexts (eg namespace-scope 
variables). I'd really like to not tangle this particular change with a complex 
reorg.  

I had a discourse thread at, 
https://discourse.llvm.org/t/targetinfo-for-unaligned-load-capability/72832 
about such a target hook.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-21 Thread YunQiang Su via cfe-commits


@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo
   : LoongArchTargetInfo(Triple, Opts) {
 LongWidth = LongAlign = PointerWidth = PointerAlign = 64;
 IntMaxType = Int64Type = SignedLong;
+HasCheapUnalignedBitfieldAccess = true;

wzssyqa wrote:

> > Only set it to true when -mno-unaligned-access.
> 
> In `LoongArchTargetInfo::handleTargetFeatures()`, we can handle the feature 
> `-ual`.

Maybe it's true, while you may need it if you need to build a generic kernel 
image for both microarchitectures.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-21 Thread YunQiang Su via cfe-commits

https://github.com/wzssyqa edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-21 Thread YunQiang Su via cfe-commits

https://github.com/wzssyqa edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-21 Thread Lu Weining via cfe-commits


@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo
   : LoongArchTargetInfo(Triple, Opts) {
 LongWidth = LongAlign = PointerWidth = PointerAlign = 64;
 IntMaxType = Int64Type = SignedLong;
+HasCheapUnalignedBitfieldAccess = true;

SixWeining wrote:

> Only set it to true when -mno-unaligned-access.

In `LoongArchTargetInfo::handleTargetFeatures()`, we can handle the feature 
`-ual`.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-21 Thread YunQiang Su via cfe-commits


@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo
   : LoongArchTargetInfo(Triple, Opts) {
 LongWidth = LongAlign = PointerWidth = PointerAlign = 64;
 IntMaxType = Int64Type = SignedLong;
+HasCheapUnalignedBitfieldAccess = true;

wzssyqa wrote:

Same to MIPS, while there is no -mno-unaligned-access for MIPS yet. I will add 
it support.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-21 Thread YunQiang Su via cfe-commits


@@ -132,6 +132,7 @@ class LLVM_LIBRARY_VISIBILITY LoongArch64TargetInfo
   : LoongArchTargetInfo(Triple, Opts) {
 LongWidth = LongAlign = PointerWidth = PointerAlign = 64;
 IntMaxType = Int64Type = SignedLong;
+HasCheapUnalignedBitfieldAccess = true;

wzssyqa wrote:

Only set it to true when -mno-unaligned-access.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-21 Thread YunQiang Su via cfe-commits

wzssyqa wrote:

> `-mno-unalgined-access` has been added since clang17: 
> https://reviews.llvm.org/D149946

This option is for LoongArch instead of MIPSr6.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-21 Thread Lu Weining via cfe-commits

SixWeining wrote:

> For GCC, we have `-mno-unalgined-access`. We need also add this option to 
> clang.

`-mno-unalgined-access` has been added since clang17: 
https://reviews.llvm.org/D149946

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-21 Thread YunQiang Su via cfe-commits

wzssyqa wrote:

For MIPSr6, it is just like AARCH64, since some microarchitecture doesn't 
support mis-unaligned well in hardware level, so we need an options to disable 
it: kernel may need it.

For GCC, we have `-mno-unalgined-access`. We need also add this option to clang.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-12 Thread John McCall via cfe-commits

rjmccall wrote:

> @rjmccall thanks for your comments. Here's a reimplementation using a single 
> loop -- the downside is that we may end up repeating the initial grouping 
> scan, if we search more than 1 Access Unit ahead and fail to find a 'good' 
> access unit. This update changes the algorithm slightly, as I realized that 
> `struct A { short a: 16; char b:8; int c;};` and `struct B { char b: 8; short 
> a:16; int c;};` should have the same (32-bit) access unit. Originally only 
> `A` got that.

Thanks.  Sorry about the delay, I'll try to take a look in the next day.

> 1. I noticed that several `CGRecordLowering` member fns could either be 
> `const` or `static` -- would you like that as a separate NFC PR?

Yes, please.

> 2. I have corrected the error about merging across zero-sized members.
> 3. It may not be obvious from GH, but this PR breaks down to 3 (or 4) 
> separate commits:
>a) A set of test cases, marked up for the current scheme
>b) A new Target hook indicating whether unaligned accesses are cheap
>c) [optional] the CGRecordLowering change just mentioned
>d) the algorithm change, which updates those tests and makes it easier to 
> see how the behaviour changes.
>Do you want that commit structure maintained?

That sounds like good commit structure.

On the target hook, it's a shame we can't easily get this information from 
LLVM.  I believe it's already there — `TargetLowering` has an 
`allowsMisalignedMemoryAccesses` method that includes some approximation of how 
fast a particular access would be.  In practice, it seems to be quite complex 
and often specific to the type and subtarget.  Maybe it'd be worth starting a 
conversation with LLVM folks to see if this could reasonably be lifted 
somewhere we could use it?

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-02-02 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

@rjmccall thanks for your comments.  Here's a reimplementation using a single 
loop -- the downside is that we may end up repeating the initial grouping scan, 
if we search more than 1 Access Unit ahead and fail to find a 'good' access 
unit.  This update changes the algorithm slightly, as I realized that `struct A 
{ short a: 16; char b:8; int c;};` and `struct B { char b: 8; short a:16; int 
c;};` should have the same (32-bit) access unit.  Originally only `A` got that.

1) I noticed that several `CGRecordLowering` member fns could either be `const` 
or `static` -- would you like that as a separate NFC PR?
2) I have corrected the error about merging across zero-sized members.
3) It may not be obvious from GH, but this PR breaks down to 3 (or 4) separate 
commits:
   a) A set of test cases, marked up for the current scheme
   b) A new Target hook indicating whether unaligned accesses are cheap
   c) [optional] the CGRecordLowering change just mentioned
   d) the algorithm change, which updates those tests and makes it easier to 
see how the behaviour changes.
   Do you want that commit structure maintained?

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-01-19 Thread Nathan Sidwell via cfe-commits


@@ -376,33 +377,41 @@ void CGRecordLowering::lowerUnion(bool isNoUniqueAddress) 
{
 }
 
 void CGRecordLowering::accumulateFields() {
-  for (RecordDecl::field_iterator Field = D->field_begin(),
-  FieldEnd = D->field_end();
-Field != FieldEnd;) {
+  RecordDecl::field_iterator FieldEnd = D->field_end();
+  RecordDecl::field_iterator BitField = FieldEnd;
+  for (RecordDecl::field_iterator Field = D->field_begin(); Field != FieldEnd;
+   ++Field) {
 if (Field->isBitField()) {
-  RecordDecl::field_iterator Start = Field;
-  // Iterate to gather the list of bitfields.
-  for (++Field; Field != FieldEnd && Field->isBitField(); ++Field);
-  accumulateBitFields(Start, Field);
+  if (BitField == FieldEnd)
+// Start gathering bitfields
+BitField = Field;
 } else if (!Field->isZeroSize(Context)) {
   // Use base subobject layout for the potentially-overlapping field,
   // as it is done in RecordLayoutBuilder
+  CharUnits Offset = bitsToCharUnits(getFieldBitOffset(*Field));

urnathan wrote:

ah, thanks -- I should have checked harder.  Simple enough to undo that bit.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-01-08 Thread John McCall via cfe-commits


@@ -442,79 +455,235 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
 return;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Such overlap, in the
+  // absence of packing, is always complete -- one storage unit is entirely
+  // within another. However, llvm cannot represent that -- it's structures are
+  // entirely flat. We place bitfields in 'access units', which are similar to
+  // the SysV storage units, but a clang-specific concept.
+
+  // It can be advantageous to concatenate two adjacent access units, if the
+  // concenation can be read or written in a single instruction.
+
+  // We do two passes.
+
+  // a) allocate bitfields into the smallest access units they can
+  // fit. This results in a set of integral-typed access units.
+
+  // b) concatentate mergeable access units. This applies the
+  // above-mentioned optimization, and in general, requires lookahead
+  // to know the next access unit -- not merely the next bitfield.
+
+  class AccessUnit {
+// Which bitfields are in the access
+RecordDecl::field_iterator First;
+RecordDecl::field_iterator Last;
+
+CharUnits Start; // Starting offset within the record
+CharUnits End;   // Finish offset (exclusive) within the record
+
+bool ContainsVolatile = false;
+
+  public:
+AccessUnit(RecordDecl::field_iterator F, RecordDecl::field_iterator L,
+   CharUnits S, CharUnits E, bool HasVolatile = false)
+: First(F), Last(L), Start(S), End(E), ContainsVolatile(HasVolatile) {}
+AccessUnit(RecordDecl::field_iterator F, CharUnits Place)
+: AccessUnit(F, F, Place, Place) {}
+
+  public:
+auto begin() const { return First; }
+auto end() const { return Last; }
+
+  public:
+void MergeFrom(AccessUnit const &Earlier) {
+  First = Earlier.First;
+  Start = Earlier.Start;
+}
+
+  public:
+// Accessors
+CharUnits getSize() const { return getSize(*this); }
+CharUnits getStart() const { return Start; }
+CharUnits getSize(const AccessUnit &NotEarlier) const {
+  return NotEarlier.End - Start;
+}
+
+// Setter
+void setEnd(CharUnits E) { End = E; }
+
+// Predicates
+bool isBarrier() const { return getSize().isZero(); }
+bool hasVolatile() const { return ContainsVolatile; }
+
+bool StartsBefore(CharUnits Offset, bool NonStrict = false) const {
+  if (NonStrict)
+// Not strictly <, permit ==
+++Offset;
+  return Start < Offset;
+}
+bool ExtendsBeyond(CharUnits Offset) const { return End > Offset; }
+bool EndsAt(CharUnits Offset) const { return End == Offset; }
   };

rjmccall wrote:

This is definitely big enough that I would say to pull it out of the function 
body.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-01-08 Thread John McCall via cfe-commits


@@ -415,12 +424,16 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
   uint64_t StartBitOffset, Tail = 0;
   if (isDiscreteBitFieldABI()) {
 for (; Field != FieldEnd; ++Field) {
-  uint64_t BitOffset = getFieldBitOffset(*Field);
+  if (!Field->isBitField()) {
+assert(Field->isZeroSize(Context) && "non-zero sized non-bitfield");
+continue;
+  }

rjmccall wrote:

Per my comment above, this is unnecessary; this should never be possible.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-01-08 Thread John McCall via cfe-commits


@@ -442,79 +455,235 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
 return;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Such overlap, in the
+  // absence of packing, is always complete -- one storage unit is entirely
+  // within another. However, llvm cannot represent that -- it's structures are
+  // entirely flat. We place bitfields in 'access units', which are similar to
+  // the SysV storage units, but a clang-specific concept.
+
+  // It can be advantageous to concatenate two adjacent access units, if the
+  // concenation can be read or written in a single instruction.
+
+  // We do two passes.
+
+  // a) allocate bitfields into the smallest access units they can
+  // fit. This results in a set of integral-typed access units.
+
+  // b) concatentate mergeable access units. This applies the
+  // above-mentioned optimization, and in general, requires lookahead
+  // to know the next access unit -- not merely the next bitfield.

rjmccall wrote:

As far as I can tell, this doesn't really require doing two passes; it's still 
an online algorithm, just over first-phase access units instead of individual 
bit-fields.  You could take your existing code to build up those first-phase 
units and then just call into your merge-or-emit logic whenever you finish one 
instead of adding it to a vector.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-01-08 Thread John McCall via cfe-commits


@@ -442,79 +455,235 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
 return;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Such overlap, in the
+  // absence of packing, is always complete -- one storage unit is entirely
+  // within another. However, llvm cannot represent that -- it's structures are
+  // entirely flat. We place bitfields in 'access units', which are similar to
+  // the SysV storage units, but a clang-specific concept.
+
+  // It can be advantageous to concatenate two adjacent access units, if the
+  // concenation can be read or written in a single instruction.
+
+  // We do two passes.
+
+  // a) allocate bitfields into the smallest access units they can
+  // fit. This results in a set of integral-typed access units.

rjmccall wrote:

It's understood that the access units will be integer-typed.  You should 
explain what you mean by "can fit", though.  It looks like: bit-fields are 
merged whenever they start at a non-exact byte boundary.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-01-08 Thread John McCall via cfe-commits


@@ -442,79 +455,235 @@ 
CGRecordLowering::accumulateBitFields(RecordDecl::field_iterator Field,
 return;
   }
 
-  // Check if OffsetInRecord (the size in bits of the current run) is better
-  // as a single field run. When OffsetInRecord has legal integer width, and
-  // its bitfield offset is naturally aligned, it is better to make the
-  // bitfield a separate storage component so as it can be accessed directly
-  // with lower cost.
-  auto IsBetterAsSingleFieldRun = [&](uint64_t OffsetInRecord,
-  uint64_t StartBitOffset) {
-if (!Types.getCodeGenOpts().FineGrainedBitfieldAccesses)
-  return false;
-if (OffsetInRecord < 8 || !llvm::isPowerOf2_64(OffsetInRecord) ||
-!DataLayout.fitsInLegalInteger(OffsetInRecord))
-  return false;
-// Make sure StartBitOffset is naturally aligned if it is treated as an
-// IType integer.
-if (StartBitOffset %
-Context.toBits(getAlignment(getIntNType(OffsetInRecord))) !=
-0)
-  return false;
-return true;
+  // The SysV ABI can overlap bitfield storage units with both other bitfield
+  // storage units /and/ other non-bitfield data members. Such overlap, in the
+  // absence of packing, is always complete -- one storage unit is entirely
+  // within another. However, llvm cannot represent that -- it's structures are
+  // entirely flat. We place bitfields in 'access units', which are similar to
+  // the SysV storage units, but a clang-specific concept.

rjmccall wrote:

Hmm.  I'm not really sure this is right — it seems to be contradicted by 
`struct { char array[5]; int bitfield: 24; };` — but I don't think it matters 
anyway.

If I had to explain the philosophy here, I would say something like:

> We split runs of bit-fields into a sequence of "access units".  When we emit 
> a load or store of a bit-field, we'll load/store the entire containing access 
> unit.  The standard requires that these loads and stores must not interfere 
> with accesses to other memory locations, and it defines the bit-field's 
> memory location as the current run of non-zero-width bit-fields.  So an 
> access unit must never overlap with non-bit-field storage or cross a 
> zero-width bit-field.  Otherwise, we're free to draw the lines as we see fit.
>
> Drawing these lines well can be complicated. LLVM generally can't modify a 
> program to access memory that it didn't before, so using very narrow access 
> units can prevent the compiler from using optimal access patterns.  For 
> example, suppose a run of bit-fields occupies four bytes in a struct.  If we 
> split that into four 1-byte access units, then a sequence of assignments that 
> doesn't touch all four bytes may have to be emitted with multiple 8-bit 
> stores instead of a single 32-bit store. On the other hand, if we use very 
> wide access units, we may find ourselves emitting accesses to bit-fields we 
> didn't really need to touch, just because LLVM was unable to clean up after 
> us.

And then you can get into explaining your algorithm.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-01-08 Thread John McCall via cfe-commits


@@ -376,33 +377,41 @@ void CGRecordLowering::lowerUnion(bool isNoUniqueAddress) 
{
 }
 
 void CGRecordLowering::accumulateFields() {
-  for (RecordDecl::field_iterator Field = D->field_begin(),
-  FieldEnd = D->field_end();
-Field != FieldEnd;) {
+  RecordDecl::field_iterator FieldEnd = D->field_end();
+  RecordDecl::field_iterator BitField = FieldEnd;
+  for (RecordDecl::field_iterator Field = D->field_begin(); Field != FieldEnd;
+   ++Field) {
 if (Field->isBitField()) {
-  RecordDecl::field_iterator Start = Field;
-  // Iterate to gather the list of bitfields.
-  for (++Field; Field != FieldEnd && Field->isBitField(); ++Field);
-  accumulateBitFields(Start, Field);
+  if (BitField == FieldEnd)
+// Start gathering bitfields
+BitField = Field;
 } else if (!Field->isZeroSize(Context)) {
   // Use base subobject layout for the potentially-overlapping field,
   // as it is done in RecordLayoutBuilder
+  CharUnits Offset = bitsToCharUnits(getFieldBitOffset(*Field));

rjmccall wrote:

This is permitting accesses to cross a regular field if it has zero size.  That 
is not allowed; bit-fields are only part of the same memory location if they 
are part of a consecutive sequence of non-zero-width bit-fields, and bit-fields 
in different memory locations must never interfere.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2024-01-05 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

Can someone please review this?

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-12-22 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

ping?

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-12-10 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

Rebased

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-12-10 Thread via cfe-commits

llvmbot wrote:



@llvm/pr-subscribers-backend-webassembly

@llvm/pr-subscribers-clang

Author: Nathan Sidwell (urnathan)


Changes

Reimplement the SysV (non-Microsoft) bitfield access unit computation. For 
avoidance of doubt, this is not an ABI change, but the memory accesses Clang 
emits to access bitfields.

This is a stack of two commits, to make it easier to see the change in 
behaviour.  The first commit is a set of new tests that verify the placement of 
access units under the current algorith.  The second commit is the change, and 
alters those tests as required. The guts of the change is in 
`clang/lib/CodeGen/CGRecordLayoutBuilder.cpp`.

The SysV ABI describes placing bitfields in 'storage units', which may end up 
overlapping eachother or other data members (for instance 'int bf :4;' may be 
in a 32-bit storage unit that overlaps a plan 'char c;' field). LLVM doesn't 
represent structures at that level, and Clang
must create a non-overlapping representation of the structure. That's what's 
happening here. To avoid confusion, I'm using the phrase 'access unit' to refer 
to the integer type Clang uses.

There are two aspects to chosing access units:

1) bitfields that share bits of a single byte must be in the same access unit 
-- you can't touch only one of those bitfields. This computation is unchanged.

2) adjacent bitfields that occupy  distinct bytes, might want to share an 
access  unit. One wants  to do this if  that makes a  better access size. This 
computation is changed.

The current algorithm computes both of these in a single pass. Computing if the 
next bitfield extends the current access unit by requiring it exacly abut the 
previous, and a few other constraints. But there are some issues with this:

a) If there's even a single bit of padding between the two, no concatenation 
occurs. This seems suboptimal.

b) Because of the above-mentioned overlapping, bitfield access units might not 
start on an aligned boundary, and some architectures do not support unaligned 
accesses.

c) Likewise, the access unit might not be able to be a natural 2^n size, and 
the adjacent bytes occupied (with either other data members, or with C++, a 
derived class occupying tail-padding bytes). We must emit exactly a (say) 
24-bit access (to avoid aliasing issues?) (example below).

d) Concatentating access units can lead to an unaligned larger access, and 
again, that's problematic for strict alignment ISA. (consider `struct C { char 
a : 8; char b : 8;};` `C` has alignment 1, but a 2 byte access unit would 
naturally want 2-byte alignment.

My understanding is that larger access units were desired, to reduce the total 
number of accesses, and there was an assertion that the access would be 
narrowed if only some of the bytes changed.

That's not what we've observed, and its demonstrable on x86_64:
```
struct B {
  int a : 16;
  int b : 8;
  char c;
};
```
Here, B::a and B::b could be placed into a 3 byte access unit, but we can't use 
4-byte accesses due to the 4th byte containing `c`. The code we end up 
generating for an increment of B::b is:
```
movzbl  2(%rdi), %eax
shll$16, %eax
movzwl  (%rdi), %ecx
addl%ecx, %eax
addl$65536, %eax# imm = 0x1
movw%ax, (%rdi)
shrl$16, %eax
movb%al, 2(%rdi)
```

we're loading both `B::a` and `B::b`, sticking them together, incrementing 
`B::b` and then only storing the latter. I.e. we've reduced the memory 
accesses, but we're doing more work than necessary. Without the `B::c` field, 
there's 1 byte of padding, and a regular 4-byte access can be used (and is). 
X86_64 is not unique in this behaviour.

I'm of the opinion that one cannot do these in a single pass -- concatenating 
access units fundamentally requires knowing how large both units are, and you 
cannot tell that just by looking at the first bitfield of the second access 
unit.

That's what this patch does: separate the two algorithm into separate phases. 
We first determine which bitfields /must/ be in a single access unit, and then 
determine which of those access units could be concatentated. We pay attention 
to the storage-size of the integral type -- an i24 naturally requires 4 bytes 
of storage. When generating the access units we note barriers such as `:0` 
bitfields (on ABIs that care), and the starting location of the next 
non-bitfield or, the structure tail-padding.

The concatenation only occurs if:

a) The larger access is naturally aligned, or unaligned access is cheap.

b) The larger access storage size doesn't expand into a forbidden zone

c) The larger access is not bigger than a register.

d) Neither of the two access units contain a volatile bitfield.

(Note that if an access-unit's storage-size naturally overlaps a subsequent 
access unit, we'll merge the two provided. A partial overlap can occur with 
packed structures and is handled too.)

ETA: in essence, it only merges ac

[clang] [clang] Better bitfield access units (PR #65742)

2023-12-08 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> I'm planning to take a closer look at this when I have more time. Sorry I 
> haven't been more responsive here.

When might that be?


https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-12-01 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

You knew this ping was coming, right?

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-11-14 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

thanks @yonghong-song!

Taking the opportunity for a ping :)

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-10-30 Thread via cfe-commits

yonghong-song wrote:

With this pull request, I tried to build linux kernel and kernel bpf selftest 
and run the selftest, and didn't find any issues. So presumably the 
implementation is correct.

I tried a particular kernel bpf selftest:
  
https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_tunnel_kern.c#L39-L42
which has a few bitfield usages.
The generated code is the same with or without this change.
The bitfield patterns in the above test_tunnel_kern.c is pretty typical in 
kernel which is pro
aligned bitfields. This patch tries to handle some cases better for 
not-commonly-used bitfield patterns, those patterns (like the one in the pull 
request summary) are very rare in bpf program or kernel, so I guess it should 
have minimum impact on BPF backend.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-10-30 Thread Nathan Sidwell via cfe-commits

urnathan wrote:


> One very brief note: in the comments in the code, you might want to 
> distinguish between the semantic width of a bitfield (i.e. the C standard 
> notion of a "memory location", which has ABI significance), vs. the accesses 
> we choose to generate (we don't need to generate no-op reads/writes).

Indeed, the naming here is unfortunately overloaded. SysV psABIs use 'Storage 
Unit' (at least 86 does, and IIRC others follow the same nomenclature).  But 
Clang also uses 'StorageInfo' and 'Storage$FOO' in its bitfield structures, 
which is unfortunate.  I've used 'Access Unit' to describe the latter in this 
patch.  If you meant something else, please clarify (or if I've missed some 
places?).


https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-10-27 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

I'm planning to take a closer look at this when I have more time.  Sorry I 
haven't been more responsive here.

One very brief note: in the comments in the code, you might want to distinguish 
between the semantic width of a bitfield (i.e. the C standard notion of a 
"memory location", which has ABI significance), vs. the accesses we choose to 
generate (we don't need to generate no-op reads/writes).

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-10-13 Thread Nigel Perks via cfe-commits

nigelp-xmos wrote:

XCore: yes the current setting of the new flag, the default false value, is 
right for XCore. Misaligned loads can generate a function call. We do not have 
benchmarks as such, but comparing the code generated for the test files touched 
here, there is an improvement. In particular bitfield-ir.cpp is a big 
improvement.

In bitfield.cpp, two misaligned loads remain. One disappears if I remove the 
struct packed attribute. The other remains: N0::read31(). This is not a 
problem, and is not a degradation, it's a function call intended for the 
purpose; just mentioning for information.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-10-13 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

time for another ping, I think ...

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-09-22 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

Ping?  Anything more I can do to progress this?

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-09-18 Thread Nathan Sidwell via cfe-commits

https://github.com/urnathan edited 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-09-11 Thread Nathan Sidwell via cfe-commits

https://github.com/urnathan ready_for_review 
https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-09-11 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

thanks for your comments,

> When I refer to CSE/DSE, I'm mostly talking about keeping values in registers 
> for longer. They don't know anything about individual fields in bitfields. If 
> we split bitfields too narrowly, we end up with extra memory accesses for 
> accessing multiple adjacent fields. And if you have accesses which overlap 
> (access some, but not all, of the same memory), CSE/DSE get much worse.

To be clear by 'accesses which overlap' you mean accesses to bitfields that 
share parts of a byte? I agree -- that's an unchanged part of the algorithm, 
bitfields that share parts of a single byte must be in the same access unit.

Or do you mean an access to a non-bitfield adjacent to a bitfield? How does 
that square with the memory model you pointed to earlier?  Or is it that if you 
see both such accesses, it must be safe to merge them, or there's some UB 
already happening? But anyway, this algorithm doesn't affect that -- in both 
before and after cases something outside of bitfieldAccumulation must be 
merging them.

> Having a bitfield unit that can't be loaded in a single memory access 
> probably doesn't have much benefit, if there's a natural boundary we can use 
> to split.

I think we're agreeing here.  The current algorithm generates access units that 
can require multiple accesses, even on a relaxed alignment machine (I refer 
back to the x86 example).  This replacement tries much harder to not do that -- 
it only merges access units if that demonstrably lowers the number of accesses 
needed for the merged access unit from accessing the original set of units.  
The motivating case we had was a sequence of 4 byte loads, to fetch an 
unaligned 32-bit access unit, 1 byte of that being fiddled with, and then that 
one byte being written back.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-09-11 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

> Hm, I've been thinking of it the opposite way round -- merging bitfields is 
> throwing away information (about where cuts might be).

In essence, late optimizations can make accesses narrower, but it can't make 
them wider.  The information about cuts is generally recoverable by analyzing 
the masking instructions.

--

When I refer to CSE/DSE, I'm mostly talking about keeping values in registers 
for longer.  They don't know anything about individual fields in bitfields.  If 
we split bitfields too narrowly, we end up with extra memory accesses for 
accessing multiple adjacent fields.  And if you have accesses which overlap 
(access some, but not all, of the same memory), CSE/DSE get much worse.

Having a bitfield unit that can't be loaded in a single memory access probably 
doesn't have much benefit, if there's a natural boundary we can use to split.

> Certainly, by could you clarify why changing the access patterns could do 
> that? My understanding of Poison is that should they cause a user-visible 
> state change, that's indicative of the original progam's UB, and the 
> languages in question don't guarantee anything at that point -- nasal demons 
> and all. That the observed behaviour of a UB-exhibiting program changes is 
> not a defect. What am I missing?

Depending on how we fix the interaction between poison and bitfields, there 
could be an invariant that accessing any bitfield has to freeze all adjacent 
bitfield values.  If we do that, then the set of "adjacent" values actually 
matters.  There's probably some way we can make this work without actually 
introducing memory accesses, though; probably not worth worrying about too much 
here.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-09-11 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> The advantage of exposing the wide accesses to the optimizer is that it 
> allows memory optimizations, like CSE or DSE, to reason about the entire 
> bitfield as a single unit, and easily eliminate accesses. Reducing the size 
> of bitfield accesses in the frontend is basically throwing away information.

Hm, I've been thinking of it the opposite way round -- merging bitfields is 
throwing away information (about where cuts might be).  And it's unclear to me 
how CSE or DSE could make use of a merged access unit to eliminate accesses -- 
it would seem to me that a merged access unit accessed at a single point would 
make it look like the whole unit was live? (Can you point me at an example of 
the analysis you describe happening?)

That the simple x86 example I showed doesn't show (complete) undoing of the 
merging suggests it is hard for CSE and DSE to do the analysis you indicate. 
DSE did work there, to undo the merge, but there's no dead load elimination 
happening. But, that DSE is merely undoing the gluing that the front end did -- 
if we didn't glue, then it would always happen.

> The disadvantage is, as you note here, that sometimes codegen doesn't manage 
> to clean up the resulting code well.

My contention is that the current algorithm both (a) fails to merge some 
mergeable access units and (b) inappropriately merges some access units 
*especially* on strict alignment machines.

> I guess my primary question here is, instead of making clang try to guess 
> which accesses will be optimal, can we improve the way the LLVM code 
> generator handles the patterns currently generated by clang? I'm not exactly 
> against changing the IR generated by clang, but changing the IR like this 
> seems likely to improve some cases at the expense of others.

In the testcases that were changed I did not encounter one that generated worse 
code (an ARM testcase showed better code due to, IIRC, not merging more than a 
register width, as with the x86 case it didn't eliminate unnecessary loads 
whereas with this those are gone). I would be very interested in seeing cases 
that degrade though.


https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-09-11 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> > > Note that changing the memory accesses performed by clang (load or store) 
> > > _is_ an ABI change at IR level because of UB.
> > 
> > 
> > Could you clarify why UB is a consideration here?
> 
> There's some weirdness with poison values; see 
> https://reviews.llvm.org/D129541 etc. This is mostly an IR modeling thing; 
> the machine instructions work fine, we just need to make sure we don't 
> accidentally poison IR values that aren't supposed to be poison.

Certainly, by could you clarify why changing the access patterns could do that? 
 My understanding of Poison is that should they cause a user-visible state 
change, that's indicative of the original progam's UB, and the languages in 
question don't guarantee anything at that point -- nasal demons and all. That 
the observed behaviour of a UB-exhibiting program changes is not a defect.  
What am I missing?


https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-09-11 Thread Nathan Sidwell via cfe-commits

urnathan wrote:


> Also, please have a look at the existing `-ffine-grained-bitfield-accesses` 
> flag and the discussions around it and IPO.

That flag's behaviour is unchanged here -- it continues to not merge access 
units. My contention here is that (a) merging units can be advantageous, but 
(b) the existing algorithm has some deficiencies.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-09-08 Thread Eli Friedman via cfe-commits

efriedma-quic wrote:

> > Note that changing the memory accesses performed by clang (load or store) 
> > _is_ an ABI change at IR level because of UB.
> 
> Could you clarify why UB is a consideration here?

There's some weirdness with poison values; see https://reviews.llvm.org/D129541 
etc.  This is mostly an IR modeling thing; the machine instructions work fine, 
we just need to make sure we don't accidentally poison IR values that aren't 
supposed to be poison.

https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang] Better bitfield access units (PR #65742)

2023-09-08 Thread Nathan Sidwell via cfe-commits

urnathan wrote:

> Note that changing the memory accesses performed by clang (load or store) 
> _is_ an ABI change at IR level because of UB.

Could you clarify why UB is a consideration here?


https://github.com/llvm/llvm-project/pull/65742
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   >