On 14/11/2025 10:43, Alex Bennée wrote:
Yodel Eldar <[email protected]> writes:

(add Florians to CC)

On 11/11/2025 04:54, Alex Bennée wrote:
The datasheet doesn't explicitly say that TXFR_LEN has to be word
aligned but the fact there is a DMA_D_WIDTH flag to select between 32
bit and 128 bit strongly implies that is how it works. The downstream

At the bottom of page 38, the datasheet [1] states "the DMA can deal
with byte aligned transfers and will minimise bus traffic by buffering
and packing misaligned accesses."

IIUC, the *_WIDTH info fields are implied as maxima.

[1] https://datasheets.raspberrypi.com/bcm2835/bcm2835-peripherals.pdf

That reads ambiguously - you could start a misaligned n*WIDTH transfer
and the hardware will write bytes until aligned?

If it does indeed work with byte accesses maybe we can just do:

   if (xlen & 0x3) {
     .. do one byte ..
     xlen -= 1;
   } else {
     .. existing 32 bit code ..
   }

but I guess we need to handle unaligned accesses as well.

Florian,

Can you help clarify what the datasheet means here?

Thanks,

<snip>

rpi kernel also goes to efforts to not write sub-4 byte lengths so
lets:


Sorry for the lagged response: I was reviewing the datasheet and,
failing to find clarity there, familiarizing myself with the AXI
protocol spec and relevant driver code. Unfortunately, I'm still
uncertain about how the DMA controller handles unaligned values in
TXFR_LEN.XLENGTH, but below I've listed some of my abridged findings in
the hope that it may be of some help. Given my ambivalence regarding
the answer to the question, I defer to Alex and the community.

In alignment with Peter's comment, the AXI spec clearly explicates
support of unaligned start address transfers [1] but doesn't appear to
require support of unaligned ending transfers, although, that doesn't
preclude them either.

On the other hand, we know the BCM2835 supports write strobes at least
for triggering cache prefetching [2], and that this "allows memory
structures to be implemented that can be written using byte and half
word accesses" [3]. The specification also states "[a]ll [manager]
interfaces and interconnect must provide correct write strobes" and
that subordinate components can choose to: fully use them, ignore them
(i.e., "treat all writes as being the full data bus width"), or detect
unsupported write strobe combinations and provide an error response.
Moreover, "[a]ny [subordinate] component that is providing memory-like
behavior must fully support write strobes" [3]. To me, this suggests
that the BCM2835 has what it needs to be able to invalidate the bytes
past the TXFR_LEN bytes in the final beat of a DMA write transaction...
but IIUC it's possible to forego that and remain AXI-compliant.

Section 4.2 of {A}, Burst Length, also mentions the use of write strobes
for partial write transactions and extends it to cover read transactions
via discarding, too:

    No component can terminate a burst early to reduce the number of
    data transfers. During a write burst, the [manager] can disable
    further writing by deasserting all the write strobes, but it
    must complete the remaining transfers in the burst. During a
    read burst, the [manager] can discard further read data, but it
    must complete the remaining transfers in the burst. [4]

As a counterpoint, however, perhaps only whole transfers of a read
transaction can be discarded, or the BCM2835 forewent it altogether.

Narrow Transfers, section 9.3, also mentions the generation of transfers
that are narrower than the data bus of the manager [5].

All of the above are taken from the AXI spec, but as mentioned earlier
some of these features are opt-in, and the BCM2835 could have skipped
them. So, lastly, let's consider TXFR_LEN from the datasheet [6]: It's
described as "specif[ying] the amount of data to be transferred in
bytes." Moreover, all of the bits of its bitfield are accounted,
including TXFR_LEN[31:30]: "Reserved - Write as 0, read as don't care."
It could be an unfortunate oversight that the authors' didn't describe
TXFR_LEN[1:0] with the same description, or it could be intentional. If
XLENGTH's two LSBs are always 0 (or just ignored), then couldn't
TXFR_LEN simply indicate the number of transfers instead of the total
number of bytes transferred in a transaction, like YLENGTH in 2D mode?

I'm inclined to think that the line about the DMA dealing with byte
aligned transfers means that both unaligned start addresses and
a partial terminal data transfers via packing, buffering, discarding,
and the application of write strobes on full-width transfers are
possible, but without more precise language or a test on hardware (not
planned anytime soon), I can't be sure. Also, please let me know if
I misunderstood any of the source material or important details I
may have missed.

In the absence of certainty and for the sake of addressing the DoS issue
that inspired the patch, perhaps it's better to leave the patch as is
with a comment soliciting further investigation on actual behavior?

{A} AMBA AXI Protocol Version: 2.0 Specification
    https://documentation-service.arm.com/static/64256e84314e245d086bc88f

{B} BCM2835 ARM Peripherals
    https://datasheets.raspberrypi.com/bcm2835/bcm2835-peripherals.pdf

[1] {A} (p. 10-2)
[2] {B} (p. 51)
[3] {A} (p. 14-5)
[4] {A} (p. 4-3)
[5] {A} (p. 9-4)
[6] {B} (p. 53)

Thanks,
Yodel

Reply via email to