From: AI Reviewer <[email protected]>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: ena: Add error bound to PHC communication layer
This commit extends the ENA PHC communication layer to retrieve an
error_bound value from the device alongside the timestamp, and adds a
phc_err_eb statistic to count error_bound retrieval failures.
> diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c
> b/drivers/net/ethernet/amazon/ena/ena_com.c
> index 257910460bd8..832c4c1d0e5e 100644
> --- a/drivers/net/ethernet/amazon/ena/ena_com.c
> +++ b/drivers/net/ethernet/amazon/ena/ena_com.c
[ ... ]
> @@ -1882,14 +1888,15 @@ int ena_com_phc_get_timestamp(struct ena_com_dev
> *ena_dev, u64 *timestamp)
> }
>
> /* req_id was updated by the device which indicates that
> - * PHC timestamp and error_flags are updated too,
> - * checking errors before retrieving timestamp
> + * PHC timestamp, error_bound and error_flags are updated too,
> + * checking error flags before retrieving timestamp and
> + * error_bound values
> */
> if (unlikely(resp->error_flags & ENA_PHC_ERROR_FLAGS)) {
[High]
Once req_id is observed equal to phc->req_id, the function reads
resp->error_flags, resp->timestamp and (now) resp->error_bound from the
device-shared DMA buffer. The updated comment explicitly states that once
req_id is observed updated, "PHC timestamp, error_bound and error_flags
are updated too" — which is the producer/consumer pattern that normally
requires a read barrier between the index check and the dependent data
loads.
The buffer is declared volatile and the req_id check uses READ_ONCE(),
but on weakly-ordered architectures such as arm64 (which includes AWS
Graviton, a primary ENA deployment target) neither volatile nor
READ_ONCE() orders dependent loads against DMA writes from the device.
Without a dma_rmb() the CPU may observe the new req_id while still
returning stale or torn values for the 64-bit timestamp, the error_bound,
or the error_flags.
Could a dma_rmb() be added between the req_id check and the subsequent
reads of resp->error_flags / resp->timestamp / resp->error_bound, similar
to the pattern already used elsewhere in the same driver?
For example, ena_com_handle_admin_completion() in the same file does:
/* Do not read the rest of the completion entry before the
* phase bit was validated
*/
dma_rmb();
and ena_com_aenq_intr_handler() has:
/* Make sure the phase bit (ownership) is as expected before
* reading the rest of the descriptor.
*/
dma_rmb();
The same shape applies here: device writes data fields, then writes
req_id last; driver checks req_id, then reads data fields.
This ordering issue is not introduced by this patch — the timestamp and
error_flags reads after the req_id check existed before — but the patch
extends the same unsynchronized read pattern to a third field
(error_bound) and the new comment now documents the assumed ordering
that is not actually enforced. Would it make sense to address the missing
dma_rmb() while touching this code?
> /* Retrieved invalid PHC timestamp, PHC enters into
[ ... ]
> @@ -1897,12 +1904,15 @@ int ena_com_phc_get_timestamp(struct ena_com_dev
> *ena_dev, u64 *timestamp)
>
> /* PHC timestamp value is returned to the caller */
> *timestamp = resp->timestamp;
> + if (error_bound)
> + *error_bound = resp->error_bound;
>
> /* Update statistic on valid PHC timestamp retrieval */
> phc->stats.phc_cnt++;
[ ... ]
--
This is an AI-generated review.