Hi,

Chwee-Lin Choong <[email protected]> writes:

> The current HW bug workaround checks the TXTT_0 ready bit first,
> then reads TXSTMPL_0 twice (before and after reading TXSTMPH_0)
> to detect whether a new timestamp was captured by timestamp
> register 0 during the workaround.
>
> This sequence has a race: if a new timestamp is captured after
> checking the TXTT_0 bit but before the first TXSTMPL_0 read, the
> detection fails because both the “old” and “new” values come from
> the same timestamp.
>
> Fix by reading TXSTMPL_0 first to establish a baseline, then
> checking the TXTT_0 bit. This ensures any timestamp captured
> during the race window will be detected.
>
> Old sequence:
>   1. Check TXTT_0 ready bit
>   2. Read TXSTMPL_0 (baseline)
>   3. Read TXSTMPH_0 (interrupt workaround)
>   4. Read TXSTMPL_0 (detect changes vs baseline)
>
> New sequence:
>   1. Read TXSTMPL_0 (baseline)
>   2. Check TXTT_0 ready bit
>   3. Read TXSTMPH_0 (interrupt workaround)
>   4. Read TXSTMPL_0 (detect changes vs baseline)
>
> Fixes: c789ad7cbebc ("igc: Work around HW bug causing missing timestamps")
> Suggested-by: Avi Shalev <[email protected]>
> Reviewed-by: Aleksandr Loktionov <[email protected]>
> Co-developed-by: Song Yoong Siang <[email protected]>
> Signed-off-by: Song Yoong Siang <[email protected]>
> Signed-off-by: Chwee-Lin Choong <[email protected]>

Patch looks good, my only concern is this report:

https://lore.kernel.org/intel-wired-lan/as1pr10mb5675dbfe7ce5f2a9336abfa4eb...@as1pr10mb5675.eurprd10.prod.outlook.com/

It's not clear to me how/why the different buffer utilization is
affecting this, but at least seems worth some investigation and
reporting back in that thread.

> ---
> v1: 
> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/[email protected]/
> v2: 
> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/[email protected]/
> v3: 
> https://patchwork.ozlabs.org/project/intel-wired-lan/patch/[email protected]/
>
> changelog:
> v1 -> v2 
> - Added detailed comments explaining the hardware bug workaround and race
>     detection mechanism
> v2 -> v3
> - Removed extra export file added by mistake  
> v3 -> v4
> - Added co-developer
> ---
>  drivers/net/ethernet/intel/igc/igc_ptp.c | 43 ++++++++++++++----------
>  1 file changed, 25 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/igc/igc_ptp.c 
> b/drivers/net/ethernet/intel/igc/igc_ptp.c
> index b7b46d863bee..7aae83c108fd 100644
> --- a/drivers/net/ethernet/intel/igc/igc_ptp.c
> +++ b/drivers/net/ethernet/intel/igc/igc_ptp.c
> @@ -774,36 +774,43 @@ static void igc_ptp_tx_reg_to_stamp(struct igc_adapter 
> *adapter,
>  static void igc_ptp_tx_hwtstamp(struct igc_adapter *adapter)
>  {
>       struct igc_hw *hw = &adapter->hw;
> +     u32 txstmpl_old;
>       u64 regval;
>       u32 mask;
>       int i;
>  
> +     /* Establish baseline of TXSTMPL_0 before checking TXTT_0.
> +      * This baseline is used to detect if a new timestamp arrives in
> +      * register 0 during the hardware bug workaround below.
> +      */
> +     txstmpl_old = rd32(IGC_TXSTMPL);
> +
>       mask = rd32(IGC_TSYNCTXCTL) & IGC_TSYNCTXCTL_TXTT_ANY;
>       if (mask & IGC_TSYNCTXCTL_TXTT_0) {
>               regval = rd32(IGC_TXSTMPL);
>               regval |= (u64)rd32(IGC_TXSTMPH) << 32;
>       } else {
> -             /* There's a bug in the hardware that could cause
> -              * missing interrupts for TX timestamping. The issue
> -              * is that for new interrupts to be triggered, the
> -              * IGC_TXSTMPH_0 register must be read.
> +             /* TXTT_0 not set - register 0 has no new timestamp initially.
> +              *
> +              * Hardware bug: Future timestamp interrupts won't fire unless
> +              * TXSTMPH_0 is read, even if the timestamp was captured in
> +              * registers 1-3.
>                *
> -              * To avoid discarding a valid timestamp that just
> -              * happened at the "wrong" time, we need to confirm
> -              * that there was no timestamp captured, we do that by
> -              * assuming that no two timestamps in sequence have
> -              * the same nanosecond value.
> +              * Workaround: Read TXSTMPH_0 here to enable future interrupts.
> +              * However, this read clears TXTT_0. If a timestamp arrives in
> +              * register 0 after checking TXTT_0 but before this read, it
> +              * would be lost.
>                *
> -              * So, we read the "low" register, read the "high"
> -              * register (to latch a new timestamp) and read the
> -              * "low" register again, if "old" and "new" versions
> -              * of the "low" register are different, a valid
> -              * timestamp was captured, we can read the "high"
> -              * register again.
> +              * To detect this race: We saved a baseline read of TXSTMPL_0
> +              * before TXTT_0 check. After performing the workaround read of
> +              * TXSTMPH_0, we read TXSTMPL_0 again. Since consecutive
> +              * timestamps never share the same nanosecond value, a change
> +              * between the baseline and new TXSTMPL_0 indicates a timestamp
> +              * arrived during the race window. If so, read the complete
> +              * timestamp.
>                */
> -             u32 txstmpl_old, txstmpl_new;
> +             u32 txstmpl_new;
>  
> -             txstmpl_old = rd32(IGC_TXSTMPL);
>               rd32(IGC_TXSTMPH);
>               txstmpl_new = rd32(IGC_TXSTMPL);
>  
> @@ -818,7 +825,7 @@ static void igc_ptp_tx_hwtstamp(struct igc_adapter 
> *adapter)
>  
>  done:
>       /* Now that the problematic first register was handled, we can
> -      * use retrieve the timestamps from the other registers
> +      * retrieve the timestamps from the other registers
>        * (starting from '1') with less complications.
>        */
>       for (i = 1; i < IGC_MAX_TX_TSTAMP_REGS; i++) {
> -- 
> 2.43.0
>

-- 
Vinicius

Reply via email to