Hi Heyi,

On 13/05/2019 12:42, Heyi Guo wrote:
> When we run several VMs with PCI passthrough and GICv4 enabled, not
> pinning vCPUs, we will occasionally see below warnings in dmesg:
> 
> ITS queue timeout (65440 65504 480)
> ITS cmd its_build_vmovp_cmd failed
> 
> The reason for the above issue is that in BUILD_SINGLE_CMD_FUNC:
> 1. Post the write command.
> 2. Release the lock.
> 3. Start to read GITS_CREADR to get the reader pointer.
> 4. Compare the reader pointer to the target pointer.
> 5. If reader pointer does not reach the target, sleep 1us and continue
> to try.
> 
> If we have several processors running the above concurrently, other
> CPUs will post write commands while the 1st CPU is waiting the
> completion. So we may have below issue:
> 
> phase 1:
> ---rd_idx-----from_idx-----to_idx--0---------
> 
> wait 1us:
> 
> phase 2:
> --------------from_idx-----to_idx--0-rd_idx--
> 
> That is the rd_idx may fly ahead of to_idx, and if in case to_idx is
> near the wrap point, rd_idx will wrap around. So the below condition
> will not be met even after 1s:
> 
> if (from_idx < to_idx && rd_idx >= to_idx)
> 
> There is another theoretical issue. For a slow and busy ITS, the
> initial rd_idx may fall behind from_idx a lot, just as below:
> 
> ---rd_idx---0--from_idx-----to_idx-----------
> 
> This will cause the wait function exit too early.
> 
> Actually, it does not make much sense to use from_idx to judge if
> to_idx is wrapped, but we need a initial rd_idx when lock is still
> acquired, and it can be used to judge whether to_idx is wrapped and
> the current rd_idx is wrapped.

That's an interesting observation. Indeed, from_idx is pretty irrelevant
here, and all we want to observe is the read pointer reaching the end of
the command set.

> 
> We switch to a method of calculating the delta of two adjacent reads
> and accumulating it to get the sum, so that we can get the real rd_idx
> from the wrapped value even when the queue is almost full.
> 
> Cc: Thomas Gleixner <[email protected]>
> Cc: Jason Cooper <[email protected]>
> Cc: Marc Zyngier <[email protected]>
> 
> Signed-off-by: Heyi Guo <[email protected]>
> ---
>  drivers/irqchip/irq-gic-v3-its.c | 30 ++++++++++++++++++++----------
>  1 file changed, 20 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index 7577755..f05acd4 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -745,32 +745,40 @@ static void its_flush_cmd(struct its_node *its, struct 
> its_cmd_block *cmd)
>  }
>  
>  static int its_wait_for_range_completion(struct its_node *its,
> -                                      struct its_cmd_block *from,
> +                                      u64    origin_rd_idx,
>                                        struct its_cmd_block *to)
>  {
> -     u64 rd_idx, from_idx, to_idx;
> +     u64 rd_idx, prev_idx, to_idx, sum;
> +     s64 delta;
>       u32 count = 1000000;    /* 1s! */
>  
> -     from_idx = its_cmd_ptr_to_offset(its, from);
>       to_idx = its_cmd_ptr_to_offset(its, to);
> +     if (to_idx < origin_rd_idx)
> +             to_idx += ITS_CMD_QUEUE_SZ;
> +
> +     prev_idx = origin_rd_idx;

I guess you could just rename origin_rd_idx to prev_idx and drop the
extra declaration (the pr_err doesn't matter much).

> +     sum = origin_rd_idx;
>  
>       while (1) {
>               rd_idx = readl_relaxed(its->base + GITS_CREADR);
>  
> -             /* Direct case */
> -             if (from_idx < to_idx && rd_idx >= to_idx)
> -                     break;
> +             /* Wrap around for CREADR */
> +             if (rd_idx >= prev_idx)
> +                     delta = rd_idx - prev_idx;
> +             else
> +                     delta = rd_idx + ITS_CMD_QUEUE_SZ - prev_idx;
>  
> -             /* Wrapped case */
> -             if (from_idx >= to_idx && rd_idx >= to_idx && rd_idx < from_idx)
> +             sum += delta;

So "sum" isn't quite saying what it represent. My understanding is that
it is the linearized version of the read pointer, right? Just like
you've linearized to_idx at the beginning of the function.

> +             if (sum >= to_idx)
>                       break;
>  
>               count--;
>               if (!count) {
>                       pr_err_ratelimited("ITS queue timeout (%llu %llu 
> %llu)\n",
> -                                        from_idx, to_idx, rd_idx);
> +                                        origin_rd_idx, to_idx, sum);
>                       return -1;
>               }
> +             prev_idx = rd_idx;
>               cpu_relax();
>               udelay(1);
>       }
> @@ -787,6 +795,7 @@ void name(struct its_node *its,                           
>                 \
>       struct its_cmd_block *cmd, *sync_cmd, *next_cmd;                \
>       synctype *sync_obj;                                             \
>       unsigned long flags;                                            \
> +     u64 rd_idx;                                                     \
>                                                                       \
>       raw_spin_lock_irqsave(&its->lock, flags);                       \
>                                                                       \
> @@ -808,10 +817,11 @@ void name(struct its_node *its,                         
>                 \
>       }                                                               \
>                                                                       \
>  post:                                                                        
> \
> +     rd_idx = readl_relaxed(its->base + GITS_CREADR);                \
>       next_cmd = its_post_commands(its);                              \
>       raw_spin_unlock_irqrestore(&its->lock, flags);                  \
>                                                                       \
> -     if (its_wait_for_range_completion(its, cmd, next_cmd))          \
> +     if (its_wait_for_range_completion(its, rd_idx, next_cmd))       \
>               pr_err_ratelimited("ITS cmd %ps failed\n", builder);    \
>  }
>  
> 

If you agree with my comments above, I'm happy to take this patch and
tidy it up myself.

Now, I think there is still an annoying bug that can creep up if the
queue wraps *twice* while you're sleeping (which could perfectly happen
in a guest running on a busy host). Which means we need to account for
wrapping generations...

Thanks,

        M.
-- 
Jazz is not dead. It just smells funny...

Reply via email to