On Wed, Mar 13, 2024 at 10:24:08AM -0700, Abhinav Kumar wrote: > On 3/13/2024 1:18 AM, Johan Hovold wrote:
> > Right, but your proposed fix would not actually fix anything and judging > > from the sparse commit message and diff itself it is clearly only meant > > to mitigate the case where user space is involved, which is *not* the > > case here. > There can be a race condition between the time the DP driver gets the > hpd disconnect event and when the hpd thread processes that event > allowing the commit to sneak in. This is something which has always been > there even without pm_runtime series and remains even today. > > In this race condition, the setting of "link_ready" to false can be a > bit delayed if we go through the HPD event processing increasing the > race condition window. > > If link_ready is false, atomic_check() fails, thereby failing any > commits and hence not allowing the atomic_disable() / atomic_enable() > cycle and hence avoiding this reset. > > The patch is moving the setting of link_ready to false earlier by not > putting it through the HPD event thread and hence trying to reduce the > window of the issue. Perhaps I'm missing something in the race that you are trying to describe (and which I've asked you to describe in more detail so that I don't have to spend more time trying to come up with a reproducer myself). I do understand how your patch works, but my point is that it does not fix the race that we are hitting on sc8280xp and, unless I'm missing something, it is not even sufficient to fix the race you are talking about as user space can still trigger that ioctl() before you clear the link_ready flag. That's why I said that it is only papering over the issue by making the race window smaller (and this should also be highlighted in the commit message). For some reason it also made things worse on sc8280xp, but I did not spend time on tracking down exactly why. Johan