Juliet Kim <juli...@linux.vnet.ibm.com> writes: > On 6/25/19 1:51 PM, Nathan Lynch wrote: >> Juliet Kim <juli...@linux.vnet.ibm.com> writes: >> >>> There's some concern this could retry forever, resulting in live lock. >> First of all the system will make progress in other areas even if there >> are repeated retries; we're not indefinitely holding locks or anything >> like that. > > For instance, system admin runs a script that picks and offlines CPUs in a > loop to keep a certain rate of onlined CPUs for energy saving. If LPM keeps > putting CPUs back online, that would never finish, and would keepgenerating > new offline requests > >> Second, Linux checks the H_VASI_STATE result on every retry. If the >> platform wants to terminate the migration (say, if it imposes a >> timeout), Linux will abandon it when H_VASI_STATE fails to return >> H_VASI_SUSPENDING. And it seems incorrect to bail out before that >> happens, absent hard errors on the Linux side such as allocation >> failures. > I confirmed with the PHYP and HMC folks that they wouldn't time out the LPM > request including H_VASI_STATE, so if the LPM retries were unlucky enough to > encounter repeated CPU offline attempts (maybe some customer code retrying > that), then the retries could continue indefinitely, or until some manual > intervention. And in the mean time, the LPM delay here would cause PHYP to > block other operations.
That sounds like a PHYP bug to me. cheers