Vaibhav Jain <vaib...@linux.ibm.com> writes: > In some cases initial bind of scm memory for an lpar can fail if > previously it wasn't released using a scm-unbind hcall. This situation > can arise due to panic of the previous kernel or forced lpar > fadump. In such cases the H_SCM_BIND_MEM return a H_OVERLAP error. > > To mitigate such cases the patch updates papr_scm_probe() to force a > call to drc_pmem_unbind() in case the initial bind of scm memory fails > with EBUSY error. In case scm-bind operation again fails after the > forced scm-unbind then we follow the existing error path. We also > update drc_pmem_bind() to handle the H_OVERLAP error returned by phyp > and indicate it as a EBUSY error back to the caller. > > Suggested-by: "Oliver O'Halloran" <ooh...@gmail.com> > Signed-off-by: Vaibhav Jain <vaib...@linux.ibm.com> > Reviewed-by: Oliver O'Halloran <ooh...@gmail.com> > --- > Change-log: > v3: > * Minor update to a code comment. [Oliver] > > v2: > * Moved the retry code from drc_pmem_bind() to papr_scm_probe() > [Oliver] > * Changed the type of variable 'rc' in drc_pmem_bind() to > int64_t. [Oliver] > --- > arch/powerpc/platforms/pseries/papr_scm.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c > b/arch/powerpc/platforms/pseries/papr_scm.c > index c01a03fd3ee7..7c5e10c063a0 100644 > --- a/arch/powerpc/platforms/pseries/papr_scm.c > +++ b/arch/powerpc/platforms/pseries/papr_scm.c > @@ -43,8 +43,9 @@ struct papr_scm_priv { > static int drc_pmem_bind(struct papr_scm_priv *p) > { > unsigned long ret[PLPAR_HCALL_BUFSIZE]; > - uint64_t rc, token; > uint64_t saved = 0; > + uint64_t token; > + int64_t rc; > > /* > * When the hypervisor cannot map all the requested memory in a single > @@ -64,6 +65,10 @@ static int drc_pmem_bind(struct papr_scm_priv *p) > } while (rc == H_BUSY); > > if (rc) { > + /* H_OVERLAP needs a separate error path */ > + if (rc == H_OVERLAP) > + return -EBUSY; > + > dev_err(&p->pdev->dev, "bind err: %lld\n", rc); > return -ENXIO; > } > @@ -331,6 +336,14 @@ static int papr_scm_probe(struct platform_device *pdev) > > /* request the hypervisor to bind this region to somewhere in memory */ > rc = drc_pmem_bind(p); > + > + /* If phyp says drc memory still bound then force unbound and retry */ > + if (rc == -EBUSY) { > + dev_warn(&pdev->dev, "Retrying bind after unbinding\n"); > + drc_pmem_unbind(p);
This should only be caused by kexec right? And considering kernel nor hypervisor won't change device binding details, can you check switching this to H_SCM_QUERY_BLOCK_MEM_BINDING? Will that result in faster boot? > + rc = drc_pmem_bind(p); > + } > + > if (rc) > goto err; > I am also not sure about the module reference count here. Should we increment the module reference count after a bind so that we can track failures in ubind and fail the module unload? -aneesh