On Mon, Sep 21, 2015 at 09:49:45PM +1000, Gavin Shan wrote: >On Mon, Sep 21, 2015 at 05:29:48PM +0800, Wei Yang wrote: >>Current EEH infrastructure would avoid to handle EEH when a PE is passed to >>guest, while if this PE is a Child PE of the one hit EEH, host would handle >>this. By doing so, this would leads to guest hang. The correct way is >>avoid to handle it on host and let guest to recover. >> >>This patch avoids to handle EEH on a passed Child PE. >> > >Ok. It's fixing the problem the guest, which owns a VF, when its PF hitting >EEH error, right? If so, I'm not sure if you really tested this code. Does >it work for you?
Yes, I inject error on Parent Bus PE. > >When the parent PE (PF) is stopped for EEH recovery, it sounds impossible >that the child PE can't be affected and just escape from the error. The >question is how the guest can continue to work after the EEH recovery on >parent PE? What I see is the PF is covering and VF in guest is recovering. > >>Signed-off-by: Wei Yang <weiy...@linux.vnet.ibm.com> >>--- >> arch/powerpc/kernel/eeh_pe.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >>diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c >>index 5cde950..c6d0e9f 100644 >>--- a/arch/powerpc/kernel/eeh_pe.c >>+++ b/arch/powerpc/kernel/eeh_pe.c >>@@ -172,6 +172,7 @@ static struct eeh_pe *eeh_pe_next(struct eeh_pe *pe, >> * callback returns something other than NULL, or no more PEs >> * to be traversed. >> */ >>+static void *__eeh_pe_get(void *data, void *flag); >> void *eeh_pe_traverse(struct eeh_pe *root, >> eeh_traverse_func fn, void *flag) >> { >>@@ -179,6 +180,8 @@ void *eeh_pe_traverse(struct eeh_pe *root, >> void *ret; >> >> for (pe = root; pe; pe = eeh_pe_next(pe, root)) { >>+ if (eeh_pe_passed(pe) && (fn != __eeh_pe_get)) >>+ continue; > >The code change here seems ugly. > >The "flag" can be extended to carry the information to skip pass-through >PEs or not. So the function calling eeh_pe_traverse() decides to skip >pass-through PEs or not. I don't get the point, which "flag" you mean? Add a flag in eeh_pe? > >> ret = fn(pe, flag); >> if (ret) return ret; >> } >>@@ -210,6 +213,8 @@ void *eeh_pe_dev_traverse(struct eeh_pe *root, > >> >> /* Traverse root PE */ >> for (pe = root; pe; pe = eeh_pe_next(pe, root)) { >>+ if (eeh_pe_passed(pe)) >>+ continue; >> eeh_pe_for_each_dev(pe, edev, tmp) { >> ret = fn(edev, flag); >> if (ret) >>-- >>2.5.0 >> -- Richard Yang Help you, Help me _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev