On Mon, Dec 16, 2013 at 03:21:40PM +0000, Zoltan Kiss wrote: [...] > >>>> > > >>>> >Should this be BUG_ON? AIUI this kthread should be the only one doing > >>>> >unmap, right? > >>>The NAPI instance can do it as well if it is a small packet fits > >>>into PKT_PROT_LEN. But still this scenario shouldn't really happen, > >>>I was just not sure we have to crash immediately. Maybe handle it as > >>>a fatal error and destroy the vif? > >>> > >It depends. If this is within the trust boundary, i.e. everything at the > >stage should have been sanitized then we should BUG_ON because there's > >clearly a bug somewhere in the sanitization process, or in the > >interaction of various backend routines. > > My understanding is that crashing should be avoided if we can bail > out somehow. At this point there is clearly a bug in netback > somewhere, something unmapped that page before it should have > happened, or at least that array get corrupted somehow. However > there is a chance that xenvif_fatal_tx_err() can contain the issue, > and the rest of the system can go unaffected. >
That would make debugging much harder if a crash is caused by a previous corrupted array and we pretend we can carry on serving IMHO. Now netback is having three routines (NAPI, two kthreads) to serve a single vif, the interation among them makes bug hard to reproduce. Wei. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/