On Tue, 20 Feb 2018 16:51:03 -0800, Florian Fainelli wrote:
> On 02/20/2018 04:43 PM, Jakub Kicinski wrote:
> > On Mon, 19 Feb 2018 18:04:17 +0530, Rahul Lakkireddy wrote:  
> >> Our requirement is to analyze the state of firmware/hardware at the
> >> time of kernel panic.   
> > 
> > I was wondering about this since you posted the patch and I can't come
> > up with any specific scenario where kernel crash would correlate
> > clearly with device state in non-trivial way.
> > 
> > Perhaps there is something about cxgb4 HW/FW that makes this useful.
> > Could you explain?  Could you give a real life example of a bug?  
> > Is it related to the TOE-looking TLS offload Atul is posting?
> > 
> > Is the panic you're targeting here real or manually triggered from user
> > space to get a full dump of kernel and FW?
> > 
> > That's me trying to guess what you're doing.. :)
> 
> One case where this might be helpful is if you are chasing down DMA
> corruption and you would like to get a nearly instant capture of both
> the kernel's memory and the adapter which may be responsible for that.
> This is not probably 100% proof because there is a timing window during
> which the dumps of both contexts are going to happen, and that alone
> might be influencing the captured memory view. Just guessing of course.

Perhaps this is what you mean with the timing window - but with random
corruptions by the time kernel hits the corrupted memory 40/100Gb
adapter has likely forgotten all about those DMAs..  And IOMMUs are
pretty good at catching corruptions on big iron CPUs (i.e. it's easy to
catch them in testing, even if production environment runs iommu=pt).
At least that's my gut feeling/experience ;)

Reply via email to