On Thu, 2015-07-30 at 01:41 +0000, Gregory Hoggarth wrote:
> Hi,
> 
> My company has also started having what appears to be the same problem, since 
> we upgraded our embedded system to 
> linux kernel 3.16.
> 
> I tried applying the suggested fix of READ_ONCE (and also had to add in the 
> necessary code to compiler.h as 3.16
> didn't have it) and unfortunately it did not fix the issue at all.
> 
> Unfortunately we do not have an easy reproduction method, and do not know 
> precisely what is going on in the system
> when the issue occurs. We know it is a multicast UDP packet but that is about 
> it. For us, the crash happens during
> a critical stage in our system initialisation, making additional debugging 
> and instrumentation difficult. Our 
> reproduction rate is approximately 1 out of 100 test runs; testing overnight 
> we will usually see 3-5 instances of 
> the crash happening. All our attempts to increase the reproduction rate, or 
> reproduce the issue in a simpler/more 
> controlled way have failed.
> 
> Because we have customised the linux kernel, in some places radically, we 
> assumed this was just a problem only we 
> were seeing, so we were trying to fix it ourselves. Now that this appears to 
> be a generic problem upstream, we've 
> simply disabled UDP early demux in our system (since it's a new optimisation 
> that we have lived without up till 
> now) and will wait for this issue to be fixed upstream instead.
> 
> 
> So I'm sharing the debug patch I've written to help gather data on what is 
> going on in the system, and some
> of the output we've gotten from the debug, in case this is useful for anyone 
> else who is seeing this problem or
> would like to try and fix it.
> 
> Feel free to ask questions, I'm not sure how much help I can be but will do 
> my best. We'll be happy to assist in
> testing any proposed fixes. I also have some more examples of kernel oops and 
> debug output if that could be useful, 
> although the debug is from earlier iterations of the patch so that historical 
> output is not as detailed as the 
> output generated by the latest version of the patch attached here.
> 
> Thanks,
> Greg Hoggarth

CC UDP early demux author : Shawn Bohrer 

I believe this is a race condition with a dst escaping RCU protected
region.

I will send a patch.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to