Re: Latest net-next from GIT panic
On Thu, 2017-09-21 at 15:18 +0200, Paweł Staszewski wrote: > ok after adding patch all is working from now for about 1 hour of normal > traffic witc all bgp sessions connected and about 600k prefixes in kernel. Great, I am doing to submit an official patch, uniting skb_dst_force() and skb_dst_force_safe() into a single helper. Thanks.
Re: Latest net-next from GIT panic
W dniu 2017-09-21 o 13:31, Paweł Staszewski pisze: W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: OK we have two problems here 1) We need to unify skb_dst_force() ( for net tree ) 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from lower device. This will considerably help your performance. For 1), this is what I had in mind, can you try it ? Thanks a lot ! diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..f23851eeaad917e8dafc06b58d23a2575405c894 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb __skb_dst_copy(nskb, oskb->_skb_refdst); } -/** - * skb_dst_force - makes sure skb dst is refcounted - * @skb: buffer - * - * If dst is not yet refcounted, let's do it - */ -static inline void skb_dst_force(struct sk_buff *skb) -{ - if (skb_dst_is_noref(skb)) { - WARN_ON(!rcu_read_lock_held()); - skb->_skb_refdst &= ~SKB_DST_NOREF; - dst_clone(skb_dst(skb)); - } -} - /** * dst_hold_safe - Take a reference on a dst if possible * @dst: pointer to dst entry @@ -356,6 +341,23 @@ static inline void skb_dst_force_safe(struct sk_buff *skb) } } +/** + * skb_dst_force - makes sure skb dst is refcounted + * @skb: buffer + * + * If dst is not yet refcounted, let's do it + */ +static inline void skb_dst_force(struct sk_buff *skb) +{ + if (skb_dst_is_noref(skb)) { + struct dst_entry *dst = skb_dst(skb); + + WARN_ON(!rcu_read_lock_held()); + if (!dst_hold_safe(dst)) + dst = NULL; + skb->_skb_refdst = (unsigned long)dst; + } +} /** * __skb_tunnel_rx - prepare skb for rx reinsert Patch applied - soo far no problems - and no warnings in dmesg ok after adding patch all is working from now for about 1 hour of normal traffic witc all bgp sessions connected and about 600k prefixes in kernel.
Re: Latest net-next from GIT panic
W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: OK we have two problems here 1) We need to unify skb_dst_force() ( for net tree ) 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from lower device. This will considerably help your performance. For 1), this is what I had in mind, can you try it ? Thanks a lot ! diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..f23851eeaad917e8dafc06b58d23a2575405c894 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb __skb_dst_copy(nskb, oskb->_skb_refdst); } -/** - * skb_dst_force - makes sure skb dst is refcounted - * @skb: buffer - * - * If dst is not yet refcounted, let's do it - */ -static inline void skb_dst_force(struct sk_buff *skb) -{ - if (skb_dst_is_noref(skb)) { - WARN_ON(!rcu_read_lock_held()); - skb->_skb_refdst &= ~SKB_DST_NOREF; - dst_clone(skb_dst(skb)); - } -} - /** * dst_hold_safe - Take a reference on a dst if possible * @dst: pointer to dst entry @@ -356,6 +341,23 @@ static inline void skb_dst_force_safe(struct sk_buff *skb) } } +/** + * skb_dst_force - makes sure skb dst is refcounted + * @skb: buffer + * + * If dst is not yet refcounted, let's do it + */ +static inline void skb_dst_force(struct sk_buff *skb) +{ + if (skb_dst_is_noref(skb)) { + struct dst_entry *dst = skb_dst(skb); + + WARN_ON(!rcu_read_lock_held()); + if (!dst_hold_safe(dst)) + dst = NULL; + skb->_skb_refdst = (unsigned long)dst; + } +} /** *__skb_tunnel_rx - prepare skb for rx reinsert Patch applied - soo far no problems - and no warnings in dmesg
Re: Latest net-next from GIT panic
W dniu 2017-09-21 o 13:12, Paweł Staszewski pisze: W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote: W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: Thanks very much Pawel for the feedback. I was looking into the code (specifically IPv4 part) and found that in free_fib_info_rcu(), we call free_nh_exceptions() without holding the fnhe_lock. I am wondering if that could cause some race condition on fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the same dst could be happening. But as we call free_fib_info_rcu() only after the grace period, and the lookup code which could potentially modify fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems fine... Hi Pawel, Could you try the following debug patch on top of net-next branch and reproduce the issue check if there are warning msg showing? diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a352..82aff41c6f63 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } Thanks. Wei Yes, we believe skb_dst_force() and skb_dst_force_safe() should be unified (to the 'safe' version) We no longer have gc to protect from 0 -> 1 transition of dst refcount. After adding patch from Wei https://bugzilla.kernel.org/show_bug.cgi?id=197005#c14 OK we have two problems here 1) We need to unify skb_dst_force() ( for net tree ) 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from lower device. This will considerably help your performance. For 1), this is what I had in mind, can you try it ? Thanks a lot ! diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..f23851eeaad917e8dafc06b58d23a2575405c894 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb __skb_dst_copy(nskb, oskb->_skb_refdst); } -/** - * skb_dst_force - makes sure skb dst is refcounted - * @skb: buffer - * - * If dst is not yet refcounted, let's do it - */ -static inline void skb_dst_force(struct sk_buff *skb) -{ - if (skb_dst_is_noref(skb)) { - WARN_ON(!rcu_read_lock_held()); - skb->_skb_refdst &= ~SKB_DST_NOREF; - dst_clone(skb_dst(skb)); - } -} - /** * dst_hold_safe - Take a reference on a dst if possible * @dst: pointer to dst entry @@ -356,6 +341,23 @@ static inline void skb_dst_force_safe(struct sk_buff *skb) } } +/** + * skb_dst_force - makes sure skb dst is refcounted + * @skb: buffer + * + * If dst is not yet refcounted, let's do it + */ +static inline void skb_dst_force(struct sk_buff *skb) +{ + if (skb_dst_is_noref(skb)) { + struct dst_entry *dst = skb_dst(skb); + + WARN_ON(!rcu_read_lock_held()); + if (!dst_hold_safe(dst)) + dst = NULL; + skb->_skb_refdst = (unsigned long)dst; + } +} /** * __skb_tunnel_rx - prepare skb for rx reinsert Thanks What is weird i have this part in my net-next from git: /** * skb_dst_force_safe - makes sure skb dst is refcounted * @skb: buffer * * If dst is not yet refcounted and not destroyed, grab a ref on it. */ static inline void skb_dst_force_safe(struct sk_buff *skb) { if (skb_dst_is_noref(skb)) { struct dst_entry *dst = skb_dst(skb); if (!dst_hold_safe(dst)) dst = NULL; skb->_skb_refdst = (unsigned long)dst; } } ok the difference is skb_dst_force_safe not skb_dst_force
Re: Latest net-next from GIT panic
W dniu 2017-09-21 o 13:03, Eric Dumazet pisze: On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote: W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: Thanks very much Pawel for the feedback. I was looking into the code (specifically IPv4 part) and found that in free_fib_info_rcu(), we call free_nh_exceptions() without holding the fnhe_lock. I am wondering if that could cause some race condition on fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the same dst could be happening. But as we call free_fib_info_rcu() only after the grace period, and the lookup code which could potentially modify fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems fine... Hi Pawel, Could you try the following debug patch on top of net-next branch and reproduce the issue check if there are warning msg showing? diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a352..82aff41c6f63 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } Thanks. Wei Yes, we believe skb_dst_force() and skb_dst_force_safe() should be unified (to the 'safe' version) We no longer have gc to protect from 0 -> 1 transition of dst refcount. After adding patch from Wei https://bugzilla.kernel.org/show_bug.cgi?id=197005#c14 OK we have two problems here 1) We need to unify skb_dst_force() ( for net tree ) 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from lower device. This will considerably help your performance. For 1), this is what I had in mind, can you try it ? Thanks a lot ! diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..f23851eeaad917e8dafc06b58d23a2575405c894 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb __skb_dst_copy(nskb, oskb->_skb_refdst); } -/** - * skb_dst_force - makes sure skb dst is refcounted - * @skb: buffer - * - * If dst is not yet refcounted, let's do it - */ -static inline void skb_dst_force(struct sk_buff *skb) -{ - if (skb_dst_is_noref(skb)) { - WARN_ON(!rcu_read_lock_held()); - skb->_skb_refdst &= ~SKB_DST_NOREF; - dst_clone(skb_dst(skb)); - } -} - /** * dst_hold_safe - Take a reference on a dst if possible * @dst: pointer to dst entry @@ -356,6 +341,23 @@ static inline void skb_dst_force_safe(struct sk_buff *skb) } } +/** + * skb_dst_force - makes sure skb dst is refcounted + * @skb: buffer + * + * If dst is not yet refcounted, let's do it + */ +static inline void skb_dst_force(struct sk_buff *skb) +{ + if (skb_dst_is_noref(skb)) { + struct dst_entry *dst = skb_dst(skb); + + WARN_ON(!rcu_read_lock_held()); + if (!dst_hold_safe(dst)) + dst = NULL; + skb->_skb_refdst = (unsigned long)dst; + } +} /** *__skb_tunnel_rx - prepare skb for rx reinsert Thanks What is weird i have this part in my net-next from git: /** * skb_dst_force_safe - makes sure skb dst is refcounted * @skb: buffer * * If dst is not yet refcounted and not destroyed, grab a ref on it. */ static inline void skb_dst_force_safe(struct sk_buff *skb) { if (skb_dst_is_noref(skb)) { struct dst_entry *dst = skb_dst(skb); if (!dst_hold_safe(dst)) dst = NULL; skb->_skb_refdst = (unsigned long)dst; } }
Re: Latest net-next from GIT panic
On Thu, 2017-09-21 at 11:06 +0200, Paweł Staszewski wrote: > > W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: > > On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: > >>> Thanks very much Pawel for the feedback. > >>> > >>> I was looking into the code (specifically IPv4 part) and found that in > >>> free_fib_info_rcu(), we call free_nh_exceptions() without holding the > >>> fnhe_lock. I am wondering if that could cause some race condition on > >>> fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the > >>> same dst could be happening. > >>> > >>> But as we call free_fib_info_rcu() only after the grace period, and > >>> the lookup code which could potentially modify > >>> fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems > >>> fine... > >>> > >> Hi Pawel, > >> > >> Could you try the following debug patch on top of net-next branch and > >> reproduce the issue check if there are warning msg showing? > >> > >> diff --git a/include/net/dst.h b/include/net/dst.h > >> index 93568bd0a352..82aff41c6f63 100644 > >> --- a/include/net/dst.h > >> +++ b/include/net/dst.h > >> @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry > >> *dst, unsigned long time) > >> static inline struct dst_entry *dst_clone(struct dst_entry *dst) > >> { > >> if (dst) > >> - atomic_inc(&dst->__refcnt); > >> + dst_hold(dst); > >> return dst; > >> } > >> > >> Thanks. > >> Wei > >> > > > > Yes, we believe skb_dst_force() and skb_dst_force_safe() should be > > unified (to the 'safe' version) > > > > We no longer have gc to protect from 0 -> 1 transition of dst refcount. > > > > > > > > > > After adding patch from Wei > https://bugzilla.kernel.org/show_bug.cgi?id=197005#c14 > OK we have two problems here 1) We need to unify skb_dst_force() ( for net tree ) 2) Vlan devices should try to correctly handle IFF_XMIT_DST_RELEASE from lower device. This will considerably help your performance. For 1), this is what I had in mind, can you try it ? Thanks a lot ! diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a3520bb7402f04d90cf04ac99c81cfbe..f23851eeaad917e8dafc06b58d23a2575405c894 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } @@ -311,21 +311,6 @@ static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb __skb_dst_copy(nskb, oskb->_skb_refdst); } -/** - * skb_dst_force - makes sure skb dst is refcounted - * @skb: buffer - * - * If dst is not yet refcounted, let's do it - */ -static inline void skb_dst_force(struct sk_buff *skb) -{ - if (skb_dst_is_noref(skb)) { - WARN_ON(!rcu_read_lock_held()); - skb->_skb_refdst &= ~SKB_DST_NOREF; - dst_clone(skb_dst(skb)); - } -} - /** * dst_hold_safe - Take a reference on a dst if possible * @dst: pointer to dst entry @@ -356,6 +341,23 @@ static inline void skb_dst_force_safe(struct sk_buff *skb) } } +/** + * skb_dst_force - makes sure skb dst is refcounted + * @skb: buffer + * + * If dst is not yet refcounted, let's do it + */ +static inline void skb_dst_force(struct sk_buff *skb) +{ + if (skb_dst_is_noref(skb)) { + struct dst_entry *dst = skb_dst(skb); + + WARN_ON(!rcu_read_lock_held()); + if (!dst_hold_safe(dst)) + dst = NULL; + skb->_skb_refdst = (unsigned long)dst; + } +} /** * __skb_tunnel_rx - prepare skb for rx reinsert
Re: Latest net-next from GIT panic
W dniu 2017-09-21 o 03:17, Eric Dumazet pisze: On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: Thanks very much Pawel for the feedback. I was looking into the code (specifically IPv4 part) and found that in free_fib_info_rcu(), we call free_nh_exceptions() without holding the fnhe_lock. I am wondering if that could cause some race condition on fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the same dst could be happening. But as we call free_fib_info_rcu() only after the grace period, and the lookup code which could potentially modify fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems fine... Hi Pawel, Could you try the following debug patch on top of net-next branch and reproduce the issue check if there are warning msg showing? diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a352..82aff41c6f63 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } Thanks. Wei Yes, we believe skb_dst_force() and skb_dst_force_safe() should be unified (to the 'safe' version) We no longer have gc to protect from 0 -> 1 transition of dst refcount. After adding patch from Wei https://bugzilla.kernel.org/show_bug.cgi?id=197005#c14
Re: Latest net-next from GIT panic
On Wed, 2017-09-20 at 18:09 -0700, Wei Wang wrote: > > Thanks very much Pawel for the feedback. > > > > I was looking into the code (specifically IPv4 part) and found that in > > free_fib_info_rcu(), we call free_nh_exceptions() without holding the > > fnhe_lock. I am wondering if that could cause some race condition on > > fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the > > same dst could be happening. > > > > But as we call free_fib_info_rcu() only after the grace period, and > > the lookup code which could potentially modify > > fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems > > fine... > > > > Hi Pawel, > > Could you try the following debug patch on top of net-next branch and > reproduce the issue check if there are warning msg showing? > > diff --git a/include/net/dst.h b/include/net/dst.h > index 93568bd0a352..82aff41c6f63 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry > *dst, unsigned long time) > static inline struct dst_entry *dst_clone(struct dst_entry *dst) > { > if (dst) > - atomic_inc(&dst->__refcnt); > + dst_hold(dst); > return dst; > } > > Thanks. > Wei > Yes, we believe skb_dst_force() and skb_dst_force_safe() should be unified (to the 'safe' version) We no longer have gc to protect from 0 -> 1 transition of dst refcount.
Re: Latest net-next from GIT panic
> Thanks very much Pawel for the feedback. > > I was looking into the code (specifically IPv4 part) and found that in > free_fib_info_rcu(), we call free_nh_exceptions() without holding the > fnhe_lock. I am wondering if that could cause some race condition on > fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the > same dst could be happening. > > But as we call free_fib_info_rcu() only after the grace period, and > the lookup code which could potentially modify > fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems > fine... > Hi Pawel, Could you try the following debug patch on top of net-next branch and reproduce the issue check if there are warning msg showing? diff --git a/include/net/dst.h b/include/net/dst.h index 93568bd0a352..82aff41c6f63 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -271,7 +271,7 @@ static inline void dst_use_noref(struct dst_entry *dst, unsigned long time) static inline struct dst_entry *dst_clone(struct dst_entry *dst) { if (dst) - atomic_inc(&dst->__refcnt); + dst_hold(dst); return dst; } Thanks. Wei On Wed, Sep 20, 2017 at 3:09 PM, Wei Wang wrote: bisected again and same result: b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 Author: Wei Wang Date: Sat Jun 17 10:42:32 2017 -0700 ipv4: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang Acked-by: Martin KaFai Lau Signed-off-by: David S. Miller :04 04 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net Will add now version 2 of patch from Eric and we will see >>> after adding patch >>> perf top catch >>>PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz cycles], >>> (all, 40 CPUs) >>> >>> --- >>> >>> 60.95% [kernel][k] dev_put.part.6 >>> 4.00% [kernel][k] ixgbe_poll >>> 3.63% [kernel][k] irq_entries_start >>> 1.22% [kernel][k] fib_table_lookup >>> 1.15% [kernel][k] do_raw_spin_lock >>> 1.05% [kernel][k] ixgbe_xmit_frame_ring >>> 1.04% [kernel][k] lookup >>> 0.87% [kernel][k] eth_type_trans >>> >>> >>> no panic on console - rebooting to check logs >>> >>> >> Nothing logged >> > > Thanks very much Pawel for the feedback. > > I was looking into the code (specifically IPv4 part) and found that in > free_fib_info_rcu(), we call free_nh_exceptions() without holding the > fnhe_lock. I am wondering if that could cause some race condition on > fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the > same dst could be happening. > > But as we call free_fib_info_rcu() only after the grace period, and > the lookup code which could potentially modify > fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems > fine... > > > On Wed, Sep 20, 2017 at 2:25 PM, Paweł Staszewski > wrote: >> >> >> W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze: >> >>> >>> >>> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: > > > > W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: >> >> >> >> W dniu 2017-09-20 o 20:36, Cong Wang pisze: >>> >>> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet >>> wrote: On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: > > but dmesg at this time shows nothing about interfaces or flaps. > > This is very odd. > > We only free netdevice in free_netdev() and it is only called when > we unregister a netdevice. Otherwise pcpu_refcnt is impossible > to be NULL. If there is a missing dev_hold() or one dev_put() in excess, this would allow the netdev to be freed too soon. -> Use after free. memory holding netdev could be reallocated-cleared by some other kernel user. >>> Sure, but only unregister could trigger a free. If there is no >>> unregister, >>> like what Pawel claims, then there is no free, the refcnt just goes t
Re: Latest net-next from GIT panic
>>> bisected again and same result: >>> b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit >>> commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 >>> Author: Wei Wang >>> Date: Sat Jun 17 10:42:32 2017 -0700 >>> >>> ipv4: mark DST_NOGC and remove the operation of dst_free() >>> >>> With the previous preparation patches, we are ready to get rid of the >>> dst gc operation in ipv4 code and release dst based on refcnt only. >>> So this patch adds DST_NOGC flag for all IPv4 dst and remove the >>> calls >>> to dst_free(). >>> At this point, all dst created in ipv4 code do not use the dst gc >>> anymore and will be destroyed at the point when refcnt drops to 0. >>> >>> Signed-off-by: Wei Wang >>> Acked-by: Martin KaFai Lau >>> Signed-off-by: David S. Miller >>> >>> :04 04 9b7e7fb641de6531fc7887473ca47ef7cb6a11da >>> 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net >>> >>> Will add now version 2 of patch from Eric and we will see >>> >>> >> after adding patch >> perf top catch >>PerfTop: 77159 irqs/sec kernel:99.7% exact: 0.0% [4000Hz cycles], >> (all, 40 CPUs) >> >> --- >> >> 60.95% [kernel][k] dev_put.part.6 >> 4.00% [kernel][k] ixgbe_poll >> 3.63% [kernel][k] irq_entries_start >> 1.22% [kernel][k] fib_table_lookup >> 1.15% [kernel][k] do_raw_spin_lock >> 1.05% [kernel][k] ixgbe_xmit_frame_ring >> 1.04% [kernel][k] lookup >> 0.87% [kernel][k] eth_type_trans >> >> >> no panic on console - rebooting to check logs >> >> > Nothing logged > Thanks very much Pawel for the feedback. I was looking into the code (specifically IPv4 part) and found that in free_fib_info_rcu(), we call free_nh_exceptions() without holding the fnhe_lock. I am wondering if that could cause some race condition on fnhe->fnhe_rth_input/output so a double call on dst_dev_put() on the same dst could be happening. But as we call free_fib_info_rcu() only after the grace period, and the lookup code which could potentially modify fnhe->fnhe_rth_input/output all holds rcu_read_lock(), it seems fine... On Wed, Sep 20, 2017 at 2:25 PM, Paweł Staszewski wrote: > > > W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze: > >> >> >> W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: >>> >>> >>> >>> W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: > > > > W dniu 2017-09-20 o 20:36, Cong Wang pisze: >> >> On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet >> wrote: >>> >>> On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We only free netdevice in free_netdev() and it is only called when we unregister a netdevice. Otherwise pcpu_refcnt is impossible to be NULL. >>> >>> If there is a missing dev_hold() or one dev_put() in excess, >>> this would allow the netdev to be freed too soon. >>> >>> -> Use after free. >>> memory holding netdev could be reallocated-cleared by some other >>> kernel >>> user. >>> >> Sure, but only unregister could trigger a free. If there is no >> unregister, >> like what Pawel claims, then there is no free, the refcnt just goes to >> 0 but the memory is still there. >> > About possible mistake from my side with bisect - i can judge too early > that some bisect was good > the road was: > git bisect start > # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag > 'pinctrl-v4.13-1' of > git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl > git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 > # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' > of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using > stack larger than 1024. > git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f > # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch > 'udp-reduce-cache-pressure' > git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 > # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch > 's390-net-updates-part-2' > git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 > # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch > 'bpf-ctx-narrow' > git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 > #
Re: Latest net-next from GIT panic
W dniu 2017-09-20 o 23:25, Paweł Staszewski pisze: W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze: W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We only free netdevice in free_netdev() and it is only called when we unregister a netdevice. Otherwise pcpu_refcnt is impossible to be NULL. If there is a missing dev_hold() or one dev_put() in excess, this would allow the netdev to be freed too soon. -> Use after free. memory holding netdev could be reallocated-cleared by some other kernel user. Sure, but only unregister could trigger a free. If there is no unregister, like what Pawel claims, then there is no free, the refcnt just goes to 0 but the memory is still there. About possible mistake from my side with bisect - i can judge too early that some bisect was good the road was: git bisect start # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 's390-net-updates-part-2' git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 'bpf-ctx-narrow' git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove cp_outgoing git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put() And currently have this running for about 4 hours without problems. git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove DST_NOCACHE flag Here for sure - panic git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree im not 100% sure tor last two Will test them again starting from [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() What i can say more I can reproduce this on any server with similar configuration the difference can be teamd instead of bonding ixgbe or i40e and mlx5 Same problems vlans - more or less prefixes learned from bgp -> zebra -> netlink -> kernel But normally in lab when using only plain routing no bgpd and about 128 vlans - with 128 routes - cant reproduce this - this apperas only with bgp - minimum where i can reproduce this was about 130k prefixes with about 286 nexthops bisected again and same result: b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 Author: Wei Wang Date: Sat Jun 17 10:42:32 2017 -0700 ipv4: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang Acked-by: Martin KaFai Lau Signed-off-
Re: Latest net-next from GIT panic
W dniu 2017-09-20 o 23:24, Paweł Staszewski pisze: W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We only free netdevice in free_netdev() and it is only called when we unregister a netdevice. Otherwise pcpu_refcnt is impossible to be NULL. If there is a missing dev_hold() or one dev_put() in excess, this would allow the netdev to be freed too soon. -> Use after free. memory holding netdev could be reallocated-cleared by some other kernel user. Sure, but only unregister could trigger a free. If there is no unregister, like what Pawel claims, then there is no free, the refcnt just goes to 0 but the memory is still there. About possible mistake from my side with bisect - i can judge too early that some bisect was good the road was: git bisect start # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 's390-net-updates-part-2' git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 'bpf-ctx-narrow' git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove cp_outgoing git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put() And currently have this running for about 4 hours without problems. git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove DST_NOCACHE flag Here for sure - panic git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree im not 100% sure tor last two Will test them again starting from [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() What i can say more I can reproduce this on any server with similar configuration the difference can be teamd instead of bonding ixgbe or i40e and mlx5 Same problems vlans - more or less prefixes learned from bgp -> zebra -> netlink -> kernel But normally in lab when using only plain routing no bgpd and about 128 vlans - with 128 routes - cant reproduce this - this apperas only with bgp - minimum where i can reproduce this was about 130k prefixes with about 286 nexthops bisected again and same result: b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 Author: Wei Wang Date: Sat Jun 17 10:42:32 2017 -0700 ipv4: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang Acked-by: Martin KaFai Lau Signed-off-by: David S. Miller :04 04 9b7e7fb641de6531f
Re: Latest net-next from GIT panic
W dniu 2017-09-20 o 23:10, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We only free netdevice in free_netdev() and it is only called when we unregister a netdevice. Otherwise pcpu_refcnt is impossible to be NULL. If there is a missing dev_hold() or one dev_put() in excess, this would allow the netdev to be freed too soon. -> Use after free. memory holding netdev could be reallocated-cleared by some other kernel user. Sure, but only unregister could trigger a free. If there is no unregister, like what Pawel claims, then there is no free, the refcnt just goes to 0 but the memory is still there. About possible mistake from my side with bisect - i can judge too early that some bisect was good the road was: git bisect start # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 's390-net-updates-part-2' git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 'bpf-ctx-narrow' git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove cp_outgoing git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put() And currently have this running for about 4 hours without problems. git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove DST_NOCACHE flag Here for sure - panic git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree im not 100% sure tor last two Will test them again starting from [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() What i can say more I can reproduce this on any server with similar configuration the difference can be teamd instead of bonding ixgbe or i40e and mlx5 Same problems vlans - more or less prefixes learned from bgp -> zebra -> netlink -> kernel But normally in lab when using only plain routing no bgpd and about 128 vlans - with 128 routes - cant reproduce this - this apperas only with bgp - minimum where i can reproduce this was about 130k prefixes with about 286 nexthops bisected again and same result: b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 Author: Wei Wang Date: Sat Jun 17 10:42:32 2017 -0700 ipv4: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang Acked-by: Martin KaFai Lau Signed-off-by: David S. Miller :04 04 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 831a73b71d3df1755f3e24c0d3c86d7a
Re: Latest net-next from GIT panic
W dniu 2017-09-20 o 21:23, Paweł Staszewski pisze: W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We only free netdevice in free_netdev() and it is only called when we unregister a netdevice. Otherwise pcpu_refcnt is impossible to be NULL. If there is a missing dev_hold() or one dev_put() in excess, this would allow the netdev to be freed too soon. -> Use after free. memory holding netdev could be reallocated-cleared by some other kernel user. Sure, but only unregister could trigger a free. If there is no unregister, like what Pawel claims, then there is no free, the refcnt just goes to 0 but the memory is still there. About possible mistake from my side with bisect - i can judge too early that some bisect was good the road was: git bisect start # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 's390-net-updates-part-2' git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 'bpf-ctx-narrow' git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove cp_outgoing git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put() And currently have this running for about 4 hours without problems. git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove DST_NOCACHE flag Here for sure - panic git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree im not 100% sure tor last two Will test them again starting from [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() What i can say more I can reproduce this on any server with similar configuration the difference can be teamd instead of bonding ixgbe or i40e and mlx5 Same problems vlans - more or less prefixes learned from bgp -> zebra -> netlink -> kernel But normally in lab when using only plain routing no bgpd and about 128 vlans - with 128 routes - cant reproduce this - this apperas only with bgp - minimum where i can reproduce this was about 130k prefixes with about 286 nexthops bisected again and same result: b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 Author: Wei Wang Date: Sat Jun 17 10:42:32 2017 -0700 ipv4: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang Acked-by: Martin KaFai Lau Signed-off-by: David S. Miller :04 04 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net Will add now version 2 of patch fr
Re: Latest net-next from GIT panic
W dniu 2017-09-20 o 21:13, Paweł Staszewski pisze: W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We only free netdevice in free_netdev() and it is only called when we unregister a netdevice. Otherwise pcpu_refcnt is impossible to be NULL. If there is a missing dev_hold() or one dev_put() in excess, this would allow the netdev to be freed too soon. -> Use after free. memory holding netdev could be reallocated-cleared by some other kernel user. Sure, but only unregister could trigger a free. If there is no unregister, like what Pawel claims, then there is no free, the refcnt just goes to 0 but the memory is still there. About possible mistake from my side with bisect - i can judge too early that some bisect was good the road was: git bisect start # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 's390-net-updates-part-2' git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 'bpf-ctx-narrow' git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove cp_outgoing git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put() And currently have this running for about 4 hours without problems. git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove DST_NOCACHE flag Here for sure - panic git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree im not 100% sure tor last two Will test them again starting from [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() What i can say more I can reproduce this on any server with similar configuration the difference can be teamd instead of bonding ixgbe or i40e and mlx5 Same problems vlans - more or less prefixes learned from bgp -> zebra -> netlink -> kernel But normally in lab when using only plain routing no bgpd and about 128 vlans - with 128 routes - cant reproduce this - this apperas only with bgp - minimum where i can reproduce this was about 130k prefixes with about 286 nexthops
Re: Latest net-next from GIT panic
W dniu 2017-09-20 o 20:36, Cong Wang pisze: On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We only free netdevice in free_netdev() and it is only called when we unregister a netdevice. Otherwise pcpu_refcnt is impossible to be NULL. If there is a missing dev_hold() or one dev_put() in excess, this would allow the netdev to be freed too soon. -> Use after free. memory holding netdev could be reallocated-cleared by some other kernel user. Sure, but only unregister could trigger a free. If there is no unregister, like what Pawel claims, then there is no free, the refcnt just goes to 0 but the memory is still there. About possible mistake from my side with bisect - i can judge too early that some bisect was good the road was: git bisect start # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 's390-net-updates-part-2' git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 'bpf-ctx-narrow' git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove cp_outgoing git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put() And currently have this running for about 4 hours without problems. git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove DST_NOCACHE flag Here for sure - panic git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree im not 100% sure tor last two Will test them again starting from [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free()
Re: Latest net-next from GIT panic
On Wed, Sep 20, 2017 at 11:30 AM, Eric Dumazet wrote: > On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: >> but dmesg at this time shows nothing about interfaces or flaps. >> >> This is very odd. >> >> We only free netdevice in free_netdev() and it is only called when >> we unregister a netdevice. Otherwise pcpu_refcnt is impossible >> to be NULL. > > If there is a missing dev_hold() or one dev_put() in excess, > this would allow the netdev to be freed too soon. > > -> Use after free. > memory holding netdev could be reallocated-cleared by some other kernel > user. > Sure, but only unregister could trigger a free. If there is no unregister, like what Pawel claims, then there is no free, the refcnt just goes to 0 but the memory is still there.
Re: Latest net-next from GIT panic
On Wed, 2017-09-20 at 11:22 -0700, Cong Wang wrote: > but dmesg at this time shows nothing about interfaces or flaps. > > This is very odd. > > We only free netdevice in free_netdev() and it is only called when > we unregister a netdevice. Otherwise pcpu_refcnt is impossible > to be NULL. If there is a missing dev_hold() or one dev_put() in excess, this would allow the netdev to be freed too soon. -> Use after free. memory holding netdev could be reallocated-cleared by some other kernel user.
Re: Latest net-next from GIT panic
On Wed, Sep 20, 2017 at 10:55 AM, Paweł Staszewski wrote: > > > W dniu 2017-09-20 o 19:50, Cong Wang pisze: > > On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet > wrote: > > Sorry for top-posting, but this is to give context to Wei, since Pawel > used a top posting way to report his bisection. > > Wei, can you take a look at Pawel report ? > > Crash happens in dst_destroy() at following : > > if (dst->dev) > dev_put(dst->dev); <> > > > dst->dev is not NULL, but netdev->pcpu_refcnt is NULL > > 65 ff 08decl %gs:(%rax) // CRASH since rax = NULL > > > > Pawel, please share your netdevices and routing setup ? > > Looks like a double dev_put() on some dev... > > Pawel, do you have any idea how this is triggered? Does your > test try to remove some network device? If so which one? > I noticed you have at least multiple vlan, bond and ixgbe > devices. > > Just after i start bgp sessions > So when host is starting i have all bgp sessions to upstreams shutdown > > To trigger panic i just enable all 6x bgp sessions at once to upstreams - > and zebra is start to pull prefixes and push them to the kernel > > Then some traffic is generated from test hosts thru this backup router and > panic is generated - every time after 10 to 15 seconds after bgp sessions > are connected. > > I'm not removing any interface at this time or do anything with interfaces - > just wait. > > And yes there are vlans attached to the bond devices > but dmesg at this time shows nothing about interfaces or flaps. This is very odd. We only free netdevice in free_netdev() and it is only called when we unregister a netdevice. Otherwise pcpu_refcnt is impossible to be NULL.
Re: Latest net-next from GIT panic
W dniu 2017-09-20 o 19:46, Wei Wang pisze: This is why I suggested to replace the BUG() in another mail So : diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { - this_cpu_dec(*dev->pcpu_refcnt); + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); + + if (!pref) { + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", + dev, dev->name, dev->reg_state, dev->dismantle); + for (;;) + cpu_relax(); + } + this_cpu_dec(*pref); } /** Thanks a lot Eric for the debug patch. Pawel, I want to confirm with you about the last good commit when you did bisection. You mentioned: And the last one git bisect good Bisecting: 1 revision left to test after this (roughly 1 step) [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree With this have kernel panic same as always git bisect bad Bisecting: 0 revisions left to test after this (roughly 0 steps) [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() So it breaks right at: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() Right? If you sync the image to one commit before the above one: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly Does it crash? Later today i will repeat last three steps - in about next 3 hours after rush hours of internet traffic - now i cant touch backup router :) And could you confirm that your config does not have any IPv6 addresses or routes configured? There is ipv6 enabled And yes there are some ipv6 ip's One interface have ipv6 enabled with one static route but no ipv6 bgp sessions - so nt many ipv6 prefixes and ipv6 fib is almost empty ip -6 r ls | wc -l 57 Thanks. Wei 6:03 +0200, Paweł Staszewski wrote: Nit much more after adding this patch https://bugzilla.kernel.org/attachment.cgi?id=258529 This is why I suggested to replace the BUG() in another mail So : diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { - this_cpu_dec(*dev->pcpu_refcnt); + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); + + if (!pref) { + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", + dev, dev->name, dev->reg_state, dev->dismantle); + for (;;) + cpu_relax(); + } + this_cpu_dec(*pref); } /** Full panic https://bugzilla.kernel.org/attachment.cgi?id=258531 I will change patch and apply but later today cause now cant use backup router as testlab - Internet rush hours if something happens this will be bed when second router will have bugged kernel :)
Re: Latest net-next from GIT panic
On Wed, 2017-09-20 at 10:50 -0700, Cong Wang wrote: > On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet wrote: > > Sorry for top-posting, but this is to give context to Wei, since Pawel > > used a top posting way to report his bisection. > > > > Wei, can you take a look at Pawel report ? > > > > Crash happens in dst_destroy() at following : > > > > if (dst->dev) > > dev_put(dst->dev); <> > > > > > > dst->dev is not NULL, but netdev->pcpu_refcnt is NULL > > > > 65 ff 08decl %gs:(%rax) // CRASH since rax = NULL > > > > > > > > Pawel, please share your netdevices and routing setup ? > > Looks like a double dev_put() on some dev... > > Pawel, do you have any idea how this is triggered? Does your > test try to remove some network device? If so which one? > I noticed you have at least multiple vlan, bond and ixgbe > devices. Or a missing dev_hold() somewhere.
Re: Latest net-next from GIT panic
On Wed, Sep 20, 2017 at 6:11 AM, Eric Dumazet wrote: > Sorry for top-posting, but this is to give context to Wei, since Pawel > used a top posting way to report his bisection. > > Wei, can you take a look at Pawel report ? > > Crash happens in dst_destroy() at following : > > if (dst->dev) > dev_put(dst->dev); <> > > > dst->dev is not NULL, but netdev->pcpu_refcnt is NULL > > 65 ff 08decl %gs:(%rax) // CRASH since rax = NULL > > > > Pawel, please share your netdevices and routing setup ? Looks like a double dev_put() on some dev... Pawel, do you have any idea how this is triggered? Does your test try to remove some network device? If so which one? I noticed you have at least multiple vlan, bond and ixgbe devices.
Re: Latest net-next from GIT panic
>> This is why I suggested to replace the BUG() in another mail >> >> So : >> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index >> f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 >> 100644 >> --- a/include/linux/netdevice.h >> +++ b/include/linux/netdevice.h >> @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); >>*/ >> static inline void dev_put(struct net_device *dev) >> { >> - this_cpu_dec(*dev->pcpu_refcnt); >> + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); >> + >> + if (!pref) { >> + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle >> %d\n", >> + dev, dev->name, dev->reg_state, dev->dismantle); >> + for (;;) >> + cpu_relax(); >> + } >> + this_cpu_dec(*pref); >> } >> /** >> Thanks a lot Eric for the debug patch. Pawel, I want to confirm with you about the last good commit when you did bisection. You mentioned: > And the last one > > git bisect good > Bisecting: 1 revision left to test after this (roughly 1 step) > [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for > insertion into fib6 tree > > With this have kernel panic same as always > > git bisect bad > Bisecting: 0 revisions left to test after this (roughly 0 steps) > [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and > remove the operation of dst_free() So it breaks right at: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() Right? If you sync the image to one commit before the above one: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly Does it crash? And could you confirm that your config does not have any IPv6 addresses or routes configured? Thanks. Wei 6:03 +0200, Paweł Staszewski wrote: >>> >>> Nit much more after adding this patch >>> >>> https://bugzilla.kernel.org/attachment.cgi?id=258529 >>> >> This is why I suggested to replace the BUG() in another mail >> >> So : >> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index >> f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 >> 100644 >> --- a/include/linux/netdevice.h >> +++ b/include/linux/netdevice.h >> @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); >>*/ >> static inline void dev_put(struct net_device *dev) >> { >> - this_cpu_dec(*dev->pcpu_refcnt); >> + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); >> + >> + if (!pref) { >> + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle >> %d\n", >> + dev, dev->name, dev->reg_state, dev->dismantle); >> + for (;;) >> + cpu_relax(); >> + } >> + this_cpu_dec(*pref); >> } >> /** >> >> >> > > Full panic > > https://bugzilla.kernel.org/attachment.cgi?id=258531 > > > I will change patch and apply but later today cause now cant use backup > router as testlab - Internet rush hours if something happens this will be > bed when second router will have bugged kernel :) > >
Re: Latest net-next from GIT panic
W dniu 2017-09-20 o 16:40, Eric Dumazet pisze: On Wed, 2017-09-20 at 16:03 +0200, Paweł Staszewski wrote: Nit much more after adding this patch https://bugzilla.kernel.org/attachment.cgi?id=258529 This is why I suggested to replace the BUG() in another mail So : diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { - this_cpu_dec(*dev->pcpu_refcnt); + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); + + if (!pref) { + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", + dev, dev->name, dev->reg_state, dev->dismantle); + for (;;) + cpu_relax(); + } + this_cpu_dec(*pref); } /** Full panic https://bugzilla.kernel.org/attachment.cgi?id=258531 I will change patch and apply but later today cause now cant use backup router as testlab - Internet rush hours if something happens this will be bed when second router will have bugged kernel :)
Re: Latest net-next from GIT panic
On Wed, 2017-09-20 at 16:03 +0200, Paweł Staszewski wrote: > Nit much more after adding this patch > > https://bugzilla.kernel.org/attachment.cgi?id=258529 > This is why I suggested to replace the BUG() in another mail So : diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..220cd12456754876edf2d3ef13195e82d70d5c74 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,15 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { - this_cpu_dec(*dev->pcpu_refcnt); + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); + + if (!pref) { + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", + dev, dev->name, dev->reg_state, dev->dismantle); + for (;;) + cpu_relax(); + } + this_cpu_dec(*pref); } /**
Re: Latest net-next from GIT panic
Nit much more after adding this patch https://bugzilla.kernel.org/attachment.cgi?id=258529 W dniu 2017-09-20 o 15:44, Eric Dumazet pisze: On Wed, 2017-09-20 at 15:39 +0200, Paweł Staszewski wrote: W dniu 2017-09-20 o 15:34, Eric Dumazet pisze: Could you try this debug patch ? diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,14 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { - this_cpu_dec(*dev->pcpu_refcnt); + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); + + if (!pref) { + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", + dev, dev->name, dev->reg_state, dev->dismantle); + BUG(); + } + this_cpu_dec(*pref); } /** You want me to add this patch to what kernel version ? currently im after git bisect reset - so mainline stable Simply us the latest net-next as mentioned in the thread title, thanks.
Re: Latest net-next from GIT panic
On Wed, 2017-09-20 at 15:39 +0200, Paweł Staszewski wrote: > > W dniu 2017-09-20 o 15:34, Eric Dumazet pisze: > > Could you try this debug patch ? > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index > > f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 > > 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -3331,7 +3331,14 @@ void netdev_run_todo(void); > >*/ > > static inline void dev_put(struct net_device *dev) > > { > > - this_cpu_dec(*dev->pcpu_refcnt); > > + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); > > + > > + if (!pref) { > > + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", > > + dev, dev->name, dev->reg_state, dev->dismantle); > > + BUG(); > > + } > > + this_cpu_dec(*pref); > > } > > > > /** > > > > > > > > You want me to add this patch to what kernel version ? > currently im after git bisect reset - so mainline stable > Simply us the latest net-next as mentioned in the thread title, thanks.
Re: Latest net-next from GIT panic
W dniu 2017-09-20 o 15:34, Eric Dumazet pisze: Could you try this debug patch ? diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,14 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { - this_cpu_dec(*dev->pcpu_refcnt); + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); + + if (!pref) { + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", + dev, dev->name, dev->reg_state, dev->dismantle); + BUG(); + } + this_cpu_dec(*pref); } /** You want me to add this patch to what kernel version ? currently im after git bisect reset - so mainline stable
Re: Latest net-next from GIT panic
On Wed, 2017-09-20 at 06:34 -0700, Eric Dumazet wrote: > Could you try this debug patch ? > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index > f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 > 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -3331,7 +3331,14 @@ void netdev_run_todo(void); > */ > static inline void dev_put(struct net_device *dev) > { > - this_cpu_dec(*dev->pcpu_refcnt); > + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); > + > + if (!pref) { > + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", > +dev, dev->name, dev->reg_state, dev->dismantle); > + BUG(); > + } > + this_cpu_dec(*pref); > } > > /** > And since the console will be filled by stack trace, maybe instead of BUG() use some infinite loop ? for (;;) cpu_relax();
Re: Latest net-next from GIT panic
Could you try this debug patch ? diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1dfe36934c2abba4e43d053ac5d6f..1eaa3553a724dc8c048f67b556337072d5addc82 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3331,7 +3331,14 @@ void netdev_run_todo(void); */ static inline void dev_put(struct net_device *dev) { - this_cpu_dec(*dev->pcpu_refcnt); + int __percpu *pref = READ_ONCE(dev->pcpu_refcnt); + + if (!pref) { + pr_err("no pcpu_refcnt on dev %p(%s) state %d dismantle %d\n", + dev, dev->name, dev->reg_state, dev->dismantle); + BUG(); + } + this_cpu_dec(*pref); } /**
Re: Latest net-next from GIT panic
Yes sorry for top-posting also. Configuration: Ethernet devices: lspci | grep Etherne 02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 04:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 04:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 07:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 07:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 81:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 81:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 83:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) 83:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01) ip l 1: lo: mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp2s0f0: mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 8192 link/ether 00:25:90:e4:97:9a brd ff:ff:ff:ff:ff:ff 3: enp2s0f1: mtu 1500 qdisc mq state DOWN mode DEFAULT qlen 8192 link/ether 00:25:90:e4:97:9b brd ff:ff:ff:ff:ff:ff 4: enp4s0f0: mtu 1500 qdisc mq master bond1 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 5: enp4s0f1: mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 6: enp7s0f0: mtu 1500 qdisc mq master bond1 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 7: enp7s0f1: mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 8: enp129s0f0: mtu 1500 qdisc mq master bond1 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 9: enp129s0f1: mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 10: enp131s0f0: mtu 1500 qdisc mq master bond1 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 11: enp131s0f1: mtu 1500 qdisc mq master bond0 state UP mode DEFAULT qlen 8192 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 12: sit0@NONE: mtu 1480 qdisc noop state DOWN mode DEFAULT qlen 1000 link/sit 0.0.0.0 brd 0.0.0.0 13: bond0: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 14: bond1: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 15: vlan4091@bond0: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 16: vlan4032@bond0: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 17: vlan514@bond0: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 18: vlan87@bond0: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 19: vlan518@bond1: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 20: vlan646@bond1: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:68 brd ff:ff:ff:ff:ff:ff 21: vlan370@bond0: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 22: vlan3212@bond0: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff 23: vlan746@bond0: mtu 1500 qdisc noqueue state UP mode DEFAULT qlen 1000 link/ether 0c:c4:7a:bc:b8:69 brd ff:ff:ff:ff:ff:ff There are bonds: cat /proc/net/bonding/bond0 Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) Bonding Mode: load balancing (round-robin) MII Status: up MII Polling Interval (ms): 100 Up Delay (ms): 0 Down Delay (ms): 0 Slave Interface: enp4s0f1 MII Status: up Speed: 1 Mbps Duplex: full Link Failure Count: 1 Permanent HW addr: 0c:c4:7a:bc:b8:69 Slave queue ID: 0 Slave Interface: enp7s0f1 MII Status: up Speed: 1 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:25:90:e3:dd:9d Slave queue ID: 0 Slave Interface: enp129s0f1 MII Status: up Speed: 1 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 00:25:90:e3:da:e1 Slave queue ID: 0 Slave Interface: enp131s0f1 MII Status: up Speed: 1 Mbps Duplex: full Link Failure Count: 0 Permanent HW addr: 0c:c4:7a:bc:b1:fd Slave queue ID: 0 cat /proc/net/bonding/bond1 Ethernet Channel Bonding Driv
Re: Latest net-next from GIT panic
So far path for bisect was: git bisect start # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 # bad: [8abd5599a520e9f188a750f1bde9dde5fb856230] Merge branch 's390-net-updates-part-2' git bisect bad 8abd5599a520e9f188a750f1bde9dde5fb856230 # good: [2fae5d0e647c6470d206e72b5fc24972bb900f70] Merge branch 'bpf-ctx-narrow' git bisect good 2fae5d0e647c6470d206e72b5fc24972bb900f70 # good: [41500c3e2a19ffcf40a7158fce1774de08e26ba2] rds: tcp: remove cp_outgoing git bisect good 41500c3e2a19ffcf40a7158fce1774de08e26ba2 # bad: [8917a777be3ba566377be05117f71b93a5fd909d] tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix git bisect bad 8917a777be3ba566377be05117f71b93a5fd909d # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put() git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 # bad: [a4c2fd7f78915a0d7c5275e7612e7793157a01f2] net: remove DST_NOCACHE flag git bisect bad a4c2fd7f78915a0d7c5275e7612e7793157a01f2 # bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112 No PANIC # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f PANIC # bad: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree git bisect bad 1cfb71eeb12047bcdbd3e6730ffed66e810a0855 PANIC # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() W dniu 2017-09-20 o 15:05, Paweł Staszewski pisze: hmm But after b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 Author: Wei Wang Date: Sat Jun 17 10:42:32 2017 -0700 ipv4: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang Acked-by: Martin KaFai Lau Signed-off-by: David S. Miller :04 04 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net Still panic - soo will back to past 3 steps and will try to get again bisect without panic. W dniu 2017-09-20 o 14:49, Paweł Staszewski pisze: And the last one git bisect good Bisecting: 1 revision left to test after this (roughly 1 step) [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree With this have kernel panic same as always git bisect bad Bisecting: 0 revisions left to test after this (roughly 0 steps) [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze: Almost there Bisecting: 6 revisions left to test after this (roughly 3 steps) [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: Ok resumed and soo far: Panic: # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f No panic: # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: Soo far bisected and marked: git bisect start # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c # bad: [ac7b7596
Re: Latest net-next from GIT panic
Sorry for top-posting, but this is to give context to Wei, since Pawel used a top posting way to report his bisection. Wei, can you take a look at Pawel report ? Crash happens in dst_destroy() at following : if (dst->dev) dev_put(dst->dev); <> dst->dev is not NULL, but netdev->pcpu_refcnt is NULL 65 ff 08decl %gs:(%rax) // CRASH since rax = NULL Pawel, please share your netdevices and routing setup ? Thanks ! On Wed, 2017-09-20 at 14:49 +0200, Paweł Staszewski wrote: > And the last one > > git bisect good > Bisecting: 1 revision left to test after this (roughly 1 step) > [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for > insertion into fib6 tree > > With this have kernel panic same as always > > git bisect bad > Bisecting: 0 revisions left to test after this (roughly 0 steps) > [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and > remove the operation of dst_free() > > > > W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze: > > Almost there > > > > Bisecting: 6 revisions left to test after this (roughly 3 steps) > > [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() > > properly > > > > > > > > W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: > >> Ok resumed and soo far: > >> > >> Panic: > >> > >> # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid > >> using stack larger than 1024. > >> git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f > >> > >> No panic: > >> > >> # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch > >> 'udp-reduce-cache-pressure' > >> git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 > >> > >> > >> W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: > >>> Soo far bisected and marked: > >>> > >>> git bisect start > >>> # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 > >>> git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 > >>> # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 > >>> git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 > >>> # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 > >>> git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c > >>> # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag > >>> 'pinctrl-v4.13-1' of > >>> git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl > >>> git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 > >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch > >>> 'next' of > >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch > >>> 'next' of > >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > >>> # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch > >>> 'next' of > >>> git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security > >>> git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 > >>> > >>> > >>> > >>> W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: > Ok kernel crashed with different panic that i didnt catch when i > was doing bisect and now my bisection is broken :) > > git bisect good > Bisecting: 1787 revisions left to test after this (roughly 11 steps) > error: Your local changes to the following files would be > overwritten by checkout: > Documentation/00-INDEX > Documentation/ABI/stable/sysfs-class-udc > Documentation/ABI/testing/configfs-usb-gadget-uac1 > Documentation/ABI/testing/ima_policy > Documentation/ABI/testing/sysfs-bus-iio > Documentation/ABI/testing/sysfs-bus-iio-meas-spec > Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 > Documentation/ABI/testing/sysfs-class-net > Documentation/ABI/testing/sysfs-class-power-twl4030 > Documentation/ABI/testing/sysfs-class-typec > Documentation/DMA-API.txt > Documentation/IRQ-domain.txt > Documentation/Makefile > Documentation/PCI/MSI-HOWTO.txt > Documentation/RCU/00-INDEX > Documentation/RCU/Design/Requirements/Requirements.html > Documentation/RCU/checklist.txt > Documentation/admin-guide/README.rst > Documentation/admin-guide/devices.txt > Documentation/admin-guide/index.rst > Documentation/admin-guide/kernel-parameters.txt > Documentation/admin-guide/pm/cpufreq.rst > Documentation/admin-guide/pm/intel_pstate.rst > Documentation/admin-guide/ras.rst > Documentation/arm/Atmel/README > Documentation/block/biodoc.txt > Documentation/conf.py > Documentation/core-api/assoc_array.rst >
Re: Latest net-next from GIT panic
hmm But after b838d5e1c5b6e57b10ec8af2268824041e3ea911 is the first bad commit commit b838d5e1c5b6e57b10ec8af2268824041e3ea911 Author: Wei Wang Date: Sat Jun 17 10:42:32 2017 -0700 ipv4: mark DST_NOGC and remove the operation of dst_free() With the previous preparation patches, we are ready to get rid of the dst gc operation in ipv4 code and release dst based on refcnt only. So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls to dst_free(). At this point, all dst created in ipv4 code do not use the dst gc anymore and will be destroyed at the point when refcnt drops to 0. Signed-off-by: Wei Wang Acked-by: Martin KaFai Lau Signed-off-by: David S. Miller :04 04 9b7e7fb641de6531fc7887473ca47ef7cb6a11da 831a73b71d3df1755f3e24c0d3c86d7a93fd55e2 M net Still panic - soo will back to past 3 steps and will try to get again bisect without panic. W dniu 2017-09-20 o 14:49, Paweł Staszewski pisze: And the last one git bisect good Bisecting: 1 revision left to test after this (roughly 1 step) [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree With this have kernel panic same as always git bisect bad Bisecting: 0 revisions left to test after this (roughly 0 steps) [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze: Almost there Bisecting: 6 revisions left to test after this (roughly 3 steps) [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: Ok resumed and soo far: Panic: # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f No panic: # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: Soo far bisected and marked: git bisect start # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: Ok kernel crashed with different panic that i didnt catch when i was doing bisect and now my bisection is broken :) git bisect good Bisecting: 1787 revisions left to test after this (roughly 11 steps) error: Your local changes to the following files would be overwritten by checkout: Documentation/00-INDEX Documentation/ABI/stable/sysfs-class-udc Documentation/ABI/testing/configfs-usb-gadget-uac1 Documentation/ABI/testing/ima_policy Documentation/ABI/testing/sysfs-bus-iio Documentation/ABI/testing/sysfs-bus-iio-meas-spec Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 Documentation/ABI/testing/sysfs-class-net Documentation/ABI/testing/sysfs-class-power-twl4030 Documentation/ABI/testing/sysfs-class-typec Documentation/DMA-API.txt Documentation/IRQ-domain.txt Documentation/Makefile Documentation/PCI/MSI-HOWTO.txt Documentation/RCU/00-INDEX Documentation/RCU/Design/Requirements/Requirements.html Documentation/RCU/checklist.txt Documentation/admin-guide/README.rst Documentation/admin-guide/devices.txt Documentation/admin-guide/index.rst Documentation/admin-guide/kernel-parameters.txt Documentation/admin-guide/pm/cpufreq.rst Documentation/admin-guide/pm/intel_pstate.rst Documentation/admin-guide/ras.rst Documentation/arm/Atmel/README Documentation/block/biodoc.txt Documentation/conf.py Documentation/core-api/assoc_array.rst Documentation/core-api/atomic_ops.rst Documentation/core-api/inde
Re: Latest net-next from GIT panic
And the last one git bisect good Bisecting: 1 revision left to test after this (roughly 1 step) [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take dst->__refcnt for insertion into fib6 tree With this have kernel panic same as always git bisect bad Bisecting: 0 revisions left to test after this (roughly 0 steps) [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() W dniu 2017-09-20 o 14:23, Paweł Staszewski pisze: Almost there Bisecting: 6 revisions left to test after this (roughly 3 steps) [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: Ok resumed and soo far: Panic: # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f No panic: # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: Soo far bisected and marked: git bisect start # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: Ok kernel crashed with different panic that i didnt catch when i was doing bisect and now my bisection is broken :) git bisect good Bisecting: 1787 revisions left to test after this (roughly 11 steps) error: Your local changes to the following files would be overwritten by checkout: Documentation/00-INDEX Documentation/ABI/stable/sysfs-class-udc Documentation/ABI/testing/configfs-usb-gadget-uac1 Documentation/ABI/testing/ima_policy Documentation/ABI/testing/sysfs-bus-iio Documentation/ABI/testing/sysfs-bus-iio-meas-spec Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 Documentation/ABI/testing/sysfs-class-net Documentation/ABI/testing/sysfs-class-power-twl4030 Documentation/ABI/testing/sysfs-class-typec Documentation/DMA-API.txt Documentation/IRQ-domain.txt Documentation/Makefile Documentation/PCI/MSI-HOWTO.txt Documentation/RCU/00-INDEX Documentation/RCU/Design/Requirements/Requirements.html Documentation/RCU/checklist.txt Documentation/admin-guide/README.rst Documentation/admin-guide/devices.txt Documentation/admin-guide/index.rst Documentation/admin-guide/kernel-parameters.txt Documentation/admin-guide/pm/cpufreq.rst Documentation/admin-guide/pm/intel_pstate.rst Documentation/admin-guide/ras.rst Documentation/arm/Atmel/README Documentation/block/biodoc.txt Documentation/conf.py Documentation/core-api/assoc_array.rst Documentation/core-api/atomic_ops.rst Documentation/core-api/index.rst Documentation/crypto/asymmetric-keys.txt Documentation/dev-tools/index.rst Documentation/dev-tools/sparse.rst Documentation/devicetree/bindings/arm/amlogic.txt Documentation/devicetree/bindings/arm/atmel-at91.txt Documentation/devicetree/bindings/arm/ccn.txt Documentation/devicetree/bindings/arm/cpus.txt Documentation/devicetree/bindings/arm/gemini.txt Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt Documentation/devicetree/bindings/arm/keystone/keystone.txt Documentation/devicetree/bindings/arm/mediatek.txt Documentation/devicetree/bindings/arm/rockchip.txt Documentation/devicetree/bindings/arm/shmobile.txt Documentation/devicetree/bindings/arm/tegra.txt Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt Documentation/devicet
Re: Latest net-next from GIT panic
Almost there Bisecting: 6 revisions left to test after this (roughly 3 steps) [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call dst_hold_safe() properly W dniu 2017-09-20 o 13:02, Paweł Staszewski pisze: Ok resumed and soo far: Panic: # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f No panic: # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: Soo far bisected and marked: git bisect start # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: Ok kernel crashed with different panic that i didnt catch when i was doing bisect and now my bisection is broken :) git bisect good Bisecting: 1787 revisions left to test after this (roughly 11 steps) error: Your local changes to the following files would be overwritten by checkout: Documentation/00-INDEX Documentation/ABI/stable/sysfs-class-udc Documentation/ABI/testing/configfs-usb-gadget-uac1 Documentation/ABI/testing/ima_policy Documentation/ABI/testing/sysfs-bus-iio Documentation/ABI/testing/sysfs-bus-iio-meas-spec Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 Documentation/ABI/testing/sysfs-class-net Documentation/ABI/testing/sysfs-class-power-twl4030 Documentation/ABI/testing/sysfs-class-typec Documentation/DMA-API.txt Documentation/IRQ-domain.txt Documentation/Makefile Documentation/PCI/MSI-HOWTO.txt Documentation/RCU/00-INDEX Documentation/RCU/Design/Requirements/Requirements.html Documentation/RCU/checklist.txt Documentation/admin-guide/README.rst Documentation/admin-guide/devices.txt Documentation/admin-guide/index.rst Documentation/admin-guide/kernel-parameters.txt Documentation/admin-guide/pm/cpufreq.rst Documentation/admin-guide/pm/intel_pstate.rst Documentation/admin-guide/ras.rst Documentation/arm/Atmel/README Documentation/block/biodoc.txt Documentation/conf.py Documentation/core-api/assoc_array.rst Documentation/core-api/atomic_ops.rst Documentation/core-api/index.rst Documentation/crypto/asymmetric-keys.txt Documentation/dev-tools/index.rst Documentation/dev-tools/sparse.rst Documentation/devicetree/bindings/arm/amlogic.txt Documentation/devicetree/bindings/arm/atmel-at91.txt Documentation/devicetree/bindings/arm/ccn.txt Documentation/devicetree/bindings/arm/cpus.txt Documentation/devicetree/bindings/arm/gemini.txt Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt Documentation/devicetree/bindings/arm/keystone/keystone.txt Documentation/devicetree/bindings/arm/mediatek.txt Documentation/devicetree/bindings/arm/rockchip.txt Documentation/devicetree/bindings/arm/shmobile.txt Documentation/devicetree/bindings/arm/tegra.txt Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt Documentation/devicetree/bindings/gpio/gpio_atmel.txt Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt Documentation/devicetree/bindings/interrupt-controller/a
Re: Latest net-next from GIT panic
Ok resumed and soo far: Panic: # bad: [9cc9a5cb176ccb4f2cda5ac34da5a659926f125f] datapath: Avoid using stack larger than 1024. git bisect bad 9cc9a5cb176ccb4f2cda5ac34da5a659926f125f No panic: # good: [073cf9e20c333ab29744717a23f9e43ec7512a20] Merge branch 'udp-reduce-cache-pressure' git bisect good 073cf9e20c333ab29744717a23f9e43ec7512a20 W dniu 2017-09-20 o 12:22, Paweł Staszewski pisze: Soo far bisected and marked: git bisect start # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: Ok kernel crashed with different panic that i didnt catch when i was doing bisect and now my bisection is broken :) git bisect good Bisecting: 1787 revisions left to test after this (roughly 11 steps) error: Your local changes to the following files would be overwritten by checkout: Documentation/00-INDEX Documentation/ABI/stable/sysfs-class-udc Documentation/ABI/testing/configfs-usb-gadget-uac1 Documentation/ABI/testing/ima_policy Documentation/ABI/testing/sysfs-bus-iio Documentation/ABI/testing/sysfs-bus-iio-meas-spec Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 Documentation/ABI/testing/sysfs-class-net Documentation/ABI/testing/sysfs-class-power-twl4030 Documentation/ABI/testing/sysfs-class-typec Documentation/DMA-API.txt Documentation/IRQ-domain.txt Documentation/Makefile Documentation/PCI/MSI-HOWTO.txt Documentation/RCU/00-INDEX Documentation/RCU/Design/Requirements/Requirements.html Documentation/RCU/checklist.txt Documentation/admin-guide/README.rst Documentation/admin-guide/devices.txt Documentation/admin-guide/index.rst Documentation/admin-guide/kernel-parameters.txt Documentation/admin-guide/pm/cpufreq.rst Documentation/admin-guide/pm/intel_pstate.rst Documentation/admin-guide/ras.rst Documentation/arm/Atmel/README Documentation/block/biodoc.txt Documentation/conf.py Documentation/core-api/assoc_array.rst Documentation/core-api/atomic_ops.rst Documentation/core-api/index.rst Documentation/crypto/asymmetric-keys.txt Documentation/dev-tools/index.rst Documentation/dev-tools/sparse.rst Documentation/devicetree/bindings/arm/amlogic.txt Documentation/devicetree/bindings/arm/atmel-at91.txt Documentation/devicetree/bindings/arm/ccn.txt Documentation/devicetree/bindings/arm/cpus.txt Documentation/devicetree/bindings/arm/gemini.txt Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt Documentation/devicetree/bindings/arm/keystone/keystone.txt Documentation/devicetree/bindings/arm/mediatek.txt Documentation/devicetree/bindings/arm/rockchip.txt Documentation/devicetree/bindings/arm/shmobile.txt Documentation/devicetree/bindings/arm/tegra.txt Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt Documentation/devicetree/bindings/gpio/gpio_atmel.txt Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt Documentation/devicetree/bindings/leds/common.txt Documentat
Re: Latest net-next from GIT panic
Soo far bisected and marked: git bisect start # bad: [07dd6cc1fff160143e82cf5df78c1db0b6e03355] Linux 4.13.2 git bisect bad 07dd6cc1fff160143e82cf5df78c1db0b6e03355 # good: [5d7d2e03e0f01a992e3521b180c3d3e67905f269] Linux 4.12.13 git bisect good 5d7d2e03e0f01a992e3521b180c3d3e67905f269 # good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12 git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c # bad: [ac7b75966c9c86426b55fe1c50ae148aa4571075] Merge tag 'pinctrl-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl git bisect bad ac7b75966c9c86426b55fe1c50ae148aa4571075 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 # good: [e24dd9ee5399747b71c1d982a484fc7601795f31] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security git bisect good e24dd9ee5399747b71c1d982a484fc7601795f31 W dniu 2017-09-20 o 12:21, Paweł Staszewski pisze: Ok kernel crashed with different panic that i didnt catch when i was doing bisect and now my bisection is broken :) git bisect good Bisecting: 1787 revisions left to test after this (roughly 11 steps) error: Your local changes to the following files would be overwritten by checkout: Documentation/00-INDEX Documentation/ABI/stable/sysfs-class-udc Documentation/ABI/testing/configfs-usb-gadget-uac1 Documentation/ABI/testing/ima_policy Documentation/ABI/testing/sysfs-bus-iio Documentation/ABI/testing/sysfs-bus-iio-meas-spec Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 Documentation/ABI/testing/sysfs-class-net Documentation/ABI/testing/sysfs-class-power-twl4030 Documentation/ABI/testing/sysfs-class-typec Documentation/DMA-API.txt Documentation/IRQ-domain.txt Documentation/Makefile Documentation/PCI/MSI-HOWTO.txt Documentation/RCU/00-INDEX Documentation/RCU/Design/Requirements/Requirements.html Documentation/RCU/checklist.txt Documentation/admin-guide/README.rst Documentation/admin-guide/devices.txt Documentation/admin-guide/index.rst Documentation/admin-guide/kernel-parameters.txt Documentation/admin-guide/pm/cpufreq.rst Documentation/admin-guide/pm/intel_pstate.rst Documentation/admin-guide/ras.rst Documentation/arm/Atmel/README Documentation/block/biodoc.txt Documentation/conf.py Documentation/core-api/assoc_array.rst Documentation/core-api/atomic_ops.rst Documentation/core-api/index.rst Documentation/crypto/asymmetric-keys.txt Documentation/dev-tools/index.rst Documentation/dev-tools/sparse.rst Documentation/devicetree/bindings/arm/amlogic.txt Documentation/devicetree/bindings/arm/atmel-at91.txt Documentation/devicetree/bindings/arm/ccn.txt Documentation/devicetree/bindings/arm/cpus.txt Documentation/devicetree/bindings/arm/gemini.txt Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt Documentation/devicetree/bindings/arm/keystone/keystone.txt Documentation/devicetree/bindings/arm/mediatek.txt Documentation/devicetree/bindings/arm/rockchip.txt Documentation/devicetree/bindings/arm/shmobile.txt Documentation/devicetree/bindings/arm/tegra.txt Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt Documentation/devicetree/bindings/gpio/gpio_atmel.txt Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt Documentation/devicetree/bindings/leds/common.txt Documentation/devicetree/bindings/mfd/hi6421.txt Documentation/devicetree/bindings/mfd/tps65910.txt Documentation/devicetree/bindings/mmc/fsl-esdhc.txt Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt Documentation/devicetree/bindings/mtd/
Re: Latest net-next from GIT panic
Ok kernel crashed with different panic that i didnt catch when i was doing bisect and now my bisection is broken :) git bisect good Bisecting: 1787 revisions left to test after this (roughly 11 steps) error: Your local changes to the following files would be overwritten by checkout: Documentation/00-INDEX Documentation/ABI/stable/sysfs-class-udc Documentation/ABI/testing/configfs-usb-gadget-uac1 Documentation/ABI/testing/ima_policy Documentation/ABI/testing/sysfs-bus-iio Documentation/ABI/testing/sysfs-bus-iio-meas-spec Documentation/ABI/testing/sysfs-bus-iio-timer-stm32 Documentation/ABI/testing/sysfs-class-net Documentation/ABI/testing/sysfs-class-power-twl4030 Documentation/ABI/testing/sysfs-class-typec Documentation/DMA-API.txt Documentation/IRQ-domain.txt Documentation/Makefile Documentation/PCI/MSI-HOWTO.txt Documentation/RCU/00-INDEX Documentation/RCU/Design/Requirements/Requirements.html Documentation/RCU/checklist.txt Documentation/admin-guide/README.rst Documentation/admin-guide/devices.txt Documentation/admin-guide/index.rst Documentation/admin-guide/kernel-parameters.txt Documentation/admin-guide/pm/cpufreq.rst Documentation/admin-guide/pm/intel_pstate.rst Documentation/admin-guide/ras.rst Documentation/arm/Atmel/README Documentation/block/biodoc.txt Documentation/conf.py Documentation/core-api/assoc_array.rst Documentation/core-api/atomic_ops.rst Documentation/core-api/index.rst Documentation/crypto/asymmetric-keys.txt Documentation/dev-tools/index.rst Documentation/dev-tools/sparse.rst Documentation/devicetree/bindings/arm/amlogic.txt Documentation/devicetree/bindings/arm/atmel-at91.txt Documentation/devicetree/bindings/arm/ccn.txt Documentation/devicetree/bindings/arm/cpus.txt Documentation/devicetree/bindings/arm/gemini.txt Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt Documentation/devicetree/bindings/arm/keystone/keystone.txt Documentation/devicetree/bindings/arm/mediatek.txt Documentation/devicetree/bindings/arm/rockchip.txt Documentation/devicetree/bindings/arm/shmobile.txt Documentation/devicetree/bindings/arm/tegra.txt Documentation/devicetree/bindings/ata/ahci-fsl-qoriq.txt Documentation/devicetree/bindings/bus/brcm,gisb-arb.txt Documentation/devicetree/bindings/clock/brcm,iproc-clocks.txt Documentation/devicetree/bindings/cpufreq/ti-cpufreq.txt Documentation/devicetree/bindings/gpio/gpio_atmel.txt Documentation/devicetree/bindings/iio/adc/amlogic,meson-saradc.txt Documentation/devicetree/bindings/iio/adc/renesas,gyroadc.txt Documentation/devicetree/bindings/iio/adc/st,stm32-adc.txt Documentation/devicetree/bindings/iio/imu/st_lsm6dsx.txt Documentation/devicetree/bindings/interrupt-controller/allwinner,sunxi-nmi.txt Documentation/devicetree/bindings/interrupt-controller/aspeed,ast2400-vic.txt Documentation/devicetree/bindings/interrupt-controller/mediatek,sysirq.txt Documentation/devicetree/bindings/leds/common.txt Documentation/devicetree/bindings/mfd/hi6421.txt Documentation/devicetree/bindings/mfd/tps65910.txt Documentation/devicetree/bindings/mmc/fsl-esdhc.txt Documentation/devicetree/bindings/mmc/k3-dw-mshc.txt Documentation/devicetree/bindings/mmc/rockchip-dw-mshc.txt Documentation/devicetree/bindings/mmc/ti-omap-hsmmc.txt Documentation/devicetree/bindings/mtd/atmel-nand.txt Documentation/devicetree/bindings/net/dsa/b53.txt Documentation/devicetree/bindings/net/ethernet.txt Documentation/devicetree/bindings/net/macb.txt Documentation/devicetree/bindings/net/marvell-orion-mdio.txt Documentation/devicetree/bindings/net/ti,wilink-st.txt Documentation/devicetree/bindings/net/wireless/ti,wlcore.txt Documentation/devicetree/bindings/nvmem/rockchip-efuse.txt Documentation/devicetree/bindings/opp/opp.txt Documentation/devicetree/bindings/phy/bcm-ns-usb3-phy.txt Documentation/devicetree/bindings/phy/brcm-sata-phy.txt Documentation/devicetree/bindings/phy/meson8b-usb2-phy.txt Documentation/devicetree/bindings/phy/phy-rockchip-inno-usb2.txt Documentation/devicetree/bindings/power/rockchip-io-domain.txt Documentation/devicetree/bindings/power/supply/bq27xxx.txt Documentation/devicetree/bindings/property-units.txt Documentation/devicetree/bindings/regulator/regulator.txt Documentation/devicetree/bindings/serial/8 error: The following untracked working tree files would be overwritten by checkout: Documentation/ABI/testing/sysfs-class-net-phydev Documentation/DocBook/.gitignore Documentation/DocBook/Makef
Re: Latest net-next from GIT panic
Ok looks like ending bisection Latest bisected kernel when there is no kernel panic 4.12.0+ (from next) - but only this warning: [ 309.030019] NETDEV WATCHDOG: enp4s0f0 (ixgbe): transmit queue 0 timed out [ 309.030034] [ cut here ] [ 309.030040] WARNING: CPU: 35 PID: 0 at dev_watchdog+0xcf/0x139 [ 309.030041] Modules linked in: bonding ipmi_si x86_pkg_temp_thermal [ 309.030045] CPU: 35 PID: 0 Comm: swapper/35 Not tainted 4.12.0+ #5 [ 309.030046] task: 88086d98a000 task.stack: c90003378000 [ 309.030048] RIP: 0010:dev_watchdog+0xcf/0x139 [ 309.030049] RSP: 0018:88087fbc3ea8 EFLAGS: 00010246 [ 309.030050] RAX: 003d RBX: 88046b68 RCX: [ 309.030050] RDX: 88087fbd2f01 RSI: RDI: 88087fbcda08 [ 309.030051] RBP: 88087fbc3eb8 R08: R09: 88087ff80a04 [ 309.030051] R10: R11: 88086d98a001 R12: [ 309.030052] R13: 88087fbc3ef8 R14: 88086d98a000 R15: 81c06008 [ 309.030053] FS: () GS:88087fbc() knlGS: [ 309.030054] CS: 0010 DS: ES: CR0: 80050033 [ 309.030054] CR2: 7fba600f6098 CR3: 00086b955000 CR4: 001406e0 [ 309.030055] Call Trace: [ 309.030057] [ 309.030059] ? netif_tx_lock+0x79/0x79 [ 309.030062] call_timer_fn.isra.24+0x17/0x77 [ 309.030063] run_timer_softirq+0x118/0x161 [ 309.030065] ? netif_tx_lock+0x79/0x79 [ 309.030066] ? ktime_get+0x2b/0x42 [ 309.030070] ? lapic_next_deadline+0x21/0x27 [ 309.030073] ? clockevents_program_event+0xa8/0xc5 [ 309.030076] __do_softirq+0xa8/0x19d [ 309.030078] irq_exit+0x5d/0x6b [ 309.030079] smp_apic_timer_interrupt+0x2a/0x36 [ 309.030082] apic_timer_interrupt+0x89/0x90 [ 309.030085] RIP: 0010:mwait_idle+0x4e/0x6a [ 309.030086] RSP: 0018:c9000337be98 EFLAGS: 0246 ORIG_RAX: ff10 [ 309.030087] RAX: RBX: RCX: [ 309.030087] RDX: RSI: RDI: 88086d98a000 [ 309.030088] RBP: c9000337be98 R08: 88046f8279a0 R09: 88046f827040 [ 309.030089] R10: 88086d98a000 R11: 88086d98a000 R12: [ 309.030089] R13: 88086d98a000 R14: 88086d98a000 R15: 88086d98a000 [ 309.030090] [ 309.030094] arch_cpu_idle+0xa/0xc [ 309.030095] default_idle_call+0x19/0x1b [ 309.030102] do_idle+0xbc/0x196 [ 309.030104] cpu_startup_entry+0x1d/0x20 [ 309.030105] start_secondary+0xd8/0xdc [ 309.030108] secondary_startup_64+0x9f/0x9f [ 309.030109] Code: cc 75 bd eb 35 48 89 df c6 05 c3 dc 74 00 01 e8 3a 62 fe ff 44 89 e1 48 89 de 48 89 c2 48 c7 c7 0f 65 a4 81 31 c0 e8 3d 4c b5 ff <0f> ff 48 8b 83 e0 01 00 00 48 89 df ff 50 78 48 8b 05 a0 bc 6a [ 309.030128] ---[ end trace 9102cb25703ae2d9 ]--- I just marked it as good - cause this problem above is differend - and im going to: git bisect good Bisecting: 1787 revisions left to test after this (roughly 11 steps) W dniu 2017-09-20 o 10:44, Paweł Staszewski pisze: Trying to make video from ipmi :) with that results: https://bugzilla.kernel.org/attachment.cgi?id=258521 catched two more lines where it starts - panic from 4.13.2. Now will try tro do some bisection W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: Hi Will try bisecting tonight W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: Just checked kernel 4.13.2 and same problem Just after start all 6 bgp sessions - and kernel starts to learn routes it panic. https://bugzilla.kernel.org/attachment.cgi?id=258509 Unfortunately we have not enough information from these traces. Can you get a full stack trace ? Alternatively, can you bisect ? Thanks.
Re: Latest net-next from GIT panic
Trying to make video from ipmi :) with that results: https://bugzilla.kernel.org/attachment.cgi?id=258521 catched two more lines where it starts - panic from 4.13.2. Now will try tro do some bisection W dniu 2017-09-20 o 09:58, Paweł Staszewski pisze: Hi Will try bisecting tonight W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: Just checked kernel 4.13.2 and same problem Just after start all 6 bgp sessions - and kernel starts to learn routes it panic. https://bugzilla.kernel.org/attachment.cgi?id=258509 Unfortunately we have not enough information from these traces. Can you get a full stack trace ? Alternatively, can you bisect ? Thanks.
Re: Latest net-next from GIT panic
Hi Will try bisecting tonight W dniu 2017-09-20 o 05:24, Eric Dumazet pisze: On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: Just checked kernel 4.13.2 and same problem Just after start all 6 bgp sessions - and kernel starts to learn routes it panic. https://bugzilla.kernel.org/attachment.cgi?id=258509 Unfortunately we have not enough information from these traces. Can you get a full stack trace ? Alternatively, can you bisect ? Thanks.
Re: Latest net-next from GIT panic
On Wed, 2017-09-20 at 02:06 +0200, Paweł Staszewski wrote: > Just checked kernel 4.13.2 and same problem > > Just after start all 6 bgp sessions - and kernel starts to learn routes > it panic. > > https://bugzilla.kernel.org/attachment.cgi?id=258509 > Unfortunately we have not enough information from these traces. Can you get a full stack trace ? Alternatively, can you bisect ? Thanks.
Re: Latest net-next from GIT panic
Latest working kernel with same configuration and kernel config 4.12.13 There is no panic after routes from all 6x bgp sessions are learned. ip r | wc -l 653112 W dniu 2017-09-20 o 02:06, Paweł Staszewski pisze: Just checked kernel 4.13.2 and same problem Just after start all 6 bgp sessions - and kernel starts to learn routes it panic. https://bugzilla.kernel.org/attachment.cgi?id=258509 W dniu 2017-09-20 o 02:01, Paweł Staszewski pisze: Some information about enviroment: Server is acting as a ip router with bgp There are 6x bgp sessions - each with full bgp table ~600k prefixes And it looks like panic is appearing after bgp sessions are connected - not by traffic - cause at time when panic occured there is almost no traffic. Also when I run tris server without turning on BGP and push thru this server traffic by pktgen there is no panic. just after it learn routes it panick W dniu 2017-09-20 o 01:45, Paweł Staszewski pisze: Added few more screenshoots from kernels 4.14-rc1(net-next) and 4.14-rc1(linux-next) https://bugzilla.kernel.org/show_bug.cgi?id=197005 W dniu 2017-09-20 o 00:35, Paweł Staszewski pisze: Just tried latest net-next git and found kernel panic. Below link to bugzilla. https://bugzilla.kernel.org/attachment.cgi?id=258499
Re: Latest net-next from GIT panic
Just checked kernel 4.13.2 and same problem Just after start all 6 bgp sessions - and kernel starts to learn routes it panic. https://bugzilla.kernel.org/attachment.cgi?id=258509 W dniu 2017-09-20 o 02:01, Paweł Staszewski pisze: Some information about enviroment: Server is acting as a ip router with bgp There are 6x bgp sessions - each with full bgp table ~600k prefixes And it looks like panic is appearing after bgp sessions are connected - not by traffic - cause at time when panic occured there is almost no traffic. Also when I run tris server without turning on BGP and push thru this server traffic by pktgen there is no panic. just after it learn routes it panick W dniu 2017-09-20 o 01:45, Paweł Staszewski pisze: Added few more screenshoots from kernels 4.14-rc1(net-next) and 4.14-rc1(linux-next) https://bugzilla.kernel.org/show_bug.cgi?id=197005 W dniu 2017-09-20 o 00:35, Paweł Staszewski pisze: Just tried latest net-next git and found kernel panic. Below link to bugzilla. https://bugzilla.kernel.org/attachment.cgi?id=258499
Re: Latest net-next from GIT panic
Some information about enviroment: Server is acting as a ip router with bgp There are 6x bgp sessions - each with full bgp table ~600k prefixes And it looks like panic is appearing after bgp sessions are connected - not by traffic - cause at time when panic occured there is almost no traffic. Also when I run tris server without turning on BGP and push thru this server traffic by pktgen there is no panic. just after it learn routes it panick W dniu 2017-09-20 o 01:45, Paweł Staszewski pisze: Added few more screenshoots from kernels 4.14-rc1(net-next) and 4.14-rc1(linux-next) https://bugzilla.kernel.org/show_bug.cgi?id=197005 W dniu 2017-09-20 o 00:35, Paweł Staszewski pisze: Just tried latest net-next git and found kernel panic. Below link to bugzilla. https://bugzilla.kernel.org/attachment.cgi?id=258499
Re: Latest net-next from GIT panic
Added few more screenshoots from kernels 4.14-rc1(net-next) and 4.14-rc1(linux-next) https://bugzilla.kernel.org/show_bug.cgi?id=197005 W dniu 2017-09-20 o 00:35, Paweł Staszewski pisze: Just tried latest net-next git and found kernel panic. Below link to bugzilla. https://bugzilla.kernel.org/attachment.cgi?id=258499
Latest net-next from GIT panic
Just tried latest net-next git and found kernel panic. Below link to bugzilla. https://bugzilla.kernel.org/attachment.cgi?id=258499