Re: [PATCH 1/3] netfilter: ipset: use setup_timer() and mod_timer().

2016-05-20 Thread Jozsef Kadlecsik
On Sat, 14 May 2016, Muhammad Falak R Wani wrote:

> Use setup_timer() and instead of init_timer(), being the preferred way
> of setting up a timer.
> 
> Also, quoting the mod_timer() function comment:
> -> mod_timer() is a more efficient way to update the expire field of an
>active timer (if the timer is inactive it will be activated).
> 
> Use setup_timer() and mod_timer() to setup and arm a timer, making the
> code compact and easier to read.
> 
> Signed-off-by: Muhammad Falak R Wani 
> ---
>  net/netfilter/ipset/ip_set_bitmap_gen.h | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)

The patch and the other too in the series as well are applied in the ipset 
git repository and will be submitted for kernel inclusion. Thanks.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH v2] netfilter: fix race condition in ipset save, swap and delete

2016-03-18 Thread Jozsef Kadlecsik
Hi,

On Mon, 14 Mar 2016, Vishwanath Pai wrote:

> I have updated the patch according to comments by Jozsef. Renamed
> ref_kernel to ref_netlink, renamed _put/_get functions and updated the
> description in commit log.

Patch is applied to the ipset git tree - you use some older kernel tree 
and I had to apply it manually. I'll send the patch for kernel inclusion.

Best regards,
Jozsef

> --
> 
> netfilter: fix race condition in ipset save, swap and delete
> 
> This fix adds a new reference counter (ref_netlink) for the struct ip_set.
> The other reference counter (ref) can be swapped out by ip_set_swap and we
> need a separate counter to keep track of references for netlink events
> like dump. Using the same ref counter for dump causes a race condition
> which can be demonstrated by the following script:
> 
> #!/bin/sh
> ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 50 \
> counters
> ipset create hash_ip2 hash:ip family inet hashsize 30 maxelem 50 \
> counters
> ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 50 \
> counters
> 
> ipset save &
> 
> ipset swap hash_ip3 hash_ip2
> ipset destroy hash_ip3 /* will crash the machine */
> 
> Swap will exchange the values of ref so destroy will see ref = 0 instead of
> ref = 1. With this fix in place swap will not succeed because ipset save
> still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink).
> 
> Both delete and swap will error out if ref_netlink != 0 on the set.
> 
> Note: The changes to *_head functions is because previously we would
> increment ref whenever we called these functions, we don't do that
> anymore.
> 
> Reviewed-by: Joshua Hunt 
> Signed-off-by: Vishwanath Pai 
> 
> --
> 
> diff --git a/include/linux/netfilter/ipset/ip_set.h 
> b/include/linux/netfilter/ipset/ip_set.h
> index 0e1f433..f48b8a6 100644
> --- a/include/linux/netfilter/ipset/ip_set.h
> +++ b/include/linux/netfilter/ipset/ip_set.h
> @@ -234,6 +234,10 @@ struct ip_set {
>   spinlock_t lock;
>   /* References to the set */
>   u32 ref;
> + /* References to the set for netlink events like dump,
> +  * ref can be swapped out by ip_set_swap
> +  */
> + u32 ref_netlink;
>   /* The core set type */
>   struct ip_set_type *type;
>   /* The type variant doing the real job */
> diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
> b/net/netfilter/ipset/ip_set_bitmap_gen.h
> index b0bc475..2e8e7e5 100644
> --- a/net/netfilter/ipset/ip_set_bitmap_gen.h
> +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
> @@ -95,7 +95,7 @@ mtype_head(struct ip_set *set, struct sk_buff *skb)
>   if (!nested)
>   goto nla_put_failure;
>   if (mtype_do_head(skb, map) ||
> - nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref - 1)) ||
> + nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) ||
>   nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)))
>   goto nla_put_failure;
>   if (unlikely(ip_set_put_flags(skb, set)))
> diff --git a/net/netfilter/ipset/ip_set_core.c 
> b/net/netfilter/ipset/ip_set_core.c
> index 95db43f..a558075 100644
> --- a/net/netfilter/ipset/ip_set_core.c
> +++ b/net/netfilter/ipset/ip_set_core.c
> @@ -497,6 +497,26 @@ __ip_set_put(struct ip_set *set)
>   write_unlock_bh(&ip_set_ref_lock);
>  }
>  
> +/* set->ref can be swapped out by ip_set_swap, netlink events (like dump) 
> need
> + * a separate reference counter
> + */
> +static inline void
> +__ip_set_get_netlink(struct ip_set *set)
> +{
> + write_lock_bh(&ip_set_ref_lock);
> + set->ref_netlink++;
> + write_unlock_bh(&ip_set_ref_lock);
> +}
> +
> +static inline void
> +__ip_set_put_netlink(struct ip_set *set)
> +{
> + write_lock_bh(&ip_set_ref_lock);
> + BUG_ON(set->ref_netlink == 0);
> + set->ref_netlink--;
> + write_unlock_bh(&ip_set_ref_lock);
> +}
> +
>  /* Add, del and test set entries from kernel.
>   *
>   * The set behind the index must exist and must be referenced
> @@ -999,7 +1019,7 @@ static int ip_set_destroy(struct net *net, struct sock 
> *ctnl,
>   if (!attr[IPSET_ATTR_SETNAME]) {
>   for (i = 0; i < inst->ip_set_max; i++) {
>   s = ip_set(inst, i);
> - if (s && s->ref) {
> + if (s && (s->ref || s->ref_netlink)) {
>   ret = -IPSET_ERR_BUSY;
>   goto out;
>   }
> @@ -1021,7 +1041,7 @@ static int ip_set_destroy(struct net *net, struct sock 
> *ctnl,
>   if (!s) {
>   ret = -ENOENT;
>   goto out;
> - } else if (s->ref) {
> + } else if (s->ref || s->ref_netlink) {
>   ret = -IPSET_ERR_BUSY;
>   goto out;
>   }
> @@ -1168,6 +1188,9 @@ static int ip_set_swap(struct net *net, struct sock 
> *ctnl, struct sk_buff *skb,
> from->family =

Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-27 Thread Jozsef Kadlecsik
On Sun, 27 Mar 2016, Baozeng Ding wrote:

> The following program triggers stack-out-of-bounds in tcp_packet. The
> kernel version is 4.5 (on Mar 16 commit
> 09fd671ccb2475436bd5f597f751ca4a7d177aea).
> Uncovered with syzkaller. Thanks.
> 
> ==
> BUG: KASAN: stack-out-of-bounds in tcp_packet+0x4b77/0x51c0 at addr
> 8800a45df3c8
> Read of size 1 by task 0327/11132
> page:ea00029177c0 count:0 mapcount:0 mapping:  (null) index:0x0
> flags: 0x1fffc00()
> page dumped because: kasan: bad access detected
> CPU: 1 PID: 11132 Comm: 0327 Tainted: GB   4.5.0+ #12
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
>  0001 8800a45df148 82945051 8800a45df1d8
>  8800a45df3c8 0027 0001 8800a45df1c8
>  81709f88 8800b4f7e3d0 0028 0286
> Call Trace:
> [< inline >] __dump_stack /kernel/lib/dump_stack.c:15
> [] dump_stack+0xb3/0x112 /kernel/lib/dump_stack.c:51
> [< inline >] print_address_description /kernel/mm/kasan/report.c:150
> [] kasan_report_error+0x4f8/0x530
> /kernel/mm/kasan/report.c:236
> [] ? skb_copy_bits+0x49d/0x6d0
> /kernel/net/core/skbuff.c:1675
> [< inline >] ? spin_lock_bh /kernel/include/linux/spinlock.h:307
> [] ? tcp_packet+0x1c9/0x51c0
> /kernel/net/netfilter/nf_conntrack_proto_tcp.c:833
> [< inline >] kasan_report /kernel/mm/kasan/report.c:259
> [] __asan_report_load1_noabort+0x3e/0x40
> /kernel/mm/kasan/report.c:277
> [< inline >] ? tcp_sack
> /kernel/net/netfilter/nf_conntrack_proto_tcp.c:473
> [< inline >] ? tcp_in_window
> /kernel/net/netfilter/nf_conntrack_proto_tcp.c:527
> [] ? tcp_packet+0x4b77/0x51c0
> /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1036
> [< inline >] tcp_sack
> /kernel/net/netfilter/nf_conntrack_proto_tcp.c:473
> [< inline >] tcp_in_window
> /kernel/net/netfilter/nf_conntrack_proto_tcp.c:527
> [] tcp_packet+0x4b77/0x51c0
> /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1036
> [] ? memset+0x28/0x30 /kernel/mm/kasan/kasan.c:302
> [] ? tcp_new+0x1a4/0xc20
> /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1122
> [< inline >] ? build_report /kernel/include/net/netlink.h:499
> [] ? xfrm_send_report+0x426/0x450
> /kernel/net/xfrm/xfrm_user.c:3039
> [] ? tcp_new+0xc20/0xc20
> /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1169
> [] ? init_conntrack+0xca/0x9e0
> /kernel/net/netfilter/nf_conntrack_core.c:972
> [] ? nf_conntrack_alloc+0x40/0x40
> /kernel/net/netfilter/nf_conntrack_core.c:903
> [] ? tcp_init_net+0x6e0/0x6e0
> /kernel/include/net/netfilter/nf_conntrack_l4proto.h:137
> [] ? ipv4_get_l4proto+0x262/0x390
> /kernel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:89
> [] ? nf_ct_get_tuple+0xaf/0x190
> /kernel/net/netfilter/nf_conntrack_core.c:197
> [] nf_conntrack_in+0x8ee/0x1170
> /kernel/net/netfilter/nf_conntrack_core.c:1177
> [] ? init_conntrack+0x9e0/0x9e0
> /kernel/net/netfilter/nf_conntrack_core.c:287
> [] ? ipt_do_table+0xa16/0x1260
> /kernel/net/ipv4/netfilter/ip_tables.c:423
> [] ? trace_hardirqs_on+0xd/0x10
> /kernel/kernel/locking/lockdep.c:2635
> [] ? __local_bh_enable_ip+0x6b/0xc0
> /kernel/kernel/softirq.c:175
> [] ? check_entry.isra.4+0x190/0x190
> /kernel/net/ipv6/netfilter/ip6_tables.c:594
> [] ? ip_reply_glue_bits+0xc0/0xc0
> /kernel/net/ipv4/ip_output.c:1530
> [] ipv4_conntrack_local+0x14e/0x1a0
> /kernel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:161
> [] ? iptable_raw_hook+0x9d/0x1e0
> /kernel/net/ipv4/netfilter/iptable_raw.c:32
> [] nf_iterate+0x15d/0x230 /kernel/net/netfilter/core.c:274
> [] ? nf_iterate+0x230/0x230 /kernel/net/netfilter/core.c:268
> [] nf_hook_slow+0x1ad/0x310 /kernel/net/netfilter/core.c:306
> [] ? nf_iterate+0x230/0x230 /kernel/net/netfilter/core.c:268
> [] ? nf_iterate+0x230/0x230 /kernel/net/netfilter/core.c:268
> [] ? prandom_u32+0x24/0x30 /kernel/lib/random32.c:83
> [] ? ip_idents_reserve+0x9f/0xf0
> /kernel/net/ipv4/route.c:484
> [< inline >] nf_hook_thresh /kernel/include/linux/netfilter.h:187
> [< inline >] nf_hook /kernel/include/linux/netfilter.h:197
> [] __ip_local_out+0x263/0x3c0
> /kernel/net/ipv4/ip_output.c:104
> [] ? ip_finish_output+0xd00/0xd00
> /kernel/include/net/ip.h:322
> [] ? __ip_flush_pending_frames.isra.45+0x2e0/0x2e0
> /kernel/net/ipv4/ip_output.c:1337
> [] ? __ip_make_skb+0xfe6/0x1610
> /kernel/net/ipv4/ip_output.c:1436
> [] ip_local_out+0x2d/0x1c0 /kernel/net/ipv4/ip_output.c:113
> [] ip_send_skb+0x3c/0xc0 /kernel/net/ipv4/ip_output.c:1443
> [] ip_push_pending_frames+0x64/0x80
> /kernel/net/ipv4/ip_output.c:1463
> [< inline >] rcu_read_unlock /kernel/include/linux/rcupdate.h:922
> [] raw_sendmsg+0x17bb/0x25c0
> /kernel/net/ieee802154/socket.c:53
> [] ? dst_output+0x190/0x190 /kernel/include/net/dst.h:492
> [< inline >] ? trace_mm_page_alloc
> /kerne

Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-27 Thread Jozsef Kadlecsik
On Mon, 28 Mar 2016, Jozsef Kadlecsik wrote:

> On Sun, 27 Mar 2016, Baozeng Ding wrote:
> 
> > The following program triggers stack-out-of-bounds in tcp_packet. The
> > kernel version is 4.5 (on Mar 16 commit
> > 09fd671ccb2475436bd5f597f751ca4a7d177aea).
> > Uncovered with syzkaller. Thanks.
> > 
> > ==
> > BUG: KASAN: stack-out-of-bounds in tcp_packet+0x4b77/0x51c0 at addr
> > 8800a45df3c8
> > Read of size 1 by task 0327/11132
> > page:ea00029177c0 count:0 mapcount:0 mapping:  (null) index:0x0
> > flags: 0x1fffc00()
> > page dumped because: kasan: bad access detected
> > CPU: 1 PID: 11132 Comm: 0327 Tainted: GB   4.5.0+ #12
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
> >  0001 8800a45df148 82945051 8800a45df1d8
> >  8800a45df3c8 0027 0001 8800a45df1c8
> >  81709f88 8800b4f7e3d0 0028 0286
> > Call Trace:
> > [< inline >] __dump_stack /kernel/lib/dump_stack.c:15
> > [] dump_stack+0xb3/0x112 /kernel/lib/dump_stack.c:51
> > [< inline >] print_address_description /kernel/mm/kasan/report.c:150
> > [] kasan_report_error+0x4f8/0x530
> > /kernel/mm/kasan/report.c:236
> > [] ? skb_copy_bits+0x49d/0x6d0
> > /kernel/net/core/skbuff.c:1675
> > [< inline >] ? spin_lock_bh /kernel/include/linux/spinlock.h:307
> > [] ? tcp_packet+0x1c9/0x51c0
> > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:833
> > [< inline >] kasan_report /kernel/mm/kasan/report.c:259
> > [] __asan_report_load1_noabort+0x3e/0x40
> > /kernel/mm/kasan/report.c:277
> > [< inline >] ? tcp_sack
> > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:473
> > [< inline >] ? tcp_in_window
> > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:527
> > [] ? tcp_packet+0x4b77/0x51c0
> > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1036
> > [< inline >] tcp_sack
> > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:473
> > [< inline >] tcp_in_window
> > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:527
> > [] tcp_packet+0x4b77/0x51c0
> > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1036
> > [] ? memset+0x28/0x30 /kernel/mm/kasan/kasan.c:302
> > [] ? tcp_new+0x1a4/0xc20
> > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1122
> > [< inline >] ? build_report /kernel/include/net/netlink.h:499
> > [] ? xfrm_send_report+0x426/0x450
> > /kernel/net/xfrm/xfrm_user.c:3039
> > [] ? tcp_new+0xc20/0xc20
> > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1169
> > [] ? init_conntrack+0xca/0x9e0
> > /kernel/net/netfilter/nf_conntrack_core.c:972
> > [] ? nf_conntrack_alloc+0x40/0x40
> > /kernel/net/netfilter/nf_conntrack_core.c:903
> > [] ? tcp_init_net+0x6e0/0x6e0
> > /kernel/include/net/netfilter/nf_conntrack_l4proto.h:137
> > [] ? ipv4_get_l4proto+0x262/0x390
> > /kernel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:89
> > [] ? nf_ct_get_tuple+0xaf/0x190
> > /kernel/net/netfilter/nf_conntrack_core.c:197
> > [] nf_conntrack_in+0x8ee/0x1170
> > /kernel/net/netfilter/nf_conntrack_core.c:1177
> > [] ? init_conntrack+0x9e0/0x9e0
> > /kernel/net/netfilter/nf_conntrack_core.c:287
> > [] ? ipt_do_table+0xa16/0x1260
> > /kernel/net/ipv4/netfilter/ip_tables.c:423
> > [] ? trace_hardirqs_on+0xd/0x10
> > /kernel/kernel/locking/lockdep.c:2635
> > [] ? __local_bh_enable_ip+0x6b/0xc0
> > /kernel/kernel/softirq.c:175
> > [] ? check_entry.isra.4+0x190/0x190
> > /kernel/net/ipv6/netfilter/ip6_tables.c:594
> > [] ? ip_reply_glue_bits+0xc0/0xc0
> > /kernel/net/ipv4/ip_output.c:1530
> > [] ipv4_conntrack_local+0x14e/0x1a0
> > /kernel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:161
> > [] ? iptable_raw_hook+0x9d/0x1e0
> > /kernel/net/ipv4/netfilter/iptable_raw.c:32
> > [] nf_iterate+0x15d/0x230 /kernel/net/netfilter/core.c:274
> > [] ? nf_iterate+0x230/0x230 
> > /kernel/net/netfilter/core.c:268
> > [] nf_hook_slow+0x1ad/0x310 
> > /kernel/net/netfilter/core.c:306
> > [] ? nf_iterate+0x230/0x230 
> > /kernel/net/netfilter/core.c:268
> > [] ? nf_iterate+0x230/0x230 
> > /kernel/net/netfilter/core.c:268
> > [] ? prandom_u32+0x24/0x30 /kernel/lib/random32.c:83
> > [] ? ip_idents_reserve+0x9f/0xf0
> > /kernel/net/ipv4/route.c:484
> > [< inline &g

Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Jozsef Kadlecsik
Hi David, Pablo,

David, do you agree with the patch for net/ipv4/tcp_input.c? If yes, how 
should I proceed? Should I send the whole patch to you or is it OK to send 
to Pablo?

Best regards,
Jozsef

On Mon, 28 Mar 2016, Baozeng Ding wrote:

> 
> 
> On 2016/3/28 10:35, Baozeng Ding wrote:
> > 
> > 
> > On 2016/3/28 6:25, Jozsef Kadlecsik wrote:
> > > On Mon, 28 Mar 2016, Jozsef Kadlecsik wrote:
> > > 
> > > > On Sun, 27 Mar 2016, Baozeng Ding wrote:
> > > > 
> > > > > The following program triggers stack-out-of-bounds in tcp_packet. The
> > > > > kernel version is 4.5 (on Mar 16 commit
> > > > > 09fd671ccb2475436bd5f597f751ca4a7d177aea).
> > > > > Uncovered with syzkaller. Thanks.
> > > > > 
> > > > > ==
> > > > > BUG: KASAN: stack-out-of-bounds in tcp_packet+0x4b77/0x51c0 at addr
> > > > > 8800a45df3c8
> > > > > Read of size 1 by task 0327/11132
> > > > > page:ea00029177c0 count:0 mapcount:0 mapping: (null) index:0x0
> > > > > flags: 0x1fffc00()
> > > > > page dumped because: kasan: bad access detected
> > > > > CPU: 1 PID: 11132 Comm: 0327 Tainted: GB 4.5.0+ #12
> > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > > > > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
> > > > >   0001 8800a45df148 82945051 8800a45df1d8
> > > > >   8800a45df3c8 0027 0001 8800a45df1c8
> > > > >   81709f88 8800b4f7e3d0 0028 0286
> > > > > Call Trace:
> > > > > [< inline >] __dump_stack /kernel/lib/dump_stack.c:15
> > > > > [] dump_stack+0xb3/0x112 /kernel/lib/dump_stack.c:51
> > > > > [< inline >] print_address_description
> > > > > /kernel/mm/kasan/report.c:150
> > > > > [] kasan_report_error+0x4f8/0x530
> > > > > /kernel/mm/kasan/report.c:236
> > > > > [] ? skb_copy_bits+0x49d/0x6d0
> > > > > /kernel/net/core/skbuff.c:1675
> > > > > [< inline >] ? spin_lock_bh
> > > > > /kernel/include/linux/spinlock.h:307
> > > > > [] ? tcp_packet+0x1c9/0x51c0
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:833
> > > > > [< inline >] kasan_report /kernel/mm/kasan/report.c:259
> > > > > [] __asan_report_load1_noabort+0x3e/0x40
> > > > > /kernel/mm/kasan/report.c:277
> > > > > [< inline >] ? tcp_sack
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:473
> > > > > [< inline >] ? tcp_in_window
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:527
> > > > > [] ? tcp_packet+0x4b77/0x51c0
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1036
> > > > > [< inline >] tcp_sack
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:473
> > > > > [< inline >] tcp_in_window
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:527
> > > > > [] tcp_packet+0x4b77/0x51c0
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1036
> > > > > [] ? memset+0x28/0x30 /kernel/mm/kasan/kasan.c:302
> > > > > [] ? tcp_new+0x1a4/0xc20
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1122
> > > > > [< inline >] ? build_report /kernel/include/net/netlink.h:499
> > > > > [] ? xfrm_send_report+0x426/0x450
> > > > > /kernel/net/xfrm/xfrm_user.c:3039
> > > > > [] ? tcp_new+0xc20/0xc20
> > > > > /kernel/net/netfilter/nf_conntrack_proto_tcp.c:1169
> > > > > [] ? init_conntrack+0xca/0x9e0
> > > > > /kernel/net/netfilter/nf_conntrack_core.c:972
> > > > > [] ? nf_conntrack_alloc+0x40/0x40
> > > > > /kernel/net/netfilter/nf_conntrack_core.c:903
> > > > > [] ? tcp_init_net+0x6e0/0x6e0
> > > > > /kernel/include/net/netfilter/nf_conntrack_l4proto.h:137
> > > > > [] ? ipv4_get_l4proto+0x262/0x390
> > > > > /kernel/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c:89
> > > > > [] ? nf_ct_get_tuple+0xaf/0x190
> > > > > /kernel/net/netfilter/nf_conntrack_core.c:197
> > > > >

Re: BUG: net/netfilter: KASAN: stack-out-of-bounds in tcp_packet

2016-03-28 Thread Jozsef Kadlecsik
On Mon, 28 Mar 2016, Eric Dumazet wrote:

> On Mon, 2016-03-28 at 22:20 +0200, Jan Engelhardt wrote:
> > On Monday 2016-03-28 21:29, David Miller wrote:
> > >>> > > @@ -3716,6 +3716,8 @@ void tcp_parse_options(const struct sk_buff 
> > >>> > > *skb,
> > >>> > >   length--;
> > >>> > >   continue;
> > >>> > >   default:
> > >>> > > +if (length < 2)
> > >>> > > +return;
> > >>> > >   opsize = *ptr++;
> > >>> > >   if (opsize < 2) /* "silly options" */
> > >>> > >   return;
> > >
> > >I'm trying to figure out how this can even matter.
> > >If we are in the loop, length is at least one.
> > >That means it is legal to read the opsize byte.
> > 
> > Is that because the skbuff is always padded to a multiple of (at
> > least) two? Maybe such padding is explicitly foregone when ASAN is in
> > place. After all, glibc, in userspace, is likely to do padding as
> > well for malloc, and yet, ASAN catches these cases.

There might be a TCP option combination, which is "properly" padded but 
broken, like (wscale, wscale-value, mss) where the mss-value is missing.

> We have at least 384 bytes of padding in skb->head (this is struct
> skb_shared_info).
> 
> Whatever garbage we might read, current code is fine.
> 
> We have to deal with a false positive here.

In net/netfilter/nf_conntrack_proto_tcp.c we copy the options into a 
buffer with skb_header_pointer(), so it's not a false positive there and 
the KASAN report referred to that part.

I thought it's valid for tcp_parse_options() too, but then I'm wrong so 
at least the part from the patch for tcp_input.c can be dropped.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: unused code in net/netfilter/ipset/ip_set_bitmap_ipmac.c

2016-02-29 Thread Jozsef Kadlecsik
Hi,

On Mon, 29 Feb 2016, Julia Lawall wrote:

> The file net/netfilter/ipset/ip_set_bitmap_ipmac.c seems to contain a lot
> of static functions that are not used in the file:
> 
> bitmap_ipmac_add_timeout
> bitmap_ipmac_do_add
> bitmap_ipmac_do_del
> bitmap_ipmac_do_head
> bitmap_ipmac_do_list
> bitmap_ipmac_do_test
> bitmap_ipmac_gc_test
> bitmap_ipmac_is_filled
> bitmap_ipmac_kadt
> bitmap_ipmac_same_set
> bitmap_ipmac_uadt
> 
> Have I overooked something?

Yes: the file includes ip_set_bitmap_gen.h in which all those functions 
are used.
 
> I was looking at this code, with Daniel Borkmann, because there seems to
> be a bug in the function bitmap_ipmac_uadt:
> 
>   if (tb[IPSET_ATTR_ETHER]) {
> memcpy(e.ether, nla_data(tb[IPSET_ATTR_ETHER]), ETH_ALEN);
> e.add_mac = 1;
> }
> 
> Later in the same file, there is:
> 
> static struct ip_set_type bitmap_ipmac_type = {
>   ...
> .adt_policy = {
>   ...
>   [IPSET_ATTR_ETHER]  = { .type = NLA_BINARY,
> .len  = ETH_ALEN },
>   ...},
>   ...
> };
> 
> The type NLA_BINARY indicates that the length is a maximum possible
> length, and thus a check of the actual length is needed before the memcpy.

You are right here (and the similar spotting in ip_set_hash_mac.c) - I'll 
prepare a patch and submit it.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH] netfilter: fix race condition in ipset save and delete

2016-03-13 Thread Jozsef Kadlecsik
Hi,

On Sat, 12 Mar 2016, Vishwanath Pai wrote:

> netfilter: fix race condition in ipset save and delete
> 
> This fix adds a new reference counter (ref_kernel) for the struct ip_set.
> The other reference counter (ref) is used to track references from the
> userspace and we need a separate counter to keep track of in-kernel
> references. Using the same ref counter for both userspace and kernel
> references causes a race condition which can be demonstrated by the
> following script:
> 
> #!/bin/sh
> ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 50 \
> counters
> ipset create hash_ip2 hash:ip family inet hashsize 30 maxelem 50 \
> counters
> ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 50 \
> counters
> 
> ipset save &
> 
> ipset swap hash_ip3 hash_ip2
> ipset destroy hash_ip3 /* will crash the machine */
> 
> Swap will exchange the values of ref so destroy will see ref = 0 instead of
> ref = 1. With this fix in place swap will not suceed because ipset save
> still has ref_kernel on the set (ip_set_swap doesn't swap ref_kernel).
> 
> Both delete and swap will error out if ref_kernel != 0 on the set.
> 
> Note: The changes to *_head functions is because previously we would
> increment ref whenever we called these functions, we don't do that
> anymore.

Overall, I agree with your patch, however I disagree with the description 
and some details. 

It's a race between dump & swap and then destroy - dump and destroy are 
safe. The "ref" reference counter *is* kernel related: it keeps track of 
references from other kernel subsystems (netfilter matches/targets) and 
from ipset itself when a set is a member of another set. It would be 
misleading to call "ref" as userspace reference counter.

The reference counter you introduce is for netlink events (technically 
just for dump), so it would better be named "ref_netlink" instead of 
"ref_kernel" (similarly, ip_set_get|put_ref_netlink).

Please update the patch, the description and resubmit. Thanks!

Best regards,
Jozsef

> Reviewed-by: Joshua Hunt 
> Signed-off-by: Vishwanath Pai 
> 
> --
> 
> diff --git a/include/linux/netfilter/ipset/ip_set.h 
> b/include/linux/netfilter/ipset/ip_set.h
> index 0e1f433..86d86db 100644
> --- a/include/linux/netfilter/ipset/ip_set.h
> +++ b/include/linux/netfilter/ipset/ip_set.h
> @@ -234,6 +234,9 @@ struct ip_set {
>   spinlock_t lock;
>   /* References to the set */
>   u32 ref;
> + /* the above ref can be swapped out by ip_set_swap and
> +cannot be used to keep track of references within ipset code */
> + u32 ref_kernel;
>   /* The core set type */
>   struct ip_set_type *type;
>   /* The type variant doing the real job */
> diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
> b/net/netfilter/ipset/ip_set_bitmap_gen.h
> index b0bc475..2e8e7e5 100644
> --- a/net/netfilter/ipset/ip_set_bitmap_gen.h
> +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
> @@ -95,7 +95,7 @@ mtype_head(struct ip_set *set, struct sk_buff *skb)
>   if (!nested)
>   goto nla_put_failure;
>   if (mtype_do_head(skb, map) ||
> - nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref - 1)) ||
> + nla_put_net32(skb, IPSET_ATTR_REFERENCES, htonl(set->ref)) ||
>   nla_put_net32(skb, IPSET_ATTR_MEMSIZE, htonl(memsize)))
>   goto nla_put_failure;
>   if (unlikely(ip_set_put_flags(skb, set)))
> diff --git a/net/netfilter/ipset/ip_set_core.c 
> b/net/netfilter/ipset/ip_set_core.c
> index 95db43f..a055f29 100644
> --- a/net/netfilter/ipset/ip_set_core.c
> +++ b/net/netfilter/ipset/ip_set_core.c
> @@ -497,6 +497,26 @@ __ip_set_put(struct ip_set *set)
>   write_unlock_bh(&ip_set_ref_lock);
>  }
>  
> +/* The above two functions keep track of references from the userspace, the
> + * _internal functions keep track of references in-kernel
> + */
> +static inline void
> +__ip_set_get_internal(struct ip_set *set)
> +{
> + write_lock_bh(&ip_set_ref_lock);
> + set->ref_kernel++;
> + write_unlock_bh(&ip_set_ref_lock);
> +}
> +
> +static inline void
> +__ip_set_put_internal(struct ip_set *set)
> +{
> + write_lock_bh(&ip_set_ref_lock);
> + BUG_ON(set->ref_kernel == 0);
> + set->ref_kernel--;
> + write_unlock_bh(&ip_set_ref_lock);
> +}
> +
>  /* Add, del and test set entries from kernel.
>   *
>   * The set behind the index must exist and must be referenced
> @@ -999,7 +1019,7 @@ static int ip_set_destroy(struct net *net, struct sock 
> *ctnl,
>   if (!attr[IPSET_ATTR_SETNAME]) {
>   for (i = 0; i < inst->ip_set_max; i++) {
>   s = ip_set(inst, i);
> - if (s && s->ref) {
> + if (s && (s->ref || s->ref_kernel)) {
>   ret = -IPSET_ERR_BUSY;
>   goto out;
>   }
> @@ -1021,7 +1041,7 @@ static int ip_set_destroy(struct net *net, struct sock 
> *ctnl,
>  

Re: [PATCH] Fix sleeping memory allocation in atomic context

2015-10-15 Thread Jozsef Kadlecsik
On Thu, 15 Oct 2015, Nikolay Borisov wrote:

> Ipset 6.26 produces the following splat:
> 
> BUG: sleeping function called from invalid context at mm/page_alloc.c:2759
> in_atomic(): 1, irqs_disabled(): 0, pid: 9664, name: ipset
> CPU: 18 PID: 9664 Comm: ipset Tainted: G   O 3.12.47-clouder3 #1
> Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
>  0002 881fd14273c8 8163d891 881fcb4264b0
>  881fcb4260c0 881fd14273e8 810ba5bf 881fd1427558
>   881fd1427568 81142b33 881f
> Call Trace:
>  [] dump_stack+0x58/0x7f
>  [] __might_sleep+0xdf/0x110
>  [] __alloc_pages_nodemask+0x243/0xc20
>  [] alloc_pages_current+0xbe/0x170
>  [] new_slab+0x295/0x340
>  [] __slab_alloc+0x2c0/0x5a0
>  [] ? __schedule+0x2dc/0x760
>  [] __kmalloc+0x11b/0x230
>  [] ? ip_set_get_byname+0xec/0x100 [ip_set]
>  [] list_set_uadd+0x16b/0x314 [ip_set_list_set]
>  [] ? _raw_write_unlock_bh+0x28/0x30
>  [] list_set_uadt+0x21c/0x320 [ip_set_list_set]
>  [] ? list_set_create+0x1a0/0x1a0 [ip_set_list_set]
>  [] call_ad+0x82/0x200 [ip_set]
>  [] ? find_set_type+0x51/0xa0 [ip_set]
>  [] ? nla_parse+0xf5/0x130
>  [] ip_set_uadd+0x20e/0x2d0 [ip_set]
>  [] ? ip_set_create+0x2a3/0x450 [ip_set]
>  [] ? ip_set_udel+0x2e0/0x2e0 [ip_set]
>  [] nfnetlink_rcv_msg+0x31e/0x330
>  [] ? nfnetlink_rcv_msg+0x41/0x330
>  [] ? nfnl_lock+0x30/0x30
>  [] netlink_rcv_skb+0xa9/0xd0
>  [] nfnetlink_rcv+0x15/0x20
>  [] netlink_unicast+0x10f/0x190
>  [] netlink_sendmsg+0x2c0/0x660
>  [] sock_sendmsg+0x90/0xc0
>  [] ? move_addr_to_user+0xa3/0xc0
>  [] ? ___sys_recvmsg+0x182/0x300
>  [] SYSC_sendto+0x134/0x180
>  [] ? mntput+0x21/0x30
>  [] ? __kfree_skb+0x3f/0xa0
>  [] SyS_sendto+0xe/0x10
>  [] system_call_fastpath+0x16/0x1b
> 
> The call chain leading to this as follow:
> call_add -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL).
> And since GFP_KERNEL allows initiating direct reclaim thus
> potentially sleeping in the allocation path, this leads to the
> aforementioned splat.
> 
> To fix it change that particular allocation type to GFP_ATOMIC, to
> correctly reflect that it is happening in an atomic context.

Good catch, Pablo please apply the patch.

Acked-by: Jozsef Kadlecsik 

Best regards,
Jozsef

> Signed-off-by: Nikolay Borisov 
> ---
> 
> Even though this patch has been generated against the stand-alone
> ipset sources I just checked the 4.3-rc4 sources and the problem
> exists there as well.
> 
>  kernel/net/netfilter/ipset/ip_set_list_set.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/net/netfilter/ipset/ip_set_list_set.c 
> b/kernel/net/netfilter/ipset/ip_set_list_set.c
> index b11ba96..0f9195f 100644
> --- a/kernel/net/netfilter/ipset/ip_set_list_set.c
> +++ b/kernel/net/netfilter/ipset/ip_set_list_set.c
> @@ -298,7 +298,7 @@ list_set_uadd(struct ip_set *set, void *value, const 
> struct ip_set_ext *ext,
> ip_set_timeout_expired(ext_timeout(n, set
>   n =  NULL;
>  
> - e = kzalloc(set->dsize, GFP_KERNEL);
> + e = kzalloc(set->dsize, GFP_ATOMIC);
>   if (!e)
>   return -ENOMEM;
>   e->id = d->id;
> -- 
> 2.5.0
> 
> 

-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix sleeping memory allocation in atomic context

2015-10-15 Thread Jozsef Kadlecsik
On Thu, 15 Oct 2015, Nikolay Aleksandrov wrote:

> On 10/15/2015 10:57 AM, Nikolay Borisov wrote:
> > Ipset 6.26 produces the following splat:
> > 
> [snip]
> > 
> > The call chain leading to this as follow:
> > call_add -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL).
> > And since GFP_KERNEL allows initiating direct reclaim thus
> > potentially sleeping in the allocation path, this leads to the
> > aforementioned splat.
> > 
> > To fix it change that particular allocation type to GFP_ATOMIC, to
> > correctly reflect that it is happening in an atomic context.
> > 
> > Signed-off-by: Nikolay Borisov 
> > ---
> > 
> > Even though this patch has been generated against the stand-alone
> > ipset sources I just checked the 4.3-rc4 sources and the problem
> > exists there as well.
> > 
> >  kernel/net/netfilter/ipset/ip_set_list_set.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/net/netfilter/ipset/ip_set_list_set.c 
> > b/kernel/net/netfilter/ipset/ip_set_list_set.c
> > index b11ba96..0f9195f 100644
> 
> Hi,
> You should fix your subject line to include the subsystem [1], something like
> netfilter: ipset: fix...
> The path to the file being patched is incorrect, patches should be generated 
> in
> the root kernel source dir [2].
> Also it'd be nice to add a Fixes tag [3] to show which commit introduced the 
> bug,
> in this case it looks like it's:
> Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list type")

The patch was created against the ipset package and not the kernel tree 
itself, so yes, a respin is needed.

Best regards,
Jozsef
 
> [1] Documentation/SubmittingPatches chapter 14
> [2] Documentation/SubmittingPatches chapter 1
> [3] Documentation/SubmittingPatches chapters 2 and 13
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] netfilter: ipset: Fix sleeping memory allocation in atomic context

2015-10-15 Thread Jozsef Kadlecsik
Hi,

On Thu, 15 Oct 2015, Eric Dumazet wrote:

> On Thu, 2015-10-15 at 16:41 +0300, Nikolay Borisov wrote:
> > 
> > On 10/15/2015 04:32 PM, Eric Dumazet wrote:
> > > On Thu, 2015-10-15 at 13:56 +0300, Nikolay Borisov wrote:
> > >> Commit 00590fdd5be0 introduced RCU locking in list type and in
> > >> doing so introduced a memory allocation in list_set_add, which
> > >> results in the following splat:
> > >>
> > >> BUG: sleeping function called from invalid context at 
> > >> mm/page_alloc.c:2759
> > >> in_atomic(): 1, irqs_disabled(): 0, pid: 9664, name: ipset
> > >> CPU: 18 PID: 9664 Comm: ipset Tainted: G   O 3.12.47-clouder3 #1
> > >> Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
> > >>  0002 881fd14273c8 8163d891 881fcb4264b0
> > >>  881fcb4260c0 881fd14273e8 810ba5bf 881fd1427558
> > >>   881fd1427568 81142b33 881f
> > >> Call Trace:
> > >>  [] dump_stack+0x58/0x7f
> > >>  [] __might_sleep+0xdf/0x110
> > >>  [] __alloc_pages_nodemask+0x243/0xc20
> > >>  [] alloc_pages_current+0xbe/0x170
> > >>  [] new_slab+0x295/0x340
> > >>  [] __slab_alloc+0x2c0/0x5a0
> > >>  [] ? __schedule+0x2dc/0x760
> > >>  [] __kmalloc+0x11b/0x230
> > >>  [] ? ip_set_get_byname+0xec/0x100 [ip_set]
> > >>  [] list_set_uadd+0x16b/0x314 [ip_set_list_set]
> > >>  [] ? _raw_write_unlock_bh+0x28/0x30
> > >>  [] list_set_uadt+0x21c/0x320 [ip_set_list_set]
> > >>  [] ? list_set_create+0x1a0/0x1a0 [ip_set_list_set]
> > >>  [] call_ad+0x82/0x200 [ip_set]
> > >>  [] ? find_set_type+0x51/0xa0 [ip_set]
> > >>  [] ? nla_parse+0xf5/0x130
> > >>  [] ip_set_uadd+0x20e/0x2d0 [ip_set]
> > >>  [] ? ip_set_create+0x2a3/0x450 [ip_set]
> > >>  [] ? ip_set_udel+0x2e0/0x2e0 [ip_set]
> > >>  [] nfnetlink_rcv_msg+0x31e/0x330
> > >>  [] ? nfnetlink_rcv_msg+0x41/0x330
> > >>  [] ? nfnl_lock+0x30/0x30
> > >>  [] netlink_rcv_skb+0xa9/0xd0
> > >>  [] nfnetlink_rcv+0x15/0x20
> > >>  [] netlink_unicast+0x10f/0x190
> > >>  [] netlink_sendmsg+0x2c0/0x660
> > >>  [] sock_sendmsg+0x90/0xc0
> > >>  [] ? move_addr_to_user+0xa3/0xc0
> > >>  [] ? ___sys_recvmsg+0x182/0x300
> > >>  [] SYSC_sendto+0x134/0x180
> > >>  [] ? mntput+0x21/0x30
> > >>  [] ? __kfree_skb+0x3f/0xa0
> > >>  [] SyS_sendto+0xe/0x10
> > >>  [] system_call_fastpath+0x16/0x1b
> > >>
> > >> The call chain leading to this is as follow:
> > >> call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL).
> > >> And since GFP_KERNEL allows initiating direct reclaim thus
> > >> potentially sleeping in the allocation path, this leads to the
> > >> aforementioned splat.
> > >>
> > >> To fix it change the allocation type to GFP_ATOMIC, to
> > >> correctly reflect that it is occuring in an atomic context.
> > >>
> > >> Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list 
> > >> type")
> > >>
> > >> Acked-by: Jozsef Kadlecsik 
> > >> Signed-off-by: Nikolay Borisov 
> > >> ---
> > >>
> > >> Changes since V1: 
> > >>  * Added acked-by 
> > >>  * Fixed patch header 
> > >>
> > >>  net/netfilter/ipset/ip_set_list_set.c | 2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/net/netfilter/ipset/ip_set_list_set.c 
> > >> b/net/netfilter/ipset/ip_set_list_set.c
> > >> index a1fe537..5a30ce6 100644
> > >> --- a/net/netfilter/ipset/ip_set_list_set.c
> > >> +++ b/net/netfilter/ipset/ip_set_list_set.c
> > >> @@ -297,7 +297,7 @@ list_set_uadd(struct ip_set *set, void *value, const 
> > >> struct ip_set_ext *ext,
> > >>ip_set_timeout_expired(ext_timeout(n, set
> > >>  n =  NULL;
> > >>  
> > >> -e = kzalloc(set->dsize, GFP_KERNEL);
> > >> +e = kzalloc(set->dsize, GFP_ATOMIC);
> > >>  if (!e)
> > >>  return -ENOMEM;
> > >>  e->id = d->id;
> > > 
> > > This patch looks very bogus to me.
> > 

Re: [PATCH v2] netfilter: ipset: Fix sleeping memory allocation in atomic context

2015-10-16 Thread Jozsef Kadlecsik
On Thu, 15 Oct 2015, Eric Dumazet wrote:

> On Thu, 2015-10-15 at 23:20 +0300, Nikolay Borisov wrote:
> 
> > While GFP_ATOMIC does indeed look the correct solution for this particular
> > case I was wondering whether something like (GFP_KERNEL & ~__GFP_WAIT)
> > wouldn't also make the cut without causing sleeping? I guess this is exactly
> > the sort of situation that Mel Gorman's patch can address
> > (marc.info/?l=linux-kernel&m=144283282101953) ?
> 
> This is not applicable here, because the caller would have to find a way
> to keep trying.
> 
> I believe one way to handle this problem (in a followup patch) would be
> to use a work queue for the gc, not a timer.
> 
> Using a timer for gc is almost always subject to big problems anyway.

Thanks, really! I'll work on to replace the timer based gc with a work 
queue based one in the next version.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] netfilter: ipset: Fix sleeping memory allocation in atomic context

2015-10-16 Thread Jozsef Kadlecsik
On Fri, 16 Oct 2015, Nikolay Borisov wrote:

> Commit 00590fdd5be0 introduced RCU locking in list type and in
> doing so introduced a memory allocation in list_set_add, which 
> is done in an atomic context, due to the fact that ipset rcu 
> list modifications are serialised with a spin lock. The reason 
> why we can't use a mutex is that in addition to modifying the 
> list with ipset commands, it's also being modified when a
> particular ipset rule timeout expires aka garbage collection. 
> This gc is triggered from set_cleanup_entries, which in turn 
> is invoked from a timer thus requiring the lock to be bh-safe. 
> 
> Concretely the following call chain can lead to "sleeping function
> called in atomic context" splat: 
> call_ad -> list_set_uadt -> list_set_uadd -> kzalloc(, GFP_KERNEL).
> And since GFP_KERNEL allows initiating direct reclaim thus
> potentially sleeping in the allocation path.
> 
> To fix the issue change the allocation type to GFP_ATOMIC, to
> correctly reflect that it is occuring in an atomic context.
> 
> Fixes: 00590fdd5be0 ("netfilter: ipset: Introduce RCU locking in list type")
> 
> Acked-by: Jozsef Kadlecsik 
> Signed-off-by: Nikolay Borisov 
> ---
> 
> Changes since v2:
>  * Massaged the changelog to reflect discussion 
>  on the mailing list
> 
> Changes since v1: 
>  * Added acked-by 
>  * Fixed patch header
> 
>  
> 
>  net/netfilter/ipset/ip_set_list_set.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

The patch is applied in the ipset package tree. Thanks!
 
Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 4.2.4

2015-10-25 Thread Jozsef Kadlecsik
Hi,

On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:

> On 25.10.2015 10:46, Willy Tarreau wrote:
> > ipset *triggered* the problem. The whole stack dump would tell more. 
> 
> OK, find the stack traces in the bug report:
> https://bugzilla.redhat.com/show_bug.cgi?id=1272645
> 
> Kernel 4.1.10 triggered also a kernel dump when playing with ipset commands
> and IPv6, details in the bug report  

It seems to me it is an architecture-specific alignment issue. I don't 
have a Cortex-A7 ARM hardware and qemu doesn't seem to support it either, 
so I'm unable to reproduce it (ipset passes all my tests on my hardware, 
including more complex ones than what breaks here). My first wild guess is 
that the dynamic array of the element structure is not aligned properly. 
Could you give a try to the next patch?

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index afe905c..1cf357d 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -1211,6 +1211,9 @@ static const struct ip_set_type_variant mtype_variant = {
.same_set = mtype_same_set,
 };
 
+#define IP_SET_BASE_ALIGN(dtype)   \
+   ALIGN(sizeof(struct dtype), __alignof__(struct dtype))
+
 #ifdef IP_SET_EMIT_CREATE
 static int
 IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
@@ -1319,12 +1322,12 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct 
ip_set *set,
 #endif
set->variant = &IPSET_TOKEN(HTYPE, 4_variant);
set->dsize = ip_set_elem_len(set, tb,
-   sizeof(struct IPSET_TOKEN(HTYPE, 4_elem)));
+   IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 4_elem)));
 #ifndef IP_SET_PROTO_UNDEF
} else {
set->variant = &IPSET_TOKEN(HTYPE, 6_variant);
set->dsize = ip_set_elem_len(set, tb,
-   sizeof(struct IPSET_TOKEN(HTYPE, 6_elem)));
+   IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 6_elem)));
}
 #endif
if (tb[IPSET_ATTR_TIMEOUT]) {

If that does not solve it, then could you help to narrow down the issue? 
Does the bug still appear if your remove the counter extension of the set?

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 4.2.4

2015-10-25 Thread Jozsef Kadlecsik
On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:

> On 25.10.2015 21:08, Gerhard Wiesinger wrote:
> > On 25.10.2015 20:46, Jozsef Kadlecsik wrote:
> > > Hi,
> > > 
> > > On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:
> > > 
> > > > On 25.10.2015 10:46, Willy Tarreau wrote:
> > > > > ipset *triggered* the problem. The whole stack dump would tell more.
> > > > OK, find the stack traces in the bug report:
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1272645
> > > > 
> > > > Kernel 4.1.10 triggered also a kernel dump when playing with ipset
> > > > commands
> > > > and IPv6, details in the bug report  
> > > It seems to me it is an architecture-specific alignment issue. I don't
> > > have a Cortex-A7 ARM hardware and qemu doesn't seem to support it either,
> > > so I'm unable to reproduce it (ipset passes all my tests on my hardware,
> > > including more complex ones than what breaks here). My first wild guess is
> > > that the dynamic array of the element structure is not aligned properly.
> > > Could you give a try to the next patch?
> > > 
> > > diff --git a/net/netfilter/ipset/ip_set_hash_gen.h
> > > b/net/netfilter/ipset/ip_set_hash_gen.h
> > > index afe905c..1cf357d 100644
> > > --- a/net/netfilter/ipset/ip_set_hash_gen.h
> > > +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> > > @@ -1211,6 +1211,9 @@ static const struct ip_set_type_variant
> > > mtype_variant = {
> > >   .same_set = mtype_same_set,
> > >   };
> > >   +#define IP_SET_BASE_ALIGN(dtype)\
> > > +ALIGN(sizeof(struct dtype), __alignof__(struct dtype))
> > > +
> > >   #ifdef IP_SET_EMIT_CREATE
> > >   static int
> > >   IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
> > > @@ -1319,12 +1322,12 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net,
> > > struct ip_set *set,
> > >   #endif
> > >   set->variant = &IPSET_TOKEN(HTYPE, 4_variant);
> > >   set->dsize = ip_set_elem_len(set, tb,
> > > -sizeof(struct IPSET_TOKEN(HTYPE, 4_elem)));
> > > +IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 4_elem)));
> > >   #ifndef IP_SET_PROTO_UNDEF
> > >   } else {
> > >   set->variant = &IPSET_TOKEN(HTYPE, 6_variant);
> > >   set->dsize = ip_set_elem_len(set, tb,
> > > -sizeof(struct IPSET_TOKEN(HTYPE, 6_elem)));
> > > +IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE, 6_elem)));
> > >   }
> > >   #endif
> > >   if (tb[IPSET_ATTR_TIMEOUT]) {
> > > 
> > > If that does not solve it, then could you help to narrow down the issue?
> > > Does the bug still appear if your remove the counter extension of the set?
> > > 
>
> Thank you for the patch it but still  crashes, see:
> https://bugzilla.redhat.com/show_bug.cgi?id=1272645
> 
> Any further ideas?

Does it crash without counters? That could narrow down where to look for.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux 4.2.4

2015-10-26 Thread Jozsef Kadlecsik
On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:

> On 25.10.2015 20:46, Jozsef Kadlecsik wrote:
> > Hi,
> > 
> > On Sun, 25 Oct 2015, Gerhard Wiesinger wrote:
> > 
> > > On 25.10.2015 10:46, Willy Tarreau wrote:
> > > > ipset *triggered* the problem. The whole stack dump would tell more.
> > > OK, find the stack traces in the bug report:
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1272645
> > > 
> > > Kernel 4.1.10 triggered also a kernel dump when playing with ipset
> > > commands
> > > and IPv6, details in the bug report  
> > It seems to me it is an architecture-specific alignment issue. I don't
> > have a Cortex-A7 ARM hardware and qemu doesn't seem to support it either,
> > so I'm unable to reproduce it (ipset passes all my tests on my hardware,
> > including more complex ones than what breaks here). My first wild guess is
> > that the dynamic array of the element structure is not aligned properly.
> > Could you give a try to the next patch?
> > 
> > diff --git a/net/netfilter/ipset/ip_set_hash_gen.h
> > b/net/netfilter/ipset/ip_set_hash_gen.h
> > index afe905c..1cf357d 100644
> > --- a/net/netfilter/ipset/ip_set_hash_gen.h
> > +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> > @@ -1211,6 +1211,9 @@ static const struct ip_set_type_variant mtype_variant
> > = {
> > .same_set = mtype_same_set,
> >   };
> >   +#define IP_SET_BASE_ALIGN(dtype) \
> > +   ALIGN(sizeof(struct dtype), __alignof__(struct dtype))
> > +
> >   #ifdef IP_SET_EMIT_CREATE
> >   static int
> >   IPSET_TOKEN(HTYPE, _create)(struct net *net, struct ip_set *set,
> > @@ -1319,12 +1322,12 @@ IPSET_TOKEN(HTYPE, _create)(struct net *net, struct
> > ip_set *set,
> >   #endif
> > set->variant = &IPSET_TOKEN(HTYPE, 4_variant);
> > set->dsize = ip_set_elem_len(set, tb,
> > -   sizeof(struct IPSET_TOKEN(HTYPE, 4_elem)));
> > +   IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE,
> > 4_elem)));
> >   #ifndef IP_SET_PROTO_UNDEF
> > } else {
> > set->variant = &IPSET_TOKEN(HTYPE, 6_variant);
> > set->dsize = ip_set_elem_len(set, tb,
> > -   sizeof(struct IPSET_TOKEN(HTYPE, 6_elem)));
> > +   IP_SET_BASE_ALIGN(IPSET_TOKEN(HTYPE,
> > 6_elem)));
> > }
> >   #endif
> > if (tb[IPSET_ATTR_TIMEOUT]) {
> > 
> > If that does not solve it, then could you help to narrow down the issue?
> > Does the bug still appear if your remove the counter extension of the set?
> > 
> 
> Patch applied well, compiling ...
> 
> Interesting, that it didn't happen before. Device is in production for 
> more than 2 month without any issue.

You mean the device was stable with the earlier kernels, but starting with 
4.2.3 (and back to 4.1.10) you have got problems, don't you?
 
> Also any idea regarding the second isssue? Or do you think it has the 
> same root cause?

Looking at your RedHat bugzilla report, the "nf_conntrack: table full, 
dropping packet" and "Alignment trap: not handling instruction" are two 
unrelated issues and the second one is triggered by the unaligned counter 
extension acccess in ipset, I'm investigating. I can't think of any reason 
how those issues could be related to each other.

> Greetings from Vienna, Austria :-)

Quite near to my place :-) 

> BTW: You can get the Banana Pi R1 for example at:
> http://www.aliexpress.com/item/BPI-R1-Set-1-R1-Board-Clear-Case-5dB-Antenna-Power-Adapter-Banana-PI-R1-Smart/32362127917.html
> I can really recommend it as a router. Power consumption is as less as 3W.
> Price is also IMHO very good.

Cool mini gear, indeed!

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] Netfilter fixes for net

2015-11-13 Thread Jozsef Kadlecsik
On Fri, 13 Nov 2015, Josh Boyer wrote:

> On Wed, Nov 11, 2015 at 12:33 PM, Pablo Neira Ayuso  
> wrote:
> > Jozsef Kadlecsik (3):
> >   netfilter: ipset: Fix extension alignment
> >   netfilter: ipset: Fix hash:* type expiration
> >   netfilter: ipset: Fix hash type expire: release empty hash bucket 
> > block
> 
> Should these three go to stable?  We've had reports in Fedora about
> ipset crashing on e.g. ARM architectures with 4.2.y kernels.  If not
> all three, then perhaps just the alignment fix?

Yes, those should definitely go to stable, at least the first two ones.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next] netfilter: ipset: Fixing unnamed union init

2015-08-22 Thread Jozsef Kadlecsik
On Sat, 22 Aug 2015, Elad Raz wrote:

> In continue to proposed Vinson Lee's post [1], this patch fixes compilation
> issues founded at gcc 4.4.7. The initialization of .cidr field of unnamed
> unions causes compilation error in gcc 4.4.x.

There's already a (couple of weeks old) patch in the -mm tree to fix the 
gcc compatilibity issue, see the last comment in the thread you refer to:

> Visible links
> [1] https://lkml.org/lkml/2015/7/5/74

So I'm unsure whether a new patch should be submitted for this.

Best regards,
Jozsef
 
> Signed-off-by: Elad Raz 
> ---
>  net/netfilter/ipset/ip_set_hash_netnet.c | 20 ++--
>  net/netfilter/ipset/ip_set_hash_netportnet.c | 20 ++--
>  2 files changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_hash_netnet.c 
> b/net/netfilter/ipset/ip_set_hash_netnet.c
> index 3c862c0..a93dfeb 100644
> --- a/net/netfilter/ipset/ip_set_hash_netnet.c
> +++ b/net/netfilter/ipset/ip_set_hash_netnet.c
> @@ -131,6 +131,13 @@ hash_netnet4_data_next(struct hash_netnet4_elem *next,
>  #define HOST_MASK32
>  #include "ip_set_hash_gen.h"
>  
> +static void
> +hash_netnet4_init(struct hash_netnet4_elem *e)
> +{
> + e->cidr[0] = HOST_MASK;
> + e->cidr[1] = HOST_MASK;
> +}
> +
>  static int
>  hash_netnet4_kadt(struct ip_set *set, const struct sk_buff *skb,
> const struct xt_action_param *par,
> @@ -160,7 +167,7 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
>  {
>   const struct hash_netnet *h = set->data;
>   ipset_adtfn adtfn = set->variant->adt[adt];
> - struct hash_netnet4_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
> + struct hash_netnet4_elem e = { };
>   struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
>   u32 ip = 0, ip_to = 0, last;
>   u32 ip2 = 0, ip2_from = 0, ip2_to = 0, last2;
> @@ -169,6 +176,7 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
>   if (tb[IPSET_ATTR_LINENO])
>   *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
>  
> + hash_netnet4_init(&e);
>   if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
>!ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS)))
>   return -IPSET_ERR_PROTOCOL;
> @@ -357,6 +365,13 @@ hash_netnet6_data_next(struct hash_netnet4_elem *next,
>  #define IP_SET_EMIT_CREATE
>  #include "ip_set_hash_gen.h"
>  
> +static void
> +hash_netnet6_init(struct hash_netnet6_elem *e)
> +{
> + e->cidr[0] = HOST_MASK;
> + e->cidr[1] = HOST_MASK;
> +}
> +
>  static int
>  hash_netnet6_kadt(struct ip_set *set, const struct sk_buff *skb,
> const struct xt_action_param *par,
> @@ -385,13 +400,14 @@ hash_netnet6_uadt(struct ip_set *set, struct nlattr 
> *tb[],
> enum ipset_adt adt, u32 *lineno, u32 flags, bool retried)
>  {
>   ipset_adtfn adtfn = set->variant->adt[adt];
> - struct hash_netnet6_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
> + struct hash_netnet6_elem e = { };
>   struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
>   int ret;
>  
>   if (tb[IPSET_ATTR_LINENO])
>   *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
>  
> + hash_netnet6_init(&e);
>   if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
>!ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS)))
>   return -IPSET_ERR_PROTOCOL;
> diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c 
> b/net/netfilter/ipset/ip_set_hash_netportnet.c
> index 0c68734..9a14c23 100644
> --- a/net/netfilter/ipset/ip_set_hash_netportnet.c
> +++ b/net/netfilter/ipset/ip_set_hash_netportnet.c
> @@ -142,6 +142,13 @@ hash_netportnet4_data_next(struct hash_netportnet4_elem 
> *next,
>  #define HOST_MASK32
>  #include "ip_set_hash_gen.h"
>  
> +static void
> +hash_netportnet4_init(struct hash_netportnet4_elem *e)
> +{
> + e->cidr[0] = HOST_MASK;
> + e->cidr[1] = HOST_MASK;
> +}
> +
>  static int
>  hash_netportnet4_kadt(struct ip_set *set, const struct sk_buff *skb,
> const struct xt_action_param *par,
> @@ -175,7 +182,7 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr 
> *tb[],
>  {
>   const struct hash_netportnet *h = set->data;
>   ipset_adtfn adtfn = set->variant->adt[adt];
> - struct hash_netportnet4_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
> + struct hash_netportnet4_elem e = { };
>   struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
>   u32 ip = 0, ip_to = 0, ip_last, p = 0, port, port_to;
>   u32 ip2_from = 0, ip2_to = 0, ip2_last, ip2;
> @@ -185,6 +192,7 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr 
> *tb[],
>   if (tb[IPSET_ATTR_LINENO])
>   *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
>  
> + hash_netportnet4_init(&e);
>   if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
>!ip_set_attr_netorder(tb, IPSET_ATTR_PORT) ||
>!ip_se

Re: tcp hang when socket fills up ?

2018-04-18 Thread Jozsef Kadlecsik
Hi,

On Tue, 17 Apr 2018, Florian Westphal wrote:

> Dominique Martinet  wrote:
> 
> [ CC Jozsef ]
> 
> > Could it have something to do with the way I setup the connection?
> > I don't think the "both remotes call connect() with carefully selected
> > source/dest port" is a very common case..
> > 
> > If you look at the tcpdump outputs I attached the sequence usually is
> > something like
> >  server > client SYN
> >  client > server SYN
> >  server > client SYNACK
> >  client > server ACK
> > 
> > ultimately it IS a connection, but with an extra SYN packet in front of
> > it (that first SYN opens up the conntrack of the nat so that the
> > client's syn can come in, the client's conntrack will be that of a
> > normal connection since its first SYN goes in directly after the
> > server's (it didn't see the server's SYN))
> > 
> > Looking at my logs again, I'm seeing the same as you:
> > 
> > This looks like the actual SYN/SYN/SYNACK/ACK:
> >  - 14.364090 seq=505004283 likely SYN coming out of server
> >  - 14.661731 seq=1913287797 on next line it says receiver
> > end=505004284 so likely the matching SYN from client
> > Which this time gets a proper SYNACK from server:
> > 14.662020 seq=505004283 ack=1913287798
> > And following final dataless ACK:
> > 14.687570 seq=1913287798 ack=505004284
> > 
> > Then as you point out some data ACK, where the scale poofs:
> > 14.688762 seq=1913287798 ack=505004284+(0) sack=505004284+(0) win=229 
> > end=1913287819
> > 14.688793 tcp_in_window: sender end=1913287798 maxend=1913316998 
> > maxwin=29312 scale=7 receiver end=505004284 maxend=505033596 maxwin=29200 
> > scale=7
> > 14.688824 tcp_in_window: 
> > 14.688852 seq=1913287798 ack=505004284+(0) sack=505004284+(0) win=229 
> > end=1913287819
> > 14.62 tcp_in_window: sender end=1913287819 maxend=1913287819 maxwin=229 
> > scale=0 receiver end=505004284 maxend=505033596 maxwin=29200 scale=7
> >
> > As you say, only tcp_options() will clear only on side of the scales.
> > We don't have sender->td_maxwin == 0 (printed) so I see no other way
> > than we are in the last else if:
> >  - we have after(end, sender->td_end) (end=1913287819 > sender
> > end=1913287798)
> >  - I assume the tcp state machine must be confused because of the
> > SYN/SYN/SYNACK/ACK pattern and we probably enter the next check, 
> > but since this is a data packet it doesn't have the tcp option for scale
> > thus scale resets.
> 
> Yes, this looks correct. Jozsef, can you please have a look?
> 
> Problem seems to be that conntrack believes that ACK packet
> re-initializes the connection:
> 
>  595 /*
>  596  * RFC 793: "if a TCP is reinitialized ... then it need
>  597  * not wait at all; it must only be sure to use sequence
>  598  * numbers larger than those recently used."
>  599  */
>  600 sender->td_end =
>  601 sender->td_maxend = end;
>  602 sender->td_maxwin = (win == 0 ? 1 : win);
>  603 
>  604 tcp_options(skb, dataoff, tcph, sender);
> 
> and last line clears the scale value (no wscale option in data packet).
> 
> 
> Transitions are:
>  server > client SYN  sNO -> sSS
>  client > server SYN  sSS -> sS2
>  server > client SYNACK   sS2 -> sSR /* here */
>  client > server ACK  sSR -> sES
> 
> SYN/ACK was observed in original direction so we hit
> state->state == TCP_CONNTRACK_SYN_RECV && dir == IP_CT_DIR_REPLY test
> when we see the ack packet and end up in the 'TCP is reinitialized' branch.
> 
> AFAICS, without this, connection would move to sES just fine,
> as the data ack is in window.

Yes, the state transition is wrong for simultaneous open, because the 
tcp_conntracks table is not (cannot be) smart enough. Could you verify the 
next untested patch?

diff --git a/include/uapi/linux/netfilter/nf_conntrack_tcp.h 
b/include/uapi/linux/netfilter/nf_conntrack_tcp.h
index 74b9115..bcba72d 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_tcp.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_tcp.h
@@ -46,6 +46,9 @@ enum tcp_conntrack {
 /* Marks possibility for expected RFC5961 challenge ACK */
 #define IP_CT_EXP_CHALLENGE_ACK0x40
 
+/* Simultaneous open initialized */
+#define IP_CT_TCP_SIMULTANEOUS_OPEN0x80
+
 struct nf_ct_tcp_flags {
__u8 flags;
__u8 mask;
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c 
b/net/netfilter/nf_conntrack_proto_tcp.c
index e97cdc1..8e67910 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -981,6 +981,17 @@ static int tcp_packet(struct nf_conn *ct,
return NF_ACCEPT; /* Don't change state */
}
break;
+   case TCP_CONNTRACK_SYN_SENT2:
+   /* tcp_conntracks table is not smart enough to handle
+* simultaneous open.
+*/
+   ct->proto

Re: tcp hang when socket fills up ?

2018-04-18 Thread Jozsef Kadlecsik
On Wed, 18 Apr 2018, Dominique Martinet wrote:

> Dominique Martinet wrote on Wed, Apr 18, 2018:
> > Jozsef Kadlecsik wrote on Wed, Apr 18, 2018:
> > > Yes, the state transition is wrong for simultaneous open, because the 
> > > tcp_conntracks table is not (cannot be) smart enough. Could you verify 
> > > the 
> > > next untested patch?
> > 
> > Thanks for the patch; I'll give it a try (probably won't make it today
> > so will report tomorrow)
> 
> Actually had time; I can confirm (added printks) we did get in that if 
> that was pointed at, and we no longer get there now. The connection no 
> longer gets in invalid state, so that looks like it nailed it.
>
> I'm now confused what this has to do with tcp_timestamp though, since 
> setting that off also seemed to work around the issue, but if we get 
> something like that in I'll be happy anyway.

Thanks for the testing! One more line is required, however: we have to get 
the assured bit set for the connection, see the new patch below.

The tcp_conntracks state table could be fixed with introducing a new 
state, but that part is exposed to userspace (ctnetlink) and ugly 
compatibility code would be required for backward compatibility.
 
diff --git a/include/uapi/linux/netfilter/nf_conntrack_tcp.h 
b/include/uapi/linux/netfilter/nf_conntrack_tcp.h
index 74b9115..bcba72d 100644
--- a/include/uapi/linux/netfilter/nf_conntrack_tcp.h
+++ b/include/uapi/linux/netfilter/nf_conntrack_tcp.h
@@ -46,6 +46,9 @@ enum tcp_conntrack {
 /* Marks possibility for expected RFC5961 challenge ACK */
 #define IP_CT_EXP_CHALLENGE_ACK0x40
 
+/* Simultaneous open initialized */
+#define IP_CT_TCP_SIMULTANEOUS_OPEN0x80
+
 struct nf_ct_tcp_flags {
__u8 flags;
__u8 mask;
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c 
b/net/netfilter/nf_conntrack_proto_tcp.c
index e97cdc1..2c1fc7e 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -981,6 +981,20 @@ static int tcp_packet(struct nf_conn *ct,
return NF_ACCEPT; /* Don't change state */
}
break;
+   case TCP_CONNTRACK_SYN_SENT2:
+   /* tcp_conntracks table is not smart enough to handle
+* simultaneous open.
+*/
+   ct->proto.tcp.last_flags |= IP_CT_TCP_SIMULTANEOUS_OPEN;
+   break;
+   case TCP_CONNTRACK_SYN_RECV:
+   if (dir == IP_CT_DIR_REPLY && index == TCP_ACK_SET &&
+   ct->proto.tcp.last_flags & IP_CT_TCP_SIMULTANEOUS_OPEN) {
+   /* We want to set the assured bit */
+   old_state = TCP_CONNTRACK_SYN_RECV;
+   new_state = TCP_CONNTRACK_ESTABLISHED;
+   }
+   break;
case TCP_CONNTRACK_CLOSE:
if (index == TCP_RST_SET
&& (ct->proto.tcp.seen[!dir].flags & 
IP_CT_TCP_FLAG_MAXACK_SET)

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: tcp hang when socket fills up ?

2018-04-18 Thread Jozsef Kadlecsik
On Wed, 18 Apr 2018, Dominique Martinet wrote:

> Jozsef Kadlecsik wrote on Wed, Apr 18, 2018:
> > Thanks for the testing! One more line is required, however: we have to get 
> > the assured bit set for the connection, see the new patch below.
> 
> I think it actually was better before. If I understand things correctly
> at this point (when we get in the case TCP_CONNTRACK_SYN_RECV) we will
> have seen SYN(out) SYN(in) SYNACK(out), but not the final ACK(in) yet.
> 
> Leaving old state as it was will not set the assured bit, but that will 
> be set on the next packet because old_state == new_state == established 
> at that point and the connection will really be setup then.

Yes, you are right: the first patch is better than the second one. 
Overthinking :-)

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH RFC 0/4] net: add bpfilter

2018-02-19 Thread Jozsef Kadlecsik
Hi David,

On Mon, 19 Feb 2018, Florian Westphal wrote:

> David Miller  wrote:
> > 
> > Florian, first of all, the whole "change the iptables binary" idea is
> > a non-starter.  For the many reasons I have described in the various
> > postings I have made today.
> > 
> > It is entirely impractical.

You stressed several times that container images, virtualization 
installations don't change - and that's exaggregation. Those are updated 
as well, and not only because security updates must be rolled out, but 
because new versions of softwares are requested.

You mentioned that the hosting part can upgrade the kernel - it means that 
enabling NFTABLES is also a non-issue when the new eBPF functionality is 
switched on, if that was missing.

> You suggest:
> 
> iptables -> setsockopt -> umh (xtables -> ebpf) -> kernel
> 
> How is this different from
> 
> iptables -> setsockopt -> umh (Xtables -> nftables -> kernel
> 
> ?
> EBPF can be placed within nftables either userspace or kernel,
> there is nothing that prevents this.

So why the second scenario suggested by Florian is not possible or must be 
avoided? It not only could keep the unmodified iptables in the container 
(if that's a must from some reason), but it would make possible to replace 
it later anytime with iptables-compat/nftables.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH] netfilter: ipset: do not call ipset_nest_end after nla_nest_cancel

2018-12-01 Thread Jozsef Kadlecsik
Hi,

On Mon, 26 Nov 2018, Pan Bian wrote:

> In the error handling block, nla_nest_cancel(skb, atd) is called to
> cancel the nest operation. But then, ipset_nest_end(skb, atd) is
> unexpected called to end the nest operation. This patch calls the
> ipset_nest_end only on the branch that nla_nest_cancel is
> not called.
> 
> Fixes: 45040978c89("netfilter: ipset: Fix set:list type crash when
> flush/dump set in parallel")

Good catch, thank you. Patch is applied in ipset git tree.

Best regards,
Jozsef

> Signed-off-by: Pan Bian 
> ---
>  net/netfilter/ipset/ip_set_list_set.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_list_set.c 
> b/net/netfilter/ipset/ip_set_list_set.c
> index 4eef55d..8da228d 100644
> --- a/net/netfilter/ipset/ip_set_list_set.c
> +++ b/net/netfilter/ipset/ip_set_list_set.c
> @@ -531,8 +531,8 @@ list_set_list(const struct ip_set *set,
>   ret = -EMSGSIZE;
>   } else {
>   cb->args[IPSET_CB_ARG0] = i;
> + ipset_nest_end(skb, atd);
>   }
> - ipset_nest_end(skb, atd);
>  out:
>   rcu_read_unlock();
>   return ret;
> -- 
> 2.7.4
> 
> 
> 

-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


RE: [PATCH] netfilter: fix stringop-overflow warning with UBSAN

2017-10-04 Thread Jozsef Kadlecsik
Hi,

[Sorry, at holiday I just cursory watched the mailing lists.]

On Tue, 1 Aug 2017, David Laight wrote:

> From: Arnd Bergmann
> > Sent: 31 July 2017 11:09
> > Using gcc-7 with UBSAN enabled, we get this false-positive warning:
> > 
> > net/netfilter/ipset/ip_set_core.c: In function 'ip_set_sockfn_get':
> > net/netfilter/ipset/ip_set_core.c:1998:3: error: 'strncpy' writing 32 bytes 
> > into a region of size 2
> > overflows the destination [-Werror=stringop-overflow=]
> >strncpy(req_get->set.name, set ? set->name : "",
> >^~~~
> > sizeof(req_get->set.name));
> > ~~
> > 
> > This seems completely bogus, and I could not find a nice workaround.
> > To work around it in a less elegant way, I change the ?: operator
> > into an if()/else() construct.
> > 
> > Signed-off-by: Arnd Bergmann 
> > ---
> >  net/netfilter/ipset/ip_set_core.c | 8 ++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/netfilter/ipset/ip_set_core.c 
> > b/net/netfilter/ipset/ip_set_core.c
> > index e495b5e484b1..d7ebb021003b 100644
> > --- a/net/netfilter/ipset/ip_set_core.c
> > +++ b/net/netfilter/ipset/ip_set_core.c
> > @@ -1995,8 +1995,12 @@ ip_set_sockfn_get(struct sock *sk, int optval, void 
> > __user *user, int *len)
> > }
> > nfnl_lock(NFNL_SUBSYS_IPSET);
> > set = ip_set(inst, req_get->set.index);
> > -   strncpy(req_get->set.name, ,
> > -   IPSET_MAXNAMELEN);
> > +   if (set)
> > +   strncpy(req_get->set.name, set->name,
> > +   sizeof(req_get->set.name));
> > +   else
> > +   memset(req_get->set.name, '\0',
> > +  sizeof(req_get->set.name));
> 
> If you use strncpy() here, the compiler might optimise the code
> back to 'how it was before'.
> 
> Or, maybe an explicit temporary: 'const char *name = set ? set->name : "";

I think the best to go with the explicit temporary variable. The if-else 
construct is too much for such a case.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH] netfilter: ipset: Convert timers to use timer_setup()

2017-10-05 Thread Jozsef Kadlecsik
Hi,

On Wed, 4 Oct 2017, Kees Cook wrote:

> In preparation for unconditionally passing the struct timer_list pointer 
> to all timer callbacks, switch to using the new timer_setup() and 
> from_timer() to pass the timer pointer explicitly. This introduces a 
> pointer back to the struct ip_set, which is used instead of the struct 
> timer_list .data field.

Please add the same changes to net/netfilter/ipset/ip_set_list.c too, in 
order to handle all ipset modules in a single patch. I don't see a way 
either to avoid the introduction of the new pointer.

Acked-by: Jozsef Kadlecsik 

Best regards,
Jozsef
 
> Cc: Pablo Neira Ayuso 
> Cc: Jozsef Kadlecsik 
> Cc: Florian Westphal 
> Cc: "David S. Miller" 
> Cc: Stephen Hemminger 
> Cc: simran singhal 
> Cc: Muhammad Falak R Wani 
> Cc: netfilter-de...@vger.kernel.org
> Cc: coret...@netfilter.org
> Cc: netdev@vger.kernel.org
> Cc: Thomas Gleixner 
> Signed-off-by: Kees Cook 
> ---
> This requires commit 686fef928bba ("timer: Prepare to change timer
> callback argument type") in v4.14-rc3, but should be otherwise
> stand-alone.
> ---
>  net/netfilter/ipset/ip_set_bitmap_gen.h   | 10 +-
>  net/netfilter/ipset/ip_set_bitmap_ip.c|  2 ++
>  net/netfilter/ipset/ip_set_bitmap_ipmac.c |  2 ++
>  net/netfilter/ipset/ip_set_bitmap_port.c  |  2 ++
>  net/netfilter/ipset/ip_set_hash_gen.h | 12 +++-
>  5 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_bitmap_gen.h 
> b/net/netfilter/ipset/ip_set_bitmap_gen.h
> index 8ad2b52a0b32..5ca18f07683b 100644
> --- a/net/netfilter/ipset/ip_set_bitmap_gen.h
> +++ b/net/netfilter/ipset/ip_set_bitmap_gen.h
> @@ -37,11 +37,11 @@
>  #define get_ext(set, map, id)((map)->extensions + ((set)->dsize * 
> (id)))
>  
>  static void
> -mtype_gc_init(struct ip_set *set, void (*gc)(unsigned long ul_set))
> +mtype_gc_init(struct ip_set *set, void (*gc)(struct timer_list *t))
>  {
>   struct mtype *map = set->data;
>  
> - setup_timer(&map->gc, gc, (unsigned long)set);
> + timer_setup(&map->gc, gc, 0);
>   mod_timer(&map->gc, jiffies + IPSET_GC_PERIOD(set->timeout) * HZ);
>  }
>  
> @@ -272,10 +272,10 @@ mtype_list(const struct ip_set *set,
>  }
>  
>  static void
> -mtype_gc(unsigned long ul_set)
> +mtype_gc(struct timer_list *t)
>  {
> - struct ip_set *set = (struct ip_set *)ul_set;
> - struct mtype *map = set->data;
> + struct mtype *map = from_timer(map, t, gc);
> + struct ip_set *set = map->set;
>   void *x;
>   u32 id;
>  
> diff --git a/net/netfilter/ipset/ip_set_bitmap_ip.c 
> b/net/netfilter/ipset/ip_set_bitmap_ip.c
> index 4783efff0bde..d8975a0b4282 100644
> --- a/net/netfilter/ipset/ip_set_bitmap_ip.c
> +++ b/net/netfilter/ipset/ip_set_bitmap_ip.c
> @@ -48,6 +48,7 @@ struct bitmap_ip {
>   size_t memsize; /* members size */
>   u8 netmask; /* subnet netmask */
>   struct timer_list gc;   /* garbage collection */
> + struct ip_set *set; /* attached to this ip_set */
>   unsigned char extensions[0] /* data extensions */
>   __aligned(__alignof__(u64));
>  };
> @@ -232,6 +233,7 @@ init_map_ip(struct ip_set *set, struct bitmap_ip *map,
>   map->netmask = netmask;
>   set->timeout = IPSET_NO_TIMEOUT;
>  
> + map->set = set;
>   set->data = map;
>   set->family = NFPROTO_IPV4;
>  
> diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c 
> b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
> index 9a065f672d3a..4c279fbd2d5d 100644
> --- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c
> +++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
> @@ -52,6 +52,7 @@ struct bitmap_ipmac {
>   u32 elements;   /* number of max elements in the set */
>   size_t memsize; /* members size */
>   struct timer_list gc;   /* garbage collector */
> + struct ip_set *set; /* attached to this ip_set */
>   unsigned char extensions[0] /* MAC + data extensions */
>   __aligned(__alignof__(u64));
>  };
> @@ -307,6 +308,7 @@ init_map_ipmac(struct ip_set *set, struct bitmap_ipmac 
> *map,
>   map->elements = elements;
>   set->timeout = IPSET_NO_TIMEOUT;
>  
> + map->set = set;
>   set->data = map;
>   set->family = NFPROTO_IPV4;
>  
> diff --git a/net/netfilter/ipset/ip_set_bitmap_port.c 
> b/net/netfilter/ipset/ip_set_bitmap_port.c
> index 7f0c733358a4..7f9bbd7c98b5 100644
> --- a/net/netfilter/ipset/ip_set_bitmap_port.c
> +++ b/net/netfilter/ipset/ip_set_bitmap_port.c
> @@ -40,6 +40,7 @@ 

Re: [PATCH] netfilter: ipset: ipset list may return wrong member count for set with timeout

2017-09-11 Thread Jozsef Kadlecsik
Hi,

Your patch is applied in the ipset git tree and I'm going to push it for 
kernel inclusion.

I modified the comment part: the elements counter can still be incorrect 
in the case of a huge set, because elements might time out during the 
listing.

Thanks for your patience!

Best regards,
Jozsef

On Thu, 17 Aug 2017, Vishwanath Pai wrote:

> Simple testcase:
> 
> $ ipset create test hash:ip timeout 5
> $ ipset add test 1.2.3.4
> $ ipset add test 1.2.2.2
> $ sleep 5
> 
> $ ipset l
> Name: test
> Type: hash:ip
> Revision: 5
> Header: family inet hashsize 1024 maxelem 65536 timeout 5
> Size in memory: 296
> References: 0
> Number of entries: 2
> Members:
> 
> We return "Number of entries: 2" but no members are listed. That is
> because mtype_list runs "ip_set_timeout_expired" and does not list the
> expired entries, but set->elements is never upated (until mtype_gc
> cleans it up later).
> 
> Reviewed-by: Joshua Hunt 
> Signed-off-by: Vishwanath Pai 
> ---
>  net/netfilter/ipset/ip_set_hash_gen.h | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
> b/net/netfilter/ipset/ip_set_hash_gen.h
> index f236c0b..ff3d31c 100644
> --- a/net/netfilter/ipset/ip_set_hash_gen.h
> +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> @@ -1041,12 +1041,22 @@ struct htype {
>  static int
>  mtype_head(struct ip_set *set, struct sk_buff *skb)
>  {
> - const struct htype *h = set->data;
> + struct htype *h = set->data;
>   const struct htable *t;
>   struct nlattr *nested;
>   size_t memsize;
>   u8 htable_bits;
>  
> + /* If any members have expired, set->elements will be wrong
> +  * mytype_expire function will update it with the right count.
> +  * we do not hold set->lock here, so grab it first.
> +  */
> + if (SET_WITH_TIMEOUT(set)) {
> + spin_lock_bh(&set->lock);
> + mtype_expire(set, h);
> + spin_unlock_bh(&set->lock);
> + }
> +
>   rcu_read_lock_bh();
>   t = rcu_dereference_bh_nfnl(h->table);
>   memsize = mtype_ahash_memsize(h, t) + set->ext_size;
> -- 
> 1.9.1
> 
> 

-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH] netfilter: ipset: ip_set_bitmap_ipmac: use swap macro in bitmap_ipmac_create

2017-10-30 Thread Jozsef Kadlecsik
Hi,

On Sat, 28 Oct 2017, Gustavo A. R. Silva wrote:

> Make use of the swap macro and remove unnecessary variable tmp.
> This makes the code easier to read and maintain.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Please resubmit the tree patches as a single one, they do the same thing. 
Thanks!

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH] netfilter: ipset: use swap macro instead of _manually_ swapping values

2017-11-06 Thread Jozsef Kadlecsik
Hi,

On Mon, 30 Oct 2017, Gustavo A. R. Silva wrote:

> Make use of the swap macro and remove unnecessary variables tmp.
> This makes the code easier to read and maintain.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  net/netfilter/ipset/ip_set_bitmap_ip.c| 8 ++--
>  net/netfilter/ipset/ip_set_bitmap_ipmac.c | 8 ++--
>  net/netfilter/ipset/ip_set_bitmap_port.c  | 8 ++--
>  3 files changed, 6 insertions(+), 18 deletions(-)

Patch is applied in the ipset git tree and will be included in the next 
batch. Thanks!

Best regards,
Jozsef

> diff --git a/net/netfilter/ipset/ip_set_bitmap_ip.c 
> b/net/netfilter/ipset/ip_set_bitmap_ip.c
> index d8975a0..488d6d0 100644
> --- a/net/netfilter/ipset/ip_set_bitmap_ip.c
> +++ b/net/netfilter/ipset/ip_set_bitmap_ip.c
> @@ -263,12 +263,8 @@ bitmap_ip_create(struct net *net, struct ip_set *set, 
> struct nlattr *tb[],
>   ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &last_ip);
>   if (ret)
>   return ret;
> - if (first_ip > last_ip) {
> - u32 tmp = first_ip;
> -
> - first_ip = last_ip;
> - last_ip = tmp;
> - }
> + if (first_ip > last_ip)
> + swap(first_ip, last_ip);
>   } else if (tb[IPSET_ATTR_CIDR]) {
>   u8 cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]);
>  
> diff --git a/net/netfilter/ipset/ip_set_bitmap_ipmac.c 
> b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
> index 4c279fb..c00b6a2 100644
> --- a/net/netfilter/ipset/ip_set_bitmap_ipmac.c
> +++ b/net/netfilter/ipset/ip_set_bitmap_ipmac.c
> @@ -337,12 +337,8 @@ bitmap_ipmac_create(struct net *net, struct ip_set *set, 
> struct nlattr *tb[],
>   ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP_TO], &last_ip);
>   if (ret)
>   return ret;
> - if (first_ip > last_ip) {
> - u32 tmp = first_ip;
> -
> - first_ip = last_ip;
> - last_ip = tmp;
> - }
> + if (first_ip > last_ip)
> + swap(first_ip, last_ip);
>   } else if (tb[IPSET_ATTR_CIDR]) {
>   u8 cidr = nla_get_u8(tb[IPSET_ATTR_CIDR]);
>  
> diff --git a/net/netfilter/ipset/ip_set_bitmap_port.c 
> b/net/netfilter/ipset/ip_set_bitmap_port.c
> index 7f9bbd7..b561ca8 100644
> --- a/net/netfilter/ipset/ip_set_bitmap_port.c
> +++ b/net/netfilter/ipset/ip_set_bitmap_port.c
> @@ -238,12 +238,8 @@ bitmap_port_create(struct net *net, struct ip_set *set, 
> struct nlattr *tb[],
>  
>   first_port = ip_set_get_h16(tb[IPSET_ATTR_PORT]);
>   last_port = ip_set_get_h16(tb[IPSET_ATTR_PORT_TO]);
> - if (first_port > last_port) {
> - u16 tmp = first_port;
> -
> - first_port = last_port;
> - last_port = tmp;
> - }
> + if (first_port > last_port)
> + swap(first_port, last_port);
>  
>   elements = last_port - first_port + 1;
>   set->dsize = ip_set_elem_len(set, tb, 0, 0);
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH v3 11/21] clusterip: exit_net cleanup check added

2017-11-06 Thread Jozsef Kadlecsik
Hello Vasily,

On Mon, 6 Nov 2017, Vasily Averin wrote:

> Be sure that configs list initialized in net_init hook was return
> to initial state.

What is the goal of the patch series you sent in the third version in a 
row?

- If the deinitializations are missing from the files, the patches 
  do not fix them, just emit warnings.
- If the deinitializations are not missing, the patches are totally 
  unnecessary.

It looks like debugging... but not expressed that way, neither in the 
subject lines nor in the patch descriptions.

Best regards,
Jozsef

> Signed-off-by: Vasily Averin 
> ---
>  net/ipv4/netfilter/ipt_CLUSTERIP.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
> b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> index 17b4ca5..4364a88 100644
> --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
> +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> @@ -819,6 +819,9 @@ static void clusterip_net_exit(struct net *net)
>   cn->procdir = NULL;
>  #endif
>   nf_unregister_net_hook(net, &cip_arp_ops);
> + WARN_ONCE(!list_empty(&cn->configs),
> +   "net %x %s: configs list is not empty\n",
> +   net->ns.inum, __func__);
>  }
>  
>  static struct pernet_operations clusterip_net_ops = {
> -- 
> 2.7.4
> 
> 

-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH nf-next] ipset: remove unused function __ip_set_get_netlink

2017-04-14 Thread Jozsef Kadlecsik
Hi Pablo,

On Fri, 14 Apr 2017, Pablo Neira Ayuso wrote:

> On Mon, Apr 10, 2017 at 03:52:37PM -0400, Aaron Conole wrote:
> > There are no in-tree callers.
> 
> @Jozsef, let me know if I should just take this to save you a pull
> request.

Just take it, thank you.

Acked-by: Jozsef Kadlecsik 

Best regards,
Jozsef

> > Signed-off-by: Aaron Conole 
> > ---
> >  net/netfilter/ipset/ip_set_core.c | 8 
> >  1 file changed, 8 deletions(-)
> > 
> > diff --git a/net/netfilter/ipset/ip_set_core.c 
> > b/net/netfilter/ipset/ip_set_core.c
> > index c296f9b..68ba531 100644
> > --- a/net/netfilter/ipset/ip_set_core.c
> > +++ b/net/netfilter/ipset/ip_set_core.c
> > @@ -501,14 +501,6 @@ __ip_set_put(struct ip_set *set)
> >   * a separate reference counter
> >   */
> >  static inline void
> > -__ip_set_get_netlink(struct ip_set *set)
> > -{
> > -   write_lock_bh(&ip_set_ref_lock);
> > -   set->ref_netlink++;
> > -   write_unlock_bh(&ip_set_ref_lock);
> > -}
> > -
> > -static inline void
> >  __ip_set_put_netlink(struct ip_set *set)
> >  {
> > write_lock_bh(&ip_set_ref_lock);
> > -- 
> > 2.9.3
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netfilter-devel" 
> > in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH] netfilter: ipset: Null pointer exception in ipset list:set

2017-02-16 Thread Jozsef Kadlecsik
Hi,

On Wed, 15 Feb 2017, Vishwanath Pai wrote:

> If we use before/after to add an element to an empty list it will cause
> a kernel panic.
> 
> $> cat crash.restore
> create a hash:ip
> create b hash:ip
> create test list:set timeout 5 size 4
> add test b before a
> 
> $> ipset -R < crash.restore
> 
> Executing the above will crash the kernel.
> 
> Signed-off-by: Vishwanath Pai 
> Reviewed-by: Josh Hunt 
> ---
>  net/netfilter/ipset/ip_set_list_set.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_list_set.c 
> b/net/netfilter/ipset/ip_set_list_set.c
> index 51077c5..178d4eb 100644
> --- a/net/netfilter/ipset/ip_set_list_set.c
> +++ b/net/netfilter/ipset/ip_set_list_set.c
> @@ -260,11 +260,14 @@ struct list_set {
>   else
>   prev = e;
>   }
> +
> + /* If before/after is used on an empty set */
> + if ((d->before > 0 && !next) ||
> + (d->before < 0 && !prev))
> + return -IPSET_ERR_REF_EXIST;
> +
>   /* Re-add already existing element */
>   if (n) {
> - if ((d->before > 0 && !next) ||
> - (d->before < 0 && !prev))
> - return -IPSET_ERR_REF_EXIST;
>   if (!flag_exist)
>   return -IPSET_ERR_EXIST;
>   /* Update extensions */
> -- 
> 1.9.1

Patch is applied, thank you!

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.mta.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics, Hungarian Academy of Sciences
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH net 1/4] netfilter: xt_recent: Fix attempt to update deleted entry

2021-02-14 Thread Jozsef Kadlecsik
On Wed, 10 Feb 2021, Reindl Harald wrote:

> > > why is that still not part of 5.10.14 given how old that issue is 
> > > 
> > > https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.10.14
> > 
> > Probably we missed the window when patches were accepted for the new
> > release. That's all
> 
> probably something is broken in the whole process given that 5.10.15 still
> don't contain the fix while i am tired of a new "stable release" every few
> days and 5.10.x like every LTS release in the past few years has a peak of it
> 
> https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.10.15

The process is a multi-step one: netfilter patches are sent for 
reviewing/accepting/rejecting to the net maintaners, and after that they 
send the patches to the kernel source maintainers. A single patch is 
rarely picked out to handle differently.

The patch has entered the queue for the stable trees.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register

2020-10-08 Thread Jozsef Kadlecsik
Hi Francesco,

On Thu, 8 Oct 2020, Francesco Ruggeri wrote:

> On Wed, Oct 7, 2020 at 12:32 PM Francesco Ruggeri  wrote:
> >
> > If the first packet conntrack sees after a re-register is an outgoing 
> > keepalive packet with no data (SEG.SEQ = SND.NXT-1), td_end is set to 
> > SND.NXT-1. When the peer correctly acknowledges SND.NXT, tcp_in_window 
> > fails check III (Upper bound for valid (s)ack: sack <= 
> > receiver.td_end) and returns false, which cascades into 
> > nf_conntrack_in setting skb->_nfct = 0 and in later conntrack iptables 
> > rules not matching. In cases where iptables are dropping packets that 
> > do not match conntrack rules this can result in idle tcp connections 
> > to time out.
> >
> > v2: adjust td_end when getting the reply rather than when sending out
> > the keepalive packet.
> >
> 
> Any comments?
> Here is a simple reproducer. The idea is to show that keepalive packets 
> in an idle tcp connection will be dropped (and the connection will time 
> out) if conntrack hooks are de-registered and then re-registered. The 
> reproducer has two files. client_server.py creates both ends of a tcp 
> connection, bounces a few packets back and forth, and then blocks on a 
> recv on the client side. The client's keepalive is configured to time 
> out in 20 seconds. This connection should not time out. test is a bash 
> script that creates a net namespace where it sets iptables rules for the 
> connection, starts client_server.py, and then clears and restores the 
> iptables rules (which causes conntrack hooks to be de-registered and 
> re-registered).

In my opinion an iptables restore should not cause conntrack hooks to be 
de-registered and re-registered, because important TCP initialization 
parameters cannot be "restored" later from the packets. Therefore the 
proper fix would be to prevent it to happen. Otherwise your patch looks OK 
to handle the case when conntrack is intentionally restarted.

Best regards,
Jozsef
 
>  file client_server.py
> #!/usr/bin/python
> 
> import socket
> 
> PORT=4446
> 
> # create server socket
> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> sock.bind(('localhost', PORT))
> sock.listen(1)
> 
> # create client socket
> cl_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> cl_sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
> cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 2)
> cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 2)
> cl_sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 10)
> cl_sock.connect(('localhost', PORT))
> 
> srv_sock, _ = sock.accept()
> 
> # Bounce a packet back and forth a few times
> buf = ''
> for i in range(5):
>cl_sock.send(buf)
>buf = srv_sock.recv(100)
>srv_sock.send(buf)
>buf = cl_sock.recv(100)
>print buf
> 
> # idle the connection
> try:
>buf = cl_sock.recv(100)
> except socket.error, e:
>print "Error: %s" % e
> 
> sock.close()
> cl_sock.close()
> srv_sock.close()
> 
> == file test
> #!/bin/bash
> 
> ip netns add dummy
> ip netns exec dummy ip link set lo up
> echo "Created namespace"
> 
> ip netns exec dummy iptables-restore < *filter
> :INPUT DROP [0:0]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 4446 -j ACCEPT
> COMMIT
> END
> echo "Installed iptables rules"
> 
> ip netns exec dummy ./client_server.py &
> echo "Created tcp connection"
> sleep 2
> 
> ip netns exec dummy iptables-restore << END
> *filter
> :INPUT ACCEPT [0:0]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> COMMIT
> END
> echo "Cleared iptables rules"
> sleep 4
> 
> ip netns exec dummy iptables-restore << END
> *filter
> :INPUT DROP [0:0]
> :FORWARD ACCEPT [0:0]
> :OUTPUT ACCEPT [0:0]
> -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
> -A INPUT -p tcp -m tcp --dport 4446 -j ACCEPT
> COMMIT
> END
> echo "Restored original iptables rules"
> 
> wait
> ip netns del dummy
> exit 0
> 

-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register

2020-10-09 Thread Jozsef Kadlecsik
Hi Florian,

On Fri, 9 Oct 2020, Florian Westphal wrote:

> Jozsef Kadlecsik  wrote:
> > > The reproducer has two files. client_server.py creates both ends of 
> > > a tcp connection, bounces a few packets back and forth, and then 
> > > blocks on a recv on the client side. The client's keepalive is 
> > > configured to time out in 20 seconds. This connection should not 
> > > time out. test is a bash script that creates a net namespace where 
> > > it sets iptables rules for the connection, starts client_server.py, 
> > > and then clears and restores the iptables rules (which causes 
> > > conntrack hooks to be de-registered and re-registered).
> > 
> > In my opinion an iptables restore should not cause conntrack hooks to be 
> > de-registered and re-registered, because important TCP initialization 
> > parameters cannot be "restored" later from the packets. Therefore the 
> > proper fix would be to prevent it to happen. Otherwise your patch looks OK 
> > to handle the case when conntrack is intentionally restarted.
> 
> The repro clears all rules, waits 4 seconds, then restores the ruleset. 
> using iptables-restore < FOO; sleep 4; iptables-restore < FOO will not 
> result in any unregister ops.
>
> We could make kernel defer unregister via some work queue but i don't
> see what this would help/accomplish (and its questionable of how long it
> should wait).

Sorry, I can't put together the two paragraphs above: in the first you 
wrote that no (hook) unregister-register happens and in the second one 
that those could be derefed.

> We could disallow unregister, but that seems silly (forces reboot...).
> 
> I think the patch is fine.

The patch is fine, but why the packets are handled by conntrack (after the 
first restore and during the 4s sleep? And then again after the second 
restore?) as if all conntrack entries were removed?
 
Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH nf v2] netfilter: conntrack: connection timeout after re-register

2020-10-09 Thread Jozsef Kadlecsik
On Fri, 9 Oct 2020, Florian Westphal wrote:

> Matches/targets that need conntrack increment a refcount. So, when all 
> rules are flushed, refcount goes down to 0 and conntrack is disabled 
> because the hooks get removed..
> 
> Just doing iptables-restore doesn't unregister as long as both the old
> and new rulesets need conntrack.
> 
> The "delay unregister" remark was wrt. the "all rules were deleted"
> case, i.e. add a "grace period" rather than acting right away when
> conntrack use count did hit 0.

Now I understand it, thanks really. The hooks are removed, so conntrack 
cannot "see" the packets and the entries become stale. 

What is the rationale behind "remove the conntrack hooks when there are no 
rule left referring to conntrack"? Performance optimization? But then the 
content of the whole conntrack table could be deleted too... ;-)
 
> Conntrack entries are not removed, only the base hooks get unregistered. 
> This is a problem for tcp window tracking.
> 
> When re-register occurs, kernel is supposed to switch the existing 
> entries to "loose" mode so window tracking won't flag packets as 
> invalid, but apparently this isn't enough to handle keepalive case.

"loose" (nf_ct_tcp_loose) mode doesn't disable window tracking, it 
enables/disables picking up already established connections. 

nf_ct_tcp_be_liberal would disable TCP window checking (but not tracking) 
for non RST packets.

But both seems to be modified only via the proc entries.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH] netfiler: ipset: fix unaligned atomic access

2020-06-21 Thread Jozsef Kadlecsik
Hi,

On Sun, 21 Jun 2020, Pablo Neira Ayuso wrote:

> I'll place in this in nf.git unless you have any objection.

No objection at all and thanks!

Acked-by: Jozsef Kadlecsik 

Best regards,
Jozsef

> On Sun, Jun 21, 2020 at 08:45:14PM +0100, Russell King - ARM Linux admin 
> wrote:
> > Gentle ping...
> > 
> > This patch fixes a remotely triggerable kernel oops, and as such can
> > be viewed as a remote denial of service attack depending on the
> > netfilter rule setup.
> > 
> > On Wed, Jun 10, 2020 at 09:51:11PM +0100, Russell King wrote:
> > > When using ip_set with counters and comment, traffic causes the kernel
> > > to panic on 32-bit ARM:
> > > 
> > > Alignment trap: not handling instruction e1b82f9f at []
> > > Unhandled fault: alignment exception (0x221) at 0xea08133c
> > > PC is at ip_set_match_extensions+0xe0/0x224 [ip_set]
> > > 
> > > The problem occurs when we try to update the 64-bit counters - the
> > > faulting address above is not 64-bit aligned.  The problem occurs
> > > due to the way elements are allocated, for example:
> > > 
> > >   set->dsize = ip_set_elem_len(set, tb, 0, 0);
> > >   map = ip_set_alloc(sizeof(*map) + elements * set->dsize);
> > > 
> > > If the element has a requirement for a member to be 64-bit aligned,
> > > and set->dsize is not a multiple of 8, but is a multiple of four,
> > > then every odd numbered elements will be misaligned - and hitting
> > > an atomic64_add() on that element will cause the kernel to panic.
> > > 
> > > ip_set_elem_len() must return a size that is rounded to the maximum
> > > alignment of any extension field stored in the element.  This change
> > > ensures that is the case.
> > > 
> > > Signed-off-by: Russell King 
> > > ---
> > > Patch against v5.6, where I tripped over this bug.  This bug has
> > > caused a kernel panic on my new router twice today.
> > > 
> > >  net/netfilter/ipset/ip_set_core.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/net/netfilter/ipset/ip_set_core.c 
> > > b/net/netfilter/ipset/ip_set_core.c
> > > index 8dd17589217d..be9cd6a500fb 100644
> > > --- a/net/netfilter/ipset/ip_set_core.c
> > > +++ b/net/netfilter/ipset/ip_set_core.c
> > > @@ -459,6 +459,8 @@ ip_set_elem_len(struct ip_set *set, struct nlattr 
> > > *tb[], size_t len,
> > >   for (id = 0; id < IPSET_EXT_ID_MAX; id++) {
> > >   if (!add_extension(id, cadt_flags, tb))
> > >   continue;
> > > + if (align < ip_set_extensions[id].align)
> > > + align = ip_set_extensions[id].align;
> > >   len = ALIGN(len, ip_set_extensions[id].align);
> > >   set->offset[id] = len;
> > >   set->extensions |= ip_set_extensions[id].type;
> > > -- 
> > > 2.20.1
> > > 
> > > 
> > 
> > -- 
> > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> 

-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH net-next v1 1/8] netfilter: inlined four headers files into another one.

2019-08-08 Thread Jozsef Kadlecsik
Hi Jeremy,

On Wed, 7 Aug 2019, Jeremy Sowden wrote:

> linux/netfilter/ipset/ip_set.h included four other header files:
> 
>   include/linux/netfilter/ipset/ip_set_comment.h
>   include/linux/netfilter/ipset/ip_set_counter.h
>   include/linux/netfilter/ipset/ip_set_skbinfo.h
>   include/linux/netfilter/ipset/ip_set_timeout.h
> 
> Of these the first three were not included anywhere else.  The last,
> ip_set_timeout.h, was included in a couple of other places, but defined
> inline functions which call other inline functions defined in ip_set.h,
> so ip_set.h had to be included before it.
> 
> Inlined all four into ip_set.h, and updated the other files that
> included ip_set_timeout.h.
> 
> Signed-off-by: Jeremy Sowden 

Acked-by: Jozsef Kadlecsik 

Also, for the ipset part of patch 2/8, thank!

Best regards,
Jozsef

> ---
>  include/linux/netfilter/ipset/ip_set.h| 238 +-
>  .../linux/netfilter/ipset/ip_set_comment.h|  73 --
>  .../linux/netfilter/ipset/ip_set_counter.h|  84 ---
>  .../linux/netfilter/ipset/ip_set_skbinfo.h|  42 
>  .../linux/netfilter/ipset/ip_set_timeout.h|  77 --
>  net/netfilter/ipset/ip_set_hash_gen.h |   2 +-
>  net/netfilter/xt_set.c|   1 -
>  7 files changed, 235 insertions(+), 282 deletions(-)
>  delete mode 100644 include/linux/netfilter/ipset/ip_set_comment.h
>  delete mode 100644 include/linux/netfilter/ipset/ip_set_counter.h
>  delete mode 100644 include/linux/netfilter/ipset/ip_set_skbinfo.h
>  delete mode 100644 include/linux/netfilter/ipset/ip_set_timeout.h
> 
> diff --git a/include/linux/netfilter/ipset/ip_set.h 
> b/include/linux/netfilter/ipset/ip_set.h
> index 12ad9b1853b4..9bc255a8461b 100644
> --- a/include/linux/netfilter/ipset/ip_set.h
> +++ b/include/linux/netfilter/ipset/ip_set.h
> @@ -452,10 +452,240 @@ bitmap_bytes(u32 a, u32 b)
>   return 4 * b - a + 8) / 8) + 3) / 4);
>  }
>  
> -#include 
> -#include 
> -#include 
> -#include 
> +/* How often should the gc be run by default */
> +#define IPSET_GC_TIME(3 * 60)
> +
> +/* Timeout period depending on the timeout value of the given set */
> +#define IPSET_GC_PERIOD(timeout) \
> + ((timeout/3) ? min_t(u32, (timeout)/3, IPSET_GC_TIME) : 1)
> +
> +/* Entry is set with no timeout value */
> +#define IPSET_ELEM_PERMANENT 0
> +
> +/* Set is defined with timeout support: timeout value may be 0 */
> +#define IPSET_NO_TIMEOUT UINT_MAX
> +
> +/* Max timeout value, see msecs_to_jiffies() in jiffies.h */
> +#define IPSET_MAX_TIMEOUT(UINT_MAX >> 1)/MSEC_PER_SEC
> +
> +#define ip_set_adt_opt_timeout(opt, set) \
> +((opt)->ext.timeout != IPSET_NO_TIMEOUT ? (opt)->ext.timeout : 
> (set)->timeout)
> +
> +static inline unsigned int
> +ip_set_timeout_uget(struct nlattr *tb)
> +{
> + unsigned int timeout = ip_set_get_h32(tb);
> +
> + /* Normalize to fit into jiffies */
> + if (timeout > IPSET_MAX_TIMEOUT)
> + timeout = IPSET_MAX_TIMEOUT;
> +
> + return timeout;
> +}
> +
> +static inline bool
> +ip_set_timeout_expired(const unsigned long *t)
> +{
> + return *t != IPSET_ELEM_PERMANENT && time_is_before_jiffies(*t);
> +}
> +
> +static inline void
> +ip_set_timeout_set(unsigned long *timeout, u32 value)
> +{
> + unsigned long t;
> +
> + if (!value) {
> + *timeout = IPSET_ELEM_PERMANENT;
> + return;
> + }
> +
> + t = msecs_to_jiffies(value * MSEC_PER_SEC) + jiffies;
> + if (t == IPSET_ELEM_PERMANENT)
> + /* Bingo! :-) */
> + t--;
> + *timeout = t;
> +}
> +
> +static inline u32
> +ip_set_timeout_get(const unsigned long *timeout)
> +{
> + u32 t;
> +
> + if (*timeout == IPSET_ELEM_PERMANENT)
> + return 0;
> +
> + t = jiffies_to_msecs(*timeout - jiffies)/MSEC_PER_SEC;
> + /* Zero value in userspace means no timeout */
> + return t == 0 ? 1 : t;
> +}
> +
> +static inline char*
> +ip_set_comment_uget(struct nlattr *tb)
> +{
> + return nla_data(tb);
> +}
> +
> +/* Called from uadd only, protected by the set spinlock.
> + * The kadt functions don't use the comment extensions in any way.
> + */
> +static inline void
> +ip_set_init_comment(struct ip_set *set, struct ip_set_comment *comment,
> + const struct ip_set_ext *ext)
> +{
> + struct ip_set_comment_rcu *c = rcu_dereference_protected(comment->c, 1);
> + size_t len = ext->comment ? strlen(ext->comment) : 0;
> +
> + if (unlikely(c)) {
> + set->ext_size -= sizeof

Re: Fw: [Fwd: [Bug 5644] New: NFS v3 TCP 3-way handshake incorrect, iptables blocks access]

2005-11-25 Thread Jozsef Kadlecsik
On Thu, 24 Nov 2005, Olaf Kirch wrote:

> On Thu, Nov 24, 2005 at 03:08:27PM +0100, Harald Welte wrote:
> > Jozsef Kadlecsik doesn't recall those patches/changes (even though he's
> > our "Mr. TCP state tracking" and is indicated as the author of one of
> > the two patches.
> >
> > I also don't recall having seen any of those patches before.  But that
> > doesn't mean all too much, my brain is like a sieve some times.
>
> Those patches came out of a discussion on netfilter-devel.  Sorry,
> I don't know exactly when but looking at our CVS log it was Dec 2004.

Yes, it was about a year ago - finally I could dig out the patches from my
mail archives. I'll prepare an updated version on the weekend and send it
out to netfilter-devel for reviewing.

Best regards,
Jozsef
-
E-mail  : [EMAIL PROTECTED], [EMAIL PROTECTED]
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
  H-1525 Budapest 114, POB. 49, Hungary
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Fwd: [Bug 5644] New: NFS v3 TCP 3-way handshake incorrect, iptables blocks access]

2005-11-29 Thread Jozsef Kadlecsik
Hi,

On Fri, 25 Nov 2005, Jozsef Kadlecsik wrote:

> On Thu, 24 Nov 2005, Olaf Kirch wrote:
>
> > On Thu, Nov 24, 2005 at 03:08:27PM +0100, Harald Welte wrote:
> > > Jozsef Kadlecsik doesn't recall those patches/changes (even though he's
> > > our "Mr. TCP state tracking" and is indicated as the author of one of
> > > the two patches.
> > >
> > > I also don't recall having seen any of those patches before.  But that
> > > doesn't mean all too much, my brain is like a sieve some times.
> >
> > Those patches came out of a discussion on netfilter-devel.  Sorry,
> > I don't know exactly when but looking at our CVS log it was Dec 2004.
>
> Yes, it was about a year ago - finally I could dig out the patches from my
> mail archives. I'll prepare an updated version on the weekend and send it
> out to netfilter-devel for reviewing.

Attached is the updated patch. Unfortunately nfsim currently is broken so
I could not test it against the testsuite, but actually it's identical
with the original, not applied patches. The patch takes care both
ip_conntrack and nf_conntrack.

Best regards,
Jozsef
-
E-mail  : [EMAIL PROTECTED], [EMAIL PROTECTED]
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
  H-1525 Budapest 114, POB. 49, HungaryMounting NFS file systems after a (warm) reboot could take a long time if
firewalling and connection tracking was enabled.

The reason is that the NFS clients tends to use the same ports (800 and
counting down). Now on reboot, the server would still have a TCB for an
existing TCP connection client:800 -> server:2049. The client sends a
SYN from port 800 to server:2049, which elicits an ACK from the server.
The firewall on the client drops the ACK because (from its point of
view) the connection is still in half-open state, and it expects to see
a SYNACK.

The client will eventually time out after several minutes.

The following patch corrects this, by accepting ACKs on half open connections
as well.

Signed-off-by: Jozsef Kadlecsik <[EMAIL PROTECTED]>

diff -urN --exclude-from=/usr/src/diff.exclude 
linux-2.6.15-rc2-orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c 
linux-2.6.15-rc2-tcp-win/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
--- linux-2.6.15-rc2-orig/net/ipv4/netfilter/ip_conntrack_proto_tcp.c   
2005-11-20 04:25:03.0 +0100
+++ linux-2.6.15-rc2-tcp-win/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
2005-11-29 11:01:25.0 +0100
@@ -272,9 +272,9 @@
  * sCL -> sCL
  */
 /*  sNO, sSS, sSR, sES, sFW, sCW, sLA, sTW, sCL, sLI   */
-/*ack*/   { sIV, sIV, sSR, sES, sCW, sCW, sTW, sTW, sCL, sIV },
+/*ack*/   { sIV, sIG, sSR, sES, sCW, sCW, sTW, sTW, sCL, sIV },
 /*
- * sSS -> sIV  Might be a half-open connection.
+ * sSS -> sIG  Might be a half-open connection.
  * sSR -> sSR  Might answer late resent SYN.
  * sES -> sES  :-)
  * sFW -> sCW  Normal close request answered by ACK.
@@ -917,8 +917,12 @@
 
switch (new_state) {
case TCP_CONNTRACK_IGNORE:
-   /* Either SYN in ORIGINAL
-* or SYN/ACK in REPLY. */
+   /* Ignored packets:
+* 
+* a) SYN in ORIGINAL
+* b) SYN/ACK in REPLY
+* c) ACK in reply direction after initial SYN in original.
+*/
if (index == TCP_SYNACK_SET
&& conntrack->proto.tcp.last_index == TCP_SYN_SET
&& conntrack->proto.tcp.last_dir != dir
@@ -985,13 +989,20 @@
}
case TCP_CONNTRACK_CLOSE:
if (index == TCP_RST_SET
-   && test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)
-   && conntrack->proto.tcp.last_index == TCP_SYN_SET
+   && ((test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)
+&& conntrack->proto.tcp.last_index == TCP_SYN_SET)
+   || (!test_bit(IPS_ASSURED_BIT, &conntrack->status)
+   && conntrack->proto.tcp.last_index == TCP_ACK_SET))
&& ntohl(th->ack_seq) == conntrack->proto.tcp.last_end) {
-   /* RST sent to invalid SYN we had let trough
-* SYN was in window then, tear down connection.
+   /* RST sent to invalid SYN or ACK we had let trough
+* at a) and c) above:
+*
+* a) SYN was in window then
+* c) we hold a half-open connection.
+*
+* Delete our connection entry.
 

Re: [PATCH net 1/4] netfilter: xt_recent: Fix attempt to update deleted entry

2021-02-05 Thread Jozsef Kadlecsik
Hi Harald,

On Fri, 5 Feb 2021, Reindl Harald wrote:

> "Reap only entries which won't be updated" sounds for me like the could 
> be some optimization: i mean when you first update and then check what 
> can be reaped the recently updated entry would not match to begin with

When the entry is new and the given recent table is full we cannot update 
(add) it, unless old entries are deleted (reaped) first. So it'd require 
more additional checkings to be introduced to reverse the order of the two 
operations.

Best regards,
Jozsef
 
> Am 05.02.21 um 01:17 schrieb Pablo Neira Ayuso:
> > From: Jozsef Kadlecsik 
> > 
> > When both --reap and --update flag are specified, there's a code
> > path at which the entry to be updated is reaped beforehand,
> > which then leads to kernel crash. Reap only entries which won't be
> > updated.
> > 
> > Fixes kernel bugzilla #207773.
> > 
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=207773
> > Reported-by: Reindl Harald 
> > Fixes: 0079c5aee348 ("netfilter: xt_recent: add an entry reaper")
> > Signed-off-by: Jozsef Kadlecsik 
> > Signed-off-by: Pablo Neira Ayuso 
> > ---
> >   net/netfilter/xt_recent.c | 12 ++--
> >   1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/netfilter/xt_recent.c b/net/netfilter/xt_recent.c
> > index 606411869698..0446307516cd 100644
> > --- a/net/netfilter/xt_recent.c
> > +++ b/net/netfilter/xt_recent.c
> > @@ -152,7 +152,8 @@ static void recent_entry_remove(struct recent_table *t,
> > struct recent_entry *e)
> >   /*
> >* Drop entries with timestamps older then 'time'.
> >*/
> > -static void recent_entry_reap(struct recent_table *t, unsigned long time)
> > +static void recent_entry_reap(struct recent_table *t, unsigned long time,
> > + struct recent_entry *working, bool update)
> >   {
> > struct recent_entry *e;
> >   @@ -161,6 +162,12 @@ static void recent_entry_reap(struct recent_table *t,
> > unsigned long time)
> >  */
> > e = list_entry(t->lru_list.next, struct recent_entry, lru_list);
> >   + /*
> > +* Do not reap the entry which are going to be updated.
> > +*/
> > +   if (e == working && update)
> > +   return;
> > +
> > /*
> >  * The last time stamp is the most recent.
> >  */
> > @@ -303,7 +310,8 @@ recent_mt(const struct sk_buff *skb, struct
> > xt_action_param *par)
> > /* info->seconds must be non-zero */
> > if (info->check_set & XT_RECENT_REAP)
> > -   recent_entry_reap(t, time);
> > +   recent_entry_reap(t, time, e,
> > +   info->check_set & XT_RECENT_UPDATE && ret);
> > }
> > if (info->check_set & XT_RECENT_SET ||
> 

-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH net 1/4] netfilter: xt_recent: Fix attempt to update deleted entry

2021-02-07 Thread Jozsef Kadlecsik
On Fri, 5 Feb 2021, Reindl Harald wrote:

> what makes me thinking about the ones without --reap - how is it 
> handeled in that case, i mean there must be some LRU logic present 
> anyways given that --reap is not enabled by default (otherwise that bug 
> would not have hitted me so long randomly)

Yes, checking the code I was wrong: when the recent table is full, the 
oldest entry is automatically removed to make space for the new one.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH net 1/4] netfilter: xt_recent: Fix attempt to update deleted entry

2021-02-07 Thread Jozsef Kadlecsik
On Sun, 7 Feb 2021, Reindl Harald wrote:

> > well, the most important thing is that the firewall-vm stops to 
> > kernel-panic
> 
> why is that still not part of 5.10.14 given how old that issue is :-(
> 
> https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.10.14

Probably we missed the window when patches were accepted for the new 
release. That's all.

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
  H-1525 Budapest 114, POB. 49, Hungary


Re: [PATCH net] netfilter: ipset: prevent uninit-value in hash_ip6_add

2020-11-19 Thread Jozsef Kadlecsik
39/0xbd0 net/netfilter/ipset/ip_set_hash_ip.c:255
>  call_ad+0x329/0xd00 net/netfilter/ipset/ip_set_core.c:1720
>  ip_set_ad+0x111f/0x1440 net/netfilter/ipset/ip_set_core.c:1808
>  ip_set_uadd+0xf6/0x110 net/netfilter/ipset/ip_set_core.c:1833
>  nfnetlink_rcv_msg+0xc7d/0xdf0 net/netfilter/nfnetlink.c:252
>  netlink_rcv_skb+0x70a/0x820 net/netlink/af_netlink.c:2494
>  nfnetlink_rcv+0x4f0/0x4380 net/netfilter/nfnetlink.c:600
>  netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
>  netlink_unicast+0x11da/0x14b0 net/netlink/af_netlink.c:1330
>  netlink_sendmsg+0x173c/0x1840 net/netlink/af_netlink.c:1919
>  sock_sendmsg_nosec net/socket.c:651 [inline]
>  sock_sendmsg net/socket.c:671 [inline]
>  sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
>  ___sys_sendmsg net/socket.c:2407 [inline]
>  __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
>  __do_sys_sendmsg net/socket.c:2449 [inline]
>  __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
>  __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
>  do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Uninit was created at:
>  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:121 [inline]
>  kmsan_internal_poison_shadow+0x5c/0xf0 mm/kmsan/kmsan.c:104
>  kmsan_slab_alloc+0x8d/0xe0 mm/kmsan/kmsan_hooks.c:76
>  slab_alloc_node mm/slub.c:2906 [inline]
>  __kmalloc_node_track_caller+0xc61/0x15f0 mm/slub.c:4512
>  __kmalloc_reserve net/core/skbuff.c:142 [inline]
>  __alloc_skb+0x309/0xae0 net/core/skbuff.c:210
>  alloc_skb include/linux/skbuff.h:1094 [inline]
>  netlink_alloc_large_skb net/netlink/af_netlink.c:1176 [inline]
>  netlink_sendmsg+0xdb8/0x1840 net/netlink/af_netlink.c:1894
>  sock_sendmsg_nosec net/socket.c:651 [inline]
>  sock_sendmsg net/socket.c:671 [inline]
>  sys_sendmsg+0xc7a/0x1240 net/socket.c:2353
>  ___sys_sendmsg net/socket.c:2407 [inline]
>  __sys_sendmsg+0x6d5/0x830 net/socket.c:2440
>  __do_sys_sendmsg net/socket.c:2449 [inline]
>  __se_sys_sendmsg+0x97/0xb0 net/socket.c:2447
>  __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2447
>  do_syscall_64+0x9f/0x140 arch/x86/entry/common.c:48
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support")
> Signed-off-by: Eric Dumazet 
> Reported-by: syzbot 
> ---
>  net/netfilter/ipset/ip_set_core.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_core.c 
> b/net/netfilter/ipset/ip_set_core.c
> index 
> c7eaa3776238d6d1da28dee0da21306d418ee9fd..89009c82a6b2408f637a2716e73b54eea3bafd0f
>  100644
> --- a/net/netfilter/ipset/ip_set_core.c
> +++ b/net/netfilter/ipset/ip_set_core.c
> @@ -271,8 +271,7 @@ flag_nested(const struct nlattr *nla)
>  
>  static const struct nla_policy ipaddr_policy[IPSET_ATTR_IPADDR_MAX + 1] = {
>   [IPSET_ATTR_IPADDR_IPV4]= { .type = NLA_U32 },
> - [IPSET_ATTR_IPADDR_IPV6]= { .type = NLA_BINARY,
> - .len = sizeof(struct in6_addr) },
> + [IPSET_ATTR_IPADDR_IPV6]= NLA_POLICY_EXACT_LEN(sizeof(struct 
> in6_addr)),
>  };
>  
>  int
> -- 

Thanks! In the backward compatibility layer in the ipset package I'm going 
to change the type to NLA_UNSPEC, so the minimal length is ensured.

Acked-by: Jozsef Kadlecsik 

Best regards,
Jozsef
-
E-mail  : kad...@blackhole.kfki.hu, kadlecsik.joz...@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
  H-1525 Budapest 114, POB. 49, Hungary