Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))
On Thu, 30 Jun 2016 16:07:26 -0400 Steven Rostedt wrote: > I can reproduce this by having the client unmount and remount the > directory. It gets even more interesting. When I unmount the directory, the hidden port does not go away. It is still there. But if I mount it again, it goes away (until it times out again). Even more info: When I first mount it, it creates 3 sockets, where one immediately is closed: tcp0 0 192.168.23.9:892192.168.23.22:44672 TIME_WAIT - tcp0 0 192.168.23.9:2049 192.168.23.22:815 ESTABLISHED - tcp0 0 192.168.23.9:754192.168.23.22:44672 ESTABLISHED - (192.168.23.22 is the machine remotely mounting a directory from the server 192.168.23.9) The trace of port 892 is this: kworker/u32:1-13473 [000] 4093.915114: xs_setup_tcp: RPC: set up xprt to 192.168.23.22 (port 44672) via tcp kworker/u32:1-13473 [000] 4093.915122: xprt_create_transport: RPC: created transport 8803b1c38000 with 65536 slots kworker/0:1H-129 [000] 4093.915152: xprt_alloc_slot: RPC:47 reserved req 88040b27ca00 xid c50ccaff kworker/0:1H-129 [000] 4093.915157: xprt_connect: RPC:47 xprt_connect xprt 8803b1c38000 is not connected kworker/0:1H-129 [000] 4093.915159: xs_connect: RPC: xs_connect scheduled xprt 8803b1c38000 kworker/0:1H-129 [000] ..s. 4093.915170: inet_csk_get_port: snum 892 kworker/0:1H-129 [000] ..s. 4093.915177: => sched_clock => inet_addr_type_table => security_capable => inet_bind => xs_bind => release_sock => sock_setsockopt => __sock_create => xs_create_sock.isra.19 => xs_tcp_setup_socket => process_one_work => worker_thread => worker_thread => kthread => ret_from_fork => kthread kworker/0:1H-129 [000] ..s. 4093.915178: inet_bind_hash: add 892 8803bb9b5cc0 kworker/0:1H-129 [000] ..s. 4093.915184: => inet_csk_get_port => sched_clock => inet_addr_type_table => security_capable => inet_bind => xs_bind => release_sock => sock_setsockopt => __sock_create => xs_create_sock.isra.19 => xs_tcp_setup_socket => process_one_work => worker_thread => worker_thread => kthread => ret_from_fork => kthread kworker/0:1H-129 [000] 4093.915185: xs_bind: RPC: xs_bind 4.136.255.255:892: ok (0) kworker/0:1H-129 [000] 4093.915186: xs_tcp_setup_socket: RPC: worker connecting xprt 8803b1c38000 via tcp to 192.168.23.22 (port 44672) kworker/0:1H-129 [000] 4093.915221: xs_tcp_setup_socket: RPC: 8803b1c38000 connect status 115 connected 0 sock state 2 -0 [003] ..s. 4093.915434: xs_tcp_state_change: RPC: xs_tcp_state_change client 8803b1c38000... -0 [003] ..s. 4093.915435: xs_tcp_state_change: RPC: state 1 conn 0 dead 0 zapped 1 sk_shutdown 0 kworker/3:1H-145 [003] 4093.915558: xprt_connect_status: RPC:47 xprt_connect_status: retrying kworker/3:1H-145 [003] 4093.915560: xprt_prepare_transmit: RPC: 47 xprt_prepare_transmit kworker/3:1H-145 [003] 4093.915562: xprt_transmit: RPC:47 xprt_transmit(72) kworker/3:1H-145 [003] 4093.915588: xs_tcp_send_request: RPC: xs_tcp_send_request(72) = 0 kworker/3:1H-145 [003] 4093.915589: xprt_transmit: RPC:47 xmit complete -0 [003] ..s. 4093.915969: xs_tcp_data_ready: RPC: xs_tcp_data_ready... kworker/3:1H-145 [003] 4093.916081: xs_tcp_data_recv: RPC: xs_tcp_data_recv started kworker/3:1H-145 [003] 4093.916083: xs_tcp_data_recv: RPC: reading TCP record fragment of length 24 kworker/3:1H-145 [003] 4093.916084: xs_tcp_data_recv: RPC: reading XID (4 bytes) kworker/3:1H-145 [003] 4093.916085: xs_tcp_data_recv: RPC: reading request with XID c50ccaff kworker/3:1H-145 [003] 4093.916086: xs_tcp_data_recv: RPC: reading CALL/REPLY flag (4 bytes) kworker/3:1H-145 [003] 4093.916087: xs_tcp_data_recv: RPC: read reply XID c50ccaff kworker/3:1H-145 [003] ..s. 4093.916088: xs_tcp_data_recv: RPC: XID c50ccaff read 16 bytes kworker/3:1H-145 [003] ..s. 4093.916089: xs_tcp_data_recv: RPC: xprt = 8803b1c38000, tcp_copied = 24, tcp_offset = 24, tcp_reclen = 24 kworker/3:1H-145 [003] ..s. 4093.916090: xprt_complete_rqst: RPC:47 xid c50ccaff complete (24 bytes received) kworker/3:1H-145 [003] 4093.916091: xs_tcp_data_recv: RPC: xs_tcp_data_recv done kworker/3:1H-145 [003] 4093.916098: xprt_release: RPC:47 release request 88040b27ca00 kworker/u32:1-13473 [002] 4093.976056: xprt_destroy: RPC: destroying transport 8803b1c38000 kworker/u32:1-13473 [002] 4093.976068: xs_destroy: RPC:
Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))
On Thu, 30 Jun 2016 18:30:42 + Trond Myklebust wrote: > Wait. So the NFS mount is still active, it’s just that the socket > disconnected due to no traffic? That should be OK. Granted that the > port can’t be reused by another process, but you really don’t want > that: what if there are no other ports available and you start > writing to a file on the NFS partition? What would cause the port to be connected to a socket again? I copied a large file to the nfs mount, and the hidden port is still there? Remember, this wasn't always the case, the hidden port is a recent issue. I ran wireshark on this and it appears to create two ports for NFS. One of them is canceled by the client (sends a FIN/ACK) and this port is what lays around never to be used again, and uses the other port for all connections after that. When I unmount the NFS directory, the port is finally freed (but has no socket attached to it). What is the purpose of keeping this port around? I can reproduce this by having the client unmount and remount the directory. -- Steve
Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))
On Thu, 30 Jun 2016 11:23:41 -0400 Steven Rostedt wrote: > I can add more trace_printk()s if it would help. I added a trace_printk() in inet_bind_bucket_destroy() to print out some information on the socket used by xs_bind(), and it shows that the bind destroy is called, but the list is not empty. /* * Caller must hold hashbucket lock for this tb with local BH disabled */ void inet_bind_bucket_destroy(struct kmem_cache *cachep, struct inet_bind_bucket *tb) { if (!current->mm && xs_port == tb->port) { trace_printk("destroy %d empty=%d %p\n", tb->port, hlist_empty(&tb->owners), tb); trace_dump_stack(1); } if (hlist_empty(&tb->owners)) { __hlist_del(&tb->node); kmem_cache_free(cachep, tb); } } I created "xs_port" to hold the port of the variable used by xs_bind, and when it is called, the hlist_empty(&tb->owners) returns false. I'll add more trace_printks to find out where those owners are being added. -- Steve
Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))
On Thu, 30 Jun 2016 13:17:47 + Trond Myklebust wrote: > > On Jun 30, 2016, at 08:59, Steven Rostedt wrote: > > > > [ resending as a new email, as I'm assuming people do not sort their > > INBOX via last email on thread, thus my last email is sitting in the > > bottom of everyone's INBOX ] > > > > I've hit this again. Not sure when it started, but I applied my old > > debug trace_printk() patch (attached) and rebooted (4.5.7). I just > > tested the latest kernel from Linus's tree (from last nights pull), and > > it still gives me the problem. > > > > Here's the trace I have: > > > >kworker/3:1H-134 [003] ..s.61.036129: inet_csk_get_port: snum 805 Here's were the port is taken > >kworker/3:1H-134 [003] ..s.61.036135: > > => sched_clock > > => inet_addr_type_table > > => security_capable > > => inet_bind > > => xs_bind > > => release_sock > > => sock_setsockopt > > => __sock_create > > => xs_create_sock.isra.19 > > => xs_tcp_setup_socket > > => process_one_work > > => worker_thread > > => worker_thread > > => kthread > > => ret_from_fork > > => kthread > >kworker/3:1H-134 [003] ..s.61.036136: inet_bind_hash: add 805 > >kworker/3:1H-134 [003] ..s.61.036138: > > => inet_csk_get_port > > => sched_clock > > => inet_addr_type_table > > => security_capable > > => inet_bind > > => xs_bind > > => release_sock > > => sock_setsockopt > > => __sock_create > > => xs_create_sock.isra.19 > > => xs_tcp_setup_socket > > => process_one_work > > => worker_thread > > => worker_thread > > => kthread > > => ret_from_fork > > => kthread > >kworker/3:1H-134 [003] 61.036139: xs_bind: RPC: xs_bind > > 4.136.255.255:805: ok (0) Here's where it is bounded. > >kworker/3:1H-134 [003] 61.036140: xs_tcp_setup_socket: RPC: > >worker connecting xprt 880407eca800 via tcp to 192.168.23.22 (port > > 43651) > >kworker/3:1H-134 [003] 61.036162: xs_tcp_setup_socket: RPC: > >880407eca800 connect status 115 connected 0 sock state 2 > > -0 [001] ..s.61.036450: xs_tcp_state_change: RPC: > >xs_tcp_state_change client 880407eca800... > > -0 [001] ..s.61.036452: xs_tcp_state_change: RPC: > >state 1 conn 0 dead 0 zapped 1 sk_shutdown 0 > >kworker/1:1H-136 [001] 61.036476: xprt_connect_status: RPC: > > 43 xprt_connect_status: retrying > >kworker/1:1H-136 [001] 61.036478: xprt_prepare_transmit: RPC: > > 43 xprt_prepare_transmit > >kworker/1:1H-136 [001] 61.036479: xprt_transmit: RPC:43 > > xprt_transmit(72) > >kworker/1:1H-136 [001] 61.036486: xs_tcp_send_request: RPC: > >xs_tcp_send_request(72) = 0 > >kworker/1:1H-136 [001] 61.036487: xprt_transmit: RPC:43 > > xmit complete > > -0 [001] ..s.61.036789: xs_tcp_data_ready: RPC: > > xs_tcp_data_ready... > >kworker/1:1H-136 [001] 61.036798: xs_tcp_data_recv: RPC: > > xs_tcp_data_recv started > >kworker/1:1H-136 [001] 61.036799: xs_tcp_data_recv: RPC: > > reading TCP record fragment of length 24 > >kworker/1:1H-136 [001] 61.036799: xs_tcp_data_recv: RPC: > > reading XID (4 bytes) > >kworker/1:1H-136 [001] 61.036800: xs_tcp_data_recv: RPC: > > reading request with XID 2f4c3f88 > >kworker/1:1H-136 [001] 61.036800: xs_tcp_data_recv: RPC: > > reading CALL/REPLY flag (4 bytes) > >kworker/1:1H-136 [001] 61.036801: xs_tcp_data_recv: RPC: > > read reply XID 2f4c3f88 > >kworker/1:1H-136 [001] ..s.61.036801: xs_tcp_data_recv: RPC: > > XID 2f4c3f88 read 16 bytes > >kworker/1:1H-136 [001] ..s.61.036802: xs_tcp_data_recv: RPC: > > xprt = 880407eca800, tcp_copied = 24, tcp_offset = 24, tcp_reclen = 24 > >kworker/1:1H-136 [001] ..s.61.036802: xprt_complete_rqst: RPC: > > 43 xid 2f4c3f88 complete (24 bytes received) > >kworker/1:1H-136 [001] 61.036803: xs_tcp_data_recv: RPC: > > xs_tcp_data_recv done > >kworker/1:1H-136 [001] 61.036812: xprt_release: RPC:43 > > release request 88040b270800 > > > > > > # unhide-tcp > > Unhide-tcp 20130526 > > Copyright © 2013 Yago Jesus & Patrick Gouin > > License GPLv3+ : GNU GPL version 3 or later > > http://www.unhide-forensics.info > > Used options: > > [*]Starting TCP checking > > > > Found Hidden port that not appears in ss: 805 > > > > What is a “Hidden port that not appears in ss: 805”, and what does this > report mean? Are we failing to close a socket? I believe hidden ports are ports that are bound to no socket. Basically, a "port leak". Where they are in limbo and can never be reused. I looked at my past report, and everthing is exactly like the issue before. When I first boot my box, the port is there, I have the above trace. I run netstat -
Re: It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))
> On Jun 30, 2016, at 08:59, Steven Rostedt wrote: > > [ resending as a new email, as I'm assuming people do not sort their > INBOX via last email on thread, thus my last email is sitting in the > bottom of everyone's INBOX ] > > I've hit this again. Not sure when it started, but I applied my old > debug trace_printk() patch (attached) and rebooted (4.5.7). I just > tested the latest kernel from Linus's tree (from last nights pull), and > it still gives me the problem. > > Here's the trace I have: > >kworker/3:1H-134 [003] ..s.61.036129: inet_csk_get_port: snum 805 >kworker/3:1H-134 [003] ..s.61.036135: > => sched_clock > => inet_addr_type_table > => security_capable > => inet_bind > => xs_bind > => release_sock > => sock_setsockopt > => __sock_create > => xs_create_sock.isra.19 > => xs_tcp_setup_socket > => process_one_work > => worker_thread > => worker_thread > => kthread > => ret_from_fork > => kthread >kworker/3:1H-134 [003] ..s.61.036136: inet_bind_hash: add 805 >kworker/3:1H-134 [003] ..s.61.036138: > => inet_csk_get_port > => sched_clock > => inet_addr_type_table > => security_capable > => inet_bind > => xs_bind > => release_sock > => sock_setsockopt > => __sock_create > => xs_create_sock.isra.19 > => xs_tcp_setup_socket > => process_one_work > => worker_thread > => worker_thread > => kthread > => ret_from_fork > => kthread >kworker/3:1H-134 [003] 61.036139: xs_bind: RPC: xs_bind > 4.136.255.255:805: ok (0) >kworker/3:1H-134 [003] 61.036140: xs_tcp_setup_socket: RPC: > worker connecting xprt 880407eca800 via tcp to 192.168.23.22 (port 43651) >kworker/3:1H-134 [003] 61.036162: xs_tcp_setup_socket: RPC: > 880407eca800 connect status 115 connected 0 sock state 2 > -0 [001] ..s.61.036450: xs_tcp_state_change: RPC: > xs_tcp_state_change client 880407eca800... > -0 [001] ..s.61.036452: xs_tcp_state_change: RPC: > state 1 conn 0 dead 0 zapped 1 sk_shutdown 0 >kworker/1:1H-136 [001] 61.036476: xprt_connect_status: RPC: > 43 xprt_connect_status: retrying >kworker/1:1H-136 [001] 61.036478: xprt_prepare_transmit: RPC: > 43 xprt_prepare_transmit >kworker/1:1H-136 [001] 61.036479: xprt_transmit: RPC:43 > xprt_transmit(72) >kworker/1:1H-136 [001] 61.036486: xs_tcp_send_request: RPC: > xs_tcp_send_request(72) = 0 >kworker/1:1H-136 [001] 61.036487: xprt_transmit: RPC:43 xmit > complete > -0 [001] ..s.61.036789: xs_tcp_data_ready: RPC: > xs_tcp_data_ready... >kworker/1:1H-136 [001] 61.036798: xs_tcp_data_recv: RPC: > xs_tcp_data_recv started >kworker/1:1H-136 [001] 61.036799: xs_tcp_data_recv: RPC: > reading TCP record fragment of length 24 >kworker/1:1H-136 [001] 61.036799: xs_tcp_data_recv: RPC: > reading XID (4 bytes) >kworker/1:1H-136 [001] 61.036800: xs_tcp_data_recv: RPC: > reading request with XID 2f4c3f88 >kworker/1:1H-136 [001] 61.036800: xs_tcp_data_recv: RPC: > reading CALL/REPLY flag (4 bytes) >kworker/1:1H-136 [001] 61.036801: xs_tcp_data_recv: RPC: > read reply XID 2f4c3f88 >kworker/1:1H-136 [001] ..s.61.036801: xs_tcp_data_recv: RPC: > XID 2f4c3f88 read 16 bytes >kworker/1:1H-136 [001] ..s.61.036802: xs_tcp_data_recv: RPC: > xprt = 880407eca800, tcp_copied = 24, tcp_offset = 24, tcp_reclen = 24 >kworker/1:1H-136 [001] ..s.61.036802: xprt_complete_rqst: RPC:43 > xid 2f4c3f88 complete (24 bytes received) >kworker/1:1H-136 [001] 61.036803: xs_tcp_data_recv: RPC: > xs_tcp_data_recv done >kworker/1:1H-136 [001] 61.036812: xprt_release: RPC:43 > release request 88040b270800 > > > # unhide-tcp > Unhide-tcp 20130526 > Copyright © 2013 Yago Jesus & Patrick Gouin > License GPLv3+ : GNU GPL version 3 or later > http://www.unhide-forensics.info > Used options: > [*]Starting TCP checking > > Found Hidden port that not appears in ss: 805 > What is a “Hidden port that not appears in ss: 805”, and what does this report mean? Are we failing to close a socket? Cheers Trond
It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))
[ resending as a new email, as I'm assuming people do not sort their INBOX via last email on thread, thus my last email is sitting in the bottom of everyone's INBOX ] I've hit this again. Not sure when it started, but I applied my old debug trace_printk() patch (attached) and rebooted (4.5.7). I just tested the latest kernel from Linus's tree (from last nights pull), and it still gives me the problem. Here's the trace I have: kworker/3:1H-134 [003] ..s.61.036129: inet_csk_get_port: snum 805 kworker/3:1H-134 [003] ..s.61.036135: => sched_clock => inet_addr_type_table => security_capable => inet_bind => xs_bind => release_sock => sock_setsockopt => __sock_create => xs_create_sock.isra.19 => xs_tcp_setup_socket => process_one_work => worker_thread => worker_thread => kthread => ret_from_fork => kthread kworker/3:1H-134 [003] ..s.61.036136: inet_bind_hash: add 805 kworker/3:1H-134 [003] ..s.61.036138: => inet_csk_get_port => sched_clock => inet_addr_type_table => security_capable => inet_bind => xs_bind => release_sock => sock_setsockopt => __sock_create => xs_create_sock.isra.19 => xs_tcp_setup_socket => process_one_work => worker_thread => worker_thread => kthread => ret_from_fork => kthread kworker/3:1H-134 [003] 61.036139: xs_bind: RPC: xs_bind 4.136.255.255:805: ok (0) kworker/3:1H-134 [003] 61.036140: xs_tcp_setup_socket: RPC: worker connecting xprt 880407eca800 via tcp to 192.168.23.22 (port 43651) kworker/3:1H-134 [003] 61.036162: xs_tcp_setup_socket: RPC: 880407eca800 connect status 115 connected 0 sock state 2 -0 [001] ..s.61.036450: xs_tcp_state_change: RPC: xs_tcp_state_change client 880407eca800... -0 [001] ..s.61.036452: xs_tcp_state_change: RPC: state 1 conn 0 dead 0 zapped 1 sk_shutdown 0 kworker/1:1H-136 [001] 61.036476: xprt_connect_status: RPC:43 xprt_connect_status: retrying kworker/1:1H-136 [001] 61.036478: xprt_prepare_transmit: RPC: 43 xprt_prepare_transmit kworker/1:1H-136 [001] 61.036479: xprt_transmit: RPC:43 xprt_transmit(72) kworker/1:1H-136 [001] 61.036486: xs_tcp_send_request: RPC: xs_tcp_send_request(72) = 0 kworker/1:1H-136 [001] 61.036487: xprt_transmit: RPC:43 xmit complete -0 [001] ..s.61.036789: xs_tcp_data_ready: RPC: xs_tcp_data_ready... kworker/1:1H-136 [001] 61.036798: xs_tcp_data_recv: RPC: xs_tcp_data_recv started kworker/1:1H-136 [001] 61.036799: xs_tcp_data_recv: RPC: reading TCP record fragment of length 24 kworker/1:1H-136 [001] 61.036799: xs_tcp_data_recv: RPC: reading XID (4 bytes) kworker/1:1H-136 [001] 61.036800: xs_tcp_data_recv: RPC: reading request with XID 2f4c3f88 kworker/1:1H-136 [001] 61.036800: xs_tcp_data_recv: RPC: reading CALL/REPLY flag (4 bytes) kworker/1:1H-136 [001] 61.036801: xs_tcp_data_recv: RPC: read reply XID 2f4c3f88 kworker/1:1H-136 [001] ..s.61.036801: xs_tcp_data_recv: RPC: XID 2f4c3f88 read 16 bytes kworker/1:1H-136 [001] ..s.61.036802: xs_tcp_data_recv: RPC: xprt = 880407eca800, tcp_copied = 24, tcp_offset = 24, tcp_reclen = 24 kworker/1:1H-136 [001] ..s.61.036802: xprt_complete_rqst: RPC:43 xid 2f4c3f88 complete (24 bytes received) kworker/1:1H-136 [001] 61.036803: xs_tcp_data_recv: RPC: xs_tcp_data_recv done kworker/1:1H-136 [001] 61.036812: xprt_release: RPC:43 release request 88040b270800 # unhide-tcp Unhide-tcp 20130526 Copyright © 2013 Yago Jesus & Patrick Gouin License GPLv3+ : GNU GPL version 3 or later http://www.unhide-forensics.info Used options: [*]Starting TCP checking Found Hidden port that not appears in ss: 805 -- Steve --- net/ipv4/inet_connection_sock.c |4 net/ipv4/inet_hashtables.c |5 + net/sunrpc/xprt.c |5 + net/sunrpc/xprtsock.c |5 + 5 files changed, 22 insertions(+) Index: linux-build.git/net/ipv4/inet_connection_sock.c === --- linux-build.git.orig/net/ipv4/inet_connection_sock.c 2016-06-22 11:55:05.952267493 -0400 +++ linux-build.git/net/ipv4/inet_connection_sock.c 2016-06-22 11:56:20.002662092 -0400 @@ -232,6 +232,10 @@ tb_found: } } success: + if (!current->mm) { + trace_printk("snum %d\n", snum); + trace_dump_stack(1); + } if (!inet_csk(sk)->icsk_bind_hash) inet_bind_hash(sk, tb, port); WARN_ON(inet_csk(sk)->icsk_bind_hash != tb); Index: linux-build.git/net/ipv4/inet_hashtables.c === --- linux-build.git.orig/net/ipv4/inet_hashtables.c 2016-06-22 11:55:0
It's back! (Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() ))
I've hit this again. Not sure when it started, but I applied my old debug trace_printk() patch (attached) and rebooted (4.5.7). I just tested the latest kernel from Linus's tree (from last nights pull), and it still gives me the problem. Here's the trace I have: kworker/3:1H-134 [003] ..s.61.036129: inet_csk_get_port: snum 805 kworker/3:1H-134 [003] ..s.61.036135: => sched_clock => inet_addr_type_table => security_capable => inet_bind => xs_bind => release_sock => sock_setsockopt => __sock_create => xs_create_sock.isra.19 => xs_tcp_setup_socket => process_one_work => worker_thread => worker_thread => kthread => ret_from_fork => kthread kworker/3:1H-134 [003] ..s.61.036136: inet_bind_hash: add 805 kworker/3:1H-134 [003] ..s.61.036138: => inet_csk_get_port => sched_clock => inet_addr_type_table => security_capable => inet_bind => xs_bind => release_sock => sock_setsockopt => __sock_create => xs_create_sock.isra.19 => xs_tcp_setup_socket => process_one_work => worker_thread => worker_thread => kthread => ret_from_fork => kthread kworker/3:1H-134 [003] 61.036139: xs_bind: RPC: xs_bind 4.136.255.255:805: ok (0) kworker/3:1H-134 [003] 61.036140: xs_tcp_setup_socket: RPC: worker connecting xprt 880407eca800 via tcp to 192.168.23.22 (port 43651) kworker/3:1H-134 [003] 61.036162: xs_tcp_setup_socket: RPC: 880407eca800 connect status 115 connected 0 sock state 2 -0 [001] ..s.61.036450: xs_tcp_state_change: RPC: xs_tcp_state_change client 880407eca800... -0 [001] ..s.61.036452: xs_tcp_state_change: RPC: state 1 conn 0 dead 0 zapped 1 sk_shutdown 0 kworker/1:1H-136 [001] 61.036476: xprt_connect_status: RPC:43 xprt_connect_status: retrying kworker/1:1H-136 [001] 61.036478: xprt_prepare_transmit: RPC: 43 xprt_prepare_transmit kworker/1:1H-136 [001] 61.036479: xprt_transmit: RPC:43 xprt_transmit(72) kworker/1:1H-136 [001] 61.036486: xs_tcp_send_request: RPC: xs_tcp_send_request(72) = 0 kworker/1:1H-136 [001] 61.036487: xprt_transmit: RPC:43 xmit complete -0 [001] ..s.61.036789: xs_tcp_data_ready: RPC: xs_tcp_data_ready... kworker/1:1H-136 [001] 61.036798: xs_tcp_data_recv: RPC: xs_tcp_data_recv started kworker/1:1H-136 [001] 61.036799: xs_tcp_data_recv: RPC: reading TCP record fragment of length 24 kworker/1:1H-136 [001] 61.036799: xs_tcp_data_recv: RPC: reading XID (4 bytes) kworker/1:1H-136 [001] 61.036800: xs_tcp_data_recv: RPC: reading request with XID 2f4c3f88 kworker/1:1H-136 [001] 61.036800: xs_tcp_data_recv: RPC: reading CALL/REPLY flag (4 bytes) kworker/1:1H-136 [001] 61.036801: xs_tcp_data_recv: RPC: read reply XID 2f4c3f88 kworker/1:1H-136 [001] ..s.61.036801: xs_tcp_data_recv: RPC: XID 2f4c3f88 read 16 bytes kworker/1:1H-136 [001] ..s.61.036802: xs_tcp_data_recv: RPC: xprt = 880407eca800, tcp_copied = 24, tcp_offset = 24, tcp_reclen = 24 kworker/1:1H-136 [001] ..s.61.036802: xprt_complete_rqst: RPC:43 xid 2f4c3f88 complete (24 bytes received) kworker/1:1H-136 [001] 61.036803: xs_tcp_data_recv: RPC: xs_tcp_data_recv done kworker/1:1H-136 [001] 61.036812: xprt_release: RPC:43 release request 88040b270800 # unhide-tcp Unhide-tcp 20130526 Copyright © 2013 Yago Jesus & Patrick Gouin License GPLv3+ : GNU GPL version 3 or later http://www.unhide-forensics.info Used options: [*]Starting TCP checking Found Hidden port that not appears in ss: 805 -- Steve --- net/ipv4/inet_connection_sock.c |4 net/ipv4/inet_hashtables.c |5 + net/sunrpc/xprt.c |5 + net/sunrpc/xprtsock.c |5 + 5 files changed, 22 insertions(+) Index: linux-build.git/net/ipv4/inet_connection_sock.c === --- linux-build.git.orig/net/ipv4/inet_connection_sock.c 2016-06-22 11:55:05.952267493 -0400 +++ linux-build.git/net/ipv4/inet_connection_sock.c 2016-06-22 11:56:20.002662092 -0400 @@ -232,6 +232,10 @@ tb_found: } } success: + if (!current->mm) { + trace_printk("snum %d\n", snum); + trace_dump_stack(1); + } if (!inet_csk(sk)->icsk_bind_hash) inet_bind_hash(sk, tb, port); WARN_ON(inet_csk(sk)->icsk_bind_hash != tb); Index: linux-build.git/net/ipv4/inet_hashtables.c === --- linux-build.git.orig/net/ipv4/inet_hashtables.c 2016-06-22 11:55:05.952267493 -0400 +++ linux-build.git/net/ipv4/inet_hashtables.c 2016-06-22 11:55:05.948267360 -0400 @@ -93,6 +93,11 @@ void inet_bind_bucket_destroy(struct kme void inet_bind
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, Jun 19, 2015 at 9:27 PM, Steven Rostedt wrote: > On Fri, 19 Jun 2015 19:25:59 -0400 > Trond Myklebust wrote: > > >> 8<-- >> >From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001 >> From: Trond Myklebust >> Date: Fri, 19 Jun 2015 16:17:57 -0400 >> Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been >> closed >> >> This fixes a regression introduced by commit caf4ccd4e88cf2 ("SUNRPC: >> Make xs_tcp_close() do a socket shutdown rather than a sock_release"). >> Prior to that commit, the autoclose feature would ensure that an >> idle connection would result in the socket being both disconnected and >> released, whereas now only gets disconnected. >> >> While the current behaviour is harmless, it does leave the port bound >> until either RPC traffic resumes or the RPC client is shut down. > > Is there a way to test RPC traffic resuming? I'd like to try that before > declaring this bug harmless. You should be seeing the same issue if you mount an NFSv3 partition. After about 5 minutes of inactivity, the client will close down the connection to the server, and rkhunter should again see the phantom socket. If you then try to access the partition, the RPC layer should immediately release the socket and establish a new connection on the same port. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe netdev" in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 19:25:59 -0400 Trond Myklebust wrote: > 8<-- > >From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001 > From: Trond Myklebust > Date: Fri, 19 Jun 2015 16:17:57 -0400 > Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been > closed > > This fixes a regression introduced by commit caf4ccd4e88cf2 ("SUNRPC: > Make xs_tcp_close() do a socket shutdown rather than a sock_release"). > Prior to that commit, the autoclose feature would ensure that an > idle connection would result in the socket being both disconnected and > released, whereas now only gets disconnected. > > While the current behaviour is harmless, it does leave the port bound > until either RPC traffic resumes or the RPC client is shut down. Is there a way to test RPC traffic resuming? I'd like to try that before declaring this bug harmless. > > Reported-by: Steven Rostedt The problem appears to go away with this patch. Tested-by: Steven Rostedt Thanks a lot! -- Steve > Signed-off-by: Trond Myklebust > --- > net/sunrpc/xprt.c | 2 +- > net/sunrpc/xprtsock.c | 40 ++-- > 2 files changed, 23 insertions(+), 19 deletions(-) > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 3ca31f20b97c..ab5dd621ae0c 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) > struct rpc_xprt *xprt = > container_of(work, struct rpc_xprt, task_cleanup); > > - xprt->ops->close(xprt); > clear_bit(XPRT_CLOSE_WAIT, &xprt->state); > + xprt->ops->close(xprt); > xprt_release_write(xprt, NULL); > } > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index fda8ec8c74c0..ee0715dfc3c7 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -623,24 +623,6 @@ process_status: > } > > /** > - * xs_tcp_shutdown - gracefully shut down a TCP socket > - * @xprt: transport > - * > - * Initiates a graceful shutdown of the TCP socket by calling the > - * equivalent of shutdown(SHUT_RDWR); > - */ > -static void xs_tcp_shutdown(struct rpc_xprt *xprt) > -{ > - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, > xprt); > - struct socket *sock = transport->sock; > - > - if (sock != NULL) { > - kernel_sock_shutdown(sock, SHUT_RDWR); > - trace_rpc_socket_shutdown(xprt, sock); > - } > -} > - > -/** > * xs_tcp_send_request - write an RPC request to a TCP socket > * @task: address of RPC task that manages the state of an RPC request > * > @@ -786,6 +768,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) > xs_sock_reset_connection_flags(xprt); > /* Mark transport as closed and wake up all pending tasks */ > xprt_disconnect_done(xprt); > + xprt_force_disconnect(xprt); > } > > /** > @@ -2103,6 +2086,27 @@ out: > xprt_wake_pending_tasks(xprt, status); > } > > +/** > + * xs_tcp_shutdown - gracefully shut down a TCP socket > + * @xprt: transport > + * > + * Initiates a graceful shutdown of the TCP socket by calling the > + * equivalent of shutdown(SHUT_RDWR); > + */ > +static void xs_tcp_shutdown(struct rpc_xprt *xprt) > +{ > + struct sock_xprt *transport = container_of(xprt, struct sock_xprt, > xprt); > + struct socket *sock = transport->sock; > + > + if (sock == NULL) > + return; > + if (xprt_connected(xprt)) { > + kernel_sock_shutdown(sock, SHUT_RDWR); > + trace_rpc_socket_shutdown(xprt, sock); > + } else > + xs_reset_transport(transport); > +} > + > static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket > *sock) > { > struct sock_xprt *transport = container_of(xprt, struct sock_xprt, > xprt); -- To unsubscribe from this list: send the line "unsubscribe netdev" in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 20:37:45 -0400 Steven Rostedt wrote: > > Is it causing any other damage than the rkhunter warning you reported? > > Well, not that I know of. Are you sure that this port will be > reconnected, and is not just a leak. Not sure if you could waste more > ports this way with connections to other machines. I only have my > wife's box connect to this server. This server is actually a client to > my other boxes. > > Although the rkhunter warning is the only thing that triggers, I still > would think this is a stable fix, especially if the port is leaked and > not taken again. I did some experiments. If I unmount the directories from my wife's machine and remount them, the port that was hidden is fully closed. Maybe it's not that big of a deal after all. -- Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 19:25:59 -0400 Trond Myklebust wrote: > On Fri, 2015-06-19 at 18:14 -0400, Steven Rostedt wrote: > > On Fri, 19 Jun 2015 16:30:18 -0400 > > Trond Myklebust wrote: > > > > > Steven, how about something like the following patch? > > > > > > > OK, the box I'm running this on is using v4.0.5, can you make a patch > > based on that, as whatever you make needs to go to stable as well. > > Is it causing any other damage than the rkhunter warning you reported? Well, not that I know of. Are you sure that this port will be reconnected, and is not just a leak. Not sure if you could waste more ports this way with connections to other machines. I only have my wife's box connect to this server. This server is actually a client to my other boxes. Although the rkhunter warning is the only thing that triggers, I still would think this is a stable fix, especially if the port is leaked and not taken again. > > Sorry. I sent that one off too quickly. Try the following. This built, will be testing it shortly. -- Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 2015-06-19 at 18:14 -0400, Steven Rostedt wrote: > On Fri, 19 Jun 2015 16:30:18 -0400 > Trond Myklebust wrote: > > > Steven, how about something like the following patch? > > > > OK, the box I'm running this on is using v4.0.5, can you make a patch > based on that, as whatever you make needs to go to stable as well. Is it causing any other damage than the rkhunter warning you reported? > distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux > -build.git/net/sunrpc/xprtsock.c on fedora/8 failed > distcc[31554] (dcc_build_somewhere) Warning: remote compilation of > '/home/rostedt/work/git/nobackup/linux > -build.git/net/sunrpc/xprtsock.c' failed, retrying locally > distcc[31554] Warning: failed to distribute > /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c > to fedora/8, running locally instead > /home/rostedt/work/git/nobackup/linux > -build.git/net/sunrpc/xprtsock.c: In function 'xs_tcp_shutdown': > /home/rostedt/work/git/nobackup/linux > -build.git/net/sunrpc/xprtsock.c:643:3: error: implicit declaration > of function 'xs_reset_transport' [-Werror=implicit-function > -declaration] > /home/rostedt/work/git/nobackup/linux > -build.git/net/sunrpc/xprtsock.c: At top level: > /home/rostedt/work/git/nobackup/linux > -build.git/net/sunrpc/xprtsock.c:825:13: warning: conflicting types > for 'xs_reset_transport' [enabled by default] > /home/rostedt/work/git/nobackup/linux > -build.git/net/sunrpc/xprtsock.c:825:13: error: static declaration of > 'xs_reset_transport' follows non-static declaration > /home/rostedt/work/git/nobackup/linux > -build.git/net/sunrpc/xprtsock.c:643:3: note: previous implicit > declaration of 'xs_reset_transport' was here > cc1: some warnings being treated as errors > distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux > -build.git/net/sunrpc/xprtsock.c on localhost failed > /home/rostedt/work/git/nobackup/linux > -build.git/scripts/Makefile.build:258: recipe for target > 'net/sunrpc/xprtsock.o' failed > make[3]: *** [net/sunrpc/xprtsock.o] Error 1 Sorry. I sent that one off too quickly. Try the following. 8<-- >From 4876cc779ff525b9c2376d8076edf47815e71f2c Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH v2] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 ("SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release"). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Reported-by: Steven Rostedt Signed-off-by: Trond Myklebust --- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtsock.c | 40 ++-- 2 files changed, 23 insertions(+), 19 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3ca31f20b97c..ab5dd621ae0c 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) struct rpc_xprt *xprt = container_of(work, struct rpc_xprt, task_cleanup); - xprt->ops->close(xprt); clear_bit(XPRT_CLOSE_WAIT, &xprt->state); + xprt->ops->close(xprt); xprt_release_write(xprt, NULL); } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fda8ec8c74c0..ee0715dfc3c7 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -623,24 +623,6 @@ process_status: } /** - * xs_tcp_shutdown - gracefully shut down a TCP socket - * @xprt: transport - * - * Initiates a graceful shutdown of the TCP socket by calling the - * equivalent of shutdown(SHUT_RDWR); - */ -static void xs_tcp_shutdown(struct rpc_xprt *xprt) -{ - struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); - struct socket *sock = transport->sock; - - if (sock != NULL) { - kernel_sock_shutdown(sock, SHUT_RDWR); - trace_rpc_socket_shutdown(xprt, sock); - } -} - -/** * xs_tcp_send_request - write an RPC request to a TCP socket * @task: address of RPC task that manages the state of an RPC request * @@ -786,6 +768,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) xs_sock_reset_connection_flags(xprt); /* Mark transport as closed and wake up all pending tasks */ xprt_disconnect_done(xprt); + xprt_force_disconnect(xprt); } /** @@ -2103,6 +2086,27 @@ out: xprt_wake_pending_tasks(xprt, status); } +/** + * xs_tcp_shutdown - gracefully shut down a TCP socket + * @xprt: transport + * + * Initiates a graceful shutdown of the TCP socket by calling the + * equivalent of shutdown(SHUT_RDWR); + */ +st
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 16:30:18 -0400 Trond Myklebust wrote: > Steven, how about something like the following patch? > OK, the box I'm running this on is using v4.0.5, can you make a patch based on that, as whatever you make needs to go to stable as well. distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c on fedora/8 failed distcc[31554] (dcc_build_somewhere) Warning: remote compilation of '/home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c' failed, retrying locally distcc[31554] Warning: failed to distribute /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c to fedora/8, running locally instead /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c: In function 'xs_tcp_shutdown': /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:643:3: error: implicit declaration of function 'xs_reset_transport' [-Werror=implicit-function-declaration] /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c: At top level: /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:825:13: warning: conflicting types for 'xs_reset_transport' [enabled by default] /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:825:13: error: static declaration of 'xs_reset_transport' follows non-static declaration /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c:643:3: note: previous implicit declaration of 'xs_reset_transport' was here cc1: some warnings being treated as errors distcc[31554] ERROR: compile /home/rostedt/work/git/nobackup/linux-build.git/net/sunrpc/xprtsock.c on localhost failed /home/rostedt/work/git/nobackup/linux-build.git/scripts/Makefile.build:258: recipe for target 'net/sunrpc/xprtsock.o' failed make[3]: *** [net/sunrpc/xprtsock.o] Error 1 -- Steve > 8<- > >From 9a0bcfdbdbc793eae1ed6d901a6396b6c66f9513 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust > Date: Fri, 19 Jun 2015 16:17:57 -0400 > Subject: [PATCH] SUNRPC: Ensure we release the TCP socket once it has been > closed > > This fixes a regression introduced by commit caf4ccd4e88cf2 ("SUNRPC: > Make xs_tcp_close() do a socket shutdown rather than a sock_release"). > Prior to that commit, the autoclose feature would ensure that an > idle connection would result in the socket being both disconnected and > released, whereas now only gets disconnected. > > While the current behaviour is harmless, it does leave the port bound > until either RPC traffic resumes or the RPC client is shut down. > > Reported-by: Steven Rostedt > Signed-off-by: Trond Myklebust > --- > net/sunrpc/xprt.c | 2 +- > net/sunrpc/xprtsock.c | 8 ++-- > 2 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 3ca31f20b97c..ab5dd621ae0c 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) > struct rpc_xprt *xprt = > container_of(work, struct rpc_xprt, task_cleanup); > > - xprt->ops->close(xprt); > clear_bit(XPRT_CLOSE_WAIT, &xprt->state); > + xprt->ops->close(xprt); > xprt_release_write(xprt, NULL); > } > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index fda8ec8c74c0..75dcdadf0269 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -634,10 +634,13 @@ static void xs_tcp_shutdown(struct rpc_xprt *xprt) > struct sock_xprt *transport = container_of(xprt, struct sock_xprt, > xprt); > struct socket *sock = transport->sock; > > - if (sock != NULL) { > + if (sock == NULL) > + return; > + if (xprt_connected(xprt)) { > kernel_sock_shutdown(sock, SHUT_RDWR); > trace_rpc_socket_shutdown(xprt, sock); > - } > + } else > + xs_reset_transport(transport); > } > > /** > @@ -786,6 +789,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) > xs_sock_reset_connection_flags(xprt); > /* Mark transport as closed and wake up all pending tasks */ > xprt_disconnect_done(xprt); > + xprt_force_disconnect(xprt); > } > > /** -- To unsubscribe from this list: send the line "unsubscribe netdev" in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 16:30:18 -0400 Trond Myklebust wrote: > Steven, how about something like the following patch? Building it now. Will let you know in a bit. > > 8<- > >From 9a0bcfdbdbc793eae1ed6d901a6396b6c66f9513 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust > Date: Fri, 19 Jun 2015 16:17:57 -0400 > Subject: [PATCH] SUNRPC: Ensure we release the TCP socket once it has been > closed > > This fixes a regression introduced by commit caf4ccd4e88cf2 ("SUNRPC: > Make xs_tcp_close() do a socket shutdown rather than a sock_release"). > Prior to that commit, the autoclose feature would ensure that an > idle connection would result in the socket being both disconnected and > released, whereas now only gets disconnected. > > While the current behaviour is harmless, it does leave the port bound > until either RPC traffic resumes or the RPC client is shut down. Hmm, is this true? The port is bound, but the socket has been freed. That is sk->sk_socket points to garbage. As my portlist.c module verified. It doesn't seem that anything can attach to that port again that I know of. Is there a way to verify that something can attach to it again? -- Steve > > Reported-by: Steven Rostedt > Signed-off-by: Trond Myklebust > --- > net/sunrpc/xprt.c | 2 +- > net/sunrpc/xprtsock.c | 8 ++-- > 2 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index 3ca31f20b97c..ab5dd621ae0c 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) > struct rpc_xprt *xprt = > container_of(work, struct rpc_xprt, task_cleanup); > > - xprt->ops->close(xprt); > clear_bit(XPRT_CLOSE_WAIT, &xprt->state); > + xprt->ops->close(xprt); > xprt_release_write(xprt, NULL); > } > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index fda8ec8c74c0..75dcdadf0269 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -634,10 +634,13 @@ static void xs_tcp_shutdown(struct rpc_xprt *xprt) > struct sock_xprt *transport = container_of(xprt, struct sock_xprt, > xprt); > struct socket *sock = transport->sock; > > - if (sock != NULL) { > + if (sock == NULL) > + return; > + if (xprt_connected(xprt)) { > kernel_sock_shutdown(sock, SHUT_RDWR); > trace_rpc_socket_shutdown(xprt, sock); > - } > + } else > + xs_reset_transport(transport); > } > > /** > @@ -786,6 +789,7 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) > xs_sock_reset_connection_flags(xprt); > /* Mark transport as closed and wake up all pending tasks */ > xprt_disconnect_done(xprt); > + xprt_force_disconnect(xprt); > } > > /** -- To unsubscribe from this list: send the line "unsubscribe netdev" in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 13:39:08 -0400 Trond Myklebust wrote: > Hang on. This is on the client box while there is an active NFSv4 > mount? Then that's probably the NFSv4 callback channel listening for > delegation callbacks. > > Can you please try: > > echo "options nfs callback_tcpport=4048" > /etc/modprobe.d/nfs-local.conf > > and then either reboot the client or unload and then reload the nfs > modules before reattempting the mount. If this is indeed the callback > channel, then that will move your phantom listener to port 4048... I unmounted the directories, removed the nfs modules, and then add this file, and loaded the modules back and remounted the directories. # netstat -ntap |grep 4048 tcp0 0 0.0.0.0:40480.0.0.0:* LISTEN - tcp0 0 192.168.23.22:4048 192.168.23.9:1010 ESTABLISHED - tcp6 0 0 :::4048 :::*LISTEN - -- Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 2015-06-19 at 15:52 -0400, Jeff Layton wrote: > On Fri, 19 Jun 2015 13:39:08 -0400 > Trond Myklebust wrote: > > > On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt < > > rost...@goodmis.org> wrote: > > > On Fri, 19 Jun 2015 12:25:53 -0400 > > > Steven Rostedt wrote: > > > > > > > > > > I don't see that 55201 anywhere. But then again, I didn't look > > > > for it > > > > before the port disappeared. I could reboot and look for it > > > > again. I > > > > should have saved the full netstat -tapn as well :-/ > > > > > > Of course I didn't find it anywhere, that's the port on my wife's > > > box > > > that port 947 was connected to. > > > > > > Now I even went over to my wife's box and ran > > > > > > # rpcinfo -p localhost > > >program vers proto port service > > > 104 tcp111 portmapper > > > 103 tcp111 portmapper > > > 102 tcp111 portmapper > > > 104 udp111 portmapper > > > 103 udp111 portmapper > > > 102 udp111 portmapper > > > 1000241 udp 34243 status > > > 1000241 tcp 34498 status > > > > > > which doesn't show anything. > > > > > > but something is listening to that port... > > > > > > # netstat -ntap |grep 55201 > > > tcp0 0 0.0.0.0:55201 0.0.0.0:* > > >LISTEN > > > > > > Hang on. This is on the client box while there is an active NFSv4 > > mount? Then that's probably the NFSv4 callback channel listening > > for > > delegation callbacks. > > > > Can you please try: > > > > echo "options nfs callback_tcpport=4048" > /etc/modprobe.d/nfs > > -local.conf > > > > and then either reboot the client or unload and then reload the nfs > > modules before reattempting the mount. If this is indeed the > > callback > > channel, then that will move your phantom listener to port 4048... > > > > Right, it was a little unclear to me before, but it now seems clear > that the callback socket that the server is opening to the client is > the one squatting on the port. > > ...and that sort of makes sense, doesn't it? That rpc_clnt will stick > around for the life of the client's lease, and the rpc_clnt binds to > a > particular port so that it can reconnect using the same one. > > Given that Stephen has done the legwork and figured out that > reverting > those commits fixes the issue, then I suspect that the real culprit > is > caf4ccd4e88cf2. > > The client is likely closing down the other end of the callback > socket when it goes idle. Before that commit, we probably did an > xs_close on it, but now we're doing a xs_tcp_shutdown and that leaves > the port bound. > Agreed. I've been looking into whether or not there is a simple fix. Reverting those patches is not an option, because the whole point was to ensure that the socket is in the TCP_CLOSED state before we release the socket. Steven, how about something like the following patch? 8<- >From 9a0bcfdbdbc793eae1ed6d901a6396b6c66f9513 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 19 Jun 2015 16:17:57 -0400 Subject: [PATCH] SUNRPC: Ensure we release the TCP socket once it has been closed This fixes a regression introduced by commit caf4ccd4e88cf2 ("SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release"). Prior to that commit, the autoclose feature would ensure that an idle connection would result in the socket being both disconnected and released, whereas now only gets disconnected. While the current behaviour is harmless, it does leave the port bound until either RPC traffic resumes or the RPC client is shut down. Reported-by: Steven Rostedt Signed-off-by: Trond Myklebust --- net/sunrpc/xprt.c | 2 +- net/sunrpc/xprtsock.c | 8 ++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 3ca31f20b97c..ab5dd621ae0c 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -611,8 +611,8 @@ static void xprt_autoclose(struct work_struct *work) struct rpc_xprt *xprt = container_of(work, struct rpc_xprt, task_cleanup); - xprt->ops->close(xprt); clear_bit(XPRT_CLOSE_WAIT, &xprt->state); + xprt->ops->close(xprt); xprt_release_write(xprt, NULL); } diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index fda8ec8c74c0..75dcdadf0269 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -634,10 +634,13 @@ static void xs_tcp_shutdown(struct rpc_xprt *xprt) struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); struct socket *sock = transport->sock; - if (sock != NULL) { + if (sock == NULL) + return; + if (xprt_connected(xprt)) { kernel_sock_shutdown(sock, SHUT_RDWR); trace_rpc_socket_shutdown(xprt, sock); - } + } els
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 13:39:08 -0400 Trond Myklebust wrote: > On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt wrote: > > On Fri, 19 Jun 2015 12:25:53 -0400 > > Steven Rostedt wrote: > > > > > >> I don't see that 55201 anywhere. But then again, I didn't look for it > >> before the port disappeared. I could reboot and look for it again. I > >> should have saved the full netstat -tapn as well :-/ > > > > Of course I didn't find it anywhere, that's the port on my wife's box > > that port 947 was connected to. > > > > Now I even went over to my wife's box and ran > > > > # rpcinfo -p localhost > >program vers proto port service > > 104 tcp111 portmapper > > 103 tcp111 portmapper > > 102 tcp111 portmapper > > 104 udp111 portmapper > > 103 udp111 portmapper > > 102 udp111 portmapper > > 1000241 udp 34243 status > > 1000241 tcp 34498 status > > > > which doesn't show anything. > > > > but something is listening to that port... > > > > # netstat -ntap |grep 55201 > > tcp0 0 0.0.0.0:55201 0.0.0.0:* LISTEN > > > Hang on. This is on the client box while there is an active NFSv4 > mount? Then that's probably the NFSv4 callback channel listening for > delegation callbacks. > > Can you please try: > > echo "options nfs callback_tcpport=4048" > /etc/modprobe.d/nfs-local.conf > > and then either reboot the client or unload and then reload the nfs > modules before reattempting the mount. If this is indeed the callback > channel, then that will move your phantom listener to port 4048... > Right, it was a little unclear to me before, but it now seems clear that the callback socket that the server is opening to the client is the one squatting on the port. ...and that sort of makes sense, doesn't it? That rpc_clnt will stick around for the life of the client's lease, and the rpc_clnt binds to a particular port so that it can reconnect using the same one. Given that Stephen has done the legwork and figured out that reverting those commits fixes the issue, then I suspect that the real culprit is caf4ccd4e88cf2. The client is likely closing down the other end of the callback socket when it goes idle. Before that commit, we probably did an xs_close on it, but now we're doing a xs_tcp_shutdown and that leaves the port bound. I'm travelling this weekend and am not set up to reproduce it to confirm, but that does seem to be a plausible scenario. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe netdev" in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, Jun 19, 2015 at 1:17 PM, Steven Rostedt wrote: > On Fri, 19 Jun 2015 12:25:53 -0400 > Steven Rostedt wrote: > > >> I don't see that 55201 anywhere. But then again, I didn't look for it >> before the port disappeared. I could reboot and look for it again. I >> should have saved the full netstat -tapn as well :-/ > > Of course I didn't find it anywhere, that's the port on my wife's box > that port 947 was connected to. > > Now I even went over to my wife's box and ran > > # rpcinfo -p localhost >program vers proto port service > 104 tcp111 portmapper > 103 tcp111 portmapper > 102 tcp111 portmapper > 104 udp111 portmapper > 103 udp111 portmapper > 102 udp111 portmapper > 1000241 udp 34243 status > 1000241 tcp 34498 status > > which doesn't show anything. > > but something is listening to that port... > > # netstat -ntap |grep 55201 > tcp0 0 0.0.0.0:55201 0.0.0.0:* LISTEN Hang on. This is on the client box while there is an active NFSv4 mount? Then that's probably the NFSv4 callback channel listening for delegation callbacks. Can you please try: echo "options nfs callback_tcpport=4048" > /etc/modprobe.d/nfs-local.conf and then either reboot the client or unload and then reload the nfs modules before reattempting the mount. If this is indeed the callback channel, then that will move your phantom listener to port 4048... Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe netdev" in
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 19 Jun 2015 12:25:53 -0400 Steven Rostedt wrote: > I don't see that 55201 anywhere. But then again, I didn't look for it > before the port disappeared. I could reboot and look for it again. I > should have saved the full netstat -tapn as well :-/ Of course I didn't find it anywhere, that's the port on my wife's box that port 947 was connected to. Now I even went over to my wife's box and ran # rpcinfo -p localhost program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 34243 status 1000241 tcp 34498 status which doesn't show anything. but something is listening to that port... # netstat -ntap |grep 55201 tcp0 0 0.0.0.0:55201 0.0.0.0:* LISTEN I rebooted again, but this time I ran this on my wife's box: # trace-cmd record -e nfs -e nfs4 -e net -e skb -e sock -e udp -e workqueue -e sunrpc I started it when my server started booting the kernel, and kept it running till the port vanished. The full trace can be downloaded from http://rostedt.homelinux.com/private/wife-trace.txt Here's some interesting output from that trace: ksoftirq-13 1..s. 12272627.681760: netif_receive_skb:dev=lo skbaddr=0x88020944c600 len=88 ksoftirq-13 1..s. 12272627.681776: net_dev_queue:dev=eth0 skbaddr=0x880234e5b100 len=42 ksoftirq-13 1..s. 12272627.681777: net_dev_start_xmit: dev=eth0 queue_mapping=0 skbaddr=0x880234e5b100 vlan_tagged=0 vlan_proto=0x vlan_tci=0x protocol=0x0806 ip_ summed=0 len=42 data_len=0 network_offset=14 transport_offset_valid=0 transport_offset=65533 tx_flags=0 gso_size=0 gso_segs=0 gso_type=0 ksoftirq-13 1..s. 12272627.681779: net_dev_xmit: dev=eth0 skbaddr=0x880234e5b100 len=42 rc=0 ksoftirq-13 1..s. 12272627.681780: kfree_skb: skbaddr=0x88023444cf00 protocol=2048 location=0x81422a72 ksoftirq-13 1..s. 12272627.681783: rpc_socket_error: error=-113 socket:[11886206] dstaddr=192.168.23.9/2049 state=2 () sk_state=2 () ksoftirq-13 1..s. 12272627.681785: rpc_task_wakeup: task:18128@0 flags=5281 state=0006 status=-113 timeout=45000 queue=xprt_pending ksoftirq-13 1d.s. 12272627.681786: workqueue_queue_work: work struct=0x8800b5a94588 function=rpc_async_schedule workqueue=0x880234666800 req_cpu=512 cpu=1 ksoftirq-13 1d.s. 12272627.681787: workqueue_activate_work: work struct 0x8800b5a94588 ksoftirq-13 1..s. 12272627.681791: rpc_socket_state_change: socket:[11886206] dstaddr=192.168.23.9/2049 state=2 () sk_state=7 () ksoftirq-13 1..s. 12272627.681792: kfree_skb: skbaddr=0x88020944c600 protocol=2048 location=0x81482c05 kworker/-20111 1 12272627.681796: workqueue_execute_start: work struct 0x8800b5a94588: function rpc_async_schedule kworker/-20111 1 12272627.681797: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=-113 action=call_connect_status kworker/-20111 1 12272627.681798: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=-113 action=call_connect_status kworker/-20111 1 12272627.681798: rpc_connect_status: task:18128@0, status -113 kworker/-20111 1..s. 12272627.681799: rpc_task_sleep: task:18128@0 flags=5281 state=0005 status=0 timeout=750 queue=delayq kworker/-20111 1 12272627.681800: workqueue_execute_end: work struct 0x8800b5a94588 -0 1..s. 12272630.688741: rpc_task_wakeup: task:18128@0 flags=5281 state=0006 status=-110 timeout=750 queue=delayq -0 1dNs. 12272630.688749: workqueue_queue_work: work struct=0x8800b5a94588 function=rpc_async_schedule workqueue=0x880234666800 req_cpu=512 cpu=1 -0 1dNs. 12272630.688749: workqueue_activate_work: work struct 0x8800b5a94588 kworker/-20111 1 12272630.688758: workqueue_execute_start: work struct 0x8800b5a94588: function rpc_async_schedule kworker/-20111 1 12272630.688759: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=-110 action=call_timeout kworker/-20111 1 12272630.688760: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=0 action=call_timeout kworker/-20111 1 12272630.688760: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=0 action=call_bind kworker/-20111 1 12272630.688761: rpc_task_run_action: task:18128@0 flags=5281 state=0005 status=0 action=call_connect kworker/-20111 1..s. 12272630.688762: rpc_task_sleep: task:18128@0 flags=5281 state=0005 status=0 timeout=45000 queue=xprt_pending kworker/-20111 1 12272630.688765: workqueue_execute_end: work struct 0x8800b5a94588 -0 3d.s. 12272630.696742: workqueue_queue_work: work struct
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, 18 Jun 2015 21:37:02 -0400 Jeff Layton wrote: > > Note, the box has been rebooted since I posted my last trace. > > > > Ahh pity. The port has probably changed...if you trace it again maybe > try to figure out what it's talking to before rebooting the server? I could probably re-enable the trace again. Would it be best if I put back the commits and run it with the buggy kernel. I could then run these commands after the bug happens and/or before the port goes away. > Oh! I was thinking that you were seeing this extra port on the > _client_, but now rereading your original mail I see that it's > appearing up on the NFS server. Is that correct? Correct, the bug is on the NFS server, not the client. The client is already up and running, and had the filesystem mounted when the server rebooted. I take it that this happened when the client tried to reconnect. Just let me know what you would like to do. As this is my main production server of my local network, I would only be able to do this a few times. Let me know all the commands and tracing you would like to have. I'll try it tomorrow (going to bed now). -- Steve > > So, assuming that this is NFSv4.0, then this port is probably bound > when the server is establishing the callback channel to the client. So > we may need to look at how those xprts are being created and whether > there are differences from a standard client xprt. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, 18 Jun 2015 21:08:43 -0400 Steven Rostedt wrote: > On Thu, 18 Jun 2015 18:50:51 -0400 > Jeff Layton wrote: > > > The interesting bit here is that the sockets all seem to connect to port > > 55201 on the remote host, if I'm reading these traces correctly. What's > > listening on that port on the server? > > > > This might give some helpful info: > > > > $ rpcinfo -p > > # rpcinfo -p wife >program vers proto port service > 104 tcp111 portmapper > 103 tcp111 portmapper > 102 tcp111 portmapper > 104 udp111 portmapper > 103 udp111 portmapper > 102 udp111 portmapper > 1000241 udp 34243 status > 1000241 tcp 34498 status > > # rpcinfo -p localhost >program vers proto port service > 104 tcp111 portmapper > 103 tcp111 portmapper > 102 tcp111 portmapper > 104 udp111 portmapper > 103 udp111 portmapper > 102 udp111 portmapper > 1000241 udp 38332 status > 1000241 tcp 52684 status > 132 tcp 2049 nfs > 133 tcp 2049 nfs > 134 tcp 2049 nfs > 1002272 tcp 2049 > 1002273 tcp 2049 > 132 udp 2049 nfs > 133 udp 2049 nfs > 134 udp 2049 nfs > 1002272 udp 2049 > 1002273 udp 2049 > 1000211 udp 53218 nlockmgr > 1000213 udp 53218 nlockmgr > 1000214 udp 53218 nlockmgr > 1000211 tcp 49825 nlockmgr > 1000213 tcp 49825 nlockmgr > 1000214 tcp 49825 nlockmgr > 151 udp 49166 mountd > 151 tcp 48797 mountd > 152 udp 47856 mountd > 152 tcp 53839 mountd > 153 udp 36090 mountd > 153 tcp 46390 mountd > > Note, the box has been rebooted since I posted my last trace. > Ahh pity. The port has probably changed...if you trace it again maybe try to figure out what it's talking to before rebooting the server? > > > > Also, what NFS version are you using to mount here? Your fstab entries > > suggest that you're using the default version (for whatever distro this > > is), but have you (e.g.) set up nfsmount.conf to default to v3 on this > > box? > > > > My box is Debian testing (recently updated). > > # dpkg -l nfs-* > > ii nfs-common 1:1.2.8-9amd64NFS support files common to clien > ii nfs-kernel-ser 1:1.2.8-9amd64support for NFS kernel server > > > same for both boxes. > > nfsmount.conf doesn't exist on either box. > > I'm assuming it is using nfs4. > (cc'ing Bruce) Oh! I was thinking that you were seeing this extra port on the _client_, but now rereading your original mail I see that it's appearing up on the NFS server. Is that correct? So, assuming that this is NFSv4.0, then this port is probably bound when the server is establishing the callback channel to the client. So we may need to look at how those xprts are being created and whether there are differences from a standard client xprt. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, 18 Jun 2015 18:50:51 -0400 Jeff Layton wrote: > The interesting bit here is that the sockets all seem to connect to port > 55201 on the remote host, if I'm reading these traces correctly. What's > listening on that port on the server? > > This might give some helpful info: > > $ rpcinfo -p # rpcinfo -p wife program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 34243 status 1000241 tcp 34498 status # rpcinfo -p localhost program vers proto port service 104 tcp111 portmapper 103 tcp111 portmapper 102 tcp111 portmapper 104 udp111 portmapper 103 udp111 portmapper 102 udp111 portmapper 1000241 udp 38332 status 1000241 tcp 52684 status 132 tcp 2049 nfs 133 tcp 2049 nfs 134 tcp 2049 nfs 1002272 tcp 2049 1002273 tcp 2049 132 udp 2049 nfs 133 udp 2049 nfs 134 udp 2049 nfs 1002272 udp 2049 1002273 udp 2049 1000211 udp 53218 nlockmgr 1000213 udp 53218 nlockmgr 1000214 udp 53218 nlockmgr 1000211 tcp 49825 nlockmgr 1000213 tcp 49825 nlockmgr 1000214 tcp 49825 nlockmgr 151 udp 49166 mountd 151 tcp 48797 mountd 152 udp 47856 mountd 152 tcp 53839 mountd 153 udp 36090 mountd 153 tcp 46390 mountd Note, the box has been rebooted since I posted my last trace. > > Also, what NFS version are you using to mount here? Your fstab entries > suggest that you're using the default version (for whatever distro this > is), but have you (e.g.) set up nfsmount.conf to default to v3 on this > box? > My box is Debian testing (recently updated). # dpkg -l nfs-* ii nfs-common 1:1.2.8-9amd64NFS support files common to clien ii nfs-kernel-ser 1:1.2.8-9amd64support for NFS kernel server same for both boxes. nfsmount.conf doesn't exist on either box. I'm assuming it is using nfs4. Anything else I can provide? -- Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, 18 Jun 2015 15:49:14 -0400 Steven Rostedt wrote: > On Thu, 18 Jun 2015 15:24:52 -0400 > Trond Myklebust wrote: > > > On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt > > wrote: > > > On Fri, 12 Jun 2015 11:50:38 -0400 > > > Steven Rostedt wrote: > > > > > >> I reverted the following commits: > > >> > > >> c627d31ba0696cbd829437af2be2f2dee3546b1e > > >> 9e2b9f37760e129cee053cc7b6e7288acc2a7134 > > >> caf4ccd4e88cf2795c927834bc488c8321437586 > > >> > > >> And the issue goes away. That is, I watched the port go from > > >> ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. > > >> > > >> In fact, I watched the port with my portlist.c module, and it > > >> disappeared there too when it entered the TIME_WAIT state. > > >> > > > > I've scanned those commits again and again, and I'm not seeing how we > > could be introducing a socket leak there. The only suspect I can see > > would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you > > using NFS swap? > > Not that I'm aware of. > > > > > > I've been running v4.0.5 with the above commits reverted for 5 days > > > now, and there's still no hidden port appearing. > > > > > > What's the status on this? Should those commits be reverted or is there > > > another solution to this bug? > > > > > > > I'm trying to reproduce, but I've had no luck yet. > > It seems to happen with the connection to my wife's machine, and that > is where my wife's box connects two directories via nfs: > > This is what's in my wife's /etc/fstab directory > > goliath:/home/upload /upload nfs auto,rw,intr,soft 0 0 > goliath:/home/gallery/gallerynfs auto,ro,intr,soft 0 0 > > And here's what's in my /etc/exports directory > > /home/upload wife(no_root_squash,no_all_squash,rw,sync,no_subtree_check) > /home/gallery 192.168.23.0/24(ro,sync,no_subtree_check) > > Attached is my config. > The interesting bit here is that the sockets all seem to connect to port 55201 on the remote host, if I'm reading these traces correctly. What's listening on that port on the server? This might give some helpful info: $ rpcinfo -p Also, what NFS version are you using to mount here? Your fstab entries suggest that you're using the default version (for whatever distro this is), but have you (e.g.) set up nfsmount.conf to default to v3 on this box? -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, 18 Jun 2015 15:24:52 -0400 Trond Myklebust wrote: > On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt wrote: > > On Fri, 12 Jun 2015 11:50:38 -0400 > > Steven Rostedt wrote: > > > >> I reverted the following commits: > >> > >> c627d31ba0696cbd829437af2be2f2dee3546b1e > >> 9e2b9f37760e129cee053cc7b6e7288acc2a7134 > >> caf4ccd4e88cf2795c927834bc488c8321437586 > >> > >> And the issue goes away. That is, I watched the port go from > >> ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. > >> > >> In fact, I watched the port with my portlist.c module, and it > >> disappeared there too when it entered the TIME_WAIT state. > >> > > I've scanned those commits again and again, and I'm not seeing how we > could be introducing a socket leak there. The only suspect I can see > would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you > using NFS swap? Not that I'm aware of. > > > I've been running v4.0.5 with the above commits reverted for 5 days > > now, and there's still no hidden port appearing. > > > > What's the status on this? Should those commits be reverted or is there > > another solution to this bug? > > > > I'm trying to reproduce, but I've had no luck yet. It seems to happen with the connection to my wife's machine, and that is where my wife's box connects two directories via nfs: This is what's in my wife's /etc/fstab directory goliath:/home/upload /upload nfs auto,rw,intr,soft 0 0 goliath:/home/gallery/gallerynfs auto,ro,intr,soft 0 0 And here's what's in my /etc/exports directory /home/upload wife(no_root_squash,no_all_squash,rw,sync,no_subtree_check) /home/gallery 192.168.23.0/24(ro,sync,no_subtree_check) Attached is my config. -- Steve config.gz Description: application/gzip
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Wed, Jun 17, 2015 at 11:08 PM, Steven Rostedt wrote: > On Fri, 12 Jun 2015 11:50:38 -0400 > Steven Rostedt wrote: > >> I reverted the following commits: >> >> c627d31ba0696cbd829437af2be2f2dee3546b1e >> 9e2b9f37760e129cee053cc7b6e7288acc2a7134 >> caf4ccd4e88cf2795c927834bc488c8321437586 >> >> And the issue goes away. That is, I watched the port go from >> ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. >> >> In fact, I watched the port with my portlist.c module, and it >> disappeared there too when it entered the TIME_WAIT state. >> I've scanned those commits again and again, and I'm not seeing how we could be introducing a socket leak there. The only suspect I can see would be the NFS swap bugs that Jeff fixed a few weeks ago. Are you using NFS swap? > I've been running v4.0.5 with the above commits reverted for 5 days > now, and there's still no hidden port appearing. > > What's the status on this? Should those commits be reverted or is there > another solution to this bug? > I'm trying to reproduce, but I've had no luck yet. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 12 Jun 2015 11:50:38 -0400 Steven Rostedt wrote: > I reverted the following commits: > > c627d31ba0696cbd829437af2be2f2dee3546b1e > 9e2b9f37760e129cee053cc7b6e7288acc2a7134 > caf4ccd4e88cf2795c927834bc488c8321437586 > > And the issue goes away. That is, I watched the port go from > ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. > > In fact, I watched the port with my portlist.c module, and it > disappeared there too when it entered the TIME_WAIT state. > I've been running v4.0.5 with the above commits reverted for 5 days now, and there's still no hidden port appearing. What's the status on this? Should those commits be reverted or is there another solution to this bug? -- Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 12 Jun 2015 11:50:38 -0400 Steven Rostedt wrote: > On Fri, 12 Jun 2015 11:34:20 -0400 > Steven Rostedt wrote: > > > And the issue goes away. That is, I watched the port go from > ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. > s/theirs/there's/ Time to go back to grammar school. :-p -- Steve -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 12 Jun 2015 11:34:20 -0400 Steven Rostedt wrote: > On Fri, 12 Jun 2015 07:40:35 -0700 > Eric Dumazet wrote: > > > Strange, because the usual way to not have time-wait is to use SO_LINGER > > with linger=0 > > > > And apparently xs_tcp_finish_connecting() has this : > > > > sock_reset_flag(sk, SOCK_LINGER); > > tcp_sk(sk)->linger2 = 0; > > > > Are you sure SO_REUSEADDR was not the thing you wanted ? > > > > Steven, have you tried kmemleak ? > > Nope, and again, I'm hesitant on adding too much debug. This is my main > server (build server, ssh server, web server, mail server, proxy > server, irc server, etc). > > Although, I made dprintk() into trace_printk() in xprtsock.c and > xprt.c, and reran it. Here's the output: > I reverted the following commits: c627d31ba0696cbd829437af2be2f2dee3546b1e 9e2b9f37760e129cee053cc7b6e7288acc2a7134 caf4ccd4e88cf2795c927834bc488c8321437586 And the issue goes away. That is, I watched the port go from ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. In fact, I watched the port with my portlist.c module, and it disappeared there too when it entered the TIME_WAIT state. Here's the trace of that run: # tracer: nop # # entries-in-buffer/entries-written: 397/397 #P:4 # # _-=> irqs-off # / _=> need-resched #| / _---=> hardirq/softirq #|| / _--=> preempt-depth #||| / delay # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | rpc.nfsd-3932 [002] 44.098689: xs_local_setup_socket: RPC: worker connecting xprt 88040b6f5800 via AF_LOCAL to /var/run/rpcbind.sock rpc.nfsd-3932 [002] 44.098699: xs_local_setup_socket: RPC: xprt 88040b6f5800 connected to /var/run/rpcbind.sock rpc.nfsd-3932 [002] 44.098700: xs_setup_local: RPC: set up xprt to /var/run/rpcbind.sock via AF_LOCAL rpc.nfsd-3932 [002] 44.098704: xprt_create_transport: RPC: created transport 88040b6f5800 with 65536 slots rpc.nfsd-3932 [002] 44.098717: xprt_alloc_slot: RPC: 1 reserved req 8800d8cc6800 xid 0850084b rpc.nfsd-3932 [002] 44.098720: xprt_prepare_transmit: RPC: 1 xprt_prepare_transmit rpc.nfsd-3932 [002] 44.098721: xprt_transmit: RPC: 1 xprt_transmit(44) rpc.nfsd-3932 [002] 44.098724: xs_local_send_request: RPC: xs_local_send_request(44) = 0 rpc.nfsd-3932 [002] 44.098724: xprt_transmit: RPC: 1 xmit complete rpcbind-1829 [001] ..s.44.098812: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [001] ..s.44.098815: xprt_complete_rqst: RPC: 1 xid 0850084b complete (24 bytes received) rpc.nfsd-3932 [002] 44.098854: xprt_release: RPC: 1 release request 8800d8cc6800 rpc.nfsd-3932 [002] 44.098864: xprt_alloc_slot: RPC: 2 reserved req 8800d8cc6800 xid 0950084b rpc.nfsd-3932 [002] 44.098865: xprt_prepare_transmit: RPC: 2 xprt_prepare_transmit rpc.nfsd-3932 [002] 44.098865: xprt_transmit: RPC: 2 xprt_transmit(44) rpc.nfsd-3932 [002] 44.098870: xs_local_send_request: RPC: xs_local_send_request(44) = 0 rpc.nfsd-3932 [002] 44.098870: xprt_transmit: RPC: 2 xmit complete rpcbind-1829 [001] ..s.44.098915: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [001] ..s.44.098917: xprt_complete_rqst: RPC: 2 xid 0950084b complete (24 bytes received) rpc.nfsd-3932 [002] 44.098968: xprt_release: RPC: 2 release request 8800d8cc6800 rpc.nfsd-3932 [002] 44.098971: xprt_alloc_slot: RPC: 3 reserved req 8800d8cc6800 xid 0a50084b rpc.nfsd-3932 [002] 44.098972: xprt_prepare_transmit: RPC: 3 xprt_prepare_transmit rpc.nfsd-3932 [002] 44.098973: xprt_transmit: RPC: 3 xprt_transmit(68) rpc.nfsd-3932 [002] 44.098978: xs_local_send_request: RPC: xs_local_send_request(68) = 0 rpc.nfsd-3932 [002] 44.098978: xprt_transmit: RPC: 3 xmit complete rpcbind-1829 [001] ..s.44.099029: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [001] ..s.44.099031: xprt_complete_rqst: RPC: 3 xid 0a50084b complete (28 bytes received) rpc.nfsd-3932 [002] 44.099083: xprt_release: RPC: 3 release request 8800d8cc6800 rpc.nfsd-3932 [002] 44.099086: xprt_alloc_slot: RPC: 4 reserved req 8800d8cc6800 xid 0b50084b rpc.nfsd-3932 [002] 44.099086: xprt_prepare_transmit: RPC: 4 xprt_prepare_tra
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 2015-06-12 at 10:57 -0400, Trond Myklebust wrote: > On Fri, Jun 12, 2015 at 10:40 AM, Eric Dumazet wrote: > > Strange, because the usual way to not have time-wait is to use SO_LINGER > > with linger=0 > > > > And apparently xs_tcp_finish_connecting() has this : > > > > sock_reset_flag(sk, SOCK_LINGER); > > tcp_sk(sk)->linger2 = 0; > > Are you sure? I thought that SO_LINGER is more about controlling how > the socket behaves w.r.t. waiting for the TCP_CLOSE state to be > achieved (i.e. about aborting the FIN state negotiation early). I've > never observed an effect on the TCP time-wait states. Definitely this is standard way to avoid time-wait states. Maybe not very well documented. We probably should... http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required > Yes. SO_REUSEADDR has the problem that it requires you bind to > something other than 0.0.0.0, so it is less appropriate for outgoing > connections; the RPC code really should not have to worry about > routing and routability of a particular source address. OK understood. Are you trying to reuse same 4-tuple ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 12 Jun 2015 07:40:35 -0700 Eric Dumazet wrote: > Strange, because the usual way to not have time-wait is to use SO_LINGER > with linger=0 > > And apparently xs_tcp_finish_connecting() has this : > > sock_reset_flag(sk, SOCK_LINGER); > tcp_sk(sk)->linger2 = 0; > > Are you sure SO_REUSEADDR was not the thing you wanted ? > > Steven, have you tried kmemleak ? Nope, and again, I'm hesitant on adding too much debug. This is my main server (build server, ssh server, web server, mail server, proxy server, irc server, etc). Although, I made dprintk() into trace_printk() in xprtsock.c and xprt.c, and reran it. Here's the output: (port 684 was the bad one this time) # tracer: nop # # entries-in-buffer/entries-written: 396/396 #P:4 # # _-=> irqs-off # / _=> need-resched #| / _---=> hardirq/softirq #|| / _--=> preempt-depth #||| / delay # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | rpc.nfsd-4710 [002] 48.615382: xs_local_setup_socket: RPC: worker connecting xprt 8800d9018000 via AF_LOCAL to /var/run/rpcbind.sock rpc.nfsd-4710 [002] 48.615393: xs_local_setup_socket: RPC: xprt 8800d9018000 connected to /var/run/rpcbind.sock rpc.nfsd-4710 [002] 48.615394: xs_setup_local: RPC: set up xprt to /var/run/rpcbind.sock via AF_LOCAL rpc.nfsd-4710 [002] 48.615399: xprt_create_transport: RPC: created transport 8800d9018000 with 65536 slots rpc.nfsd-4710 [002] 48.615416: xprt_alloc_slot: RPC: 1 reserved req 8800db829600 xid cb06d5e8 rpc.nfsd-4710 [002] 48.615419: xprt_prepare_transmit: RPC: 1 xprt_prepare_transmit rpc.nfsd-4710 [002] 48.615420: xprt_transmit: RPC: 1 xprt_transmit(44) rpc.nfsd-4710 [002] 48.615424: xs_local_send_request: RPC: xs_local_send_request(44) = 0 rpc.nfsd-4710 [002] 48.615425: xprt_transmit: RPC: 1 xmit complete rpcbind-1829 [003] ..s.48.615503: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [003] ..s.48.615506: xprt_complete_rqst: RPC: 1 xid cb06d5e8 complete (24 bytes received) rpc.nfsd-4710 [002] 48.615556: xprt_release: RPC: 1 release request 8800db829600 rpc.nfsd-4710 [002] 48.615568: xprt_alloc_slot: RPC: 2 reserved req 8800db829600 xid cc06d5e8 rpc.nfsd-4710 [002] 48.615569: xprt_prepare_transmit: RPC: 2 xprt_prepare_transmit rpc.nfsd-4710 [002] 48.615569: xprt_transmit: RPC: 2 xprt_transmit(44) rpc.nfsd-4710 [002] 48.615578: xs_local_send_request: RPC: xs_local_send_request(44) = 0 rpc.nfsd-4710 [002] 48.615578: xprt_transmit: RPC: 2 xmit complete rpcbind-1829 [003] ..s.48.615643: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [003] ..s.48.615645: xprt_complete_rqst: RPC: 2 xid cc06d5e8 complete (24 bytes received) rpc.nfsd-4710 [002] 48.615695: xprt_release: RPC: 2 release request 8800db829600 rpc.nfsd-4710 [002] 48.615698: xprt_alloc_slot: RPC: 3 reserved req 8800db829600 xid cd06d5e8 rpc.nfsd-4710 [002] 48.615699: xprt_prepare_transmit: RPC: 3 xprt_prepare_transmit rpc.nfsd-4710 [002] 48.615700: xprt_transmit: RPC: 3 xprt_transmit(68) rpc.nfsd-4710 [002] 48.615706: xs_local_send_request: RPC: xs_local_send_request(68) = 0 rpc.nfsd-4710 [002] 48.615707: xprt_transmit: RPC: 3 xmit complete rpcbind-1829 [003] ..s.48.615784: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [003] ..s.48.615785: xprt_complete_rqst: RPC: 3 xid cd06d5e8 complete (28 bytes received) rpc.nfsd-4710 [002] 48.615830: xprt_release: RPC: 3 release request 8800db829600 rpc.nfsd-4710 [002] 48.615833: xprt_alloc_slot: RPC: 4 reserved req 8800db829600 xid ce06d5e8 rpc.nfsd-4710 [002] 48.615834: xprt_prepare_transmit: RPC: 4 xprt_prepare_transmit rpc.nfsd-4710 [002] 48.615835: xprt_transmit: RPC: 4 xprt_transmit(68) rpc.nfsd-4710 [002] 48.615841: xs_local_send_request: RPC: xs_local_send_request(68) = 0 rpc.nfsd-4710 [002] 48.615841: xprt_transmit: RPC: 4 xmit complete rpcbind-1829 [003] ..s.48.615892: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [003] ..s.48.615894: xprt_complete_rqst: RPC: 4 xid ce06d5e8 complete (
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, Jun 12, 2015 at 10:40 AM, Eric Dumazet wrote: > On Fri, 2015-06-12 at 10:10 -0400, Trond Myklebust wrote: >> On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt wrote: >> > >> > I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter >> > started reporting a hidden port on my box. >> > >> > Running unhide-tcp I see this: >> > >> > # unhide-tcp >> > Unhide-tcp 20121229 >> > Copyright © 2012 Yago Jesus & Patrick Gouin >> > License GPLv3+ : GNU GPL version 3 or later >> > http://www.unhide-forensics.info >> > Used options: >> > [*]Starting TCP checking >> > >> > Found Hidden port that not appears in ss: 946 >> > [*]Starting UDP checking >> > >> > This scared the hell out of me as I'm thinking that I have got some kind >> > of NSA backdoor hooked into my server and it is monitoring my plans to >> > smuggle Kinder Überraschung into the USA from Germany. I panicked! >> > >> > Well, I wasted the day writing modules to first look at all the sockets >> > opened by all processes (via their file descriptors) and posted their >> > port numbers. >> > >> > http://rostedt.homelinux.com/private/tasklist.c >> > >> > But this port wasn't there either. >> > >> > Then I decided to look at the ports in tcp_hashinfo. >> > >> > http://rostedt.homelinux.com/private/portlist.c >> > >> > This found the port but no file was connected to it, and worse yet, >> > when I first ran it without using probe_kernel_read(), it crashed my >> > kernel, because sk->sk_socket pointed to a freed socket! >> > >> > Note, each boot, the hidden port is different. >> > >> > Finally, I decided to bring in the big guns, and inserted a >> > trace_printk() into the bind logic, to see if I could find the culprit. >> > After fiddling with it a few times, I found a suspect: >> > >> >kworker/3:1H-123 [003] ..s.96.696213: inet_bind_hash: add 946 >> > >> > Bah, it's a kernel thread doing it, via a work queue. I then added a >> > trace_dump_stack() to find what was calling this, and here it is: >> > >> > kworker/3:1H-123 [003] ..s.96.696222: >> > => inet_csk_get_port >> > => inet_addr_type >> > => inet_bind >> > => xs_bind >> > => sock_setsockopt >> > => __sock_create >> > => xs_create_sock.isra.18 >> > => xs_tcp_setup_socket >> > => process_one_work >> > => worker_thread >> > => worker_thread >> > => kthread >> > => kthread >> > => ret_from_fork >> > => kthread >> > >> > I rebooted, and examined what happens. I see the kworker binding that >> > port, and all seems well: >> > >> > # netstat -tapn |grep 946 >> > tcp0 0 192.168.23.9:946192.168.23.22:55201 >> > ESTABLISHED - >> > >> > But waiting for a bit, the connection goes into a TIME_WAIT, and then >> > it just disappears. But the bind to the port does not get released, and >> > that port is from then on, taken. >> > >> > This never happened with my 3.19 kernels. I would bisect it but this is >> > happening on my main server box which I usually only reboot every other >> > month doing upgrades. It causes too much disturbance for myself (and my >> > family) as when this box is offline, basically the rest of my machines >> > are too. >> > >> > I figured this may be enough information to see if you can fix it. >> > Otherwise I can try to do the bisect, but that's not going to happen >> > any time soon. I may just go back to 3.19 for now, such that rkhunter >> > stops complaining about the hidden port. >> > >> >> The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on >> the socket before binding the port (commit 4dda9c8a5e34: "SUNRPC: Set >> SO_REUSEPORT socket option for TCP connections"). Perhaps there is an >> issue with that? > > Strange, because the usual way to not have time-wait is to use SO_LINGER > with linger=0 > > And apparently xs_tcp_finish_connecting() has this : > > sock_reset_flag(sk, SOCK_LINGER); > tcp_sk(sk)->linger2 = 0; Are you sure? I thought that SO_LINGER is more about controlling how the socket behaves w.r.t. waiting for the TCP_CLOSE state to be achieved (i.e. about aborting the FIN state negotiation early). I've never observed an effect on the TCP time-wait states. > Are you sure SO_REUSEADDR was not the thing you wanted ? Yes. SO_REUSEADDR has the problem that it requires you bind to something other than 0.0.0.0, so it is less appropriate for outgoing connections; the RPC code really should not have to worry about routing and routability of a particular source address. Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 2015-06-12 at 10:10 -0400, Trond Myklebust wrote: > On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt wrote: > > > > I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter > > started reporting a hidden port on my box. > > > > Running unhide-tcp I see this: > > > > # unhide-tcp > > Unhide-tcp 20121229 > > Copyright © 2012 Yago Jesus & Patrick Gouin > > License GPLv3+ : GNU GPL version 3 or later > > http://www.unhide-forensics.info > > Used options: > > [*]Starting TCP checking > > > > Found Hidden port that not appears in ss: 946 > > [*]Starting UDP checking > > > > This scared the hell out of me as I'm thinking that I have got some kind > > of NSA backdoor hooked into my server and it is monitoring my plans to > > smuggle Kinder Überraschung into the USA from Germany. I panicked! > > > > Well, I wasted the day writing modules to first look at all the sockets > > opened by all processes (via their file descriptors) and posted their > > port numbers. > > > > http://rostedt.homelinux.com/private/tasklist.c > > > > But this port wasn't there either. > > > > Then I decided to look at the ports in tcp_hashinfo. > > > > http://rostedt.homelinux.com/private/portlist.c > > > > This found the port but no file was connected to it, and worse yet, > > when I first ran it without using probe_kernel_read(), it crashed my > > kernel, because sk->sk_socket pointed to a freed socket! > > > > Note, each boot, the hidden port is different. > > > > Finally, I decided to bring in the big guns, and inserted a > > trace_printk() into the bind logic, to see if I could find the culprit. > > After fiddling with it a few times, I found a suspect: > > > >kworker/3:1H-123 [003] ..s.96.696213: inet_bind_hash: add 946 > > > > Bah, it's a kernel thread doing it, via a work queue. I then added a > > trace_dump_stack() to find what was calling this, and here it is: > > > > kworker/3:1H-123 [003] ..s.96.696222: > > => inet_csk_get_port > > => inet_addr_type > > => inet_bind > > => xs_bind > > => sock_setsockopt > > => __sock_create > > => xs_create_sock.isra.18 > > => xs_tcp_setup_socket > > => process_one_work > > => worker_thread > > => worker_thread > > => kthread > > => kthread > > => ret_from_fork > > => kthread > > > > I rebooted, and examined what happens. I see the kworker binding that > > port, and all seems well: > > > > # netstat -tapn |grep 946 > > tcp0 0 192.168.23.9:946192.168.23.22:55201 > > ESTABLISHED - > > > > But waiting for a bit, the connection goes into a TIME_WAIT, and then > > it just disappears. But the bind to the port does not get released, and > > that port is from then on, taken. > > > > This never happened with my 3.19 kernels. I would bisect it but this is > > happening on my main server box which I usually only reboot every other > > month doing upgrades. It causes too much disturbance for myself (and my > > family) as when this box is offline, basically the rest of my machines > > are too. > > > > I figured this may be enough information to see if you can fix it. > > Otherwise I can try to do the bisect, but that's not going to happen > > any time soon. I may just go back to 3.19 for now, such that rkhunter > > stops complaining about the hidden port. > > > > The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on > the socket before binding the port (commit 4dda9c8a5e34: "SUNRPC: Set > SO_REUSEPORT socket option for TCP connections"). Perhaps there is an > issue with that? Strange, because the usual way to not have time-wait is to use SO_LINGER with linger=0 And apparently xs_tcp_finish_connecting() has this : sock_reset_flag(sk, SOCK_LINGER); tcp_sk(sk)->linger2 = 0; Are you sure SO_REUSEADDR was not the thing you wanted ? Steven, have you tried kmemleak ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt wrote: > > I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter > started reporting a hidden port on my box. > > Running unhide-tcp I see this: > > # unhide-tcp > Unhide-tcp 20121229 > Copyright © 2012 Yago Jesus & Patrick Gouin > License GPLv3+ : GNU GPL version 3 or later > http://www.unhide-forensics.info > Used options: > [*]Starting TCP checking > > Found Hidden port that not appears in ss: 946 > [*]Starting UDP checking > > This scared the hell out of me as I'm thinking that I have got some kind > of NSA backdoor hooked into my server and it is monitoring my plans to > smuggle Kinder Überraschung into the USA from Germany. I panicked! > > Well, I wasted the day writing modules to first look at all the sockets > opened by all processes (via their file descriptors) and posted their > port numbers. > > http://rostedt.homelinux.com/private/tasklist.c > > But this port wasn't there either. > > Then I decided to look at the ports in tcp_hashinfo. > > http://rostedt.homelinux.com/private/portlist.c > > This found the port but no file was connected to it, and worse yet, > when I first ran it without using probe_kernel_read(), it crashed my > kernel, because sk->sk_socket pointed to a freed socket! > > Note, each boot, the hidden port is different. > > Finally, I decided to bring in the big guns, and inserted a > trace_printk() into the bind logic, to see if I could find the culprit. > After fiddling with it a few times, I found a suspect: > >kworker/3:1H-123 [003] ..s.96.696213: inet_bind_hash: add 946 > > Bah, it's a kernel thread doing it, via a work queue. I then added a > trace_dump_stack() to find what was calling this, and here it is: > > kworker/3:1H-123 [003] ..s.96.696222: > => inet_csk_get_port > => inet_addr_type > => inet_bind > => xs_bind > => sock_setsockopt > => __sock_create > => xs_create_sock.isra.18 > => xs_tcp_setup_socket > => process_one_work > => worker_thread > => worker_thread > => kthread > => kthread > => ret_from_fork > => kthread > > I rebooted, and examined what happens. I see the kworker binding that > port, and all seems well: > > # netstat -tapn |grep 946 > tcp0 0 192.168.23.9:946192.168.23.22:55201 > ESTABLISHED - > > But waiting for a bit, the connection goes into a TIME_WAIT, and then > it just disappears. But the bind to the port does not get released, and > that port is from then on, taken. > > This never happened with my 3.19 kernels. I would bisect it but this is > happening on my main server box which I usually only reboot every other > month doing upgrades. It causes too much disturbance for myself (and my > family) as when this box is offline, basically the rest of my machines > are too. > > I figured this may be enough information to see if you can fix it. > Otherwise I can try to do the bisect, but that's not going to happen > any time soon. I may just go back to 3.19 for now, such that rkhunter > stops complaining about the hidden port. > The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on the socket before binding the port (commit 4dda9c8a5e34: "SUNRPC: Set SO_REUSEPORT socket option for TCP connections"). Perhaps there is an issue with that? Cheers Trond -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html