Re: pflocal socket bug
Svante Signell, le Fri 11 Sep 2015 19:50:21 +0200, a écrit : > Well, the Linux manpage for connect says: The linux manpage is not supposed to be relevant. > While POSIX says: > http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html > [EADDRNOTAVAIL] > The specified address is not available from the local machine. > [ECONNREFUSED] > The target address was not listening for connections or refused > the connection request. > > To me the best explanation is "No-one listening on the remote address." Take care that we may also be in the connection-less case, which does not have the notion of listening and connection. > and "The target address was not listening for connections or refused the > connection request." Additionally it is related to connect() by the > error name itself. In the connection-less case, connect is not meant to connect, but just to specify the default destination address only. > The Hurd patch is very simple: If you didn't mangle it by pasting it twice :) > Why not align to other OSes? "aligning" is not necessarily the right solution. But here it should be fine, yes. I have fixed both connect() and sendto(). Samuel
Re: pflocal socket bug
On Fri, 2015-09-11 at 17:43 +0200, Samuel Thibault wrote: > Svante Signell, le Fri 11 Sep 2015 17:42:19 +0200, a écrit : > > It seems like EADDRNOTAVAIL is an appropriate error, > > It is appropriate according to POSIX. > > > Otherwise I can create a patch for the socklog package. WDYT? > > That makes sense since POSIX allows it. Well, the Linux manpage for connect says: EADDRNOTAVAIL (Internet domain sockets) The socket referred to by sockfd had not previously been bound to an address and, upon attempting to bind it to an ephemeral port, it was determined that all port numbers in the ephemeral port range are currently in use. See the discussion of /proc/sys/net/ipv4/ip_local_port_range in ip(7). Note the comment: (Internet domain sockets), i.e. AF_INET not AF_UNIX. ECONNREFUSED No-one listening on the remote address. While POSIX says: http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html [EADDRNOTAVAIL] The specified address is not available from the local machine. [ECONNREFUSED] The target address was not listening for connections or refused the connection request. To me the best explanation is "No-one listening on the remote address." and "The target address was not listening for connections or refused the connection request." Additionally it is related to connect() by the error name itself. The Hurd patch is very simple: --- pflocal/socket.c~2015-09-11 19:46:33.0 +0200 +++ pflocal/socket.c2015-09-11 19:44:33.0 +0200 @@ -118,6 +118,8 @@ return EOPNOTSUPP; err = addr_get_sock (addr, ); + if (err == EADDRNOTAVAIL) +err = ECONNREFUSED;--- pflocal/socket.c2015-09-11 19:46:33.0 +0200 +++ pflocal/socket.c2015-09-11 19:44:33.0 +0200 @@ -118,6 +118,8 @@ return EOPNOTSUPP; err = addr_get_sock (addr, ); + if (err == EADDRNOTAVAIL) +err = ECONNREFUSED; if (!err) { struct sock *sock = user->sock; if (!err) { struct sock *sock = user->sock; Additionally the following removes the declaration warning: --- pflocal/sock.h~ 2015-09-11 16:28:10.0 +0200 +++ pflocal/sock.h 2015-09-11 19:28:46.0 +0200 @@ -84,6 +84,9 @@ struct connq *connect_queue; }; +/* Bind SOCK to ADDR. */ +error_t sock_bind (struct sock *sock, struct addr *addr); + /* Socket flags */ #define PFLOCAL_SOCK_CONNECTED 0x1 /* A connected connection-oriented sock. */ #define PFLOCAL_SOCK_NONBLOCK 0x2 /* Don't block on I/O. */ @@ -154,9 +157,6 @@ /* Return a new user port on SOCK in PORT. */ error_t sock_create_port (struct sock *sock, mach_port_t *port); -/* Bind SOCK to ADDR. */ -error_t sock_bind (struct sock *sock, struct addr *addr); - /* Returns SOCK's address in ADDR, with an additional reference added. If SOCK doesn't currently have an address, one is fabricated first. */ error_t sock_get_addr (struct sock *sock, struct addr **addr); Why not align to other OSes?
Re: pflocal socket bug
On Fri, 2015-09-11 at 02:10 +0200, Samuel Thibault wrote: > Hello, > > Thanks for the detailed description and testcase! I could fix it, this > case was actually apparently completely untested: reference counting was > never reaching zero, and thus the sock never actually shut down. Nice to see such a fast solution! I think I even understood most of it :) So the problem was all about reference counting in socket.c (S_socket_connect). I searched for a solution in the earlier call to sock.c (sock_connect) in that function. Oh well! Maybe this also solves the ssh (intermittent) problems when unlinking/removing a named socket.
Re: pflocal socket bug
On Fri, 2015-09-11 at 10:34 +0200, Svante Signell wrote: > On Fri, 2015-09-11 at 02:10 +0200, Samuel Thibault wrote: > > Hello, > > > > Thanks for the detailed description and testcase! I could fix it, this > > case was actually apparently completely untested: reference counting was > > never reaching zero, and thus the sock never actually shut down. > > Nice to see such a fast solution! I think I even understood most of > it :) I've tested the patched pflocal and the broken socket is reported in the client connect() call :) One small problem though with the socklog package: It assumes ECONNREFUSED and now pflocal/connect reports EADDRNOTAVAIL. It seems like EADDRNOTAVAIL is an appropriate error, but since both Linux and pfinet reports ECONNREFUSED maybe Hurd should do that to. Otherwise I can create a patch for the socklog package. WDYT? > Maybe this also solves > the ssh (intermittent) problems when unlinking/removing a named socket. It seem like the ssh(d) delays are gone, and I'm able to log in immediately with no minute-long timeouts :) More testing is needed though.
Re: pflocal socket bug
Svante Signell, le Fri 11 Sep 2015 17:42:19 +0200, a écrit : > It seems like EADDRNOTAVAIL is an appropriate error, It is appropriate according to POSIX. > Otherwise I can create a patch for the socklog package. WDYT? That makes sense since POSIX allows it. Samuel
pflocal socket bug
Hi, Looking into the socklog package test errors a bug was found in the pflocal socket implementation, specifically in connect(): 1) PID1: create a local named socket using a server to receive data PID2: connect to that socket with a client to send data everything is fine :) Terminal1: ./test_socket_server.DGRAM test_socket_server:main(): Domain = AF_UNIX, Type = SOCK_DGRAM test_socket_server:main(): Removing old socket: 'test_socket' test_socket_server:socket_unix(): socket() socket_fd = 3 test_socket_server:read_socket(): Waiting for incoming connections... Terminal2: ./test_socket_client.DGRAM test_socket_client:main(): Domain = AF_UNIX, Type = SOCK_DGRAM test_socket_client:main(): socket_name = 'test_socket' test_socket_client:write_socket(): connect(): err = '(os/kern) successful' test_socket_client:write_socket(): send() err = '(os/kern) successful' Terminal1: test_socket_server:read_socket(): read() Message = 'Testing, testing, ...' 2) kill $PID1 to leave the socket behind PID2: connect to that socket with a client to send data On hurd this is not detected: /test_socket_client.DGRAM test_socket_client:main(): Domain = AF_UNIX, Type = SOCK_DGRAM test_socket_client:main(): socket_name = 'test_socket' test_socket_client:write_socket(): connect(): err = '(os/kern) successful' <-- BUG IS HERE test_socket_client:write_socket(): send() err = '(os/kern) successful' Expected output e.g. on Linux (or with AF_INET): ./test_socket_client.DGRAM test_socket_client:main(): Domain = AF_UNIX, Type = SOCK_DGRAM test_socket_client:main(): socket_name = 'test_socket' test_socket_client:write_socket(): connect(): err = 'Connection refused' <-- correct ECONNREFUSED is returned test_socket_client:write_socket(): connect(): Connection refused test_socket_client:main(): write_socket() SOCK_DGRAM: Connection refused Programs are attached to cover the following cases selected in defs.inc: AF_UNIX+SOCK_DGRAM, AF_UNIX+SOCK_STREAM (pflocal is buggy) AF_INET+SOCK_DGRAM, AF_INET+SOCK_STREAM (pfinet is fine) I've tried to find out how to detect that the socket server is gone but not found a solution yet. Functions tried in pflocal/sock.c: pipe_acquire_reader(pipe): modifies the reference count and locking problems pipe_acquire_writer(pipe): modifies the reference count and locking problems pipe_is_readable(pipe, 0/1): always returns false pipe_wait_readable(pipe, 1, 0/1): always returns EAGAIN, setting noblock to 0 makes the box freeze :( pipe_wait_writable(pipe, 0/1): Always returns 0 since pipe->readable(pipe, 1)=0 and pipe->write_limit=16384 (another bug?) The test calls were placed in the function: void connect() planned to return an error code instead. Hopefully the problem is described detailed enough. #include #include #include #include #include #include #include #include #include #include #include "defs.inc" #ifdef USE_SOCK_STREAM #define type SOCK_STREAM #else #define type SOCK_DGRAM #endif /* FIXME: Support AF_INET */ #ifdef USE_AF_UNIX #define domain AF_UNIX #else #define domain AF_INET #endif #define SOCKET_NAME "test_socket" #define DATA "Testing, testing, ..." int write_socket(const char* socket_name) { int err = 0, socket_fd, write_size; #ifdef USE_AF_UNIX struct sockaddr_un server; int c = sizeof(struct sockaddr_un); #else struct sockaddr_in server; int c = sizeof(struct sockaddr_in); #endif /* Create socket */ socket_fd = socket (domain, type, 0); if (socket_fd == -1) { perror("test_socket_client:write_socket(): socket()"); return -1; } #ifdef USE_AF_UNIX server.sun_family = domain; strncpy (server.sun_path, socket_name, sizeof(server.sun_path)); #else server.sin_family = domain; server.sin_addr.s_addr = inet_addr("127.0.0.1"); server.sin_port = htons( ); #endif /* Connect to socket_fd */ err = connect (socket_fd, (struct sockaddr *), c); printf("test_socket_client:write_socket(): connect(): err = '%s'\n", strerror(errno)); if (err) { close(socket_fd); #ifdef EDESTADDRREQ if (errno == EDESTADDRREQ) errno = ECONNREFUSED; #endif perror("test_socket_client:write_socket(): connect()"); return err; } #if 0 write_size = write (socket_fd, DATA, sizeof(DATA)); printf("test_socket_client:write_socket(): write(): err = '%s'\n", strerror(errno)); #else write_size = send (socket_fd, DATA, sizeof(DATA), 0); printf("test_socket_client:write_socket(): send() err = '%s'\n", strerror(errno)); #endif if (write_size == -1) { #ifdef USE_SOCK_STREAM perror("test_socket_client:write_socket(): write(): Sending on stream socket"); #else perror("test_socket_client:write_socket(): write(): Sending on datagram socket"); #endif close(socket_fd); return err; } close(socket_fd); return err; } int main(void) { const char* socket_name = SOCKET_NAME; int err = 0; printf ("test_socket_client:main(): Domain = %s, Type = %s\n", (domain == AF_UNIX) ? "AF_UNIX": "AF_INET", (type ==
Re: pflocal socket bug
Hello, Thanks for the detailed description and testcase! I could fix it, this case was actually apparently completely untested: reference counting was never reaching zero, and thus the sock never actually shut down. SAmuel