Re: pflocal socket bug

2015-09-12 Thread Samuel Thibault
Svante Signell, le Fri 11 Sep 2015 19:50:21 +0200, a écrit :
> Well, the Linux manpage for connect says:

The linux manpage is not supposed to be relevant.

> While POSIX says:
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html
> [EADDRNOTAVAIL]
> The specified address is not available from the local machine.
> [ECONNREFUSED]
> The target address was not listening for connections or refused
> the connection request.
> 
> To me the best explanation is "No-one listening on the remote address."

Take care that we may also be in the connection-less case, which does
not have the notion of listening and connection.

> and "The target address was not listening for connections or refused the
> connection request." Additionally it is related to connect() by the
> error name itself.

In the connection-less case, connect is not meant to connect, but just
to specify the default destination address only.

> The Hurd patch is very simple:

If you didn't mangle it by pasting it twice :)

> Why not align to other OSes?

"aligning" is not necessarily the right solution.  But here it should be
fine, yes.  I have fixed both connect() and sendto().

Samuel



Re: pflocal socket bug

2015-09-11 Thread Svante Signell
On Fri, 2015-09-11 at 17:43 +0200, Samuel Thibault wrote:
> Svante Signell, le Fri 11 Sep 2015 17:42:19 +0200, a écrit :
> > It seems like EADDRNOTAVAIL is an appropriate error,
> 
> It is appropriate according to POSIX.
> 
> > Otherwise I can create a patch for the socklog package. WDYT?
> 
> That makes sense since POSIX allows it.

Well, the Linux manpage for connect says:
EADDRNOTAVAIL
(Internet domain sockets) The socket referred to by sockfd had not
previously been bound to an address and, upon attempting to bind it to
an ephemeral port, it was determined that all port numbers in the
ephemeral port range are currently in use. See the discussion
of /proc/sys/net/ipv4/ip_local_port_range in ip(7).
Note the comment: (Internet domain sockets), i.e. AF_INET not AF_UNIX.
ECONNREFUSED
  No-one listening on the remote address.

While POSIX says:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/connect.html
[EADDRNOTAVAIL]
The specified address is not available from the local machine.
[ECONNREFUSED]
The target address was not listening for connections or refused
the connection request.

To me the best explanation is "No-one listening on the remote address."
and "The target address was not listening for connections or refused the
connection request." Additionally it is related to connect() by the
error name itself.

The Hurd patch is very simple:
--- pflocal/socket.c~2015-09-11 19:46:33.0 +0200
+++ pflocal/socket.c2015-09-11 19:44:33.0 +0200
@@ -118,6 +118,8 @@
 return EOPNOTSUPP;
 
   err = addr_get_sock (addr, );
+  if (err == EADDRNOTAVAIL)
+err = ECONNREFUSED;--- pflocal/socket.c2015-09-11
19:46:33.0 +0200
+++ pflocal/socket.c2015-09-11 19:44:33.0 +0200
@@ -118,6 +118,8 @@
 return EOPNOTSUPP;
 
   err = addr_get_sock (addr, );
+  if (err == EADDRNOTAVAIL)
+err = ECONNREFUSED;
   if (!err)
 {
   struct sock *sock = user->sock;

   if (!err)
 {
   struct sock *sock = user->sock;

Additionally the following removes the declaration warning:
--- pflocal/sock.h~ 2015-09-11 16:28:10.0 +0200
+++ pflocal/sock.h  2015-09-11 19:28:46.0 +0200
@@ -84,6 +84,9 @@
   struct connq *connect_queue;
 };
 
+/* Bind SOCK to ADDR.  */
+error_t sock_bind (struct sock *sock, struct addr *addr);
+
 /* Socket flags */
 #define PFLOCAL_SOCK_CONNECTED 0x1 /* A connected
connection-oriented sock. */
 #define PFLOCAL_SOCK_NONBLOCK  0x2 /* Don't block on I/O.  */
@@ -154,9 +157,6 @@
 /* Return a new user port on SOCK in PORT.  */
 error_t sock_create_port (struct sock *sock, mach_port_t *port);
 
-/* Bind SOCK to ADDR.  */
-error_t sock_bind (struct sock *sock, struct addr *addr);
-
 /* Returns SOCK's address in ADDR, with an additional reference added.
If
SOCK doesn't currently have an address, one is fabricated first.  */
 error_t sock_get_addr (struct sock *sock, struct addr **addr);

Why not align to other OSes?




Re: pflocal socket bug

2015-09-11 Thread Svante Signell
On Fri, 2015-09-11 at 02:10 +0200, Samuel Thibault wrote:
> Hello,
> 
> Thanks for the detailed description and testcase!  I could fix it, this
> case was actually apparently completely untested: reference counting was
> never reaching zero, and thus the sock never actually shut down.

Nice to see such a fast solution! I think I even understood most of
it :)

So the problem was all about reference counting in socket.c
(S_socket_connect). I searched for a solution in the earlier call to
sock.c (sock_connect) in that function. Oh well! Maybe this also solves
the ssh (intermittent) problems when unlinking/removing a named socket.




Re: pflocal socket bug

2015-09-11 Thread Svante Signell
On Fri, 2015-09-11 at 10:34 +0200, Svante Signell wrote:
> On Fri, 2015-09-11 at 02:10 +0200, Samuel Thibault wrote:
> > Hello,
> > 
> > Thanks for the detailed description and testcase!  I could fix it, this
> > case was actually apparently completely untested: reference counting was
> > never reaching zero, and thus the sock never actually shut down.
> 
> Nice to see such a fast solution! I think I even understood most of
> it :)

I've tested the patched pflocal and the broken socket is reported in the
client connect() call :)

One small problem though with the socklog package:
It assumes ECONNREFUSED and now pflocal/connect reports EADDRNOTAVAIL.
It seems like EADDRNOTAVAIL is an appropriate error, but since both
Linux and pfinet reports ECONNREFUSED maybe Hurd should do that to.
Otherwise I can create a patch for the socklog package. WDYT?

> Maybe this also solves
> the ssh (intermittent) problems when unlinking/removing a named socket.

It seem like the ssh(d) delays are gone, and I'm able to log in
immediately with no minute-long timeouts :) More testing is needed
though.





Re: pflocal socket bug

2015-09-11 Thread Samuel Thibault
Svante Signell, le Fri 11 Sep 2015 17:42:19 +0200, a écrit :
> It seems like EADDRNOTAVAIL is an appropriate error,

It is appropriate according to POSIX.

> Otherwise I can create a patch for the socklog package. WDYT?

That makes sense since POSIX allows it.

Samuel



pflocal socket bug

2015-09-10 Thread Svante Signell
Hi,

Looking into the socklog package test errors a bug was found in the
pflocal socket implementation, specifically in connect():
1)
PID1: create a local named socket using a server to receive data
PID2: connect to that socket with a client to send data
everything is fine :)

Terminal1:
./test_socket_server.DGRAM 
test_socket_server:main(): Domain = AF_UNIX, Type = SOCK_DGRAM
test_socket_server:main(): Removing old socket: 'test_socket'
test_socket_server:socket_unix(): socket() socket_fd = 3
test_socket_server:read_socket(): Waiting for incoming connections...

Terminal2:
./test_socket_client.DGRAM 
test_socket_client:main(): Domain = AF_UNIX, Type = SOCK_DGRAM
test_socket_client:main(): socket_name = 'test_socket'
test_socket_client:write_socket(): connect(): err = '(os/kern)
successful'
test_socket_client:write_socket(): send() err = '(os/kern) successful'

Terminal1:
test_socket_server:read_socket(): read() Message = 'Testing,
testing, ...'

2) kill $PID1 to leave the socket behind
PID2: connect to that socket with a client to send data
On hurd this is not detected:
/test_socket_client.DGRAM 
test_socket_client:main(): Domain = AF_UNIX, Type = SOCK_DGRAM
test_socket_client:main(): socket_name = 'test_socket'
test_socket_client:write_socket(): connect(): err = '(os/kern)
successful' <-- BUG IS HERE
test_socket_client:write_socket(): send() err = '(os/kern) successful'

Expected output e.g. on Linux (or with AF_INET):
./test_socket_client.DGRAM
test_socket_client:main(): Domain = AF_UNIX, Type = SOCK_DGRAM
test_socket_client:main(): socket_name = 'test_socket'
test_socket_client:write_socket(): connect(): err = 'Connection refused'
<-- correct ECONNREFUSED is returned
test_socket_client:write_socket(): connect(): Connection refused
test_socket_client:main(): write_socket() SOCK_DGRAM: Connection refused

Programs are attached to cover the following cases selected in defs.inc:
AF_UNIX+SOCK_DGRAM, AF_UNIX+SOCK_STREAM (pflocal is buggy)
AF_INET+SOCK_DGRAM, AF_INET+SOCK_STREAM (pfinet is fine)

I've tried to find out how to detect that the socket server is gone but
not found a solution yet.

Functions tried in pflocal/sock.c:
pipe_acquire_reader(pipe): modifies the reference count and locking
problems
pipe_acquire_writer(pipe): modifies the reference count and locking
problems
pipe_is_readable(pipe, 0/1): always returns false
pipe_wait_readable(pipe, 1, 0/1): always returns EAGAIN, setting noblock
to 0 makes the box freeze :(
pipe_wait_writable(pipe, 0/1): Always returns 0 since
pipe->readable(pipe, 1)=0 and pipe->write_limit=16384 (another bug?)

The test calls were placed in the function: void connect() planned to
return an error code instead.

Hopefully the problem is described detailed enough.

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 
#include 
#include 

#include "defs.inc"

#ifdef USE_SOCK_STREAM
#define type SOCK_STREAM
#else
#define type SOCK_DGRAM
#endif

/* FIXME: Support AF_INET */
#ifdef USE_AF_UNIX
#define domain AF_UNIX
#else
#define domain AF_INET
#endif

#define SOCKET_NAME "test_socket"
#define DATA "Testing, testing, ..."

int write_socket(const char* socket_name) {
  int err = 0, socket_fd, write_size;
#ifdef USE_AF_UNIX
  struct sockaddr_un server;
  int c = sizeof(struct sockaddr_un);
#else
  struct sockaddr_in server;
  int c = sizeof(struct sockaddr_in);
#endif

  /* Create socket */
  socket_fd = socket (domain, type, 0);
  if (socket_fd == -1)
{
  perror("test_socket_client:write_socket(): socket()");
  return -1;
}
#ifdef USE_AF_UNIX
  server.sun_family = domain;
  strncpy (server.sun_path, socket_name, sizeof(server.sun_path));
#else
  server.sin_family = domain;
  server.sin_addr.s_addr = inet_addr("127.0.0.1");
  server.sin_port = htons(  );
#endif

  /* Connect to socket_fd */
  err = connect (socket_fd, (struct sockaddr *), c);
  printf("test_socket_client:write_socket(): connect(): err = '%s'\n", strerror(errno));
  if (err) {
close(socket_fd);
#ifdef EDESTADDRREQ
if (errno == EDESTADDRREQ) errno = ECONNREFUSED;
#endif
perror("test_socket_client:write_socket(): connect()");
return err;
  }
#if 0
  write_size = write (socket_fd, DATA, sizeof(DATA));
  printf("test_socket_client:write_socket(): write(): err = '%s'\n", strerror(errno));
#else
  write_size = send (socket_fd, DATA, sizeof(DATA), 0);
  printf("test_socket_client:write_socket(): send() err = '%s'\n", strerror(errno));
#endif
  if (write_size == -1)
{
#ifdef USE_SOCK_STREAM
  perror("test_socket_client:write_socket(): write(): Sending on stream socket");
#else
  perror("test_socket_client:write_socket(): write(): Sending on datagram socket");
#endif
  close(socket_fd);
  return err;
}

  close(socket_fd);
  return err;
}

int main(void) {
  const char* socket_name = SOCKET_NAME;
  int err = 0;

  printf ("test_socket_client:main(): Domain = %s, Type = %s\n", (domain == AF_UNIX) ? "AF_UNIX": "AF_INET", (type == 

Re: pflocal socket bug

2015-09-10 Thread Samuel Thibault
Hello,

Thanks for the detailed description and testcase!  I could fix it, this
case was actually apparently completely untested: reference counting was
never reaching zero, and thus the sock never actually shut down.

SAmuel