Hello, Changelog compared to the previous patch:
- split some code to patches 1-3/4 - use MPTCP with the backend if set in the backend config thanks to patches X/4 - remove ".receivers" in the new MPTCP protocol structures. - split too large lines - fix family for 'mptcp6@' address Another question: Is it still needed to duplicate the mptcp protocol structures from the tcp ones ? Could we not simply make a copy of the tcp ones and simply setting the name and sock_proto ? Doeraene Anthony On Mon, Aug 26, 2024 at 11:51 AM Aperence <[email protected]> wrote: > > Multipath TCP (MPTCP), standardized in RFC8684 [1], is a TCP extension > that enables a TCP connection to use different paths. > > Multipath TCP has been used for several use cases. On smartphones, MPTCP > enables seamless handovers between cellular and Wi-Fi networks while > preserving established connections. This use-case is what pushed Apple > to use MPTCP since 2013 in multiple applications [2]. On dual-stack > hosts, Multipath TCP enables the TCP connection to automatically use the > best performing path, either IPv4 or IPv6. If one path fails, MPTCP > automatically uses the other path. > > To benefit from MPTCP, both the client and the server have to support > it. Multipath TCP is a backward-compatible TCP extension that is enabled > by default on recent Linux distributions (Debian, Ubuntu, Redhat, ...). > Multipath TCP is included in the Linux kernel since version 5.6 [3]. To > use it on Linux, an application must explicitly enable it when creating > the socket. No need to change anything else in the application. > > This attached patch adds MPTCP per address support, to be used with: > > mptcp{,4,6}@<address>[:port1[-port2]] > > MPTCP v4 and v6 protocols have been added: they are mainly a copy of the > TCP ones, with small differences: names, proto, and receivers lists. > > These protocols are stored in __protocol_by_family, as an alternative to > TCP, similar to what has been done with QUIC. By doing that, the size of > __protocol_by_family has not been increased, and it behaves like TCP. > > MPTCP is both supported for the frontend and backend sides. > > Also added an example of configuration using mptcp along with a backend > allowing to experiment with it. > > Note that this is a re-implementation of Björn's work from 3 years ago > [4], when haproxy's internals were probably less ready to deal with > this, causing his work to be left pending for a while. > > Currently, the TCP_MAXSEG socket option doesn't seem to be supported > with MPTCP [5]. This results in a warning when trying to set the MSS of > sockets in proto_tcp:tcp_bind_listener. > > This can be resolved by adding two new variables: > sock_inet(6)_mptcp_maxseg_default that will hold the default > value of the TCP_MAXSEG option. Note that for the moment, this > will always be -1 as the option isn't supported. However, in the > future, when the support for this option will be added, it should > contain the correct value for the MSS, allowing to correctly > set the TCP_MAXSEG option. > > Link: https://www.rfc-editor.org/rfc/rfc8684.html [1] > Link: https://www.tessares.net/apples-mptcp-story-so-far/ [2] > Link: https://www.mptcp.dev [3] > Link: https://github.com/haproxy/haproxy/issues/1028 [4] > Link: https://github.com/multipath-tcp/mptcp_net-next/issues/515 [5] > > Co-authored-by: Dorian Craps <[email protected]> > Co-authored-by: Matthieu Baerts (NGI0) <[email protected]> > --- > doc/configuration.txt | 21 +++++++ > examples/mptcp-backend.py | 22 ++++++++ > examples/mptcp.cfg | 23 ++++++++ > include/haproxy/compat.h | 10 ++++ > include/haproxy/sock_inet.h | 8 +++ > src/backend.c | 3 +- > src/proto_tcp.c | 108 ++++++++++++++++++++++++++++++++++-- > src/protocol.c | 3 +- > src/sock.c | 5 +- > src/sock_inet.c | 30 ++++++++++ > src/tools.c | 21 +++++++ > 11 files changed, 246 insertions(+), 8 deletions(-) > create mode 100644 examples/mptcp-backend.py > create mode 100644 examples/mptcp.cfg > > diff --git a/doc/configuration.txt b/doc/configuration.txt > index 213febb76..802d5fe3f 100644 > --- a/doc/configuration.txt > +++ b/doc/configuration.txt > @@ -28255,6 +28255,27 @@ report this to the maintainers. > range can or must be specified. > It is considered as an alias of > 'stream+ipv4@'. > > +'mptcp@<address>[:port1[-port2]]' following <address> is considered as an > IPv4 > + or IPv6 address depending of the syntax but > + socket type and transport method is forced > to > + "stream", with the MPTCP protocol. > Depending > + on the statement using this address, a > port or > + a port range can or must be specified. > + > +'mptcp4@<address>[:port1[-port2]]' following <address> is always considered > as > + an IPv4 address but socket type and > transport > + method is forced to "stream", with the > MPTCP > + protocol. Depending on the statement using > + this address, a port or port range can or > + must be specified. > + > +'mptcp6@<address>[:port1[-port2]]' following <address> is always considered > as > + an IPv6 address but socket type and > transport > + method is forced to "stream", with the > MPTCP > + protocol. Depending on the statement using > + this address, a port or port range can or > + must be specified. > + > 'udp@<address>[:port1[-port2]]' following <address> is considered as an IPv4 > or IPv6 address depending of the syntax but > socket type and transport method is forced to > diff --git a/examples/mptcp-backend.py b/examples/mptcp-backend.py > new file mode 100644 > index 000000000..5237de542 > --- /dev/null > +++ b/examples/mptcp-backend.py > @@ -0,0 +1,22 @@ > +# > ============================================================================= > +# Example of a simple backend server using mptcp in python, used with > mptcp.cfg > +# > ============================================================================= > + > +import socket > + > +sock = socket.socket(socket.AF_INET6, socket.SOCK_STREAM, > socket.IPPROTO_MPTCP) > +sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) > +# dual stack IPv4/IPv6 > +sock.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_V6ONLY, 0) > + > +sock.bind(("::", 4331)) > +sock.listen() > + > +while True: > + (conn, address) = sock.accept() > + req = conn.recv(1024) > + print(F"Received request : {req}") > + conn.send(b"HTTP/1.0 200 OK\r\n\r\nHello\n") > + conn.close() > + > +sock.close() > diff --git a/examples/mptcp.cfg b/examples/mptcp.cfg > new file mode 100644 > index 000000000..d43483dfe > --- /dev/null > +++ b/examples/mptcp.cfg > @@ -0,0 +1,23 @@ > +# You can test this configuration by running the command: > +# > +# $ mptcpize run curl localhost:5000 > + > +global > + strict-limits # refuse to start if insufficient FDs/memory > + # add some process-wide tuning here if required > + > +defaults > + mode http > + balance roundrobin > + timeout client 60s > + timeout server 60s > + timeout connect 1s > + > +frontend main > + bind mptcp@[::]:5000 > + default_backend mptcp_backend > + > +# MPTCP is usually used on the frontend, but it is also possible > +# to enable it to communicate with the backend > +backend mptcp_backend > + server mptcp_server mptcp@[::]:4331 > diff --git a/include/haproxy/compat.h b/include/haproxy/compat.h > index 3829060b7..68474fe8e 100644 > --- a/include/haproxy/compat.h > +++ b/include/haproxy/compat.h > @@ -317,6 +317,16 @@ typedef struct { } empty_t; > #define queue _queue > #endif > > +/* Define a flag indicating if MPTCP is available */ > +#ifdef __linux__ > +#define HA_HAVE_MPTCP 1 > +#endif > + > +/* only Linux defines IPPROTO_MPTCP */ > +#ifndef IPPROTO_MPTCP > +#define IPPROTO_MPTCP 262 > +#endif > + > #endif /* _HAPROXY_COMPAT_H */ > > /* > diff --git a/include/haproxy/sock_inet.h b/include/haproxy/sock_inet.h > index 6f07e637a..1c3b7a303 100644 > --- a/include/haproxy/sock_inet.h > +++ b/include/haproxy/sock_inet.h > @@ -31,6 +31,14 @@ extern int sock_inet6_v6only_default; > extern int sock_inet_tcp_maxseg_default; > extern int sock_inet6_tcp_maxseg_default; > > +#ifdef HA_HAVE_MPTCP > +extern int sock_inet_mptcp_maxseg_default; > +extern int sock_inet6_mptcp_maxseg_default; > +#else > +#define sock_inet_mptcp_maxseg_default -1 > +#define sock_inet6_mptcp_maxseg_default -1 > +#endif > + > extern struct proto_fam proto_fam_inet4; > extern struct proto_fam proto_fam_inet6; > > diff --git a/src/backend.c b/src/backend.c > index 6956d9bfe..e4bd465e9 100644 > --- a/src/backend.c > +++ b/src/backend.c > @@ -1690,8 +1690,9 @@ int connect_server(struct stream *s) > > if (!srv_conn->xprt) { > /* set the correct protocol on the output stream connector */ > + > if (srv) { > - if (conn_prepare(srv_conn, > protocol_lookup(srv_conn->dst->ss_family, PROTO_TYPE_STREAM, 0), srv->xprt)) { > + if (conn_prepare(srv_conn, > protocol_lookup(srv_conn->dst->ss_family, PROTO_TYPE_STREAM, srv->alt_proto), > srv->xprt)) { > conn_free(srv_conn); > return SF_ERR_INTERNAL; > } > diff --git a/src/proto_tcp.c b/src/proto_tcp.c > index cf79ffbc5..39de465ef 100644 > --- a/src/proto_tcp.c > +++ b/src/proto_tcp.c > @@ -145,6 +145,98 @@ struct protocol proto_tcpv6 = { > > INITCALL1(STG_REGISTER, protocol_register, &proto_tcpv6); > > +#ifdef HA_HAVE_MPTCP > +/* Most fields are copied from proto_tcpv4 */ > +struct protocol proto_mptcpv4 = { > + .name = "mptcpv4", > + > + /* connection layer */ > + .xprt_type = PROTO_TYPE_STREAM, > + .listen = tcp_bind_listener, > + .enable = tcp_enable_listener, > + .disable = tcp_disable_listener, > + .add = default_add_listener, > + .unbind = default_unbind_listener, > + .suspend = default_suspend_listener, > + .resume = default_resume_listener, > + .accept_conn = sock_accept_conn, > + .ctrl_init = sock_conn_ctrl_init, > + .ctrl_close = sock_conn_ctrl_close, > + .connect = tcp_connect_server, > + .drain = sock_drain, > + .check_events = sock_check_events, > + .ignore_events = sock_ignore_events, > + .get_info = tcp_get_info, > + > + /* binding layer */ > + .rx_suspend = tcp_suspend_receiver, > + .rx_resume = tcp_resume_receiver, > + > + /* address family */ > + .fam = &proto_fam_inet4, > + > + /* socket layer */ > + .proto_type = PROTO_TYPE_STREAM, > + .sock_type = SOCK_STREAM, > + .sock_prot = IPPROTO_MPTCP, /* MPTCP specific */ > + .rx_enable = sock_enable, > + .rx_disable = sock_disable, > + .rx_unbind = sock_unbind, > + .rx_listening = sock_accepting_conn, > + .default_iocb = sock_accept_iocb, > +#ifdef SO_REUSEPORT > + .flags = PROTO_F_REUSEPORT_SUPPORTED, > +#endif > +}; > + > +INITCALL1(STG_REGISTER, protocol_register, &proto_mptcpv4); > + > +/* Most fields are copied from proto_tcpv6 */ > +struct protocol proto_mptcpv6 = { > + .name = "mptcpv6", > + > + /* connection layer */ > + .xprt_type = PROTO_TYPE_STREAM, > + .listen = tcp_bind_listener, > + .enable = tcp_enable_listener, > + .disable = tcp_disable_listener, > + .add = default_add_listener, > + .unbind = default_unbind_listener, > + .suspend = default_suspend_listener, > + .resume = default_resume_listener, > + .accept_conn = sock_accept_conn, > + .ctrl_init = sock_conn_ctrl_init, > + .ctrl_close = sock_conn_ctrl_close, > + .connect = tcp_connect_server, > + .drain = sock_drain, > + .check_events = sock_check_events, > + .ignore_events = sock_ignore_events, > + .get_info = tcp_get_info, > + > + /* binding layer */ > + .rx_suspend = tcp_suspend_receiver, > + .rx_resume = tcp_resume_receiver, > + > + /* address family */ > + .fam = &proto_fam_inet6, > + > + /* socket layer */ > + .proto_type = PROTO_TYPE_STREAM, > + .sock_type = SOCK_STREAM, > + .sock_prot = IPPROTO_MPTCP, /* MPTCP specific */ > + .rx_enable = sock_enable, > + .rx_disable = sock_disable, > + .rx_unbind = sock_unbind, > + .rx_listening = sock_accepting_conn, > + .default_iocb = sock_accept_iocb, > +#ifdef SO_REUSEPORT > + .flags = PROTO_F_REUSEPORT_SUPPORTED, > +#endif > +}; > + > +INITCALL1(STG_REGISTER, protocol_register, &proto_mptcpv6); > +#endif > + > /* Binds ipv4/ipv6 address <local> to socket <fd>, unless <flags> is set, in > which > * case we try to bind <remote>. <flags> is a 2-bit field consisting of : > * - 0 : ignore remote address (may even be a NULL pointer) > @@ -590,12 +682,20 @@ int tcp_bind_listener(struct listener *listener, char > *errmsg, int errlen) > /* we may want to try to restore the default MSS if the > socket was inherited */ > int tmpmaxseg = -1; > int defaultmss; > + int v4 = listener->rx.addr.ss_family == AF_INET; > socklen_t len = sizeof(tmpmaxseg); > > - if (listener->rx.addr.ss_family == AF_INET) > - defaultmss = sock_inet_tcp_maxseg_default; > - else > - defaultmss = sock_inet6_tcp_maxseg_default; > + if (listener->rx.proto->sock_prot == IPPROTO_MPTCP) { > + if (v4) > + defaultmss = sock_inet_mptcp_maxseg_default; > + else > + defaultmss = sock_inet6_mptcp_maxseg_default; > + } else { > + if (v4) > + defaultmss = sock_inet_tcp_maxseg_default; > + else > + defaultmss = sock_inet6_tcp_maxseg_default; > + } > > getsockopt(fd, IPPROTO_TCP, TCP_MAXSEG, &tmpmaxseg, &len); > if (defaultmss > 0 && > diff --git a/src/protocol.c b/src/protocol.c > index f5f494068..edf1c22ad 100644 > --- a/src/protocol.c > +++ b/src/protocol.c > @@ -51,7 +51,8 @@ void protocol_register(struct protocol *proto) > LIST_APPEND(&protocols, &proto->list); > __protocol_by_family[sock_family] > [proto->proto_type] > - [proto->xprt_type == PROTO_TYPE_DGRAM] = proto; > + [proto->xprt_type == PROTO_TYPE_DGRAM || > + proto->sock_prot == IPPROTO_MPTCP] = proto; > __proto_fam_by_family[sock_family] = proto->fam; > HA_SPIN_UNLOCK(PROTO_LOCK, &proto_lock); > } > diff --git a/src/sock.c b/src/sock.c > index aa524d886..4b872d15e 100644 > --- a/src/sock.c > +++ b/src/sock.c > @@ -279,7 +279,7 @@ int sock_create_server_socket(struct connection *conn, > struct proxy *be, int *st > ns = __objt_server(conn->target)->netns; > } > #endif > - proto = protocol_lookup(conn->dst->ss_family, PROTO_TYPE_STREAM, 0); > + proto = protocol_lookup(conn->dst->ss_family, PROTO_TYPE_STREAM, > conn->ctrl->sock_prot == IPPROTO_MPTCP); > BUG_ON(!proto); > sock_fd = my_socketat(ns, proto->fam->sock_domain, SOCK_STREAM, > proto->sock_prot); > > @@ -306,7 +306,8 @@ int sock_create_server_socket(struct connection *conn, > struct proxy *be, int *st > } > > if (fd_set_nonblock(sock_fd) == -1 || > - ((conn->ctrl->sock_prot == IPPROTO_TCP) && > (setsockopt(sock_fd, IPPROTO_TCP, TCP_NODELAY, &one, sizeof(one)) == -1))) { > + ((conn->ctrl->sock_prot == IPPROTO_TCP || > conn->ctrl->sock_prot == IPPROTO_MPTCP) && > + (setsockopt(sock_fd, IPPROTO_TCP, TCP_NODELAY, &one, > sizeof(one)) == -1))) { > qfprintf(stderr,"Cannot set client socket to non blocking > mode.\n"); > send_log(be, LOG_EMERG, "Cannot set client socket to non > blocking mode.\n"); > close(sock_fd); > diff --git a/src/sock_inet.c b/src/sock_inet.c > index 07364f02a..20a9ab598 100644 > --- a/src/sock_inet.c > +++ b/src/sock_inet.c > @@ -79,6 +79,12 @@ int sock_inet6_v6only_default = 0; > int sock_inet_tcp_maxseg_default = -1; > int sock_inet6_tcp_maxseg_default = -1; > > +/* Default MPTCPv4/MPTCPv6 MSS settings. -1=unknown. */ > +#ifdef HA_HAVE_MPTCP > +int sock_inet_mptcp_maxseg_default = -1; > +int sock_inet6_mptcp_maxseg_default = -1; > +#endif > + > /* Compares two AF_INET sockaddr addresses. Returns 0 if they match or > non-zero > * if they do not match. > */ > @@ -496,6 +502,30 @@ static void sock_inet_prepare() > #endif > close(fd); > } > + > +#ifdef HA_HAVE_MPTCP > + fd = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); > + if (fd >= 0) { > +#ifdef TCP_MAXSEG > + /* retrieve the OS' default mss for MPTCPv4 */ > + len = sizeof(val); > + if (getsockopt(fd, IPPROTO_TCP, TCP_MAXSEG, &val, &len) == 0) > + sock_inet_mptcp_maxseg_default = val; > +#endif > + close(fd); > + } > + > + fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_MPTCP); > + if (fd >= 0) { > +#ifdef TCP_MAXSEG > + /* retrieve the OS' default mss for MPTCPv6 */ > + len = sizeof(val); > + if (getsockopt(fd, IPPROTO_TCP, TCP_MAXSEG, &val, &len) == 0) > + sock_inet6_mptcp_maxseg_default = val; > +#endif > + close(fd); > + } > +#endif > } > > INITCALL0(STG_PREPARE, sock_inet_prepare); > diff --git a/src/tools.c b/src/tools.c > index 662ba3a6e..0fc453a07 100644 > --- a/src/tools.c > +++ b/src/tools.c > @@ -1069,6 +1069,13 @@ struct sockaddr_storage *str2sa_range(const char *str, > int *port, int *low, int > proto_type = PROTO_TYPE_STREAM; > ctrl_type = SOCK_STREAM; > } > + else if (strncmp(str2, "mptcp4@", 7) == 0) { > + str2 += 7; > + ss.ss_family = AF_INET; > + proto_type = PROTO_TYPE_STREAM; > + ctrl_type = SOCK_STREAM; > + alt_proto = 1; > + } > else if (strncmp(str2, "udp4@", 5) == 0) { > str2 += 5; > ss.ss_family = AF_INET; > @@ -1082,6 +1089,13 @@ struct sockaddr_storage *str2sa_range(const char *str, > int *port, int *low, int > proto_type = PROTO_TYPE_STREAM; > ctrl_type = SOCK_STREAM; > } > + else if (strncmp(str2, "mptcp6@", 7) == 0) { > + str2 += 7; > + ss.ss_family = AF_INET6; > + proto_type = PROTO_TYPE_STREAM; > + ctrl_type = SOCK_STREAM; > + alt_proto = 1; > + } > else if (strncmp(str2, "udp6@", 5) == 0) { > str2 += 5; > ss.ss_family = AF_INET6; > @@ -1095,6 +1109,13 @@ struct sockaddr_storage *str2sa_range(const char *str, > int *port, int *low, int > proto_type = PROTO_TYPE_STREAM; > ctrl_type = SOCK_STREAM; > } > + else if (strncmp(str2, "mptcp@", 6) == 0) { > + str2 += 6; > + ss.ss_family = AF_UNSPEC; > + proto_type = PROTO_TYPE_STREAM; > + ctrl_type = SOCK_STREAM; > + alt_proto = 1; > + } > else if (strncmp(str2, "udp@", 4) == 0) { > str2 += 4; > ss.ss_family = AF_UNSPEC; > -- > 2.46.0 >

