RE: [PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > ... > Some comments below. The vast majority of them are really minor, the > only thing which bothers me a little bit is WARN() in hvsock_sendmsg() > which I think shouldn't be there. But I may have missed something. Thank you for the very detailed comments, Vitaly! Now I see I shouldn't put pr_err() in hvsock_sendmsg() and hvsock_recvmsg(), because IMO a malicious app can use this to generate lots of messages to slow down the system. I'll remove them. I'll reply your other comments bellow. > > +#define guid_t uuid_le > > +struct sockaddr_hv { > > + __kernel_sa_family_tshv_family; /* Address family */ > > + u16 reserved;/* Must be Zero*/ > > + guid_t shv_vm_id; /* VM ID */ > > + guid_t shv_service_id; /* Service ID */ > > +}; > > I'm not sure it is worth it to introduce a new 'guid_t' type here, we > may want to rename > > shv_vm_id -> shv_vm_guid > shv_service_id -> shv_service_guid > > and use uuid_le type. Ok. I'll make the change. > > +config HYPERV_SOCK > > + tristate "Hyper-V Sockets" > > + depends on HYPERV > > + default m if HYPERV > > + help > > + Hyper-V Sockets is somewhat like TCP over VMBus, allowing > > + communication between Linux guest and Hyper-V host without TCP/IP. > > + > > I know it's hard to come up with a simple description but I'd rather > describe is as "Socket interface for high speed communication between > Linux guest and Hyper-V host over VMBus." OK. > > +static bool uuid_equals(uuid_le u1, uuid_le u2) > > +{ > > + return !uuid_le_cmp(u1, u2); > > +} > > why not use uuid_le_cmp directly? OK. I will change to it. > > +static unsigned int hvsock_poll(struct file *file, struct socket *sock, > > + poll_table *wait) >> ... > > + if (channel) { > > + /* If there is something in the queue then we can read */ > > + get_ringbuffer_rw_status(channel, &can_read, &can_write); > > + > > + if (!can_read && hvsk->recv) > > + can_read = true; > > + > > + if (!(sk->sk_shutdown & RCV_SHUTDOWN) && can_read) > > + mask |= POLLIN | POLLRDNORM; > > + } else { > > + can_read = false; > > we don't use can_read below I'll remove the can_read assignment. > > + channel = hvsk->channel; > > + if (!channel) { > > + WARN_ONCE(1, "NULL channel! There is a programming > bug.\n"); > > BUG() then OK. > > +static int hvsock_open_connection(struct vmbus_channel *channel) > > +{ > > + .. > > + if (conn_from_host) { > > + if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog) { > > + ret = -EMFILE; > > I'm not sure -EMFILE is appropriate, we don't really have "too many open > files". Here the ret value doesn't really matter, because the return value of the function is not really used at present. However, I agree with you that EMFILE is unsuitable. Let me change to ECONNREFUSED, which seems better to me. > > +static int hvsock_connect_wait(struct socket *sock, > > + int flags, int current_ret) > > +{ > > + struct sock *sk = sock->sk; > > + struct hvsock_sock *hvsk; > > + int ret = current_ret; > > + DEFINE_WAIT(wait); > > + long timeout; > > + > > + hvsk = sk_to_hvsock(sk); > > + timeout = 30 * HZ; > > We may want to introduce a define for this timeout. Does it actually > match host's timeout? I'll add HVSOCK_CONNECT_TIMEOUT for this. Yes, the value is from Hyper-V team. > > +static int hvsock_accept_wait(struct sock *listener, > > + .. > > + > > + if (ret) { > > + release_sock(connected); > > + sock_put(connected); > > + } else { > > + newsock->state = SS_CONNECTED; > > + sock_graft(connected, newsock); > > + release_sock(connected); > > + sock_put(connected); > > so we do release_sock()/sock_put() unconditionally and this piece could > be rewritten as > > if (!ret) { > newsock->state = SS_CONNECTED; > sock_graft(connected, newsock); > } > release_sock(connected); > sock_put(connected); Will do. > > +static int hvsock_listen(struct socket *sock, int backlog) > > +{ > > + .. > > + /* This is an artificial limit */ > > + if (backlog > 128) > > + backlog = 128; > > Let's do a define for it. Ok. > > +static int hvsock_sendmsg(struct socket *sock, struct msghdr *msg, > > + size_t len) > > +{ > > + struct hvsock_sock *hvsk; > > + struct sock *sk; > > + int ret; > > + > > + if (len == 0) > > + return -EINVAL; > > + > > + if (msg->msg_flags & ~MSG_DONTWAIT) { > > + pr_err("%s: unsupported flags=0x%x\n", __func__, > > + msg->msg_flags); > > I don't think
Re: [PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Dexuan Cui writes: > Hyper-V Sockets (hv_sock) supplies a byte-stream based communication > mechanism between the host and the guest. It's somewhat like TCP over > VMBus, but the transportation layer (VMBus) is much simpler than IP. > > With Hyper-V Sockets, applications between the host and the guest can talk > to each other directly by the traditional BSD-style socket APIs. > > Hyper-V Sockets is only available on new Windows hosts, like Windows Server > 2016. More info is in this article "Make your own integration services": > https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service > > The patch implements the necessary support in the guest side by introducing > a new socket address family AF_HYPERV. > > Signed-off-by: Dexuan Cui > Cc: "K. Y. Srinivasan" > Cc: Haiyang Zhang > Cc: Vitaly Kuznetsov Some comments below. The vast majority of them are really minor, the only thing which bothers me a little bit is WARN() in hvsock_sendmsg() which I think shouldn't be there. But I may have missed something. I didn't do any tests for the code. > Cc: Cathy Avery > --- > > You can also get the patch here (2764221d): > https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160708_v15 > > For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31 > > In v12, the changes are mainly the following: > > 1) remove the module params as David suggested. > > 2) use 5 exact pages for VMBus send/recv rings, respectively. > The host side's design of the feature requires 5 exact pages for recv/send > rings respectively -- this is suboptimal considering memory consumption, > however unluckily we have to live with it, before the host comes up with > a new design in the future. :-( > > 3) remove the per-connection static send/recv buffers > Instead, we allocate and free the buffers dynamically only when we recv/send > data. This means: when a connection is idle, no memory is consumed as > recv/send buffers at all. > > In v13: > I return ENOMEM on buffer alllocation failure > >Actually "man read/write" says "Other errors may occur, depending on the > object connected to fd". "man send/recv" indeed lists ENOMEM. >Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. >In the long run, I think we should add a new API in the VMBus driver, > allowing data copy from VMBus ringbuffer into user mode buffer directly. > This way, we can even eliminate this temporary buffer. > > In v14: > fix some coding style issues pointed out by David. > > In v15: > Just some stylistic changes addressing comments from Joe Perches and > Olaf Hering -- thank you! > - add a GPL blurb. > - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE > - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions > - remove a not-very-useful pr_err() > - fix some typos in comment and coding style issues. > > Looking forward to your comments! > > MAINTAINERS |2 + > include/linux/hyperv.h | 13 + > include/linux/socket.h |4 +- > include/net/af_hvsock.h | 78 +++ > include/uapi/linux/hyperv.h | 24 + > net/Kconfig |1 + > net/Makefile|1 + > net/hv_sock/Kconfig | 10 + > net/hv_sock/Makefile|3 + > net/hv_sock/af_hvsock.c | 1523 > +++ > 10 files changed, 1658 insertions(+), 1 deletion(-) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 50f69ba..6eaa26f 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -5514,7 +5514,9 @@ F: drivers/pci/host/pci-hyperv.c > F: drivers/net/hyperv/ > F: drivers/scsi/storvsc_drv.c > F: drivers/video/fbdev/hyperv_fb.c > +F: net/hv_sock/ > F: include/linux/hyperv.h > +F: include/net/af_hvsock.h > F: tools/hv/ > F: Documentation/ABI/stable/sysfs-bus-vmbus > > diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h > index 50f493e..1cda6ea5 100644 > --- a/include/linux/hyperv.h > +++ b/include/linux/hyperv.h > @@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct > vmbus_channel *channel) > vmbus_set_event(channel); > } > > +struct vmpipe_proto_header { > + u32 pkt_type; > + u32 data_size; > +}; > + > +#define HVSOCK_HEADER_LEN(sizeof(struct vmpacket_descriptor) + \ > + sizeof(struct vmpipe_proto_header)) > + > +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ > +#define PREV_INDICES_LEN (sizeof(u64)) > > +#define HVSOCK_PKT_LEN(payload_len) (HVSOCK_HEADER_LEN + \ > + ALIGN((payload_len), 8) + \ > + PREV_INDICES_LEN) > #endif /* _HYPERV_H */ > diff --git a/include/linux/socket.h b/include/linux/socket.h > index b5cc5a6..0b68b58 100644 > --- a/include/linux/socket.h > +++ b/include/linux/socket.h > @@ -202,8 +202,9 @@ struct ucred { > #define AF_VSOCK 40 /* vSockets
Re: [PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets
It's too bad about having to allocate the send/receive rings on a per socket basis. Hopefully this will change in the future. I have built kernel 4.7.0-rc6 with this patch and did a quick test on the driver using rhel7 over hyperv Windows Server 2016 TP5. These were basic send and receive tests ( host to vm and vm to host ) using apps provided by Dexuan. Reviewed-by: Cathy Avery Tested-by: Cathy Avery On 07/08/2016 03:47 AM, Dexuan Cui wrote: Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Vitaly Kuznetsov Cc: Cathy Avery --- You can also get the patch here (2764221d): https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160708_v15 For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31 In v12, the changes are mainly the following: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. In v13: I return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. In v14: fix some coding style issues pointed out by David. In v15: Just some stylistic changes addressing comments from Joe Perches and Olaf Hering -- thank you! - add a GPL blurb. - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions - remove a not-very-useful pr_err() - fix some typos in comment and coding style issues. Looking forward to your comments! MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 24 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1523 +++ 10 files changed, 1658 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 50f69ba..6eaa26f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c F:drivers/net/hyperv/ F:drivers/scsi/storvsc_drv.c F:drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F:include/linux/hyperv.h +F: include/net/af_hvsock.h F:tools/hv/ F:Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 50f493e..1cda6ea5 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel *channel) vmbus_set_event(channel); } +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) +#define HVSOCK_PKT_LEN(payload_len) (HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index b5cc5a6..0b68b58 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -202,8 +202,9 @@ struct ucred { #define AF_VSOCK 4
RE: [PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Dexuan Cui > Sent: Friday, July 8, 2016 15:47 > > You can also get the patch here (2764221d): > https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160708_v15 > > In v14: > fix some coding style issues pointed out by David. > > In v15: > Just some stylistic changes addressing comments from Joe Perches and > Olaf Hering -- thank you! > - add a GPL blurb. > - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE > - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions > - remove a not-very-useful pr_err() > - fix some typos in comment and coding style issues. FYI: the diff between v14 and v15 is attached: the diff is generated by git-diff-ing the 2 branches decui/hv_sock/net-next/20160629_v14 and decui/hv_sock/net-next/20160708_v15 in the above github repo. Thanks, -- Dexuan delta_v14_vs.v15.patch Description: delta_v14_vs.v15.patch
[PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Vitaly Kuznetsov Cc: Cathy Avery --- You can also get the patch here (2764221d): https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160708_v15 For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31 In v12, the changes are mainly the following: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. In v13: I return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. In v14: fix some coding style issues pointed out by David. In v15: Just some stylistic changes addressing comments from Joe Perches and Olaf Hering -- thank you! - add a GPL blurb. - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions - remove a not-very-useful pr_err() - fix some typos in comment and coding style issues. Looking forward to your comments! MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 24 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1523 +++ 10 files changed, 1658 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 50f69ba..6eaa26f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 50f493e..1cda6ea5 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel *channel) vmbus_set_event(channel); } +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index b5cc5a6..0b68b58 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -202,8 +202,9 @@ struct ucred { #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ #define AF_QIPCRTR 42 /* Qualcomm IPC Router */ +#define AF_HYPERV 43 /* Hyper-V Sockets */ -#define AF_MAX 43 /* For now.. */ +#define AF_MAX 44 /* For now.. */ /* Protocol families, same as address families. */ #define PF_UNSPEC AF_UNSPEC @@ -251,6 +252,7 @@ struct ucred { #define PF_VS