On 4/27/2021 9:23 AM, Chris Mi wrote:
On 3/24/2021 8:14 PM, Ilya Maximets wrote:
On 3/24/21 10:17 AM, Chris Mi wrote:
On 3/23/2021 10:24 PM, Ilya Maximets wrote:
On 3/5/21 4:27 AM, Chris Mi wrote:
Hi Ilya,
I think about your suggestion recently. But I'm still not very
clear about the design.
Please see my reply below:
On 3/1/2021 8:48 PM, Ilya Maximets wrote:
On 3/1/21 9:30 AM, Chris Mi wrote:
Hi Simon, Ilya,
Could I know what should we do to make progress for this patch set?
It has been posted in the community for a long time 😁
In general, the way to get your patches reviewed is to review other
patches. It's simply because we still have a huge review backlog
(214 patches right now in patchwork and most of them needs review)
and bugfixes usually has a bit higher priority. By reviewing other
patches you're reducing amount of work for maintainers so they can
get to your patches faster.
OK, I see.
For the series and comments from Eelco:
I didn't read the patches carefully, only a quick glance, but I
still
do not understand why we need a separate thread to poll psample
events.
Why can't we just allow usual handler threads to do that?
I'm not sure if you are aware of that the psample netlink is
different from the ovs
netlink. Without offload, kernel sends missed packet and sFlow
packet to userspace
using the same netlink 'ovs_packet_family'. So they can use the
same handler thread.
But in order to offload sFlow action, we use psample kernel module
to send sampled
packets from kernel to userspace. The format for ovs netlink
message and psample
netlink messages are different.
Hi. Sorry for late reply.
Yes, I understand that message format is different, but it doesn't
really
matter. All the message-specific handling and parsing should happen
inside the netdev_offload_tc_recv(). This function should have a
prototype
similar to dpif_netlink_recv(), i.e. it should receive a pointer to
the
struct dpif_upcall from the caller and fill it with data. Maybe other
type of a generic data structure if it's really not possible to
construct
struct dpif_upcall.
From the
architecture perspective it's not a good thing to call ofproto code
from the offload-provider. This introduces lots of complications
and might cause lots of issues in the future.
I'd say that design should look like this:
handler thread ->
dpif_recv() ->
dpif_netlink_recv() ->
netdev_offload_recv() ->
netdev_offload_tc_recv() ->
nl_sock_recv()
In order to use the handler thread, I'm not sure if we should add
psample socket
fd to every handler's epoll_fd. If we should do that, we should
consider how to
differentiate if the event comes from ovs or psample netlink.
Maybe we should
allocate a special port number for psample and assign it to
event.data.u32.
Anyway, that's the details. If this is the right direction, I'll
think about it.
Last three calls should be implemented.
The better version of this will be to throw away dpif part from
above call chain and make it:
handler thread ->
netdev_offload_recv() ->
netdev_offload_tc_recv() ->
nl_sock_recv()
If we throw away dpif part, maybe we have to write some duplicate
epoll code
for psample only. And we can't block in nl_sock_recv(). Maybe we
have to add
psample socket fd to every handler's epoll_fd. So we have to
change the dpif
somehow.
There is no need to implement any epoll_fd logic for psample.
You may only use handler with handler_id == 0. Only this handler
will receive
psample upcalls. netdev_offload_recv_wait() should be implemented
similar
to how dpif_netlink_recv_wait() implemented for windows case, i.e.
it will
call nl_sock_wait(nl_sock, POLLIN) for the psample netlink socket.
There is no need to block and you're never blocking anywhere
because your're
calling nl_sock_recv(..., wait = false) which will use MSG_DONTWAIT
while
actually receiving a message.
Hi Ilya,
With this new design, the code will be changed greatly. I'm not
unwilling to change it.
The effort is not small. So before I start, I want to make sure that
it is feasible.
Yes, we won't block in nl_sock_recv(), but the handler thread will
be blocked in poll_block().
If any of the vport netlink socket is readable , it will be waken up
by kernel. If we don't use
epoll_ctl to add the psample netlink socket to the handler's epoll
fd, we can't receive the
psample packet as soon as possible. That means we can only receive
the sampled packet
when there is a miss upcall.
I added some debug message in ovs epoll code and got above
conclusion. But I'm not
the expert of the epoll API, so I'm not sure if I missed anything.
Handler thread wakes up on POLLIN on epoll_fd itself, not on the
event on one
of the file descriptors added to epoll. epoll_fd is added to a
thread's poll
loop by
poll_fd_wait(handler->epoll_fd, POLLIN);
If you will call nl_sock_wait() on psample socket, this socket will
be added
to the same poll loop with:
nl_sock_wait(psample_sock, POLLIN) ->
poll_fd_wait(psample_sock->fd, POLLIN);
This way psample socket will become an *additional* source for waking up
for the thread that called nl_sock_wait(). So, this handler will be
waken
up if POLLIN happened on a psample socket even if there are no miss
upcalls.
The whole epoll infrastructure is local to lib/dpif-netlink.c and
handler
threads knows nothing about it. poll_loop() knows nothing about this
epoll
stuff too, it just adds epoll_fd itself to the list of fds for the usual
poll() because of the poll_fd_wait() call. psample socket will end
up in
the same usual poll().
Hi Ilya,
The code according to your suggestion is ready. But during the
internal code review,
Eli Britstein thought flow_api is netdev based, but the psample/sFlow
offload is
not netdev based, but dpif based. Maybe we should introduce
dpif-offload-provider
and have a dpf_offload_api, so it won't be related to a specific
netdev, but to a dpif type.
Could I know what's your opinion about it?
Hi Ilya,
Any comment about it?
Thanks,
Chris
Thanks,
Chris
Thanks,
Chris
So, there should be several call chains:
1. Init.
open_dpif_backer/type_run() ->
netdev_offload_recv_set() ->
netdev_offload_tc_recv_set(enabled) ->
if (enable)
psample_sock = nl_sock_create(...);
else
close(psample_sock);
2. Wait.
udpif_upcall_handler() ->
netdev_offload_recv_wait() ->
netdev_offload_tc_recv_wait() ->
if (handler_id == 0)
nl_sock_wait(psample_sock, POLLIN);
3. Receive.
udpif_upcall_handler() ->
recv_upcalls() ->
netdev_offload_recv() ->
| netdev_offload_tc_recv(..., struct dpif_upcall *upcall) ->
| if (handler_id == 0)
| nl_sock_recv(psample_socket, ..., wait = false);
| *upcall = <received data>;
upcall_receive()
process_upcall() ->
dpif_sflow_received()
4. Deinit.
close_dpif_backer() ->
netdev_offload_recv_set(enable = false);
Does that look more clear?
I don't know if I understand your suggestion correctly. Or I
missed anything
So if you have time, could you please elaborate?
Thanks,
Chris
This way we could avoid touching dpif-netdev and still have psample
offloading for the case where netdev-offload-tc is used from the
userspace datapath.
Above architecture also implies implementation of:
- netdev_offload_recv_wait()
- netdev_offload_recv_purge()
- and the netdev_offload_tc_* counterparts.
Best regards, Ilya Maximets.
Thanks,
Chris
On 2/23/2021 5:08 PM, Roi Dayan wrote:
On 2021-01-27 8:23 AM, Chris Mi wrote:
This patch set adds offload support for sFlow.
Psample is a genetlink channel for packet sampling. TC action
act_sample
uses psample to send sampled packets to userspace.
When offloading sample action to TC, userspace creates a
unique ID to
map sFlow action and tunnel info and passes this ID to kernel
instead
of the sFlow info. psample will send this ID and sampled
packet to
userspace. Using the ID, userspace can recover the sFlow info
and send
sampled packet to the right sFlow monitoring host.
v2-v1:
- Fix robot errors.
v3-v2:
- Remove Gerrit Change-Id.
- Add patch #9 to fix older kernels build issue.
- Add travis test result.
v4-v3:
- Fix offload issue when sampling rate is 1.
v5-v4:
- Move polling thread from ofproto to netdev-offload-tc.
v6-v5:
- Rebase.
- Add GitHub Actions test result.
v7-v6:
- Remove Gerrit Change-Id.
- Fix "ERROR: Inappropriate spacing around cast"
v8-v7
- Address Eelco Chaudron's comment for patch #11.
v9-v8
- Remove sflow_len from struct dpif_sflow_attr.
- Log a debug message for other userspace actions.
v10-v9
- Address Eelco Chaudron's comments on v9.
v11-v10
- Fix a bracing error.
v12-v11
- Add duplicate sample group id check.
Chris Mi (11):
compat: Add psample and tc sample action defines for
older kernels
ovs-kmod-ctl: Load kernel module psample
dpif: Introduce register sFlow upcall callback API
ofproto: Add upcall callback to process sFlow packet
netdev-offload: Introduce register sFlow upcall callback
API
netdev-offload-tc: Implement register sFlow upcall
callback API
dpif-netlink: Implement register sFlow upcall callback API
netdev-offload-tc: Introduce group ID management API
netdev-offload-tc: Create psample netlink socket
netdev-offload-tc: Add psample receive handler
netdev-offload-tc: Add offload support for sFlow
include/linux/automake.mk | 4 +-
include/linux/psample.h | 58 +++
include/linux/tc_act/tc_sample.h | 25 ++
lib/dpif-netdev.c | 1 +
lib/dpif-netlink.c | 27 ++
lib/dpif-netlink.h | 4 +
lib/dpif-provider.h | 10 +
lib/dpif.c | 8 +
lib/dpif.h | 23 ++
lib/netdev-offload-provider.h | 3 +
lib/netdev-offload-tc.c | 659
++++++++++++++++++++++++++++++-
lib/netdev-offload.c | 30 ++
lib/netdev-offload.h | 4 +
lib/tc.c | 61 ++-
lib/tc.h | 16 +-
ofproto/ofproto-dpif-upcall.c | 42 ++
utilities/ovs-kmod-ctl.in | 14 +
17 files changed, 973 insertions(+), 16 deletions(-)
create mode 100644 include/linux/psample.h
create mode 100644 include/linux/tc_act/tc_sample.h
Hi Simon, Ilya,
Can you help review for this series?
do you have any comments you want us to handle?
Thanks,
Roi
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev