-event notification based on the verdict from the filter.
The uspace component can use these perf-event notifications to either
read any state managed by the eBPF kernel module, or issue a TCP_INFO
netlink call if desired.
Patch 2 provides a simple example that shows how to use this infra
(and also p
-by: Sowmini Varadhan
---
V2: inline call to sys_perf_event_open() following the style of existing
code in kselftests/bpf
tools/testing/selftests/bpf/Makefile |4 +-
tools/testing/selftests/bpf/test_tcpnotify.h | 19 ++
tools/testing/selftests/bpf/test_tcpnotify_kern.c | 95
This patch allows eBPF programs that use sock_ops to send
perf-based event notifications using bpf_perf_event_output()
Signed-off-by: Sowmini Varadhan
---
net/core/filter.c | 19 +++
1 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/net/core/filter.c b/net/core
tifications to either
read any state managed by the eBPF kernel module, or issue a TCP_INFO
netlink call if desired.
Patch 2 provides a simple example that shows how to use this infra
(and also provides a test case for it)
Sowmini Varadhan (2):
bpf: add perf-event notificaton support for sock_ops
-by: Sowmini Varadhan
---
tools/testing/selftests/bpf/Makefile |4 +-
tools/testing/selftests/bpf/perf-sys.h| 74
tools/testing/selftests/bpf/test_tcpnotify.h | 19 ++
tools/testing/selftests/bpf/test_tcpnotify_kern.c | 95 +++
tools/testing
This patch allows eBPF programs that use sock_ops to send
perf-based event notifications using bpf_perf_event_output()
Signed-off-by: Sowmini Varadhan
---
net/core/filter.c | 19 +++
1 files changed, 19 insertions(+), 0 deletions(-)
diff --git a/net/core/filter.c b/net/core
Simple Proof-Of-Concept test program for BPF_TCP_INFO_NOTIFY
(will move this to testing/selftests/net later)
Signed-off-by: Sowmini Varadhan
---
samples/bpf/Makefile |1 +
samples/bpf/tcp_notify_kern.c | 73 +
2 files changed, 74 insertions
We want to use the inet_sock_diag_destroy code to send notifications
for more types of TCP events than just socket_close(), so refactor
the code to allow this.
Signed-off-by: Sowmini Varadhan
---
include/linux/sock_diag.h | 18 +-
include/uapi/linux/sock_diag.h |2
notification for an iperf connection if the number of
retransmits exceeds 16.
Sowmini Varadhan (3):
sock_diag: Refactor inet_sock_diag_destroy code
tcp: BPF_TCP_INFO_NOTIFY support
bpf: Added a sample for tcp_info_notify callback
include/linux/sock_diag.h | 18 +++---
includ
eturn status is used
by the caller to queue up a tcp_info notification for the application.
Signed-off-by: Sowmini Varadhan
---
include/net/tcp.h| 15 +--
include/uapi/linux/bpf.h |4
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/include/net/tcp.h b/inc
t most things in BPF today only operate on sk_buffs. How should
we use *BPF on something other than an sk_buff?
--Sowmini
e, BPF hook can be an alternate parallel mechanism.
sure and that make sense. though I hope we will explore those
alternate mechanisms too.
--Sowmini
table, and walk it, instead of holding up other VRFS
sorry, could not resist my i-told-you-so moment :-P
--Sowmini
On (10/11/18 08:26), Stephen Hemminger wrote:
> You can do the something like this already with BPF socket filters.
> But writing BPF for multi-part messages is hard.
Indeed. And I was just experimenting with this for ARP just last week.
So to handle the caes of "ip neigh show a.b.c.d" without
you look at that first.
Meanwhile, how about waiting for Tushar's next patchset, where
you will have your selftests that are based on veth/netns
just like exising tests for XDP. vxlan etc. I strongly suggest
waiting for that.
And btw, it would have been very useful/courteous to help with
the RFC reviews to start with.
--Sowmini
t)
Does that address your concern?
--Sowmini
; exercise.. I suppose you can add example code in
sefltests for this, but asking for a "proper test" may be
a litte unrealistic here- a proper test needs proper hardware
in this case.
--Sowmini
On (09/10/18 17:16), Cong Wang wrote:
> >
> > On (09/10/18 16:51), Cong Wang wrote:
> > >
> > > __rds_create_bind_key(key, addr, port, scope_id);
> > > - rs = rhashtable_lookup_fast(_hash_table, key, ht_parms);
> > > + rcu_read_lock();
> > > + rs =
don't see any reason we should prefer synchronize_rcu() here.
Usually correctness (making sure all readers are done, before nuking a
data structure) is a little bit more important than perforamance, aka
"safety before speed" is what I've always been taught.
Clearly, your mileage varies. As you please.
--Sowmini
his. How do we ensure this with SOCK_RCU_FREE (or is the
intention to just reduce *some* of the syzbot failures)?
--Sowmini
going back
to rwlock, instead of rcu)
--Sowmini
e sure it is immune to these issues..
--Sowmini
e612a0 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Sowmini Varadhan
---
v2: added "Fixes" tag
net/xfrm/xfrm_input.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index b89c9c7..be3
esp=aes_gcm_c-256-null.
Each patch has a technical description of the contents of the fix.
V2: added Fixes tag so that it can be backported to the stable trees.
Sowmini Varadhan (2):
xfrm: reset transport header back to network header after all input
transforms ahave been applied
xfrm
back to network header
only after the last transformation so that subsequent xfrms
can find the correct transport header.
Fixes: 7785bba299a8 ("esp: Add a software GRO codepath")
Suggested-by: Steffen Klassert
Signed-off-by: Sowmini Varadhan
---
v2: added "Fixes" tag
ne
ed-off-by: Sowmini Varadhan
---
net/xfrm/xfrm_input.c |1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index b89c9c7..be3520e 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -458,6 +458,7 @@ int xfrm_input(struct s
back to network header
only after the last transformation so that subsequent xfrms
can find the correct transport header.
Suggested-by: Steffen Klassert
Signed-off-by: Sowmini Varadhan
---
net/ipv4/xfrm4_input.c |1 +
net/ipv4/xfrm4_mode_transport.c |4 +---
net/ipv6/xfrm6_input.c
esp=aes_gcm_c-256-null.
Each patch has a technical description of the contents of the fix.
Sowmini Varadhan (2):
xfrm: reset transport header back to network header after all input
transforms ahave been applied
xfrm: reset crypto_done when iterating over multiple input xfrms
net/ipv4
On (08/21/18 14:05), Yue Haibing wrote:
> Remove duplicated include.
>
> Signed-off-by: Yue Haibing
Acked-by: Sowmini Varadhan
th the code as it exists today.
but there is a valid lock hierachy violation here, and
imho it's a good idea to get that cleaned up.
It also avoids needlessly holding down the rs_recv_lock
when doing an rds_inc_put.
--Sowmini
the tmp_list (potentially resulting in rds_message_purge())
after dropping the rs_recv_lock.
The same lock hierarchy violation also exists in rds_still_queued()
and should be avoided in a similar manner
Signed-off-by: Sowmini Varadhan
Reported-by: syzbot+52140d69ac6dc6b92...@syzkaller.appspotmail.com
ot;Structurally dead code")
> Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support")
> Signed-off-by: Gustavo A. R. Silva
Acked-by: Sowmini Varadhan
nge existing behavior. And doing what
> you mentioned will change existing behavior and break apps.
thank you.
--Sowmini
al that info via
the optlen. (And the reason for this inconsistency is that you dont
want to deal with the user->kernel copy in the same way?)
--Sowmini
ss odd (I've already explained
to you why RDS-over-UDP does not make much practical sense for the RDS
use-cases we anticipate). YMMV.
Thanks,
--Sowmini
On (07/06/18 23:08), Ka-Cheong Poon wrote:
>
> As mentioned in a previous mail, it is unclear why the
> port number is transport specific. Most Internet services
> use the same port number running over TCP/UDP as shown
> in the IANA database. And the IANA RDS registration is
> the same. What
e of the comment is interesting.
> > Also, while you are there, s/exisiting/existing, please.
>
>
> OK, with change that.
Wonderful.
For the rest, I repeat: Oracle Clusters are using UDP/IPV6 today
(with no RDS). You need feature compat with UDP for that reason.
--Sowmini
it in IB specific header files.
Santosh, David, I have to NACK this if it is not changed.
--Sowmini
cc me in follow-ups to this thread.
Thank you.
--Sowmini
ts absence is expected)
Please have a look, thanks.
--Sowmini
on how you set up your DNS.
It seems like this is all about "I dont want to deal with this
now", so I dont want to continue this discussion which is really
going nowhere.
Thanks
--Sowmini
On (06/26/18 10:53), Sowmini Varadhan wrote:
> Date: Tue, 26 Jun 2018 10:53:23 -0400
> From: Sowmini Varadhan
> To: David Miller
> Cc: netdev@vger.kernel.org, rds-de...@oss.oracle.com
> Subject: Re: [rds-devel] [PATCH net-next] rds: clean up loopback
>
> and just t
by email?
the last time I asked this question, the answer was a pointer to
https://groups.google.com/forum/#!msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ
Thanks
--Sowmini
did not target net) is
official confirmation that the syzbot failures are root-caused to the
absence of this patch (since there is no reproducer for many of these,
and no crash dumps available from syzbot).
--Sowmini
y)
https://www.spinics.net/lists/linux-rdma/msg66020.html
as I understand it, if there is no reproducer, you cannot really
have a pass/fail test to confirm the fix.
--Sowmini
backport to earlier kernels (if needed)..
--Sowmini
und
to, and maybe create another socket, and bind it to link-local"
You're not doing this for IPv4 and RDS today (you dont have to do this
for UDP, afaik)
This is especially true if "X" is a hostname that got resovled using DNS
> BTW, if it is really > needed, it can be added in future.
shrug. You are introducing a new error return.
--Sowmini
On (06/26/18 13:30), Ka-Cheong Poon wrote:
>
> My answer to this is that if a socket is not bound to a link
> local address (meaning it is bound to a non-link local address)
> and it is used to send to a link local peer, I think it should
> fail.
Hmm, I'm not sure I agree. I dont think this is
v6 support in rds_connect?
>
>
> Oops, I missed this when I ported the internal version to the
> net-next version. Will add it back.
Ok
--Sowmini
dr's
scopeid")
Also, why is there no IPv6 support in rds_connect?
(still looking through the rds-tcp changes, but wanted to get these
questions clarified first).
--Sowmini
On (06/25/18 06:41), Sowmini Varadhan wrote:
:
> Add the changes aligned with the changes from
> commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
> netns/module teardown and rds connection/workq management") for
> rds_loop_transport
with the changes from
commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management") for
rds_loop_transport
Acked-by: Santosh Shilimkar
Signed-off-by: Sowmini Varadhan
---
net/rds/connection.c | 11 +-
net/
net/lists/netdev/msg475074.html for earlier
discussion thread)
--Sowmini
you need some type of synchronization (either
through mutex, or some atomic flag in the rs or similar) to make
sure rds_bind() and rds_ib_get_mr() are mutually exclusive.
--Sowmini
The content on the wire should be the same.
I'm sorry that's not how I interpret Willem's email below
(and maybe I misunderstood)
the following taken from https://www.spinics.net/lists/netdev/msg496150.html
Sowmini> If yes, how will the recvmsg differentiate between the case
Sowmini> (20
you may well end up just reinventing IP
frag/re-assembly when you are done (with just the slight improvement
that each "fragment" has a full UDP header, so it has a better shot
at ECMP and RSS).
--Sowmini
ifferentiate between the case
(2000 byte message followed by 512 byte message) and
(1472 byte message, 526 byte message, then 512 byte message),
in other words, how are UDP message boundary semantics preserved?
--Sowmini
for the various L2/L3 etc headers)
--Sowmini
like
the WARN_ONs in that commit are not even being triggered).
We've not been able to reproduce this issues, and without
a crash dump (or some hint of other threads that were running
at the time of the problem) are working on figuring out
the root-cause by code-inspection.
--Sowmini
/
>
> Well, not part of your commit.
As above.
>
>
> > * function resets the RDS connections in that netns so that we can
>
> Two double spaces incidents above
>
> Not part of your commit
As above.
Thanks much.
--Sowmini
to netdevice notifiers and
refactors all the code needed to dismantle rds_tcp state
into a ->exit callback for the pernet_operations used with
register_pernet_device().
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
net/rds/tcp
ace I missed.. no easy answer here, I am afraid.
--Sowmini
onal self-review/testing.
Please also take a look, if you can, to see if I missed something.
Thanks for the input,
--Sowmini
---patch follows
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 08ea9cd..87c2643 100644
--- a/net/rds/tcp.c
+++
On (03/17/18 10:15), Sowmini Varadhan wrote:
> To solve the scaling problem why not just have a well-defined
> callback to modules when devices are quiesced, instead of
> overloading the pernet_device registration in this obscure way?
I thought about this a bit, and maybe I missed your
I spent a long time staring at both v1 and v2 of your patch.
I understand the overall goal, but I am afraid to say that these
patches are complete hacks.
I was trying to understand why patchv1 blows with a null rtn in
rds_tcp_init_net, but v2 does not, and the analysis is ugly.
I'm going to
2 times, that needs some comments to
provide guidance for other subsystems. e.g., I found the
large block comment in net-namespace.h very helpful, so lets
please clearly document what and why and when this should
be used.
--Sowmini
you are saying there are scaling constraints on subsystems
that register for netdevice handlers. The disturbing part
of that is that it does not scale.
Thanks.
--Sowmini
k interfaces have been taken down (loopback is
the last one) we know there are no more packets coming in
and out, so it is safe to dismantle all kernel sockets
created by rds-tcp.
Hope that helps.
--Sowmini
do that.
Please share your patch, I can review it and maybe help to test
it..
As I was trying to say in my RFC, I am quite open to ways to make
this cleanup more obvious
--Sowmini
On (03/16/18 15:38), Kirill Tkhai wrote:
>
> 467fa15356acf by Sowmini Varadhan added NETDEV_UNREGISTER_FINAL dependence
> with the commentary:
>
> /* rds-tcp registers as a pernet subys, so the ->exit will only
>* get invoked after network acitivity
use rds_destroy_pending() correctly.
Reported-by: syzbot+c68e51bb5e699d3f8...@syzkaller.appspotmail.com
Fixes: ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management")
Signed-off-by: Sowmini Varadhan <
fix it later.
> Hard to understand why RDS is messing with hard irqs really.
some of it comes from the rds_rdma history: some parts of
the common rds and rds_rdma module get called in various
driver contexts for infiniband.
--Sowmini
On (03/11/18 18:03), Colin King wrote:
> From: Colin Ian King
>
> Functions rds_info_from_znotifier and rds_message_zcopy_from_user are
> local to the source and do not need to be in global scope, so make them
> static.
the rds_message_zcopy_from_user warning was
On (03/11/18 17:27), Colin King wrote:
> Variable sg_off is assigned a value but it is never read, hence it is
> redundant and can be removed.
>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
On (03/08/18 18:56), kbuild test robot wrote:
>
> Fixes: d40a126b16ea ("rds: refactor zcopy code into
> rds_message_zcopy_from_user")
> Signed-off-by: Fengguang Wu <fengguang...@intel.com>
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
(do I need to separately submit a non-RFC patch for this?)
On (03/07/18 09:40), Jesus Sanchez-Palencia wrote:
> Fix the SO_ZEROCOPY switch case on sock_setsockopt() avoiding the
> ret values to be overwritten by the one set on the default case.
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Move the large block of code predicated on zcopy from
rds_message_copy_from_user into a new function,
rds_message_zcopy_from_user()
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
net/rds/message.c | 108 +---
1 files chang
)
Sowmini Varadhan (2):
rds: refactor zcopy code into rds_message_zcopy_from_user
rds: use list structure to track information for zerocopy completion
notification
okie_queue by
a simpler list that results in a smaller memory footprint as well
as more efficient memory_allocation time.
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
net/rds/af_rds.c |6 ++--
net/rds/message.c | 77 +
figured as a kernel module.
Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
By moving the ops assignment after
> the ops->accept() call, we save increasing the refcnt in
> case the ops->accept() fails. Otherwise, the __module_get()
> needs to be moved before ops->accept() to handle this failure
> case.
I see, thanks for clarification.
It may be helpful to have some comment in there, in case some other
module trips on something similar in the future.
--Sowmini
goto out;
>
> + new_sock->ops = sock->ops;
How is this delta relevant to the commit comment? Seems unrelated?
--Sowmini
PF_RDS sockets pass up cookies for zerocopy completion as ancillary
data. Update msg_zerocopy to reap this information.
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Acked-by: Willem de Bruijn <will...@google.com>
Acked-by: Santosh Shilimkar <santosh.shilim...@ora
to remove the sk_errror_queue related paths in
RDS.
Both of these goals are implemented in this series.
v2: removed sk_error_queue support
v3: incorporated additional code review comments (details in each patch)
Sowmini Varadhan (3):
selftests/net: revert the zerocopy Rx path for PF_RDS
rds: deliver
In preparation for optimized reception of zerocopy completion,
revert the Rx side changes introduced by Commit dfb8434b0a94
("selftests/net: add zerocopy support for PF_RDS test case")
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Acked-by: Willem de Bruijn <
es support for zerocopy completion notification on
MSG_ERRQUEUE for PF_RDS sockets.
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Acked-by: Willem de Bruijn <will...@google.com>
Acked-by: Santosh Shilimkar <santosh.shilim...@oracle.com>
---
v2: remove sk_error_q
ics.net/lists/netdev/msg485424.html
I resent my patch a few minutes ago, but I suspect I may
now be hitting this well-known patchwork bug:
https://www.spinics.net/lists/sparclinux/msg13787.html
Do I need to do something?
--Sowmini
On (02/27/18 11:49), David Miller wrote:
> > Do I need to resend?
>
> Yes, see my other email.
do we need to resend patches not showing up in patchwork?
I recall seeing email about picking things manually from inbox
but missed this.
--Sowmini
delivers notifications on sk_error_queue.
This patch series removes the sk_error_queue support to the
smatch warning is not applicable after this patch.
on a different note, for some odd reason I'm not seeing this patch series
on the patch queue, though its showing up in the archives.
--Sowmini
In preparation for optimized reception of zerocopy completion,
revert the Rx side changes introduced by Commit dfb8434b0a94
("selftests/net: add zerocopy support for PF_RDS test case")
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: prepare to remove sk_e
PF_RDS sockets pass up cookies for zerocopy completion as ancillary
data. Update msg_zerocopy to reap this information.
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: receive zerocopy completion notification as POLLIN
v3: drop ncookies arg in do_process_zerocopy_c
es support for zerocopy completion notification on
MSG_ERRQUEUE for PF_RDS sockets.
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: remove sk_error_queue path; lot of cautionary checks rds_recvmsg_zcookie()
and callers to make sure we dont remove cookie
sk_error_queue support
v3: incorporated additional code review comments (details in each patch)
Sowmini Varadhan (3):
selftests/net: revert the zerocopy Rx path for PF_RDS
rds: deliver zerocopy completion notification with data
selftests/net: reap zerocopy completions passed up as ancillary data
On (02/25/18 10:56), Willem de Bruijn wrote:
> > @@ -91,22 +85,19 @@ static void rds_rm_zerocopy_callback(struct rds_sock
> > *rs,
> > spin_unlock_irqrestore(>lock, flags);
> > mm_unaccount_pinned_pages(>z_mmp);
> >
pointed out that socket functions block
if sk_err is non-zero, thus if the RDS code does not plan/need
to use sk_error_queue path for completion notification, it
is preferable to remove the sk_errror_queue related paths in
RDS.
Both of these goals are implemented in this series.
Sowmini Varadhan (3
PF_RDS sockets pass up cookies for zerocopy completion as ancillary
data. Update msg_zerocopy to reap this information.
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: receive zerocopy completion notification as POLLIN
tools/testing/selftests/net/msg_zerocopy.c
In preparation for optimized reception of zerocopy completion,
revert the Rx side changes introduced by Commit dfb8434b0a94
("selftests/net: add zerocopy support for PF_RDS test case")
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: prepare to remove sk_e
es support for zerocopy completion notification on
MSG_ERRQUEUE for PF_RDS sockets.
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
v2: remove sk_error_queue path; lot of cautionary checks rds_recvmsg_zcookie()
and callers to make sure we dont remove cookie
bot+f893ae7bb2f6456df...@syzkaller.appspotmail.com
Fixes: 0cebaccef3ac ("rds: zerocopy Tx support.")
Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
---
net/rds/send.c |3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/net/rds/send.c b/net/rds/send.c
ind
On (02/21/18 19:39), Willem de Bruijn wrote:
> >> By the way, the put_cmsg is unconditional even if the caller did
> >> not supply msg_control. So it is basically no longer safe to ever
> >> call read, recv or recvfrom on a socket if zerocopy notifications
> >> are outstanding.
> >
> > Wait, I
1 - 100 of 629 matches
Mail list logo