We observed high 99 and 99.9% latencies when doing RPCs with DCTCP. The
problem is triggered when the last packet of a request arrives CE
marked. The reply will carry the ECE mark causing TCP to shrink its cwnd
to 1 (because there are no packets in flight). When the 1st packet of
the next request
DCTCP depends on the CA_EVENT_NON_DELAYED_ACK and CA_EVENT_DELAYED_ACK
notifications to keep track if it needs to send an ACK for packets that
were received with a particular ECN state but whose ACK was delayed.
Under some circumstances, for example when a delayed ACK is sent with a
data packet,
When have observed high tail latencies when using DCTCP for RPCs as
compared to using Cubic. For example, in one setup there are 2 hosts
sending to a 3rd one, with each sender having 3 flows (1 stream,
1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following
table shows the 99% and
On 06/29/2018 01:27 AM, Daniel Borkmann wrote:
On 06/19/2018 08:00 PM, Tushar Dave wrote:
Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER which uses the
existing socket filter infrastructure for bpf program attach and load.
SOCKET_SG_FILTER eBPF program receives struct scatterlist as
From: Yuval Mintz
Add a scale test capable of validating that offloaded network
functionality is indeed functional at scale when configured to
the different KVD profiles available.
Start by testing offloaded routes are functional at scale by
passing traffic on each one of them in turn.
Add a wrapper around mlxsw/mirror_gre_scale.sh that parameterized number
of offloadable mirrors on Spectrum machines.
Signed-off-by: Petr Machata
Reviewed-by: Jiri Pirko
---
.../drivers/net/mlxsw/spectrum/mirror_gre_scale.sh | 13 +
1 file changed, 13 insertions(+)
create
Test that it's possible to offload a given number of mirrors.
Signed-off-by: Petr Machata
Reviewed-by: Jiri Pirko
---
.../drivers/net/mlxsw/mirror_gre_scale.sh | 197 +
1 file changed, 197 insertions(+)
create mode 100644
Add a wrapper around mlxsw/tc_flower_scale.sh that parameterizes the
generic tc flower scale test template with Spectrum-specific target
values.
Signed-off-by: Petr Machata
Reviewed-by: Yuval Mintz
---
.../drivers/net/mlxsw/spectrum/tc_flower_scale.sh | 19 +++
1 file
Add test of capacity to offload flower.
This is a generic portion of the test that is meant to be called from a
driver that supplies a particular number of rules to be tested with.
Signed-off-by: Petr Machata
Reviewed-by: Yuval Mintz
---
.../selftests/drivers/net/mlxsw/tc_flower_scale.sh |
From: Yuval Mintz
IPv4 routes in Spectrum are based on the kvd single-hash, but as it's
a hash we need to assume we cannot reach 100% of its capacity.
Add a wrapper that provides us with good/bad target numbers for the
Spectrum ASIC.
Signed-off-by: Yuval Mintz
Reviewed-by: Petr Machata
From: Arkadi Sharshevsky
This test aims for both stand alone and internal usage by the resource
infra. The test receives the number routes to offload and checks:
- The routes were offloaded correctly
- Traffic for each route.
Signed-off-by: Arkadi Sharshevsky
Signed-off-by: Yuval Mintz
From: Yuval Mintz
Add a selftest that can be used to perform basic sanity of the devlink
resource API as well as test the behavior of KVD manipulation in the
driver.
This is the first case of a HW-only test - in order to test the devlink
resource a driver capable of exposing resources has to be
This library builds on top of devlink_lib.sh and contains functionality
specific to Spectrum ASICs, e.g., re-partitioning the various KVD
sub-parts.
Signed-off-by: Yuval Mintz
[pe...@mellanox.com: Split this out from another patch. Fix line length
in devlink_sp_read_kvd_defaults().]
This helper library contains wrappers to devlink functionality agnostic
to the underlying device.
Signed-off-by: Yuval Mintz
[pe...@mellanox.com: Split this out from another patch.]
Signed-off-by: Petr Machata
---
.../selftests/net/forwarding/devlink_lib.sh| 108 +
setup_wait() and tc_offload_check() both assume that all NUM_NETIFS
interfaces are relevant for a given test. However, the scale test script
acts as an umbrella for a number of sub-tests, some of which may not
require all the interfaces.
Thus it's suboptimal for tc_offload_check() to query all
From: Yuval Mintz
The devlink related scripts are mlxsw-specific. As a result, they'll
reside in a different directory - but would still need the common logic
implemented in lib.sh.
So as a preliminary step, allow lib.sh to be sourced from other
directories as well.
Signed-off-by: Yuval Mintz
In the scale testing scenarios, one usually has a condition that is
expected to either fail, or pass, depending on which side of the scale
is being tested.
To capture this logic, add a function check_err_fail(), which dispatches
either to check_err() or check_fail(), depending on the value of the
There are a number of tests that check features of the Linux networking
stack. By running them on suitable interfaces, one can exercise the
mlxsw offloading code. However none of these tests attempts to push
mlxsw to the limits supported by the ASIC.
As an additional wrinkle, the "limits
On 06/29/2018 01:32 AM, Daniel Borkmann wrote:
On 06/19/2018 08:00 PM, Tushar Dave wrote:
[...]
+int sg_filter_run(struct sock *sk, struct scatterlist *sg)
+{
+ struct sk_filter *filter;
+ int err;
+
+ rcu_read_lock();
+ filter = rcu_dereference(sk->sk_filter);
+
On 06/29/2018 01:18 AM, Daniel Borkmann wrote:
On 06/19/2018 08:00 PM, Tushar Dave wrote:
When sg_filter_run() is invoked it runs the attached eBPF
SOCKET_SG_FILTER program which deals with struct scatterlist.
In addition, this patch also adds bpf_sg_next helper function that
allows users
On 06/29/2018 01:48 AM, Daniel Borkmann wrote:
On 06/29/2018 09:25 AM, Daniel Borkmann wrote:
On 06/19/2018 08:00 PM, Tushar Dave wrote:
Add new eBPF prog type BPF_PROG_TYPE_SOCKET_SG_FILTER which uses the
existing socket filter infrastructure for bpf program attach and load.
NFP NAPI handling will only complete the TXed packets when called
with budget of 0, implement ndo_poll_controller by scheduling NAPI
on all TX queues.
Signed-off-by: Jakub Kicinski
---
.../ethernet/netronome/nfp/nfp_net_common.c| 18 ++
1 file changed, 18 insertions(+)
diff
Use napi_consume_skb() in nfp_net_tx_complete() to get bulk free.
Pass 0 as budget for ctrl queue completion since it runs out of
a tasklet.
Signed-off-by: Jakub Kicinski
Reviewed-by: Dirk van der Merwe
---
drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 11 ++-
1 file changed, 6
On some platforms with broken ACPI tables we may not have access
to the Serial Number PCIe capability. This capability is crucial
for us for switchdev operation as we use serial number as switch ID,
and for communication with management FW where interface ID is used.
If we can't determine the
After user changes the ring count statistics for deactivated
rings disappear from ethtool -S output. This causes loss of
information to the user and means that ethtool stats may not
add up to interface stats. Always expose counters from all
the rings. Note that we allocate at most
Hi!
This set contains assorted updates to driver base and flower.
First patch is a follow up to a fix to calculating counters which
went into net. For ethtool counters we should also make sure
they are visible even after ring reconfiguration. Next patch
is a safety measure in case we are
We used to leave bus-info in ethtool driver info empty for
representors in case multi-PCIe-to-single-host cards make
the association between PCIe device and NFP many to one.
It seems these attempts are futile, we need to link the
representors to one PCIe device in sysfs to get consistent
naming,
From: Pieter Jansen van Vuuren
Hardware will automatically update csum in headers when a set action has
been performed. This means we could in the driver ignore the explicit
checksum action when performing a set action.
Signed-off-by: Pieter Jansen van Vuuren
Reviewed-by: Jakub Kicinski
From: John Hurley
Currently the NFP fw only supports L3/L4 hashing so rejects the offload of
filters that output to LAG ports implementing other hash algorithms. Team,
however, uses a BPF function for the hash that is not defined. To support
Team offload, accept hashes that are defined as
From: John Hurley
Extract the tos and the tunnel flags from the tunnel key and offload these
action fields. Only the checksum and tunnel key flags are implemented in
fw so reject offloads of other flags. The tunnel key flag is always
considered set in the fw so enforce that it is set in the
From: John Hurley
Previously the ttl for ipv4 udp tunnels was set to the namespace default.
Modify this to attempt to extract the ttl from a full route lookup on the
tunnel destination. If this is not possible then resort to the default.
Signed-off-by: John Hurley
Reviewed-by: Jakub Kicinski
On 06/29/2018 08:42 PM, Kees Cook wrote:
> On Thu, Jun 28, 2018 at 2:34 PM, Daniel Borkmann wrote:
>> Kees suggested that if set_memory_*() can fail, we should annotate it with
>> __must_check, and all callers need to deal with it gracefully given those
>> set_memory_*() markings aren't
On Thu, Jun 28, 2018 at 8:20 AM, Yifeng Sun wrote:
> Add 'clone' action to kernel datapath by using existing functions.
> When actions within clone don't modify the current flow, the flow
> key is not cloned before executing clone actions.
>
> This is a follow up patch for this incomplete work:
>
Hi Edward,
I love your patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url:
https://github.com/0day-ci/linux/commits/Edward-Cree/Handle-multiple-received-packets-at-each-stage/20180630-042204
config: x86_64-randconfig-x003-201825 (attached as .config)
compiler:
On Fri, Jun 29, 2018 at 2:24 AM Saeed Mahameed wrote:
>
> From: Boris Pismenny
>
> This patch enables UDP GSO support. We enable this by using two WQEs
> the first is a UDP LSO WQE for all segments with equal length, and the
> second is for the last segment in case it has different length.
> Due
On Fri, Jun 29, 2018 at 10:06 AM Samudrala, Sridhar
wrote:
>
> So instead of introducing 'chaintemplate' object in the kernel, can't we add
> 'chain'
> object in the kernel that takes the 'template' as an attribute?
This is exactly what I mean above. Making the chain a standalone object
in
Hello,
We're trying to create lots of strongswan VPN tunnels on network devices
bound to different VRFs. We are using Fedora-24 on the client side, with a
4.16.15+ kernel
and updated 'ip' package, etc.
So far, no luck getting it to work.
Any idea if this is supported or not?
Thanks,
Ben
--
Hi Edward,
I love your patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url:
https://github.com/0day-ci/linux/commits/Edward-Cree/Handle-multiple-received-packets-at-each-stage/20180630-042204
config: i386-randconfig-a0-201825 (attached as .config)
compiler:
Allow UPDSA to change "set mark" to permit
policy separation of packet routing decisions from
SA keying in systems that use mark-based routing.
The set mark, used as a routing and firewall mark
for outbound packets, is made update-able which
allows routing decisions to be handled independently
of
On 06/29/2018 01:42 PM, Cong Wang wrote:
> As noticed by Eric, we need to switch to the helper
> dev_change_tx_queue_len() for SIOCSIFTXQLEN call path too,
> otheriwse still miss dev_qdisc_change_tx_queue_len().
>
> Fixes: 6a643ddb5624 ("net: introduce helper dev_change_tx_queue_len()")
>
From: Roopa Prabhu
After commit f9d4b0c1e969 ("fib_rules: move common handling of newrule
delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find
for existing rule lookup in both the add and del paths. While this
is good for the delete path, it solves a few problems but opens up
a
As noticed by Eric, we need to switch to the helper
dev_change_tx_queue_len() for SIOCSIFTXQLEN call path too,
otheriwse still miss dev_qdisc_change_tx_queue_len().
Fixes: 6a643ddb5624 ("net: introduce helper dev_change_tx_queue_len()")
Reported-by: Eric Dumazet
Signed-off-by: Cong Wang
---
On 06/29/2018 11:49 AM, Willem de Bruijn wrote:
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
+static void report_sock_error(struct sk_buff *skb, u32 err, u8 code)
+{
+ struct sock_exterr_skb *serr;
+ ktime_t txtime = skb->tstamp;
+
+
On Fri, 29 Jun 2018, Neal Cardwell wrote:
> On Fri, Jun 29, 2018 at 6:07 AM Ilpo Järvinen
> wrote:
> >
> > If SACK is not enabled and the first cumulative ACK after the RTO
> > retransmission covers more than the retransmitted skb, a spurious
> > FRTO undo will trigger (assuming FRTO is enabled
On Thu, Jun 28, 2018 at 9:59 PM, Roopa Prabhu wrote:
> On Wed, Jun 27, 2018 at 6:27 PM, Roopa Prabhu
> wrote:
>> From: Roopa Prabhu
>>
>> After commit f9d4b0c1e969 ("fib_rules: move common handling of newrule
>> delrule msgs into fib_nl2rule"), rule_find is strict about checking
>> for an
Generally the check should be very cheap, as the sk_buff_head is in cache.
Signed-off-by: Edward Cree
---
net/core/dev.c | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c
index 4c5ebfab9bc8..d6084b0cd9ce 100644
--- a/net/core/dev.c
+++
__netif_receive_skb_core() does a depressingly large amount of per-packet
work that can't easily be listified, because the another_round looping
makes it nontrivial to slice up into smaller functions.
Fortunately, most of that work disappears in the fast path:
* Hardware devices generally don't
Also involved adding a way to run a netfilter hook over a list of packets.
Rather than attempting to make netfilter know about lists (which would be
a major project in itself) we just let it call the regular okfn (in this
case ip_rcv_finish()) for any packets it steals, and have it give us back
ip_rcv_finish_core(), if it does not drop, sets skb->dst by either early
demux or route lookup. The last step, calling dst_input(skb), is left to
the caller; in the listified case, we split to form sublists with a common
dst, but then ip_sublist_rcv_finish() just calls dst_input(skb) in a
First example of a layer splitting the list (rather than merely taking
individual packets off it).
Involves new list.h function, list_cut_before(), like list_cut_position()
but cuts on the other side of the given entry.
Signed-off-by: Edward Cree
---
include/linux/list.h | 30
netif_receive_skb_list_internal() now processes a list and hands it
on to the next function.
Signed-off-by: Edward Cree
---
net/core/dev.c | 61 +-
1 file changed, 56 insertions(+), 5 deletions(-)
diff --git a/net/core/dev.c
Improves packet rate of 1-byte UDP receives by up to 10%.
Signed-off-by: Edward Cree
---
drivers/net/ethernet/sfc/efx.c| 12
drivers/net/ethernet/sfc/net_driver.h | 3 +++
drivers/net/ethernet/sfc/rx.c | 7 ++-
3 files changed, 21 insertions(+), 1 deletion(-)
Signed-off-by: Edward Cree
---
include/trace/events/net.h | 7 +++
net/core/dev.c | 4 +++-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 9c886739246a..00aa72ce0e7c 100644
---
Just calls netif_receive_skb() in a loop.
Signed-off-by: Edward Cree
---
include/linux/netdevice.h | 1 +
net/core/dev.c| 19 +++
2 files changed, 20 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c6b377a15869..e104b2e4a735
This patch series adds the capability for the network stack to receive a
list of packets and process them as a unit, rather than handling each
packet singly in sequence. This is done by factoring out the existing
datapath code at each layer and wrapping it in list handling code.
The
On 6/28/18, 1:48 PM, "netdev-ow...@vger.kernel.org on behalf of Neal Cardwell"
wrote:
On Thu, Jun 28, 2018 at 4:20 PM Lawrence Brakmo wrote:
>
> I just looked at 4.18 traces and the behavior is as follows:
>
>Host A sends the last packets of the request
>
>
> >> diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
> >> index 5514a8aa3bd5..166f4b72875b 100644
> >> --- a/net/sched/sch_etf.c
> >> +++ b/net/sched/sch_etf.c
> >> @@ -11,6 +11,7 @@
> >> #include
> >> #include
> >> #include
> >> +#include
> >> #include
> >> #include
> >>
On Thu, Jun 28, 2018 at 2:34 PM, Daniel Borkmann wrote:
> Kees suggested that if set_memory_*() can fail, we should annotate it with
> __must_check, and all callers need to deal with it gracefully given those
> set_memory_*() markings aren't "advisory", but they're expected to actually
> do what
On Thu, Jun 28, 2018 at 11:34:56PM +0200, Daniel Borkmann wrote:
> This set contains three fixes that are mostly JIT and set_memory_*()
> related. The third in the series in particular fixes the syzkaller
> bugs that were still pending; aside from local reproduction & testing,
> also 'syz test'
Hi Willem,
On 06/28/2018 07:27 AM, Willem de Bruijn wrote:
(...)
>
>> struct sock_txtime {
>> clockid_t clockid;/* reference clockid */
>> - u16 flags; /* bit 0: txtime in deadline_mode */
>> + u16 flags; /* bit 0:
Hi Dave,
please apply a few qeth fixes for -net and your 4.17 stable queue.
Patches 1-3 fix several issues wrt to MAC address management that were
introduced during the 4.17 cycle.
Patch 4 tackles a long-standing issue with busy multi-connection workloads
on devices in af_iucv mode.
Patch 5
This reverts commit b7493e91c11a757cf0f8ab26989642ee4bb2c642.
On its own, querying RDEV for a MAC address works fine. But when upgrading
from a qeth that previously queried DDEV on a z/VM NIC (ie. any kernel with
commit ec61bd2fd2a2), the RDEV query now returns a _different_ MAC address
than the
If qeth_qdio_output_handler() detects that a transmit requires async
completion, it replaces the pending buffer's metadata object
(qeth_qdio_out_buffer) so that this queue buffer can be re-used while
the data is pending completion.
Later when the CQ indicates async completion of such a metadata
commit e830baa9c3f0 ("qeth: restore device features after recovery") and
commit ce3443564145 ("s390/qeth: rely on kernel for feature recovery")
made sure that the HW functions for device features get re-programmed
after recovery.
But we missed that the same handling is also required when a card
When qeth_l2_set_mac_address() finds the card in a non-reachable state,
it merely copies the new MAC address into dev->dev_addr so that
__qeth_l2_set_online() can later register it with the HW.
But __qeth_l2_set_online() may very well be running concurrently, so we
can't trust the card state
From: Vasily Gorbik
*ether_addr*_64bits functions have been introduced to optimize
performance critical paths, which access 6-byte ethernet address as u64
value to get "nice" assembly. A harmless hack works nicely on ethernet
addresses shoved into a structure or a larger buffer, until busted by
Today, sunrpc lives in net/sunrpc. As far as I can tell, the primary
production consumer of it is NFS. The RPC clients have the concept of
being tied back to a network namespace. On the other hand, NFS has its
own superblock with its own user namespace.
When sunrpc convert kuids to UIDs to send
On 6/29/2018 6:05 AM, Jiri Pirko wrote:
Fri, Jun 29, 2018 at 02:54:36PM CEST, dsah...@gmail.com wrote:
On 6/29/18 6:48 AM, Jiri Pirko wrote:
Fri, Jun 29, 2018 at 02:12:21PM CEST, j...@mojatatu.com wrote:
On 29/06/18 04:39 AM, Jiri Pirko wrote:
Fri, Jun 29, 2018 at 12:25:53AM CEST,
On Fri, 29 Jun 2018 09:04:15 +0200, Daniel Borkmann wrote:
> On 06/28/2018 06:54 PM, Jakub Kicinski wrote:
> > On Thu, 28 Jun 2018 09:42:06 +0200, Jiri Benc wrote:
> >> On Wed, 27 Jun 2018 11:49:49 +0200, Daniel Borkmann wrote:
> >>> Looks good to me, and yes in BPF case a mask like
The __alx_open function can be called from ndo_open, which is called
under RTNL, or from alx_resume, which isn't. Since commit d768319cd427,
we're calling the netif_set_real_num_{tx,rx}_queues functions, which
need to be called under RTNL.
This is similar to commit 0c2cc02e571a ("igb: Move the
On Fri, 29 Jun 2018 17:34:41 +0200
Andrew Lunn wrote:
>> Wow indeed that will help a lot. Just so that we're in sync, do you
>> plan to add those helpers, or should I take this branch as a base for
>> the conversion and go on ?
>
>I'm still working on it. I can probably push again in the next
> Wow indeed that will help a lot. Just so that we're in sync, do you
> plan to add those helpers, or should I take this branch as a base for
> the conversion and go on ?
I'm still working on it. I can probably push again in the next few
minutes. But they won't be compile tested, i.e. broken...
Fixes: fc7a6c287ff3 ("sfc: use a semaphore to lock farch filters too")
Suggested-by: Joseph Korty
Signed-off-by: Bert Kenward
---
drivers/net/ethernet/sfc/farch.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/sfc/farch.c b/drivers/net/ethernet/sfc/farch.c
index
Hello Andrew,
On Fri, 29 Jun 2018 15:43:43 +0200
Andrew Lunn wrote:
>> Thanks for the suggestion. I see how this can be done with
>> phydrv->supported and phydev->lp_advertising, however I'm not sure how
>> we should deal with the fact that ethernet drivers directly access
>> fields such as
From: Saeed Mahameed
Date: Thu, 28 Jun 2018 14:50:51 -0700
> The following pull request includes updates for mlx5e netdevice driver.
> For more information please see tag log below.
>
> Please pull and let me know if there's any problem.
Pulled, thanks Saeed.
From: Jakub Kicinski
Date: Tue, 26 Jun 2018 21:39:33 -0700
> Simon & Pieter say:
>
> This set adds Geneve Options support to the TC tunnel key action.
> It provides the plumbing required to configure Geneve variable length
> options. The options can be configured in the form CLASS:TYPE:DATA,
>
Create unittests for the tc tunnel_key action.
v2:
For the tests expecting failures, added non-zero exit codes in the
teardowns. This prevents those tests from failing if the act_tunnel_key
module is unloaded.
Signed-off-by: Keara Leibovitz
---
.../tc-testing/tc-tests/actions/tunnel_key.json
From: Ganesh Goudar
Date: Tue, 26 Jun 2018 17:10:25 +0530
> From: Arjun Vynipadath
>
> The present TX workrequest(FW_ETH_TX_PKT_WR) cant be used for
> host->vf communication, since it doesn't loopback the outgoing
> packets to virtual interfaces on the same port. This can be done
> using
From: Ganesh Goudar
Date: Tue, 26 Jun 2018 17:10:50 +0530
> From: Arjun Vynipadath
>
> This is used to change TX workrequests, which helps in
> host->vf communication.
>
> Signed-off-by: Arjun Vynipadath
> Signed-off-by: Casey Leedom
> Signed-off-by: Ganesh Goudar
Applied.
Please disregard - wrong patch.
Keara
Andrew Lunn wrote on 06/29/2018 03:33 PM:
Kernel: 3.16.0-4-amd64 (Debian v8 stock kernel)
This is very old. Please at least try the current version of Debian,
"Stretch".
Andrew
I had the same idea and attached the card to another machine with a recent
kernel:
# uname -a
Linux c5
Create unittests for the tc tunnel_key action.
Signed-off-by: Keara Leibovitz
---
.../tc-testing/tc-tests/actions/tunnel_key.json| 676 +
1 file changed, 676 insertions(+)
create mode 100644
tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json
diff
On 06/29/2018 06:45 AM, Daniel Borkmann wrote:
> On 06/25/2018 05:34 PM, John Fastabend wrote:
> [...]
>> @@ -2302,9 +2347,12 @@ static int sock_hash_ctx_update_elem(struct
>> bpf_sock_ops_kern *skops,
>> goto bucket_err;
>> }
>>
>> -e->hash_link = l_new;
>> -e->htab =
On Fri, Jun 29, 2018 at 6:07 AM Ilpo Järvinen wrote:
>
> If SACK is not enabled and the first cumulative ACK after the RTO
> retransmission covers more than the retransmitted skb, a spurious
> FRTO undo will trigger (assuming FRTO is enabled for that RTO).
> The reason is that any
Calling skb_unclone() is expensive as it triggers a memcpy operation.
Instead of calling skb_unclone() unconditionally, call it only when skb
has a shared frag_list. This improves tls rx throughout significantly.
Signed-off-by: Vakul Garg
Suggested-by: Boris Pismenny
---
On Fri, Jun 29, 2018 at 6:46 AM Linus Torvalds
wrote:
>
> Hmm. I'll have to think about it.
I took yours over the revert. It's just smaller and nicer, and the
conditional should predict fine for any case where it might matter.
Linus
On Fri, Jun 29, 2018 at 6:37 AM Christoph Hellwig wrote:
>
> The big aio poll revert broke various network protocols that don't
> implement ->poll as a patch in the aio poll serie removed sock_no_poll
> and made the common code handle this case.
Hmm. I just did the revert of commit 984652dd8b1f
On 06/25/2018 05:34 PM, John Fastabend wrote:
[...]
> @@ -2302,9 +2347,12 @@ static int sock_hash_ctx_update_elem(struct
> bpf_sock_ops_kern *skops,
> goto bucket_err;
> }
>
> - e->hash_link = l_new;
> - e->htab = container_of(map, struct bpf_htab, map);
> +
> Thanks for the suggestion. I see how this can be done with
> phydrv->supported and phydev->lp_advertising, however I'm not sure how
> we should deal with the fact that ethernet drivers directly access
> fields such as "advertising" or "supported".
Hi Maxime
I started cleaning up some of the
On Fri, Jun 29, 2018 at 6:29 AM Christoph Hellwig wrote:
> No need for poll_table_entry, we just need a wait_queue_head.
> poll_table_entry is an select.c internal (except for two nasty driver) -
> neither epoll nor most in-kernel callers use it.
Well, you need the poll_table for the
From: Avi Kivity
io_pgetevents() will not change the signal mask. Mark it const
to make it clear and to reduce the need for casts in user code.
Reviewed-by: Christoph Hellwig
Signed-off-by: Avi Kivity
Signed-off-by: Al Viro
[hch: reapply the patch that got incorrectly reverted]
The big aio poll revert broke various network protocols that don't
implement ->poll as a patch in the aio poll serie removed sock_no_poll
and made the common code handle this case.
Fixes: a11e1d43 ("Revert changes to convert to ->poll_mask() and aio
IOCB_CMD_POLL")
Signed-off-by: Christoph
Two fixup for incorrectly reverted bits.
From: David Ahern
Date: Fri, 29 Jun 2018 06:54:36 -0600
> The resolution of the syntax affect the uapi changes proposed. You are
> wanting to add new RTM commands which suggests new objects. If a
> template is an attribute of an existing object then the netlink API
> should indicate that.
+1
> Kernel: 3.16.0-4-amd64 (Debian v8 stock kernel)
This is very old. Please at least try the current version of Debian,
"Stretch".
Andrew
On Thu, Jun 28, 2018 at 10:30:27PM +0100, Al Viro wrote:
> > Because I think that what it can do is simply to do the ->poll() calls
> > outside the iocb locks, and then just attach the poll table to the
> > kioctx afterwards.
>
> I'd do a bit more - embed the first poll_table_entry into poll iocb
On Thu, Jun 28, 2018 at 02:11:17PM -0700, Linus Torvalds wrote:
> Christoph, do you have a test program for IOCB_CMD_POLL and what it's
> actually supposed to do?
https://pagure.io/libaio/c/9c6935e81854d1585bbfa48c35b185849d746864?branch=aio-poll
is the actual test in libaio. In addition to
Hello Andrew,
On Tue, 19 Jun 2018 17:21:55 +0200
Andrew Lunn wrote:
>> What I propose is that we add 3 link_mode fields in phy_device, and keep
>> the legacy fields for now. It would be up to the driver to fill the new
>> "supported" field in config_init, kind of like what's done in the
>>
From: Xin Long
Date: Thu, 28 Jun 2018 15:31:00 +0800
> This feature is actually already supported by sk->sk_reuse which can be
> set by socket level opt SO_REUSEADDR. But it's not working exactly as
> RFC6458 demands in section 8.1.27, like:
>
> - This option only supports one-to-one style
Decrement the number of elements in the map in case the allocation
of a new node fails.
Signed-off-by: Mauricio Vasquez B
---
kernel/bpf/hashtab.c | 16 +++-
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index
1 - 100 of 125 matches
Mail list logo