RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode
> -Original Message- > From: Jeff Kirsher [mailto:jeffrey.t.kirs...@intel.com] > Sent: Friday, December 23, 2016 9:07 AM > To: maowenan; Alexander Duyck > Cc: Stephen Hemminger; netdev@vger.kernel.org; weiyongjun (A); > Dingtianhong > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering > mode > > On Fri, 2016-12-23 at 00:40 +, maowenan wrote: > > > -Original Message- > > > From: Alexander Duyck [mailto:alexander.du...@gmail.com] > > > Sent: Thursday, December 22, 2016 11:54 PM > > > To: maowenan > > > Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirsher@intel. > > > com; > > > weiyongjun (A); Dingtianhong > > > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax > > > ordering mode > > > > > > On Wed, Dec 21, 2016 at 5:39 PM, maowenan > > > wrote: > > > > > > > > > > > > > -Original Message- > > > > > From: Stephen Hemminger [mailto:step...@networkplumber.org] > > > > > Sent: Thursday, December 22, 2016 9:28 AM > > > > > To: maowenan > > > > > Cc: netdev@vger.kernel.org; jeffrey.t.kirs...@intel.com > > > > > Subject: Re: [PATCH] ethtool: add one ethtool option to set > > > > > relax ordering mode > > > > > > > > > > On Thu, 8 Dec 2016 14:51:38 +0800 Mao Wenan > > > > > wrote: > > > > > > > > > > > This patch provides one way to set/unset IXGBE NIC TX and RX > > > > > > relax ordering mode, which can be set by ethtool. > > > > > > Relax ordering is one mode of 82599 NIC, to enable this mode > > > > > > can enhance the performance for some cpu architecure. > > > > > > > > > > Then it should be done by CPU architecture specific quirks > > > > > (preferably in PCI > > > > > layer) so that all users get the option without having to do > > > > > manual > > > > > > intervention. > > > > > > > > > > > example: > > > > > > ethtool -s enp1s0f0 relaxorder off ethtool -s enp1s0f0 > > > > > > relaxorder on > > > > > > > > > > Doing it via ethtool is a developer API (for testing) not > > > > > something that makes sense in production. > > > > > > > > > > > > This feature is not mandatory for all users, acturally relax > > > > ordering default configuration of 82599 is 'disable', So this > > > > patch gives one way to > > > > > > enable relax ordering to be selected in some performance condition. > > > > > > That isn't quite true. The default for Sparc systems is to have it > > > enabled. > > > > > > Really this is something that is platform specific. I agree with > > > Stephen that it would work better if this was handled as a series of > > > platform specific quirks handled at something like the PCI layer > > > rather than be a switch the user can toggle on and off. > > > > > > With that being said there are changes being made that should help > > > to improve the situation. Specifically I am looking at adding > > > support for the DMA_ATTR_WEAK_ORDERING which may also allow us to > > > identify cases where you might be able to specify the DMA behavior > > > via the DMA mapping instead of having to make the final decision in > > > the device itself. > > > > > > - Alex > > > > Yes, Sparc is a special case. From the NIC driver point of view, It is > > no need for some ARCHs to do particular operation and compiling > > branch, ethtool is a flexible method for user to make decision whether > > on|off this feature. > > I think Jeff as maintainer of 82599 has some comments about this. > > My original comment/objection was that you attempted to do this change as a > module parameter to the ixgbe driver, where I directed you to use ethtool so > that other drivers could benefit from the ability to enable/disable relaxed > ordering. As far as how it gets implemented in ethtool or PCI layer, makes > little difference to me, I only had issues with the driver specific module > parameter implementation, which is not acceptable. Thank you Jeff and Alex. And then I have gone through mail thread about "i40e: enable PCIe relax ordering for SPARC", It only works for SPARC, any other ARCH who wants to enable DMA_ATTR_WEAK_ORDERING feature, should define the new macro, recompile the driver module. Because of the above reasons, we implement in ethtool to give the final user a convenient way to on|off special feature, no need define new macro, easy to extend the new features, and also good benefit for other driver as Jeff referred.
Re: [PATCH v2] stmmac: CSR clock configuration fix
G'day Joao, On 23/12/2016 01:06, Joao Pinto wrote: Às 4:57 PM de 12/22/2016, Phil Reid escreveu: On 22/12/2016 23:47, Joao Pinto wrote: Hello Phil, Às 3:42 PM de 12/22/2016, Phil Reid escreveu: G'day Joao, On 22/12/2016 20:38, Joao Pinto wrote: When testing stmmac with my QoS reference design I checked a problem in the CSR clock configuration that was impossibilitating the phy discovery, since every read operation returned 0x. This patch fixes the issue. Signed-off-by: Joao Pinto --- changes v1->v2 (David Miller) - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch to make sense drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +- drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c | 2 +- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c index b21d03f..94223c8 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins, mac->mii.reg_shift = 6; mac->mii.reg_mask = 0x07C0; mac->mii.clk_csr_shift = 2; -mac->mii.clk_csr_mask = 0xF; +mac->mii.clk_csr_mask = GENMASK(4, 2); Should this not be GENMASK(5,2) According to Universal MAC databook (valid for MAC100 and MAC1000), we have: Bits: 4:2 000 60-100 MHz clk_csr_i/42 001 100-150 MHz clk_csr_i/62 010 20-35 MHz clk_csr_i/16 011 35-60 MHz clk_csr_i/26 100 150-250 MHz clk_csr_i/102 101 250-300 MHz clk_csr_i/124 110, 111 Reserved So only bits 2, 3 and 4 should be masked. Thanks. But this is a change in behaviour from what was there isn't. Previous mask was 4 bits. now it's 3. And for example the altera socfgpa implementation in the cyclone V has valid values for values 0x8-0xf, using bit 5:2. According to the databook, bit 5 is reserved and RO. When reserved, it is possible to customize. Is that the case? If there is hardware using the 5th bit we can put it GENMASK(5, 2) with no problem. I've also checked the Aria 10 documentation and bit 5 is also RW. The following options are documented and supported 1000: CSR clock/4 1001: CSR clock/6 1010: CSR clock/8 1011: CSR clock/10 1100: CSR clock/12 1101: CSR clock/14 1110: CSR clock/16 : CSR clock/18 They do mention that these values will probably be outside the IEEE 802.3 specified range. But there's at least a couple of cores out there where GENMASK(5,2) is valid. Can't say if anyone is using that setting thou. -- Regards Phil Reid
Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode
On Fri, 2016-12-23 at 00:40 +, maowenan wrote: > > -Original Message- > > From: Alexander Duyck [mailto:alexander.du...@gmail.com] > > Sent: Thursday, December 22, 2016 11:54 PM > > To: maowenan > > Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirsher@intel. > > com; > > weiyongjun (A); Dingtianhong > > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax > > ordering mode > > > > On Wed, Dec 21, 2016 at 5:39 PM, maowenan > > wrote: > > > > > > > > > > -Original Message- > > > > From: Stephen Hemminger [mailto:step...@networkplumber.org] > > > > Sent: Thursday, December 22, 2016 9:28 AM > > > > To: maowenan > > > > Cc: netdev@vger.kernel.org; jeffrey.t.kirs...@intel.com > > > > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax > > > > ordering mode > > > > > > > > On Thu, 8 Dec 2016 14:51:38 +0800 > > > > Mao Wenan wrote: > > > > > > > > > This patch provides one way to set/unset IXGBE NIC TX and RX > > > > > relax > > > > > ordering mode, which can be set by ethtool. > > > > > Relax ordering is one mode of 82599 NIC, to enable this mode can > > > > > enhance the performance for some cpu architecure. > > > > > > > > Then it should be done by CPU architecture specific quirks > > > > (preferably in PCI > > > > layer) so that all users get the option without having to do manual > > > > intervention. > > > > > > > > > example: > > > > > ethtool -s enp1s0f0 relaxorder off > > > > > ethtool -s enp1s0f0 relaxorder on > > > > > > > > Doing it via ethtool is a developer API (for testing) not something > > > > that makes sense in production. > > > > > > > > > This feature is not mandatory for all users, acturally relax ordering > > > default configuration of 82599 is 'disable', So this patch gives one > > > way to > > > > enable relax ordering to be selected in some performance condition. > > > > That isn't quite true. The default for Sparc systems is to have it > > enabled. > > > > Really this is something that is platform specific. I agree with > > Stephen that it > > would work better if this was handled as a series of platform specific > > quirks > > handled at something like the PCI layer rather than be a switch the > > user can > > toggle on and off. > > > > With that being said there are changes being made that should help to > > improve > > the situation. Specifically I am looking at adding support for the > > DMA_ATTR_WEAK_ORDERING which may also allow us to identify cases where > > you might be able to specify the DMA behavior via the DMA mapping > > instead of > > having to make the final decision in the device itself. > > > > - Alex > > Yes, Sparc is a special case. From the NIC driver point of view, It is no > need for > some ARCHs to do particular operation and compiling branch, ethtool is a > flexible > method for user to make decision whether on|off this feature. > I think Jeff as maintainer of 82599 has some comments about this. My original comment/objection was that you attempted to do this change as a module parameter to the ixgbe driver, where I directed you to use ethtool so that other drivers could benefit from the ability to enable/disable relaxed ordering. As far as how it gets implemented in ethtool or PCI layer, makes little difference to me, I only had issues with the driver specific module parameter implementation, which is not acceptable. signature.asc Description: This is a digitally signed message part
[PATCH] brcmfmac: fix spelling mistakes on "Ivalid"
From: Colin Ian King Trivial fixes to spelling mistake "Ivalid" to "Invalid" in brcmf_err error messages. Signed-off-by: Colin Ian King --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c index 7ffc4ab..15eaf72 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -3971,7 +3971,7 @@ brcmf_configure_wpaie(struct brcmf_if *ifp, pval |= AES_ENABLED; break; default: - brcmf_err("Ivalid unicast security info\n"); + brcmf_err("Invalid unicast security info\n"); } offset++; } @@ -4015,7 +4015,7 @@ brcmf_configure_wpaie(struct brcmf_if *ifp, wpa_auth |= WPA2_AUTH_1X_SHA256; break; default: - brcmf_err("Ivalid key mgmt info\n"); + brcmf_err("Invalid key mgmt info\n"); } offset++; } -- 2.10.2
RE: [PATCH] ethtool: add one ethtool option to set relax ordering mode
> -Original Message- > From: Alexander Duyck [mailto:alexander.du...@gmail.com] > Sent: Thursday, December 22, 2016 11:54 PM > To: maowenan > Cc: Stephen Hemminger; netdev@vger.kernel.org; jeffrey.t.kirs...@intel.com; > weiyongjun (A); Dingtianhong > Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering > mode > > On Wed, Dec 21, 2016 at 5:39 PM, maowenan > wrote: > > > > > >> -Original Message- > >> From: Stephen Hemminger [mailto:step...@networkplumber.org] > >> Sent: Thursday, December 22, 2016 9:28 AM > >> To: maowenan > >> Cc: netdev@vger.kernel.org; jeffrey.t.kirs...@intel.com > >> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax > >> ordering mode > >> > >> On Thu, 8 Dec 2016 14:51:38 +0800 > >> Mao Wenan wrote: > >> > >> > This patch provides one way to set/unset IXGBE NIC TX and RX relax > >> > ordering mode, which can be set by ethtool. > >> > Relax ordering is one mode of 82599 NIC, to enable this mode can > >> > enhance the performance for some cpu architecure. > >> > >> Then it should be done by CPU architecture specific quirks > >> (preferably in PCI > >> layer) so that all users get the option without having to do manual > intervention. > >> > >> > example: > >> > ethtool -s enp1s0f0 relaxorder off > >> > ethtool -s enp1s0f0 relaxorder on > >> > >> Doing it via ethtool is a developer API (for testing) not something > >> that makes sense in production. > > > > > > This feature is not mandatory for all users, acturally relax ordering > > default configuration of 82599 is 'disable', So this patch gives one way to > enable relax ordering to be selected in some performance condition. > > That isn't quite true. The default for Sparc systems is to have it enabled. > > Really this is something that is platform specific. I agree with Stephen > that it > would work better if this was handled as a series of platform specific quirks > handled at something like the PCI layer rather than be a switch the user can > toggle on and off. > > With that being said there are changes being made that should help to improve > the situation. Specifically I am looking at adding support for the > DMA_ATTR_WEAK_ORDERING which may also allow us to identify cases where > you might be able to specify the DMA behavior via the DMA mapping instead of > having to make the final decision in the device itself. > > - Alex Yes, Sparc is a special case. From the NIC driver point of view, It is no need for some ARCHs to do particular operation and compiling branch, ethtool is a flexible method for user to make decision whether on|off this feature. I think Jeff as maintainer of 82599 has some comments about this. -Wenan
Re: [PATCH net] net, sched: fix soft lockup in tc_classify
On 12/22/2016 08:05 PM, Cong Wang wrote: On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann wrote: Ok, you mean for net. In that case I prefer the smaller sized fix to be honest. It also covers everything from the point where we fetch the chain via cops->tcf_chain() to the end of the function, which is where most of the complexity resides, and only the two mentioned commits do the relock, I really wish the problem is only about relocking, but look at the code, the deeper reason why we have this bug is the complexity of the logic inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it idempotent; 2) the request logic itself is hard, because of tc filter design and implementation. This is why I worry more than just relocking. But do you have a concrete 2nd issue/bug you're seeing? It rather sounds to me your argument is more about fear of complexity on tc framework itself. I agree it's complex, and tc_ctl_tfilter() is quite big in itself, where it would be good to reduce it's complexity into smaller pieces. But it's not really related to the fix itself, reducing complexity requires significantly more and deeper work on the code. We can rework tc_ctl_tfilter() in net-next to try to simplify it, sure, but I don't get why we have to discuss so much on this matter in this context, really. so as a fix I think it's fine as-is. As mentioned, if there's need to refactor tc_ctl_tfilter() net-next would be better, imho. Refactor is a too strong word, just moving the replay out is not a refactor. The end result will be still smaller than your commit d936377414fadbafb4, which is already backported to stable. Commit d936377414fa is a whole different thing, and not related to the topic at all. The two-line changes with the initialization fix is quite straight forward and fixes a bug in the logic. It's also less complex in terms of lines but also in terms of complexity than to refactor or move the replay out. Moving it out contextually means that you also wouldn't rule out that things like nlmsg_parse(), or in-fact *any* of the __tc_ctl_tfilter() return paths could potentially return an error that suddenly would require a replay of the request. So, while moving it out fixes the bug, too, it also adds more potential points where you would go and replay the request where you didn't do so before and therefore also adds more complexity to the code that needs review/audit when fixing bugs since you don't have these hard/direct return paths anymore. Therefore I don't think it's better to go that way for the fix. Thanks, Daniel
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
Hannes Frederic Sowa wrote: > A lockdep test should still be done. ;) Adding might_lock() annotations will improve coverage a lot. > Yes, that does look nice indeed. Accounting for bits instead of bytes > shouldn't be a huge problem either. Maybe it gets a bit more verbose in > case you can't satisfy a request with one batched entropy block and have > to consume randomness from two. The bit granularity is also for the callers' convenience, so they don't have to mask again. Whether get_random_bits rounds up to byte boundaries internally or not is something else. When the current batch runs low, I was actually thinking of throwing away the remaining bits and computing a new batch of 512. But it's whatever works best at implementation time. >>> It could only mix the output back in every two calls, in which case >>> you can backtrack up to one call but you need to do 2^128 work to >>> backtrack farther. But yes, this is getting excessively complicated. > >> No, if you're willing to accept limited backtrack, this is a perfectly >> acceptable solution, and not too complicated. You could do it phase-less >> if you like; store the previous output, then after generating the new >> one, mix in both. Then overwrite the previous output. (But doing two >> rounds of a crypto primtive to avoid one conditional jump is stupid, >> so forget that.) > Can you quickly explain why we lose the backtracking capability? Sure. An RNG is (state[i], output[i]) = f(state[i-1]). The goal of backtracking is to compute output[i], or better yet state[i-1], given state[i]. For example, consider an OFB or CTR mode generator. The state is a key and and IV, and you encrypt the IV with the key to produce output, then either replace the IV with the output, or increment it. Either way, since you still have the key, you can invert the transformation and recover the previous IV. The standard way around this is to use the Davies-Meyer construction: IV[i] = IV[i-1] + E(IV[i-1], key) This is the standard way to make a non-invertible random function out of an invertible random permutation. >From the sum, there's no easy way to find the ciphertext *or* the plaintext that was encrypted. Assuming the encryption is secure, the only way to reverse it is brute force: guess IV[i-1] and run the operation forward to see if the resultant IV[i] matches. There are a variety of ways to organize this computation, since the guess gives toy both IV[i-1] and E(IV[i-1], key) = IV[i] - IV[i-1], including running E forward, backward, or starting from both ends to see if you meet in the middle. The way you add the encryption output to the IV is not very important. It can be addition, xor, or some more complex invertible transformation. In the case of SipHash, the "encryption" output is smaller than the input, so we have to get a bit more creative, but it's still basically the same thing. The problem is that the output which is combined with the IV is too small. With only 64 bits, trying all possible values is practical. (The world's Bitcoin miners are collectively computing SHA-256(SHA-256(input)) 1.7 * 2^64 times per second.) By basically doing two iterations at once and mixing in 128 bits of output, the guessing attack is rendered impractical. The only downside is that you need to remember and store one result between when it's computed and last used. This is part of the state, so an attack can find output[i-1], but not anything farther back. > ChaCha as a block cipher gives a "perfect" permutation from the output > of either the CRNG or the CPRNG, which actually itself has backtracking > protection. I'm not quite understanding. The /dev/random implementation uses some of the ChaCha output as a new ChaCha key (that's another way to mix output back into the state) to prevent backtracking. But this slows it down, and again if you want to be efficient, you're generating and storing large batches of entropy and storing it in the RNG state.
Re: [PATCH net] net, sched: fix soft lockup in tc_classify
On 12/22/2016 06:50 PM, John Fastabend wrote: On 16-12-22 08:53 AM, David Miller wrote: From: Daniel Borkmann Date: Wed, 21 Dec 2016 22:07:48 +0100 Ok, you mean for net. In that case I prefer the smaller sized fix to be honest. It also covers everything from the point where we fetch the chain via cops->tcf_chain() to the end of the function, which is where most of the complexity resides, and only the two mentioned commits do the relock, so as a fix I think it's fine as-is. As mentioned, if there's need to refactor tc_ctl_tfilter() net-next would be better, imho. Please, can you two work towards an agreement as to what fix should go in at this time? Thanks. Thanks for fixing this! I have a slight preference to push this patch into net as its been tested already by Shahar and is not particularly invasive. Then for net-next do a larger cleanup to get rid of some of the complexity per Cong's suggestion. My preference as well, thanks.
Re: [PATCH net] net, sched: fix soft lockup in tc_classify
On 12/22/2016 02:16 PM, Shahar Klein wrote: On 12/21/2016 7:04 PM, Daniel Borkmann wrote: Shahar reported a soft lockup in tc_classify(), where we run into an endless loop when walking the classifier chain due to tp->next == tp which is a state we should never run into. The issue only seems to trigger under load in the tc control path. What happens is that in tc_ctl_tfilter(), thread A allocates a new tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() with it. In that classifier callback we had to unlock/lock the rtnl mutex and returned with -EAGAIN. One reason why we need to drop there is, for example, that we need to request an action module to be loaded. This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning after we loaded and found the requested action, we need to redo the whole request so we don't race against others. While we had to unlock rtnl in that time, thread B's request was processed next on that CPU. Thread B added a new tp instance successfully to the classifier chain. When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN and destroying its tp instance which never got linked, we goto replay and redo A's request. This time when walking the classifier chain in tc_ctl_tfilter() for checking for existing tp instances we had a priority match and found the tp instance that was created and linked by thread B. Now calling again into tp->ops->change() with that tp was successful and returned without error. tp_created was never cleared in the second round, thus kernel thinks that we need to link it into the classifier chain (once again). tp and *back point to the same object due to the match we had earlier on. Thus for thread B's already public tp, we reset tp->next to tp itself and link it into the chain, which eventually causes the mentioned endless loop in tc_classify() once a packet hits the data path. Fix is to clear tp_created at the beginning of each request, also when we replay it. On the paths that can cause -EAGAIN we already destroy the original tp instance we had and on replay we really need to start from scratch. It seems that this issue was first introduced in commit 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup"). Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup") Reported-by: Shahar Klein Signed-off-by: Daniel Borkmann Cc: Cong Wang [...] I've tested this specific patch (without cong's patch and without the many prints) many times and in many test permutations today and it didn't lock Thanks again Daniel! Just catching up with email (traveling whole day), thanks a bunch for your effort, Shahar! In that case lets then add: Tested-by: Shahar Klein
[PATCH net] inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets
From: Willem de Bruijn Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within the packet. For sockets that have transport headers pulled, transport offset can be negative. Use signed comparison to avoid overflow. Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Reported-by: Nisar Jagabar Signed-off-by: Willem de Bruijn --- net/ipv4/ip_sockglue.c | 2 +- net/ipv6/datagram.c| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index 8b13881..9760734 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -148,7 +148,7 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct sk_buff *skb) const struct iphdr *iph = ip_hdr(skb); __be16 *ports = (__be16 *)skb_transport_header(skb); - if (skb_transport_offset(skb) + 4 > skb->len) + if (skb_transport_offset(skb) + 4 > (int)skb->len) return; /* All current transport protocols have the port numbers in the diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c index 0489e19..1407426 100644 --- a/net/ipv6/datagram.c +++ b/net/ipv6/datagram.c @@ -701,7 +701,7 @@ void ip6_datagram_recv_specific_ctl(struct sock *sk, struct msghdr *msg, struct sockaddr_in6 sin6; __be16 *ports = (__be16 *) skb_transport_header(skb); - if (skb_transport_offset(skb) + 4 <= skb->len) { + if (skb_transport_offset(skb) + 4 <= (int)skb->len) { /* All current transport protocols have the port numbers in the * first four bytes of the transport header and this function is * written with this assumption in mind. -- 2.8.0.rc3.226.g39d4020
Re: [PATCH] Fixed status entry in m_can documentation
On Thu, Dec 22, 2016 at 11:45:21AM +0100, Vyacheslav V. Yurkov wrote: > Use valid value for 'enabled' in status field > > Signed-off-by: Vyacheslav V. Yurkov > --- > Documentation/devicetree/bindings/net/can/m_can.txt | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt > b/Documentation/devicetree/bindings/net/can/m_can.txt > index 9e33177..5facaf5 100644 > --- a/Documentation/devicetree/bindings/net/can/m_can.txt > +++ b/Documentation/devicetree/bindings/net/can/m_can.txt > @@ -63,5 +63,5 @@ Board dts: > &m_can1 { > pinctrl-names = "default"; > pinctrl-0 = <&pinctrl_m_can1>; > - status = "enabled"; > + status = "okay"; Examples don't need to have status prop. Just remove. Rob
Re: [PATCH v3 2/3] NFC: trf7970a: Add device tree option of 1.8 Volt IO voltage
On Wed, Dec 21, 2016 at 11:18:33PM -0500, Geoff Lansberry wrote: > The TRF7970A has configuration options for supporting hardware designs > with 1.8 Volt or 3.3 Volt IO. This commit adds a device tree option, > using a fixed regulator binding, for setting the io voltage to match > the hardware configuration. If no option is supplied it defaults to > 3.3 volt configuration. > > Signed-off-by: Geoff Lansberry > --- > .../devicetree/bindings/net/nfc/trf7970a.txt | 2 ++ Acked-by: Rob Herring > drivers/nfc/trf7970a.c | 26 > +- > 2 files changed, 27 insertions(+), 1 deletion(-)
Re: [PATCH v3 1/3] NFC: trf7970a: add device tree option for 27MHz clock
On Wed, Dec 21, 2016 at 11:18:32PM -0500, Geoff Lansberry wrote: > The TRF7970A has configuration options to support hardware designs > which use a 27.12MHz clock. This commit adds a device tree option > 'clock-frequency' to support configuring the this chip for default > 13.56MHz clock or the optional 27.12MHz clock. > > Signed-off-by: Geoff Lansberry > --- > .../devicetree/bindings/net/nfc/trf7970a.txt | 2 + Acked-by: Rob Herring > drivers/nfc/trf7970a.c | 50 > +- > 2 files changed, 41 insertions(+), 11 deletions(-)
Re: [PATCH 6/6 net-next] inet: reset tb->fastreuseport when adding a reuseport sk
On Thu, 2016-12-22 at 16:26 -0500, Josef Bacik wrote: > If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 > and > never set it again. Which means that in the future if we end up adding a > bunch > of reuseport sk's to that tb we'll have to do the expensive scan every time. > Instead add a sock_common to the tb so we know what reuseport sk succeeded > last. > Once one sk has made it onto the list we know that there are no potential bind > conflicts on the owners list that match that sk's rcv_addr. So copy the sk's > common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT > so > we know we have to do an extra check for subsequent reuseport sockets and skip > the expensive bind conflict check. > > Signed-off-by: Josef Bacik > --- > include/net/inet_hashtables.h | 4 > net/ipv4/inet_connection_sock.c | 53 > + > 2 files changed, 53 insertions(+), 4 deletions(-) > > diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h > index 756ed16..4ccc18f 100644 > --- a/include/net/inet_hashtables.h > +++ b/include/net/inet_hashtables.h > @@ -74,12 +74,16 @@ struct inet_ehash_bucket { > * users logged onto your box, isn't it nice to know that new data > * ports are created in O(1) time? I thought so. ;-)-DaveM > */ > +#define FASTREUSEPORT_ANY1 > +#define FASTREUSEPORT_STRICT 2 > + > struct inet_bind_bucket { > possible_net_t ib_net; > unsigned short port; > signed char fastreuse; > signed char fastreuseport; > kuid_t fastuid; > + struct sock_common fastsock; > int num_owners; > struct hlist_node node; > struct hlist_head owners; Please place this fat field at the end of inet_bind_bucket Many sockets do not use SO_REUSEPORT and should not use this field, while tb->owners need to be touched.
Re: [PATCH 3/6 net-next] inet: kill smallest_size and smallest_port
On Thu, 2016-12-22 at 16:26 -0500, Josef Bacik wrote: > In inet_csk_get_port we seem to be using smallest_port to figure out where the > best place to look for a SO_REUSEPORT sk that matches with an existing set of > SO_REUSEPORT's. However if we get to the logic > > if (smallest_size != -1) { > port = smallest_port; > goto have_port; > } > > we will do a useless search, because we would have already done the > inet_csk_bind_conflict for that port and it would have returned 1, otherwise > we > would have gone to found_tb and succeeded. Since this logic makes us do yet > another trip through inet_csk_bind_conflict for a port we know won't work just > delete this code and save us the time. > > Signed-off-by: Josef Bacik Please remove tb->need_owners ;)
Re: [PATCH v4 2/3] Bluetooth: btusb: Add out-of-band wakeup support
On Wed, Dec 21, 2016 at 12:33:51PM -0800, Rajat Jain wrote: > Some onboard BT chips (e.g. Marvell 8997) contain a wakeup pin that > can be connected to a gpio on the CPU side, and can be used to wakeup > the host out-of-band. This can be useful in situations where the > in-band wakeup is not possible or not preferable (e.g. the in-band > wakeup may require the USB host controller to remain active, and > hence consuming more system power during system sleep). > > The oob gpio interrupt to be used for wakeup on the CPU side, is > read from the device tree node, (using standard interrupt descriptors). > A devcie tree binding document is also added for the driver. The > compatible string is in compliance with > Documentation/devicetree/bindings/usb/usb-device.txt > > Signed-off-by: Rajat Jain > Reviewed-by: Brian Norris > --- > v4: Move the set_bit(BTUSB_OOB_WAKE_DISABLED,..) call to the beginning of > btusb_config_oob_wake() - caught by Brian. > v3: Add Brian's "Reviewed-by" > v2: * Use interrupt-names ("wakeup") instead of assuming first interrupt. > * Leave it on device tree to specify IRQ flags (level /edge triggered) > * Mark the device as non wakeable on exit. > > Documentation/devicetree/bindings/net/btusb.txt | 40 Acked-by: Rob Herring > drivers/bluetooth/btusb.c | 84 > + > 2 files changed, 124 insertions(+) > create mode 100644 Documentation/devicetree/bindings/net/btusb.txt
Re: [PATCH net-next 02/10] net: netcp: ethss: add support of 10gbe pcsr link status
On Tue, Dec 20, 2016 at 05:09:45PM -0500, Murali Karicheri wrote: > From: WingMan Kwok > > The 10GBASE-R Physical Coding Sublayer (PCS-R) module provides > functionality of a physical coding sublayer (PCS) on data being > transferred between a demuxed XGMII and SerDes supporting a 16 > or 32 bit interface. From the driver point of view, whether > a ethernet link is up or not depends also on the status of the > block-lock bit of the PCSR. This patch adds the checking of that > bit in order to determine the link status. I would think this would be a common thing and the phy driver should provide the status, rather than trying to give the ethernet driver direct access to the phy registers. Is the PCSR the serdes phy or registers in addition to that? Rob
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
On 22.12.2016 22:11, George Spelvin wrote: >> I do tend to like Ted's version in which we use batched >> get_random_bytes() output. If it's fast enough, it's simpler and lets >> us get the full strength of a CSPRNG. > > With the ChaCha20 generator, that's fine, although note that this abandons > anti-backtracking entirely. > > It also takes locks, something the previous get_random_int code > path avoided. Do we need to audit the call sites to ensure that's safe? We have spin_lock_irq* locks on the way. Of course they can hurt in when contended. The situation should be the same as with get_random_bytes, callable from every possible situation in the kernel without fear, so this should also work for get_random_int. A lockdep test should still be done. ;) > And there is the issue that the existing callers assume that there's a > fixed cost per word. A good half of get_random_long calls are followed by > "& ~PAGE_MASK" to extract the low 12 bits. Or "& ((1ul << mmap_rnd_bits) > - 1)" to extract the low 28. If we have a buffer we're going to have to > pay to refill, it would be nice to use less than 8 bytes to satisfy those. > > But that can be a followup patch. I'm thinking > > unsigned long get_random_bits(unsigned bits) > E.g. get_random_bits(PAGE_SHIFT), >get_random_bits(mmap_rnd_bits), > u32 imm_rnd = get_random_bits(32) > > unsigned get_random_mod(unsigned modulus) > E.g. get_random_mod(hole) & ~(alignment - 1); >get_random_mod(port_scan_backoff) > (Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed > to prandom.) > > with, until the audit is completed: > #define get_random_int() get_random_bits(32) > #define get_random_long() get_random_bits(BITS_PER_LONG) Yes, that does look nice indeed. Accounting for bits instead of bytes shouldn't be a huge problem either. Maybe it gets a bit more verbose in case you can't satisfy a request with one batched entropy block and have to consume randomness from two. >> It could only mix the output back in every two calls, in which case >> you can backtrack up to one call but you need to do 2^128 work to >> backtrack farther. But yes, this is getting excessively complicated. > > No, if you're willing to accept limited backtrack, this is a perfectly > acceptable solution, and not too complicated. You could do it phase-less > if you like; store the previous output, then after generating the new > one, mix in both. Then overwrite the previous output. (But doing two > rounds of a crypto primtive to avoid one conditional jump is stupid, > so forget that.) Can you quickly explain why we lose the backtracking capability? ChaCha as a block cipher gives a "perfect" permutation from the output of either the CRNG or the CPRNG, which actually itself has backtracking protection. Thanks for explaining, Hannes
[PATCH 4/6 net-next] inet: don't check for bind conflicts twice when searching for a port
This is just wasted time, we've already found a tb that doesn't have a bind conflict, and we don't drop the head lock so scanning again isn't going to give us a different answer. Instead move the tb->reuse setting logic outside of the found_tb path and put it in the success: path. Then make it so that we don't goto again if we find a bind conflict in the found_tb path as we won't reach this anymore when we are scanning for an ephemeral port. Signed-off-by: Josef Bacik --- net/ipv4/inet_connection_sock.c | 33 + 1 file changed, 13 insertions(+), 20 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index d352366..f6a34bc 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -164,7 +164,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) { bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN; struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo; - int ret = 1, attempts = 5, port = snum; + int ret = 1, port = snum; struct inet_bind_hashbucket *head; struct net *net = sock_net(sk); int i, low, high, attempt_half; @@ -183,7 +183,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) goto tb_not_found; } -again: attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0; other_half_scan: inet_get_local_port_range(net, &low, &high); @@ -220,8 +219,10 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) spin_lock_bh(&head->lock); inet_bind_bucket_for_each(tb, &head->chain) if (net_eq(ib_net(tb), net) && tb->port == port) { + if (hlist_empty(&tb->owners)) + goto success; if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) - goto tb_found; + goto success; goto next_port; } goto tb_not_found; @@ -256,23 +257,11 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) goto success; - if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) { - if ((reuse || -(tb->fastreuseport > 0 && - sk->sk_reuseport && - !rcu_access_pointer(sk->sk_reuseport_cb) && - uid_eq(tb->fastuid, uid))) && !snum && - --attempts >= 0) { - spin_unlock_bh(&head->lock); - goto again; - } + if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) goto fail_unlock; - } - if (!reuse) - tb->fastreuse = 0; - if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid)) - tb->fastreuseport = 0; - } else { + } +success: + if (!hlist_empty(&tb->owners)) { tb->fastreuse = reuse; if (sk->sk_reuseport) { tb->fastreuseport = 1; @@ -280,8 +269,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) } else { tb->fastreuseport = 0; } + } else { + if (!reuse) + tb->fastreuse = 0; + if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid)) + tb->fastreuseport = 0; } -success: if (!inet_csk(sk)->icsk_bind_hash) inet_bind_hash(sk, tb, port); WARN_ON(inet_csk(sk)->icsk_bind_hash != tb); -- 2.5.5
[PATCH 3/6 net-next] inet: kill smallest_size and smallest_port
In inet_csk_get_port we seem to be using smallest_port to figure out where the best place to look for a SO_REUSEPORT sk that matches with an existing set of SO_REUSEPORT's. However if we get to the logic if (smallest_size != -1) { port = smallest_port; goto have_port; } we will do a useless search, because we would have already done the inet_csk_bind_conflict for that port and it would have returned 1, otherwise we would have gone to found_tb and succeeded. Since this logic makes us do yet another trip through inet_csk_bind_conflict for a port we know won't work just delete this code and save us the time. Signed-off-by: Josef Bacik --- net/ipv4/inet_connection_sock.c | 26 -- 1 file changed, 4 insertions(+), 22 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index a1c9055..d352366 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -165,7 +165,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN; struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo; int ret = 1, attempts = 5, port = snum; - int smallest_size = -1, smallest_port; struct inet_bind_hashbucket *head; struct net *net = sock_net(sk); int i, low, high, attempt_half; @@ -175,7 +174,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) bool reuseport_ok = !!snum; if (port) { -have_port: head = &hinfo->bhash[inet_bhashfn(net, port, hinfo->bhash_size)]; spin_lock_bh(&head->lock); @@ -209,8 +207,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) * We do the opposite to not pollute connect() users. */ offset |= 1U; - smallest_size = -1; - smallest_port = low; /* avoid compiler warning */ other_parity_scan: port = low + offset; @@ -224,15 +220,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) spin_lock_bh(&head->lock); inet_bind_bucket_for_each(tb, &head->chain) if (net_eq(ib_net(tb), net) && tb->port == port) { - if (((tb->fastreuse > 0 && reuse) || -(tb->fastreuseport > 0 && - sk->sk_reuseport && - !rcu_access_pointer(sk->sk_reuseport_cb) && - uid_eq(tb->fastuid, uid))) && - (tb->num_owners < smallest_size || smallest_size == -1)) { - smallest_size = tb->num_owners; - smallest_port = port; - } if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) goto tb_found; goto next_port; @@ -243,10 +230,6 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) cond_resched(); } - if (smallest_size != -1) { - port = smallest_port; - goto have_port; - } offset--; if (!(offset & 1)) goto other_parity_scan; @@ -268,19 +251,18 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) if (sk->sk_reuse == SK_FORCE_REUSE) goto success; - if (((tb->fastreuse > 0 && reuse) || + if ((tb->fastreuse > 0 && reuse) || (tb->fastreuseport > 0 && !rcu_access_pointer(sk->sk_reuseport_cb) && - sk->sk_reuseport && uid_eq(tb->fastuid, uid))) && - smallest_size == -1) + sk->sk_reuseport && uid_eq(tb->fastuid, uid))) goto success; if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) { if ((reuse || (tb->fastreuseport > 0 && sk->sk_reuseport && !rcu_access_pointer(sk->sk_reuseport_cb) && - uid_eq(tb->fastuid, uid))) && - !snum && smallest_size != -1 && --attempts >= 0) { + uid_eq(tb->fastuid, uid))) && !snum && + --attempts >= 0) { spin_unlock_bh(&head->lock); goto again; } -- 2.5.5
[PATCH 1/6 net-next] inet: collapse ipv4/v6 rcv_saddr_equal functions into one
We pass these per-protocol equal functions around in various places, but we can just have one function that checks the sk->sk_family and then do the right comparison function. I've also changed the ipv4 version to not cast to inet_sock since it is unneeded. Signed-off-by: Josef Bacik --- include/net/addrconf.h | 4 +-- include/net/inet_hashtables.h| 5 +-- include/net/udp.h| 1 - net/ipv4/inet_connection_sock.c | 72 net/ipv4/inet_hashtables.c | 16 +++-- net/ipv4/udp.c | 58 +++- net/ipv6/inet6_connection_sock.c | 4 +-- net/ipv6/inet6_hashtables.c | 46 + net/ipv6/udp.c | 2 +- 9 files changed, 95 insertions(+), 113 deletions(-) diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 8f998af..17c6fd8 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -88,9 +88,7 @@ int __ipv6_get_lladdr(struct inet6_dev *idev, struct in6_addr *addr, u32 banned_flags); int ipv6_get_lladdr(struct net_device *dev, struct in6_addr *addr, u32 banned_flags); -int ipv4_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, -bool match_wildcard); -int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, +int inet_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, bool match_wildcard); void addrconf_join_solict(struct net_device *dev, const struct in6_addr *addr); void addrconf_leave_solict(struct inet6_dev *idev, const struct in6_addr *addr); diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 0574493..756ed16 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -203,10 +203,7 @@ void inet_hashinfo_init(struct inet_hashinfo *h); bool inet_ehash_insert(struct sock *sk, struct sock *osk); bool inet_ehash_nolisten(struct sock *sk, struct sock *osk); -int __inet_hash(struct sock *sk, struct sock *osk, - int (*saddr_same)(const struct sock *sk1, - const struct sock *sk2, - bool match_wildcard)); +int __inet_hash(struct sock *sk, struct sock *osk); int inet_hash(struct sock *sk); void inet_unhash(struct sock *sk); diff --git a/include/net/udp.h b/include/net/udp.h index 1661791..c9d8b8e 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -204,7 +204,6 @@ static inline void udp_lib_close(struct sock *sk, long timeout) } int udp_lib_get_port(struct sock *sk, unsigned short snum, -int (*)(const struct sock *, const struct sock *, bool), unsigned int hash2_nulladdr); u32 udp_flow_hashrnd(void); diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 19ea045..ba597cb 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -31,6 +31,78 @@ const char inet_csk_timer_bug_msg[] = "inet_csk BUG: unknown timer value\n"; EXPORT_SYMBOL(inet_csk_timer_bug_msg); #endif +#if IS_ENABLED(CONFIG_IPV6) +/* match_wildcard == true: IPV6_ADDR_ANY equals to any IPv6 addresses if IPv6 + * only, and any IPv4 addresses if not IPv6 only + * match_wildcard == false: addresses must be exactly the same, i.e. + * IPV6_ADDR_ANY only equals to IPV6_ADDR_ANY, + * and 0.0.0.0 equals to 0.0.0.0 only + */ +static int ipv6_rcv_saddr_equal(const struct sock *sk, const struct sock *sk2, + bool match_wildcard) +{ + const struct in6_addr *sk2_rcv_saddr6 = inet6_rcv_saddr(sk2); + int sk2_ipv6only = inet_v6_ipv6only(sk2); + int addr_type = ipv6_addr_type(&sk->sk_v6_rcv_saddr); + int addr_type2 = sk2_rcv_saddr6 ? ipv6_addr_type(sk2_rcv_saddr6) : IPV6_ADDR_MAPPED; + + /* if both are mapped, treat as IPv4 */ + if (addr_type == IPV6_ADDR_MAPPED && addr_type2 == IPV6_ADDR_MAPPED) { + if (!sk2_ipv6only) { + if (sk->sk_rcv_saddr == sk2->sk_rcv_saddr) + return 1; + if (!sk->sk_rcv_saddr || !sk2->sk_rcv_saddr) + return match_wildcard; + } + return 0; + } + + if (addr_type == IPV6_ADDR_ANY && addr_type2 == IPV6_ADDR_ANY) + return 1; + + if (addr_type2 == IPV6_ADDR_ANY && match_wildcard && + !(sk2_ipv6only && addr_type == IPV6_ADDR_MAPPED)) + return 1; + + if (addr_type == IPV6_ADDR_ANY && match_wildcard && + !(ipv6_only_sock(sk) && addr_type2 == IPV6_ADDR_MAPPED)) + return 1; + + if (sk2_rcv_saddr6 && + ipv6_addr_equal(&sk->sk_v6_rcv_saddr, sk2_rcv_saddr6)) + return 1; + + return 0; +} +#end
[PATCH 6/6 net-next] inet: reset tb->fastreuseport when adding a reuseport sk
If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and never set it again. Which means that in the future if we end up adding a bunch of reuseport sk's to that tb we'll have to do the expensive scan every time. Instead add a sock_common to the tb so we know what reuseport sk succeeded last. Once one sk has made it onto the list we know that there are no potential bind conflicts on the owners list that match that sk's rcv_addr. So copy the sk's common into our tb->fastsock and set tb->fastruseport to FASTREUSESOCK_STRICT so we know we have to do an extra check for subsequent reuseport sockets and skip the expensive bind conflict check. Signed-off-by: Josef Bacik --- include/net/inet_hashtables.h | 4 net/ipv4/inet_connection_sock.c | 53 + 2 files changed, 53 insertions(+), 4 deletions(-) diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h index 756ed16..4ccc18f 100644 --- a/include/net/inet_hashtables.h +++ b/include/net/inet_hashtables.h @@ -74,12 +74,16 @@ struct inet_ehash_bucket { * users logged onto your box, isn't it nice to know that new data * ports are created in O(1) time? I thought so. ;-) -DaveM */ +#define FASTREUSEPORT_ANY 1 +#define FASTREUSEPORT_STRICT 2 + struct inet_bind_bucket { possible_net_t ib_net; unsigned short port; signed char fastreuse; signed char fastreuseport; kuid_t fastuid; + struct sock_common fastsock; int num_owners; struct hlist_node node; struct hlist_head owners; diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 923bd60..814382f 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -236,6 +236,32 @@ inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int * return head; } +static inline int sk_reuseport_match(struct inet_bind_bucket *tb, +struct sock *sk) +{ + struct sock *sk2 = (struct sock *)&tb->fastsock; + kuid_t uid = sock_i_uid(sk); + + if (tb->fastreuseport <= 0) + return 0; + if (!sk->sk_reuseport) + return 0; + if (rcu_access_pointer(sk->sk_reuseport_cb)) + return 0; + if (!uid_eq(tb->fastuid, uid)) + return 0; + /* We only need to check the rcv_saddr if this tb was once marked +* without fastreuseport and then was reset, as we can only know that +* the fastsock has no potential bind conflicts with the rest of the +* possible socks on the owners list. +*/ + if (tb->fastreuseport == FASTREUSEPORT_ANY) + return 1; + if (!inet_rcv_saddr_equal(sk, sk2, true)) + return 0; + return 1; +} + /* Obtain a reference to a local port for the given sock, * if snum is zero it means select any available local port. * We try to allocate an odd port (and leave even ports for connect()) @@ -275,9 +301,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) goto success; if ((tb->fastreuse > 0 && reuse) || -(tb->fastreuseport > 0 && - !rcu_access_pointer(sk->sk_reuseport_cb) && - sk->sk_reuseport && uid_eq(tb->fastuid, uid))) + sk_reuseport_match(tb, sk)) goto success; if (inet_csk_bind_conflict(sk, tb, true, true)) goto fail_unlock; @@ -288,14 +312,35 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) if (sk->sk_reuseport) { tb->fastreuseport = 1; tb->fastuid = uid; + memcpy(&tb->fastsock, &sk->__sk_common, + sizeof(struct sock_common)); } else { tb->fastreuseport = 0; } } else { if (!reuse) tb->fastreuse = 0; - if (!sk->sk_reuseport || !uid_eq(tb->fastuid, uid)) + if (sk->sk_reuseport) { + /* We didn't match or we don't have fastreuseport set on +* the tb, but we have sk_reuseport set on this socket +* and we know that there are no bind conflicts with +* this socket in this tb, so reset our tb's reuseport +* settings so that any subsequent sockets that match +* our current socket will be put on the fast path. +* +* If we reset we need to set FASTREUSEPORT_STRICT so we +* do extra checking for all subsequent sk_reuseport +
[PATCH 2/6 net-next] inet: drop ->bind_conflict
The only difference between inet6_csk_bind_conflict and inet_csk_bind_conflict is how they check the rcv_saddr, so delete this call back and simply change inet_csk_bind_conflict to call inet_rcv_saddr_equal. Signed-off-by: Josef Bacik --- include/net/inet6_connection_sock.h | 5 - include/net/inet_connection_sock.h | 6 -- net/dccp/ipv4.c | 2 +- net/dccp/ipv6.c | 2 -- net/ipv4/inet_connection_sock.c | 22 +++- net/ipv4/tcp_ipv4.c | 2 +- net/ipv6/inet6_connection_sock.c| 40 - net/ipv6/tcp_ipv6.c | 2 -- 8 files changed, 9 insertions(+), 72 deletions(-) diff --git a/include/net/inet6_connection_sock.h b/include/net/inet6_connection_sock.h index 3212b39..8ec87b6 100644 --- a/include/net/inet6_connection_sock.h +++ b/include/net/inet6_connection_sock.h @@ -15,16 +15,11 @@ #include -struct inet_bind_bucket; struct request_sock; struct sk_buff; struct sock; struct sockaddr; -int inet6_csk_bind_conflict(const struct sock *sk, - const struct inet_bind_bucket *tb, bool relax, - bool soreuseport_ok); - struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct flowi6 *fl6, const struct request_sock *req, u8 proto); diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 85ee387..add75c7 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -62,9 +62,6 @@ struct inet_connection_sock_af_ops { char __user *optval, int __user *optlen); #endif void(*addr2sockaddr)(struct sock *sk, struct sockaddr *); - int (*bind_conflict)(const struct sock *sk, -const struct inet_bind_bucket *tb, -bool relax, bool soreuseport_ok); void(*mtu_reduced)(struct sock *sk); }; @@ -261,9 +258,6 @@ inet_csk_rto_backoff(const struct inet_connection_sock *icsk, struct sock *inet_csk_accept(struct sock *sk, int flags, int *err); -int inet_csk_bind_conflict(const struct sock *sk, - const struct inet_bind_bucket *tb, bool relax, - bool soreuseport_ok); int inet_csk_get_port(struct sock *sk, unsigned short snum); struct dst_entry *inet_csk_route_req(const struct sock *sk, struct flowi4 *fl4, diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c index d859a5c..ed6f99b 100644 --- a/net/dccp/ipv4.c +++ b/net/dccp/ipv4.c @@ -17,6 +17,7 @@ #include #include +#include #include #include #include @@ -904,7 +905,6 @@ static const struct inet_connection_sock_af_ops dccp_ipv4_af_ops = { .getsockopt= ip_getsockopt, .addr2sockaddr = inet_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in), - .bind_conflict = inet_csk_bind_conflict, #ifdef CONFIG_COMPAT .compat_setsockopt = compat_ip_setsockopt, .compat_getsockopt = compat_ip_getsockopt, diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c index adfc790..08bcdc3 100644 --- a/net/dccp/ipv6.c +++ b/net/dccp/ipv6.c @@ -937,7 +937,6 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_af_ops = { .getsockopt= ipv6_getsockopt, .addr2sockaddr = inet6_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in6), - .bind_conflict = inet6_csk_bind_conflict, #ifdef CONFIG_COMPAT .compat_setsockopt = compat_ipv6_setsockopt, .compat_getsockopt = compat_ipv6_getsockopt, @@ -958,7 +957,6 @@ static const struct inet_connection_sock_af_ops dccp_ipv6_mapped = { .getsockopt= ipv6_getsockopt, .addr2sockaddr = inet6_csk_addr2sockaddr, .sockaddr_len = sizeof(struct sockaddr_in6), - .bind_conflict = inet6_csk_bind_conflict, #ifdef CONFIG_COMPAT .compat_setsockopt = compat_ipv6_setsockopt, .compat_getsockopt = compat_ipv6_getsockopt, diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index ba597cb..a1c9055 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -116,9 +116,9 @@ void inet_get_local_port_range(struct net *net, int *low, int *high) } EXPORT_SYMBOL(inet_get_local_port_range); -int inet_csk_bind_conflict(const struct sock *sk, - const struct inet_bind_bucket *tb, bool relax, - bool reuseport_ok) +static int inet_csk_bind_conflict(const struct sock *sk, + const struct inet_bind_bucket *tb, + bool relax, bool reuseport_ok) { struct sock *sk2; bool reuse = sk->sk_reuse; @@ -134,7 +134,6 @@ int inet_csk_bind_conflict(const struct sock *sk, sk_for_each_bound(sk
[PATCH 5/6 net-next] inet: split inet_csk_get_port into two functions
inet_csk_get_port does two different things, it either scans for an open port, or it tries to see if the specified port is available for use. Since these two operations have different rules and are basically independent lets split them into two different functions to make them both more readable. Signed-off-by: Josef Bacik --- net/ipv4/inet_connection_sock.c | 66 +++-- 1 file changed, 44 insertions(+), 22 deletions(-) diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index f6a34bc..923bd60 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -156,33 +156,21 @@ static int inet_csk_bind_conflict(const struct sock *sk, return sk2 != NULL; } -/* Obtain a reference to a local port for the given sock, - * if snum is zero it means select any available local port. - * We try to allocate an odd port (and leave even ports for connect()) +/* + * Find an open port number for the socket. Returns with the + * inet_bind_hashbucket lock held. */ -int inet_csk_get_port(struct sock *sk, unsigned short snum) +static struct inet_bind_hashbucket * +inet_csk_find_open_port(struct sock *sk, struct inet_bind_bucket **tb_ret, int *port_ret) { - bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN; struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo; - int ret = 1, port = snum; + int port = 0; struct inet_bind_hashbucket *head; struct net *net = sock_net(sk); int i, low, high, attempt_half; struct inet_bind_bucket *tb; - kuid_t uid = sock_i_uid(sk); u32 remaining, offset; - bool reuseport_ok = !!snum; - if (port) { - head = &hinfo->bhash[inet_bhashfn(net, port, - hinfo->bhash_size)]; - spin_lock_bh(&head->lock); - inet_bind_bucket_for_each(tb, &head->chain) - if (net_eq(ib_net(tb), net) && tb->port == port) - goto tb_found; - - goto tb_not_found; - } attempt_half = (sk->sk_reuse == SK_CAN_REUSE) ? 1 : 0; other_half_scan: inet_get_local_port_range(net, &low, &high); @@ -221,11 +209,12 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) if (net_eq(ib_net(tb), net) && tb->port == port) { if (hlist_empty(&tb->owners)) goto success; - if (!inet_csk_bind_conflict(sk, tb, false, reuseport_ok)) + if (!inet_csk_bind_conflict(sk, tb, false, false)) goto success; goto next_port; } - goto tb_not_found; + tb = NULL; + goto success; next_port: spin_unlock_bh(&head->lock); cond_resched(); @@ -240,8 +229,41 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) attempt_half = 2; goto other_half_scan; } - return ret; + return NULL; +success: + *port_ret = port; + *tb_ret = tb; + return head; +} + +/* Obtain a reference to a local port for the given sock, + * if snum is zero it means select any available local port. + * We try to allocate an odd port (and leave even ports for connect()) + */ +int inet_csk_get_port(struct sock *sk, unsigned short snum) +{ + bool reuse = sk->sk_reuse && sk->sk_state != TCP_LISTEN; + struct inet_hashinfo *hinfo = sk->sk_prot->h.hashinfo; + int ret = 1, port = snum; + struct inet_bind_hashbucket *head; + struct net *net = sock_net(sk); + struct inet_bind_bucket *tb = NULL; + kuid_t uid = sock_i_uid(sk); + if (!port) { + head = inet_csk_find_open_port(sk, &tb, &port); + if (!head) + return ret; + if (!tb) + goto tb_not_found; + goto success; + } + head = &hinfo->bhash[inet_bhashfn(net, port, + hinfo->bhash_size)]; + spin_lock_bh(&head->lock); + inet_bind_bucket_for_each(tb, &head->chain) + if (net_eq(ib_net(tb), net) && tb->port == port) + goto tb_found; tb_not_found: tb = inet_bind_bucket_create(hinfo->bind_bucket_cachep, net, head, port); @@ -257,7 +279,7 @@ int inet_csk_get_port(struct sock *sk, unsigned short snum) !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport && uid_eq(tb->fastuid, uid))) goto success; - if (inet_csk_bind_conflict(sk, tb, true, reuseport_ok)) + if (inet_csk_bind_conflict(sk, tb, true, true))
[PATCH 0/6 net-next][V2] Rework inet_csk_get_port
V1->V2: -Added a new patch 'inet: collapse ipv4/v6 rcv_saddr_equal functions into one' at Hannes' suggestion. -Dropped ->bind_conflict and just use the new helper. -Fixed a compile bug from the original ->bind_conflict patch. The original description of the series follows = At some point recently the guys working on our load balancer added the ability to use SO_REUSEPORT. When they restarted their app with this option enabled they immediately hit a softlockup on what appeared to be the inet_bind_bucket->lock. Eventually what all of our debugging and discussion led us to was the fact that the application comes up without SO_REUSEPORT, shuts down which creates around 100k twsk's, and then comes up and tries to open a bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket owners list under the lock. Since this lock is needed for dealing with the twsk's and basically anything else related to connections we would softlockup, and sometimes not ever recover. To solve this problem I did what you see in Path 5/5. Once we have a SO_REUSEPORT socket on the tb->owners list we know that the socket has no conflicts with any of the other sockets on that list. So we can add a copy of the sock_common (really all we need is the recv_saddr but it seemed ugly to copy just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've copied the whole common) in order to check subsequent SO_REUSEPORT sockets. If they match the previous one then we can skip the expensive inet_csk_bind_conflict check. This is what eliminated the soft lockup that we were seeing. Patches 1-4 are cleanups and re-workings. For instance when we specify port == 0 we need to find an open port, but we would do two passes through inet_csk_bind_conflict every time we found a possible port. We would also keep track of the smallest_port value in order to try and use it if we found no port our first run through. This however made no sense as it would have had to fail the first pass through inet_csk_bind_conflict, so would not actually pass the second pass through either. Finally I split the function into two functions in order to make it easier to read and to distinguish between the two behaviors. I have tested this on one of our load balancing boxes during peak traffic and it hasn't fallen over. But this is not my area, so obviously feel free to point out where I'm being stupid and I'll get it fixed up and retested. Thanks, Josef
Re: [PATCH net-next 01/10] net: netcp: ethss: add support of subsystem register region regmap
On Tue, Dec 20, 2016 at 05:09:44PM -0500, Murali Karicheri wrote: > From: WingMan Kwok > > 10gbe phy driver needs to access the 10gbe subsystem control > register during phy initialization. To facilitate the shared > access of the subsystem register region between the 10gbe Ethernet > driver and the phy driver, this patch adds support of the > subsystem register region defined by a syscon node in the dts. > > Although there is no shared access to the gbe subsystem register > region, using syscon for that is for the sake of consistency. > > This change is backward compatible with previously released gbe > devicetree bindings. > > Signed-off-by: WingMan Kwok > Signed-off-by: Murali Karicheri > Signed-off-by: Sekhar Nori > --- > .../devicetree/bindings/net/keystone-netcp.txt | 16 ++- > drivers/net/ethernet/ti/netcp_ethss.c | 140 > + > 2 files changed, 127 insertions(+), 29 deletions(-) > > diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt > b/Documentation/devicetree/bindings/net/keystone-netcp.txt > index 04ba1dc..0854a73 100644 > --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt > +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt > @@ -72,20 +72,24 @@ Required properties: > "ti,netcp-gbe-2" for 1GbE N NetCP 1.5 (N=2) > "ti,netcp-xgbe" for 10 GbE > > +- syscon-subsys: phandle to syscon node of the switch > + subsystem registers. > + > - reg: register location and the size for the following > register > regions in the specified order. > - switch subsystem registers > + - sgmii module registers This needs to go on the end of the list. Otherwise, it is not backwards compatible. > - sgmii port3/4 module registers (only for NetCP 1.4) > - switch module registers > - serdes registers (only for 10G) > > NetCP 1.4 ethss, here is the order > - index #0 - switch subsystem registers > + index #0 - sgmii module registers > index #1 - sgmii port3/4 module registers > index #2 - switch module registers > > NetCP 1.5 ethss 9 port, 5 port and 2 port > - index #0 - switch subsystem registers > + index #0 - sgmii module registers > index #1 - switch module registers > index #2 - serdes registers > > @@ -145,6 +149,11 @@ Optional properties: > > Example binding: > > +gbe_subsys: subsys@209 { > + compatible = "syscon"; > + reg = <0x0209 0x100>; > +}; > + > netcp: netcp@200 { > reg = <0x2620110 0x8>; > reg-names = "efuse"; > @@ -163,7 +172,8 @@ netcp: netcp@200 { > ranges; > gbe@9 { > label = "netcp-gbe"; > - reg = <0x9 0x300>, <0x90400 0x400>, <0x90800 0x700>; > + syscon-subsys = <&gbe_subsys>; > + reg = <0x90100 0x200>, <0x90400 0x200>, <0x90800 0x700>; > /* enable-ale; */ > tx-queue = <648>; > tx-channel = <8>;
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
> I do tend to like Ted's version in which we use batched > get_random_bytes() output. If it's fast enough, it's simpler and lets > us get the full strength of a CSPRNG. With the ChaCha20 generator, that's fine, although note that this abandons anti-backtracking entirely. It also takes locks, something the previous get_random_int code path avoided. Do we need to audit the call sites to ensure that's safe? And there is the issue that the existing callers assume that there's a fixed cost per word. A good half of get_random_long calls are followed by "& ~PAGE_MASK" to extract the low 12 bits. Or "& ((1ul << mmap_rnd_bits) - 1)" to extract the low 28. If we have a buffer we're going to have to pay to refill, it would be nice to use less than 8 bytes to satisfy those. But that can be a followup patch. I'm thinking unsigned long get_random_bits(unsigned bits) E.g. get_random_bits(PAGE_SHIFT), get_random_bits(mmap_rnd_bits), u32 imm_rnd = get_random_bits(32) unsigned get_random_mod(unsigned modulus) E.g. get_random_mod(hole) & ~(alignment - 1); get_random_mod(port_scan_backoff) (Althogh probably drivers/s390/scsi/zfcp_fc.c should be changed to prandom.) with, until the audit is completed: #define get_random_int() get_random_bits(32) #define get_random_long() get_random_bits(BITS_PER_LONG) > It could only mix the output back in every two calls, in which case > you can backtrack up to one call but you need to do 2^128 work to > backtrack farther. But yes, this is getting excessively complicated. No, if you're willing to accept limited backtrack, this is a perfectly acceptable solution, and not too complicated. You could do it phase-less if you like; store the previous output, then after generating the new one, mix in both. Then overwrite the previous output. (But doing two rounds of a crypto primtive to avoid one conditional jump is stupid, so forget that.) >> Hmm, interesting. Although, for ASLR, we could use get_random_bytes() >> directly and be done with it. It won't be a bottleneck. Isn't that what you already suggested? I don't mind fewer primtives; I got a bit fixated on "Replace MD5 with SipHash". It's just the locking that I want to check isn't a problem.
Re: [PATCH iproute2 v3 4/4] ifstat: Add "sw only" extended statistics to ifstat
On 12/22/16, 8:23 AM, Nogah Frankel wrote: > Add support for extended statistics of SW only type, for counting only the > packets that went via the cpu. (useful for systems with forward > offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS > and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT. > > It is under the name 'software' > (or any shorten of it as 'soft' or simply 's') > > For example: > ifstat -x s > > Nogah, can we keep the option names closer to the attribute names ? That would avoid some confusion and help with the follow-up stats. ifstat -x offload cpu or ifstat -x cpu for others it would be: ifstat -x link [vlan|igmp] ifstat -x vlan ifstat -x igmp ifstat -x lacp and so on... thanks!
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On 22.12.2016 20:56, Andy Lutomirski wrote: > It's also not quite clear to me why userspace needs to be able to > calculate the digest on its own. A bpf(BPF_CALC_PROGRAM_DIGEST) > command that takes a BPF program as input and hashes it would seem to > serve the same purpose, and that would allow the kernel to key the > digest and change the algorithm down the road without breaking things. I think that people expect digests of BPF programs to be stable over time and reboots.
Re: [PATCH 1/5 net-next] inet: replace ->bind_conflict with ->rcv_saddr_equal
On 21.12.2016 16:16, Josef Bacik wrote: > On Wed, Dec 21, 2016 at 10:06 AM, Hannes Frederic Sowa > wrote: >> On Tue, 2016-12-20 at 15:07 -0500, Josef Bacik wrote: >>> The only difference between inet6_csk_bind_conflict and >>> inet_csk_bind_conflict >>> is how they check the rcv_saddr. Since we want to be able to check >>> the saddr in >>> other places just drop the protocol specific ->bind_conflict and >>> replace it with >>> ->rcv_saddr_equal, then make inet_csk_bind_conflict the one true >>> bind conflict >>> function. >>> >>> Signed-off-by: Josef Bacik >>> >> >> >> >>> --- >>> include/net/inet6_connection_sock.h | 5 - >>> include/net/inet_connection_sock.h | 9 +++-- >>> net/dccp/ipv4.c | 3 ++- >>> net/dccp/ipv6.c | 2 +- >>> net/ipv4/inet_connection_sock.c | 22 +++- >>> net/ipv4/tcp_ipv4.c | 3 ++- >>> net/ipv4/udp.c | 1 + >>> net/ipv6/inet6_connection_sock.c| 40 >>> - >>> net/ipv6/tcp_ipv6.c | 4 ++-- >>> 9 files changed, 18 insertions(+), 71 deletions(-) >>> >>> diff --git a/include/net/inet6_connection_sock.h >>> b/include/net/inet6_connection_sock.h >>> index 3212b39..8ec87b6 100644 >>> --- a/include/net/inet6_connection_sock.h >>> +++ b/include/net/inet6_connection_sock.h >>> @@ -15,16 +15,11 @@ >>> >>> #include >>> >>> -struct inet_bind_bucket; >>> struct request_sock; >>> struct sk_buff; >>> struct sock; >>> struct sockaddr; >>> >>> -int inet6_csk_bind_conflict(const struct sock *sk, >>> -const struct inet_bind_bucket *tb, bool relax, >>> -bool soreuseport_ok); >>> - >>> struct dst_entry *inet6_csk_route_req(const struct sock *sk, struct >>> flowi6 *fl6, >>> const struct request_sock *req, u8 proto); >>> >>> diff --git a/include/net/inet_connection_sock.h >>> b/include/net/inet_connection_sock.h >>> index ec0479a..9cd43c5 100644 >>> --- a/include/net/inet_connection_sock.h >>> +++ b/include/net/inet_connection_sock.h >>> @@ -62,9 +62,9 @@ struct inet_connection_sock_af_ops { >>> char __user *optval, int __user *optlen); >>> #endif >>> void(*addr2sockaddr)(struct sock *sk, struct sockaddr *); >>> -int(*bind_conflict)(const struct sock *sk, >>> - const struct inet_bind_bucket *tb, >>> - bool relax, bool soreuseport_ok); >>> +int (*rcv_saddr_equal)(const struct sock *sk1, >>> + const struct sock *sk2, >>> + bool match_wildcard); >>> void(*mtu_reduced)(struct sock *sk); >>> }; >>> >>> >> >> The patch looks as a nice code cleanup already! >> >> Have you looked if we can simply have one rcv_saddr_equal for both ipv4 >> and ipv6 that e.g. uses sk->sk_family instead of function pointers? >> This could give us even more possibilities to remove some indirect >> functions calls and thus might relieve some cycles? > > I was going to do that but I'm not familiar enough with how sockets work > to be comfortable. My main concern is we have the ipv6_only() check on > a socket, which seems to indicate to me that you can have a socket that > can do both ipv4/ipv6, so what if we're specifically going through the > ipv6 code, but we aren't ipv6_only() and we end up doing the ipv4 > address compare when we really need to do the ipv6 address compare? If > this can't happen (and honestly as I type it out it sounds crazier than > it did in my head) then yeah I'll totally do that as well and we can > just have a global function without the protocol specific callbacks, but > I need you or somebody to tell me I'm crazy and that it would be ok to > have it all in one function. Thanks, IPv6 sockets can do IPv4 via mapped addresses. The other way around doesn't work, there are no IPv4 sockets that can speak IPv6. The ipv6_only flags in IPv4 sockets should always stay 0 but they need to be evaluated from there side for possible port conflicts. My idea is to use sk->sk_family, which is in sync with icsk->icsk_af_ops, which you use for the function pointer lookup, to switch between those functions. Looking through a lot of callback, I don't see this assumption violated so far. This would also solve the problem which David described, your search would be free of those indirect jumps. Bye, Hannes
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 11:34 AM, Alexei Starovoitov wrote: > On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski wrote: >> On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa >> wrote: >>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: >>> >>> We don't prevent ebpf programs being loaded based on the digest but >>> just to uniquely identify loaded programs from user space and match up >>> with their source. >> >> The commit log talks about using the hash to see if the program has >> already been compiled and JITted. If that's done, then a collision >> will directly cause the kernel to malfunction. > > Andy, please read the code. > we could have used jhash there just as well. > Collisions are fine. There's relevant in the code to read yet AFAICS. The code exports it via fdinfo, and userspace is expected to do something with it. The commit message says: When programs are pinned and retrieved by an ELF loader, the loader can check the program's digest through fdinfo and compare it against one that was generated over the ELF file's program section to see if the program needs to be reloaded. I assume this means that a userspace component is expected to compare the digest of a loaded program to a digest of a program it wants to load and to use the result of the comparison to decide whether the programs are the same. If that's indeed the case (and it sure sounds like it, and I fully expect CRIU to do very similar things when support is added), then malicious collisions do matter. It's also not quite clear to me why userspace needs to be able to calculate the digest on its own. A bpf(BPF_CALC_PROGRAM_DIGEST) command that takes a BPF program as input and hashes it would seem to serve the same purpose, and that would allow the kernel to key the digest and change the algorithm down the road without breaking things. Regardless, adding a new hash algorithm that is almost-but-not-quite SHA-1 and making it a stable interface to userspace is not a good thing. --Andy
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, Dec 22, 2016 at 07:08:37PM +0100, Hannes Frederic Sowa wrote: > I wasn't concerned about performance but more about DoS resilience. I > wonder how safe half md4 actually is in terms of allowing users to > generate long hash chains in the filesystem (in terms of length > extension attacks against half_md4). > > In ext4, is it actually possible that a "disrupter" learns about the > hashing secret in the way how the inodes are returned during getdents? They'd have to be a local user, who can execute telldir(3) --- in which case there are plenty of other denial of service attacks one could carry out that would be far more devastating. It might also be an issue if the file system is exposed via NFS, but again, there are so many other ways an attacker could DoS a NFS server that I don't think of it as a much of a concern. Keep in mind that worst someone can do is cause directory inserts to fail with an ENOSPC, and there are plenty of other ways of doing that --- such as consuming all of the blocks and inodes in the file system, for example. So it's a threat, but not a high priority one as far as I'm concerned. And if this was a problem in actual practice, users could switch to the TEA based hash, which should be far harder to attack, and available today. - Ted
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
On Thu, Dec 22, 2016 at 11:24 AM, George Spelvin wrote: >> Having slept on this, I like it less. The problem is that a >> backtracking attacker doesn't just learn H(random seed || entropy_0 || >> secret || ...) -- they learn the internal state of the hash function >> that generates that value. This probably breaks any attempt to apply >> security properties of the hash function. For example, the internal >> state could easily contain a whole bunch of prior outputs it in >> verbatim. > > The problem is, anti-backtracing is in severe tension with your desire > to use unmodified SipHash. > > First of all, I'd like to repeat that it isn't a design goal of the current > generator and isn't necessary. Agreed. > Now, the main point: it's not likely to be solvable. > > The problem with unmodified SipHash is that is has only 64 bits of > output. No mix-back mechanism can get around the fundamental problem > that that's too small to prevent a brute-force guessing attack. You need > wider mix-back. And getting more output from unmodified SipHash requires > more finalization rounds, which is expensive. It could only mix the output back in every two calls, in which case you can backtrack up to one call but you need to do 2^128 work to backtrack farther. But yes, this is getting excessively complicated. > Finally, your discomfort about an attacker learning the internal state... > if an attacker knows the key and the input, they can construct the > internal state. Yes, we could discard the internal state and construct > a fresh one on the next call to get_random_int, but what are you going > to key it with? What are you going to feed it? What keeps *that* > internal state any more secret from an attacker than one that's explicitly > stored? I do tend to like Ted's version in which we use batched get_random_bytes() output. If it's fast enough, it's simpler and lets us get the full strength of a CSPRNG. (Aside: some day I want to move all that code from drivers/ to lib/ and teach it to be buildable in userspace, too, so it's easy to play with it, feed it test vectors, confirm that it's equivalent to a reference implementation, write up the actual design and try to get real cryptographers to analyze it, etc.) > For example, clearly stating the concern over starting new processes > with predictable layout, and the limits on the fresh entropy supply, > has made me realize that there *is* a possible source: each exec() > is passed 128 bits from get_random_bytes in the AT_RANDOM element of > its auxv. Since get_random_int() accesses "current" anyway, we could > store some seed material there rather than using "pid". While this is > not fresh for each call to get_random_int, it *is* fresh for each new > address space whose layout is being randomized. Hmm, interesting. Although, for ASLR, we could use get_random_bytes() directly and be done with it. It won't be a bottleneck.
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 9:25 AM, Andy Lutomirski wrote: > On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa > wrote: >> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: >> >> We don't prevent ebpf programs being loaded based on the digest but >> just to uniquely identify loaded programs from user space and match up >> with their source. > > The commit log talks about using the hash to see if the program has > already been compiled and JITted. If that's done, then a collision > will directly cause the kernel to malfunction. Andy, please read the code. we could have used jhash there just as well. Collisions are fine.
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
> Having slept on this, I like it less. The problem is that a > backtracking attacker doesn't just learn H(random seed || entropy_0 || > secret || ...) -- they learn the internal state of the hash function > that generates that value. This probably breaks any attempt to apply > security properties of the hash function. For example, the internal > state could easily contain a whole bunch of prior outputs it in > verbatim. The problem is, anti-backtracing is in severe tension with your desire to use unmodified SipHash. First of all, I'd like to repeat that it isn't a design goal of the current generator and isn't necessary. The current generator just returns hash[0] from the MD5 state, then leaves the state stored. The fact that it conceals earlier outputs is an accident of the Davies-Meyer structure of md5_transform. It isn't necessary, because every secret generated is stored unencrypted for as long as it's of value. A few values are used for retransmit backoffs and random MAC addresses. Those are revealed to the world as soon as they're used. Most values are used for ASLR. These address are of interest to an attacker trying to mount a buffer-overflow attack, but that only lasts as long as the process is running to receive buffers. After the process exits, knowledge of its layout is worthless. And this is stored as long as it's running in kernel-accessible data, so storing a *second* copy in less conveniently kernel-accessible data (the RNG state) doesn't make the situation any worse. In addition to the above, if you're assuming a state capture, then since we have (for necessary efficieny reasons) a negligible about of fresh entropy, an attacker has the secret key and can predict *future* outputs very easily. Given that information, an attacker doesn't need to learn the layout of vulnerable server X. If they have a buffer overflow, they can crash the current instance and wait for a fresh image to be started (with a known address space) to launch their attack at. Kernel state capture attacks are a very unlikely attack, mostly because it's a narrow target a hair's breadth away from the much more interesting outright kernel compromise (attacker gains write access as well as read) which renders all this fancy cryptanaysis moot. Now, the main point: it's not likely to be solvable. The problem with unmodified SipHash is that is has only 64 bits of output. No mix-back mechanism can get around the fundamental problem that that's too small to prevent a brute-force guessing attack. You need wider mix-back. And getting more output from unmodified SipHash requires more finalization rounds, which is expensive. (Aside: 64 bits does have the advantage that it can't be brute-forced on the attacked machine. It must be exfiltrated to the attacker, and the solution returned to the attack code. But doing this is getting easier all the time.) Adding antibacktracking to SipHash is trivial: just add a Davies-Meyer structure around its internal state. Remember the internal state before hashing in the entropy and secret, generate the output, then add the previous and final states together for storage. This is a standard textbook construction, very cheap, and doesn't touch the compression function which is the target of analysis and attacks, but it requires poking around in SipHash's internal state. (And people who read the textbooks without understanding them will get upset because the textbooks all talk about using this construction with block ciphers, and SipHash's compression function is not advertised as a block cipher.) Alternative designs exist; you could extract additional output from earlier rounds of SipHash, using the duplex sponge construction you mentioned earlier. That output would be used for mixback purposes *only*, so wouldn't affect the security proof of the "primary" output. But this is also getting creative with SipHash's internals. Now, you could use a completely *different* cryptographic primitive to enforce one-way-ness, and apply SipHash as a strong output transform, but that doesn't feel like good design, and is probably more expensive. Finally, your discomfort about an attacker learning the internal state... if an attacker knows the key and the input, they can construct the internal state. Yes, we could discard the internal state and construct a fresh one on the next call to get_random_int, but what are you going to key it with? What are you going to feed it? What keeps *that* internal state any more secret from an attacker than one that's explicitly stored? Keeping the internal state around is a cacheing optimization, that's all. *If* you're assuming a state capture, the only thing secret from the attacker is any fresh entropy collected between the time of capture and the time of generation. Due to mandatory efficiency requirements, this is very small. I really think you're wishing for the impossible here. A final note: although I'm disagreeing with you,
Re: [PATCH iproute2 0/2] Fix the usage of TC tunnel key
On Thu, 22 Dec 2016 10:14:39 +0200 Hadar Hen Zion wrote: > Add dest UDP port parameter to the usage of tc tunnle key action and > classifcation. > > > Hadar Hen Zion (2): > tc/cls_flower: Add to the usage encapsulation dest UDP port > tc/m_tunnel_key: Add to the usage encapsulation dest UDP port > > tc/f_flower.c | 5 +++-- > tc/m_tunnel_key.c | 2 +- > 2 files changed, 4 insertions(+), 3 deletions(-) > Looks good, both applied
Re: [PATCH net] net, sched: fix soft lockup in tc_classify
On Wed, Dec 21, 2016 at 1:07 PM, Daniel Borkmann wrote: > > Ok, you mean for net. In that case I prefer the smaller sized fix to be > honest. It also covers everything from the point where we fetch the chain > via cops->tcf_chain() to the end of the function, which is where most of > the complexity resides, and only the two mentioned commits do the relock, I really wish the problem is only about relocking, but look at the code, the deeper reason why we have this bug is the complexity of the logic inside tc_ctl_tfilter(): 1) the replay logic is hard, we have to make it idempotent; 2) the request logic itself is hard, because of tc filter design and implementation. This is why I worry more than just relocking. > so as a fix I think it's fine as-is. As mentioned, if there's need to > refactor tc_ctl_tfilter() net-next would be better, imho. Refactor is a too strong word, just moving the replay out is not a refactor. The end result will be still smaller than your commit d936377414fadbafb4, which is already backported to stable.
Re: [PATCH iproute2 v3 2/4] ifstat: Add extended statistics to ifstat
On Thu, 22 Dec 2016 18:23:13 +0200 Nogah Frankel wrote: On Thu, 22 Dec 2016 18:23:13 +0200 Nogah Frankel wrote: > } > @@ -691,18 +804,22 @@ static const struct option longopts[] = { > { "interval", 1, 0, 't' }, > { "version", 0, 0, 'V' }, > { "zeros", 0, 0, 'z' }, > + { "extended", 1, 0, 'x'}, > { 0 } > }; > > + > int main(int argc, char *argv[]) You let extra whitespace changes creep in. > + case 'x': > + is_extended = true; > + memset(stats_type, 0, 64); > + strncpy(stats_type, optarg, 63); > + break; This seems like doing this either the paranoid or hard way. Why not: const char *stats_type = NULL; ... case 'x': stats_type = optarg; break; ... if (stats_type) snprintf(hist_name, sizeof(hist_name), "%s/.%s_ifstat.u%d", P_tmpdir, stats_type, getuid()); else snprintf(hist_name, sizeof(hist_name), "%s/.ifstat.u%d", P_tmpdir, getuid()); Since: 1) optarg points to area in argv that is persistent (avoid copy) 2) don't need is_extended flag value then Please cleanup and resubmit.
[PATCH] tc: add missing limits.h header
This fixes under musl build issues like: f_matchall.c: In function ‘matchall_parse_opt’: f_matchall.c:48:12: error: ‘LONG_MIN’ undeclared (first use in this function) if (h == LONG_MIN || h == LONG_MAX) { ^ f_matchall.c:48:12: note: each undeclared identifier is reported only once for each function it appears in f_matchall.c:48:29: error: ‘LONG_MAX’ undeclared (first use in this function) if (h == LONG_MIN || h == LONG_MAX) { ^ Signed-off-by: Baruch Siach --- tc/tc_util.h | 1 + 1 file changed, 1 insertion(+) diff --git a/tc/tc_util.h b/tc/tc_util.h index f198a4ad5554..4db26c6d5e25 100644 --- a/tc/tc_util.h +++ b/tc/tc_util.h @@ -2,6 +2,7 @@ #define _TC_UTIL_H_ 1 #define MAX_MSG 16384 +#include #include #include #include -- 2.11.0
Re: [PATCH 1/3] NFC: trf7970a: add device tree option for 27MHz clock
On Mon, Dec 19, 2016 at 5:23 PM, Geoff Lansberry wrote: > I can make that change, however, I worry that it may be a bit > misleading, since there are only two supported clock frequencies, but > a number like that to me implies that it could be set to any number > you want. I'm new at this, and so I'll go ahead and change it as you > request, but I'd like to hear your thoughts on my concern. Then the binding doc just needs to state what are the 2 valid frequencies. Rob
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 5:59 PM, Hannes Frederic Sowa wrote: > We don't prevent ebpf programs being loaded based on the digest but > just to uniquely identify loaded programs from user space and match up > with their source. Okay, so in that case, a weak hashing function like SHA1 could result in a real vulnerability. Therefore, this SHA1 stuff needs to be reverted immediately, pending a different implementation. If this has ever shipped in a kernel version, it could even deserve a CVE. No SHA1! > The hashing is not a proper sha1 neither, unfortunately. I think that > is why it will have a custom implementation in iproute2? Jeepers creepers. So for some ungodly reason, LKML has invented yet another homebrewed crypto primitive. This story really gets more horrifying every day. No bueno. So yea, let's revert and re-commit (repeal and replace? just kidding...). Out with SHA-1, in with Blake2 or SHA2. Jason
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, Dec 22, 2016 at 7:08 PM, Hannes Frederic Sowa wrote: > I wasn't concerned about performance but more about DoS resilience. I > wonder how safe half md4 actually is in terms of allowing users to > generate long hash chains in the filesystem (in terms of length > extension attacks against half_md4). AFAIK, this is a real vulnerability that needs to be addressed. Judging by Ted's inquiry about my siphash testing suite, I assume he's probably tinkering around with it as we speak. :) Meanwhile I've separated things into several trees: 1. chacha20 rng, already submitted: https://git.zx2c4.com/linux-dev/log/?h=random-next 2. md5 cleanup, not yet submitted: https://git.zx2c4.com/linux-dev/log/?h=md5-cleanup 3. md4 cleanup, already submitted: https://git.zx2c4.com/linux-dev/log/?h=ext4-next-md4-cleanup 4. siphash and networking, not yet submitted as a x/4 series: https://git.zx2c4.com/linux-dev/log/?h=net-next-siphash I'll submit (4) in a couple of days, waiting for any comments on the existing patch-set. Jason
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On 22.12.2016 16:54, Theodore Ts'o wrote: > On Thu, Dec 22, 2016 at 02:10:33PM +0100, Jason A. Donenfeld wrote: >> On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa >> wrote: >>> following up on what appears to be a random subject: ;) >>> >>> IIRC, ext4 code by default still uses half_md4 for hashing of filenames >>> in the htree. siphash seems to fit this use case pretty good. >> >> I saw this too. I'll try to address it in v8 of this series. > > This is a separate issue, and this series is getting a bit too > complex. So I'd suggest pushing this off to a separate change. > > Changing the htree hash algorithm is an on-disk format change, and so > we couldn't roll it out until e2fsprogs gets updated and rolled out > pretty broadley. In fact George sent me patches to add siphash as a > hash algorithm for htree a while back (for both the kernel and > e2fsprogs), but I never got around to testing and applying them, > mainly because while it's technically faster, I had other higher > priority issues to work on --- and see previous comments regarding > pixel peeping. Improving the hash algorithm by tens or even hundreds > of nanoseconds isn't really going to matter since we only do a htree > lookup on a file creation or cold cache lookup, and the SSD or HDD I/O > times will dominate. And from the power perspective, saving > microwatts of CPU power isn't going to matter if you're going to be > spinning up the storage device I wasn't concerned about performance but more about DoS resilience. I wonder how safe half md4 actually is in terms of allowing users to generate long hash chains in the filesystem (in terms of length extension attacks against half_md4). In ext4, is it actually possible that a "disrupter" learns about the hashing secret in the way how the inodes are returned during getdents? Thanks, Hannes
Re: [PATCH net-next] ixgbevf: fix 'Etherleak' in ixgbevf
Yes that is much more helpful. So looking at it things are being padded but the last 4 bytes always have this extra data in them. I've been trying to recreate the issue on an 82599 with an SR-IOV VF and I haven't been having much luck reproducing the problem. In your test environment is the 82599 connected directly to the Windows machine or are there any switches/routers/gateways/tunnels/vlans in between? I've tried several iterations but with the 82599 connected directly to another NIC I have I am not able to get it to produce the garbage padding you are seeing. It makes me wonder if there might be something that is manipulating the packets in between the two systems. For example if there was a VLAN being associated with the VF that is later stripped and then the packet handed raw to the test system it might explain what is introducing the extra padding and reason for pulling in the CRC, and your patch would mask the issue since it would push the minimum frame size with a VLAN to 68 instead of 64. - Alex On Wed, Dec 21, 2016 at 6:00 PM, Kefeng Wang wrote: > > > On 2016/12/21 10:20, Alexander Duyck wrote: >> I find it curious that only the last 4 bytes have data in them. I'm >> wondering if the NIC/driver in the Windows/Nessus system is >> interpreting the 4 byte CRC on the end of the frame as padding instead >> of stripping it. >> >> Is there any chance you could capture the entire frame instead of just >> the padding? Maybe you could run something like wireshark without >> enabling promiscuous mode on the VF and capture the frames it is >> trying to send and receive. What I want to verify is what the actual >> amount of padding is that is needed to get to 60 bytes and where the >> CRC should start. >> >> - Alex > > Here is the verbose output, is this useful? > Or we will try according to your advice, thanks, > > D:\Program Files\Tenable\Nessus>nasl.exe -aX -t 192.169.0.151 etherleak.nasl > -- > ---[ ICMP ]--- > 0x00: 45 00 00 1D 20 81 00 00 40 01 D7 F3 C0 A9 00 97E... ...@... > 0x10: C0 A9 00 82 00 00 87 FD 00 01 00 01 78 00 00 00x... > 0x20: 00 00 00 00 00 00 00 00 00 00 98 E4 75 DF u. > -- > ---[ ICMP ]--- > 0x00: 45 00 00 1D 20 85 00 00 40 01 D7 EF C0 A9 00 97E... ...@... > 0x10: C0 A9 00 82 00 00 87 FD 00 01 00 01 78 00 00 00x... > 0x20: 00 00 00 00 00 00 00 00 00 00 FB DA F8 13 .. > ---[ ether1 ]--- > 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 98 E4 75...u > 0x10: DF . > ---[ ether2 ]--- > 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 FB DA F8 > 0x10: 13 . > > Padding observed in one frame : > > 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 98 E4 75...u > 0x10: DF . > > Padding observed in another frame : > > 0x00: 00 00 00 00 00 00 00 00 00 00 00 00 00 FB DA F8 > 0x10: 13 > >
Re: [PATCH net] net, sched: fix soft lockup in tc_classify
On 16-12-22 08:53 AM, David Miller wrote: > From: Daniel Borkmann > Date: Wed, 21 Dec 2016 22:07:48 +0100 > >> Ok, you mean for net. In that case I prefer the smaller sized fix to >> be honest. It also covers everything from the point where we fetch >> the chain via cops->tcf_chain() to the end of the function, which is >> where most of the complexity resides, and only the two mentioned >> commits do the relock, so as a fix I think it's fine as-is. As >> mentioned, if there's need to refactor tc_ctl_tfilter() net-next >> would be better, imho. > > Please, can you two work towards an agreement as to what fix should > go in at this time? > > Thanks. > Thanks for fixing this! I have a slight preference to push this patch into net as its been tested already by Shahar and is not particularly invasive. Then for net-next do a larger cleanup to get rid of some of the complexity per Cong's suggestion. .John
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, 2016-12-22 at 09:25 -0800, Andy Lutomirski wrote: > On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa > wrote: > > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: > > > > > > You mean: > > > > > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae > > > Author: Daniel Borkmann > > > Date: Sun Dec 4 23:19:41 2016 +0100 > > > > > > bpf: add prog_digest and expose it via fdinfo/netlink > > > > > > > > > Yes, please! This actually matters for security -- imagine a > > > malicious program brute-forcing a collision so that it gets loaded > > > wrong. And this is IMO a use case for SHA-256 or SHA-512/256 > > > (preferably the latter). Speed basically doesn't matter here and > > > Blake2 is both less stable (didn't they slightly change it recently?) > > > and much less well studied. > > > > We don't prevent ebpf programs being loaded based on the digest but > > just to uniquely identify loaded programs from user space and match up > > with their source. > > The commit log talks about using the hash to see if the program has > already been compiled and JITted. If that's done, then a collision > will directly cause the kernel to malfunction. Yeah, it still shouldn't crash the kernel but it could cause malfunctions because assumptions are not met from user space thus it could act in a strange way: My personal biggest concern is that users of this API will at some point in time assume this digist is unique (as a key itself for a hashtables f.e.), while it is actually not (and not enforced so by the kernel). If you can get an unpriv ebpf program inserted to the kernel with the same weak hash, a controller daemon could pick it up and bind it to another ebpf hook, probably outside of the unpriv realm the user was in before. Only the sorting matters, which might be unstable and is not guaranteed by anything in most hash table based data structures. The API seems flawed to me. > > > My inclination would have been to seed them with something that isn't > > > exposed to userspace for the precise reason that it would prevent user > > > code from making assumptions about what's in the hash. But if there's > > > a use case for why user code needs to be able to calculate the hash on > > > its own, then that's fine. But perhaps the actual fdinfo string > > > should be "sha256:abcd1234..." to give some flexibility down the road. To add to this, I am very much in favor of that. Right now it doesn't have a name because it is a custom algorithm. ;) > > > > > > Also: > > > > > > + result = (__force __be32 *)fp->digest; > > > + for (i = 0; i < SHA_DIGEST_WORDS; i++) > > > + result[i] = cpu_to_be32(fp->digest[i]); > > > > > > Everyone, please, please, please don't open-code crypto primitives. > > > Is this and the code above it even correct? It might be but on a very > > > brief glance it looks wrong to me. If you're doing this to avoid > > > depending on crypto, then fix crypto so you can pull in the algorithm > > > without pulling in the whole crypto core. > > > > The hashing is not a proper sha1 neither, unfortunately. I think that > > is why it will have a custom implementation in iproute2? > > Putting on crypto hat: > > NAK NAK NAK NAK NAK. "The Linux kernel invented a new primitive in > 2016 when people know better and is going to handle it by porting that > new primitive to userspace" is not a particularly good argument. > > Okay, crypto hack back off. > > > > > I wondered if bpf program loading should have used the module loading > > infrastructure from the beginning... > > That would be way too complicated and would be nasty for the unprivileged > cases. I was more or less just thinking about using the syscalls and user space representation not the generic infrastructure, as it is anyway too much concerned with real kernel modules (would probably also solve your cgroup-ebpf thread, as it can be passed by unique name or name and hash ;) ). Anyway... > > > At the very least, there should be a separate function that calculates > > > the hash of a buffer and that function should explicitly run itself > > > against test vectors of various lengths to make sure that it's > > > calculating what it claims to be calculating. And it doesn't look > > > like the userspace code has landed, so, if this thing isn't > > > calculating SHA1 correctly, it's plausible that no one has noticed. > > > > I hope this was known from the beginning, this is not sha1 unfortunately. > > > > But ebpf elf programs also need preprocessing to get rid of some > > embedded load-depending data, so maybe it was considered to be just > > enough? > > I suspect it was actually an accident. Maybe, I don't know. Bye, Hannes
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 8:59 AM, Hannes Frederic Sowa wrote: > On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: >> On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa >> wrote: >> > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote: >> > > Hi Hannes, >> > > >> > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa >> > > wrote: >> > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI. >> > > > You don't want to give people new IPv6 addresses with the same stable >> > > > secret (across reboots) after a kernel upgrade. Maybe they lose >> > > > connectivity then and it is extra work? >> > > >> > > Ahh, too bad. So it goes. >> > >> > If no other users survive we can put it into the ipv6 module. >> > >> > > > The bpf hash stuff can be changed during this merge window, as it is >> > > > not yet in a released kernel. Albeit I would probably have preferred >> > > > something like sha256 here, which can be easily replicated by user >> > > > space tools (minus the problem of patching out references to not >> > > > hashable data, which must be zeroed). >> > > >> > > Oh, interesting, so time is of the essence then. Do you want to handle >> > > changing the new eBPF code to something not-SHA1 before it's too late, >> > > as part of a ne >> >> w patchset that can fast track itself to David? And >> > > then I can preserve my large series for the next merge window. >> > >> > This algorithm should be a non-seeded algorithm, because the hashes >> > should be stable and verifiable by user space tooling. Thus this would >> > need a hashing algorithm that is hardened against pre-image >> > attacks/collision resistance, which siphash is not. I would prefer some >> > higher order SHA algorithm for that actually. >> > >> >> You mean: >> >> commit 7bd509e311f408f7a5132fcdde2069af65fa05ae >> Author: Daniel Borkmann >> Date: Sun Dec 4 23:19:41 2016 +0100 >> >> bpf: add prog_digest and expose it via fdinfo/netlink >> >> >> Yes, please! This actually matters for security -- imagine a >> malicious program brute-forcing a collision so that it gets loaded >> wrong. And this is IMO a use case for SHA-256 or SHA-512/256 >> (preferably the latter). Speed basically doesn't matter here and >> Blake2 is both less stable (didn't they slightly change it recently?) >> and much less well studied. > > We don't prevent ebpf programs being loaded based on the digest but > just to uniquely identify loaded programs from user space and match up > with their source. The commit log talks about using the hash to see if the program has already been compiled and JITted. If that's done, then a collision will directly cause the kernel to malfunction. >> My inclination would have been to seed them with something that isn't >> exposed to userspace for the precise reason that it would prevent user >> code from making assumptions about what's in the hash. But if there's >> a use case for why user code needs to be able to calculate the hash on >> its own, then that's fine. But perhaps the actual fdinfo string >> should be "sha256:abcd1234..." to give some flexibility down the road. >> >> Also: >> >> + result = (__force __be32 *)fp->digest; >> + for (i = 0; i < SHA_DIGEST_WORDS; i++) >> + result[i] = cpu_to_be32(fp->digest[i]); >> >> Everyone, please, please, please don't open-code crypto primitives. >> Is this and the code above it even correct? It might be but on a very >> brief glance it looks wrong to me. If you're doing this to avoid >> depending on crypto, then fix crypto so you can pull in the algorithm >> without pulling in the whole crypto core. > > The hashing is not a proper sha1 neither, unfortunately. I think that > is why it will have a custom implementation in iproute2? Putting on crypto hat: NAK NAK NAK NAK NAK. "The Linux kernel invented a new primitive in 2016 when people know better and is going to handle it by porting that new primitive to userspace" is not a particularly good argument. Okay, crypto hack back off. > > I wondered if bpf program loading should have used the module loading > infrastructure from the beginning... That would be way too complicated and would be nasty for the unprivileged cases. > >> At the very least, there should be a separate function that calculates >> the hash of a buffer and that function should explicitly run itself >> against test vectors of various lengths to make sure that it's >> calculating what it claims to be calculating. And it doesn't look >> like the userspace code has landed, so, if this thing isn't >> calculating SHA1 correctly, it's plausible that no one has noticed. > > I hope this was known from the beginning, this is not sha1 unfortunately. > > But ebpf elf programs also need preprocessing to get rid of some > embedded load-depending data, so maybe it was considered to be just > enough? I suspect it was actually an accident.
Re: [PATCH v2] stmmac: CSR clock configuration fix
Às 4:57 PM de 12/22/2016, Phil Reid escreveu: > On 22/12/2016 23:47, Joao Pinto wrote: >> >> Hello Phil, >> >> Às 3:42 PM de 12/22/2016, Phil Reid escreveu: >>> G'day Joao, >>> >>> On 22/12/2016 20:38, Joao Pinto wrote: When testing stmmac with my QoS reference design I checked a problem in the CSR clock configuration that was impossibilitating the phy discovery, since every read operation returned 0x. This patch fixes the issue. Signed-off-by: Joao Pinto --- changes v1->v2 (David Miller) - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch to make sense drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +- drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c | 2 +- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c index b21d03f..94223c8 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins, mac->mii.reg_shift = 6; mac->mii.reg_mask = 0x07C0; mac->mii.clk_csr_shift = 2; -mac->mii.clk_csr_mask = 0xF; +mac->mii.clk_csr_mask = GENMASK(4, 2); >>> >>> Should this not be GENMASK(5,2) >> >> According to Universal MAC databook (valid for MAC100 and MAC1000), we have: >> >> Bits: 4:2 >> 000 60-100 MHz clk_csr_i/42 >> 001 100-150 MHz clk_csr_i/62 >> 010 20-35 MHz clk_csr_i/16 >> 011 35-60 MHz clk_csr_i/26 >> 100 150-250 MHz clk_csr_i/102 >> 101 250-300 MHz clk_csr_i/124 >> 110, 111 Reserved >> >> So only bits 2, 3 and 4 should be masked. >> >> Thanks. >> > But this is a change in behaviour from what was there isn't. > Previous mask was 4 bits. now it's 3. > > And for example the altera socfgpa implementation in the cyclone V has valid > values > for values 0x8-0xf, using bit 5:2. According to the databook, bit 5 is reserved and RO. When reserved, it is possible to customize. Is that the case? If there is hardware using the 5th bit we can put it GENMASK(5, 2) with no problem. > > > > >
Re: [PATCH net] net, sched: fix soft lockup in tc_classify
From: Daniel Borkmann Date: Wed, 21 Dec 2016 22:07:48 +0100 > Ok, you mean for net. In that case I prefer the smaller sized fix to > be honest. It also covers everything from the point where we fetch > the chain via cops->tcf_chain() to the end of the function, which is > where most of the complexity resides, and only the two mentioned > commits do the relock, so as a fix I think it's fine as-is. As > mentioned, if there's need to refactor tc_ctl_tfilter() net-next > would be better, imho. Please, can you two work towards an agreement as to what fix should go in at this time? Thanks.
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote: > On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa > wrote: > > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote: > > > Hi Hannes, > > > > > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa > > > wrote: > > > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI. > > > > You don't want to give people new IPv6 addresses with the same stable > > > > secret (across reboots) after a kernel upgrade. Maybe they lose > > > > connectivity then and it is extra work? > > > > > > Ahh, too bad. So it goes. > > > > If no other users survive we can put it into the ipv6 module. > > > > > > The bpf hash stuff can be changed during this merge window, as it is > > > > not yet in a released kernel. Albeit I would probably have preferred > > > > something like sha256 here, which can be easily replicated by user > > > > space tools (minus the problem of patching out references to not > > > > hashable data, which must be zeroed). > > > > > > Oh, interesting, so time is of the essence then. Do you want to handle > > > changing the new eBPF code to something not-SHA1 before it's too late, > > > as part of a ne > > w patchset that can fast track itself to David? And > > > then I can preserve my large series for the next merge window. > > > > This algorithm should be a non-seeded algorithm, because the hashes > > should be stable and verifiable by user space tooling. Thus this would > > need a hashing algorithm that is hardened against pre-image > > attacks/collision resistance, which siphash is not. I would prefer some > > higher order SHA algorithm for that actually. > > > > You mean: > > commit 7bd509e311f408f7a5132fcdde2069af65fa05ae > Author: Daniel Borkmann > Date: Sun Dec 4 23:19:41 2016 +0100 > > bpf: add prog_digest and expose it via fdinfo/netlink > > > Yes, please! This actually matters for security -- imagine a > malicious program brute-forcing a collision so that it gets loaded > wrong. And this is IMO a use case for SHA-256 or SHA-512/256 > (preferably the latter). Speed basically doesn't matter here and > Blake2 is both less stable (didn't they slightly change it recently?) > and much less well studied. We don't prevent ebpf programs being loaded based on the digest but just to uniquely identify loaded programs from user space and match up with their source. There have been talks about signing bpf programs, thus this would probably need another digest algorithm additionally to that one, wasting probably instructions. Probably going somewhere in direction of PKCS#7 might be the thing to do (which leads to the problem to make PKCS#7 attackable by ordinary unpriv users, hmpf). > My inclination would have been to seed them with something that isn't > exposed to userspace for the precise reason that it would prevent user > code from making assumptions about what's in the hash. But if there's > a use case for why user code needs to be able to calculate the hash on > its own, then that's fine. But perhaps the actual fdinfo string > should be "sha256:abcd1234..." to give some flexibility down the road. > > Also: > > + result = (__force __be32 *)fp->digest; > + for (i = 0; i < SHA_DIGEST_WORDS; i++) > + result[i] = cpu_to_be32(fp->digest[i]); > > Everyone, please, please, please don't open-code crypto primitives. > Is this and the code above it even correct? It might be but on a very > brief glance it looks wrong to me. If you're doing this to avoid > depending on crypto, then fix crypto so you can pull in the algorithm > without pulling in the whole crypto core. The hashing is not a proper sha1 neither, unfortunately. I think that is why it will have a custom implementation in iproute2? I wondered if bpf program loading should have used the module loading infrastructure from the beginning... > At the very least, there should be a separate function that calculates > the hash of a buffer and that function should explicitly run itself > against test vectors of various lengths to make sure that it's > calculating what it claims to be calculating. And it doesn't look > like the userspace code has landed, so, if this thing isn't > calculating SHA1 correctly, it's plausible that no one has noticed. I hope this was known from the beginning, this is not sha1 unfortunately. But ebpf elf programs also need preprocessing to get rid of some embedded load-depending data, so maybe it was considered to be just enough? Bye, Hannes
Re: [PATCH v2] stmmac: CSR clock configuration fix
On 22/12/2016 23:47, Joao Pinto wrote: Hello Phil, Às 3:42 PM de 12/22/2016, Phil Reid escreveu: G'day Joao, On 22/12/2016 20:38, Joao Pinto wrote: When testing stmmac with my QoS reference design I checked a problem in the CSR clock configuration that was impossibilitating the phy discovery, since every read operation returned 0x. This patch fixes the issue. Signed-off-by: Joao Pinto --- changes v1->v2 (David Miller) - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch to make sense drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +- drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c | 2 +- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c index b21d03f..94223c8 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins, mac->mii.reg_shift = 6; mac->mii.reg_mask = 0x07C0; mac->mii.clk_csr_shift = 2; -mac->mii.clk_csr_mask = 0xF; +mac->mii.clk_csr_mask = GENMASK(4, 2); Should this not be GENMASK(5,2) According to Universal MAC databook (valid for MAC100 and MAC1000), we have: Bits: 4:2 000 60-100 MHz clk_csr_i/42 001 100-150 MHz clk_csr_i/62 010 20-35 MHz clk_csr_i/16 011 35-60 MHz clk_csr_i/26 100 150-250 MHz clk_csr_i/102 101 250-300 MHz clk_csr_i/124 110, 111 Reserved So only bits 2, 3 and 4 should be masked. Thanks. But this is a change in behaviour from what was there isn't. Previous mask was 4 bits. now it's 3. And for example the altera socfgpa implementation in the cyclone V has valid values for values 0x8-0xf, using bit 5:2. -- Regards Phil Reid
VERY IMPORTANT
We have an inheritance of a deceased client with your surname. Kindly contact me via email:( gertjan.vlie...@yandex.com ) with your full names for info Best Regards. Dr, Gertjan Vlieghe. -- Correo Corporativo Hospital Universitario del Valle E.S.E *** "Estamos re-dimensionandonos para crecer!" **
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 8:28 AM, Jason A. Donenfeld wrote: > Hi all, > > I don't know what your design requirements are for this. It looks like > you're generating some kind of crypto digest of a program, and you > need to avoid collisions. If you'd like to go with a PRF (keyed hash > function) that uses some kernel secret key, then I'd strongly suggest > using Keyed-Blake2. Alternatively, if you need for userspace to be > able to calculate the same hash, and don't want to use some kernel > secret, then I'd still suggest using Blake2, which will be fast and > secure. > > If you can wait until January, I'll work on a commit adding the > primitive to the tree. I've already written it and I just need to get > things cleaned up. > >> Blake2 is both less stable (didn't they slightly change it recently?) > > No, Blake2 is very stable. It's also extremely secure and has been > extensively studied. Not to mention it's faster than SHA2. And if you > want to use it as a PRF, it's obvious better suited and faster to use > Blake2's keyed PRF mode than HMAC-SHA2. > > If you don't care about performance, and you don't want to use a PRF, > then just use SHA2-256. If you're particularly concerned about certain > types of attacks, you could go with SHA2-512 truncated to 256 bytes, > but somehow I doubt you need this. I don't think this cares about performance. (Well, it cares about performance, but the verifier will likely dominiate the cost by such a large margin that the hash algo doesn't matter.) And staying FIPS-compliant-ish is worth a little bit, so I'd advocate for something in the SHA2 family. > If userspace hasn't landed, can we get away with changing this code > after 4.10? Or should we just fix it before 4.10? Or should we revert > it before 4.10? Development-policy-things like this I have zero clue > about, so I heed to your guidance. I think it should be fixed or reverted before 4.10. --Andy
Re: [PATCH net] net, sched: fix soft lockup in tc_classify
On 12/21/2016 7:04 PM, Daniel Borkmann wrote: Shahar reported a soft lockup in tc_classify(), where we run into an endless loop when walking the classifier chain due to tp->next == tp which is a state we should never run into. The issue only seems to trigger under load in the tc control path. What happens is that in tc_ctl_tfilter(), thread A allocates a new tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() with it. In that classifier callback we had to unlock/lock the rtnl mutex and returned with -EAGAIN. One reason why we need to drop there is, for example, that we need to request an action module to be loaded. This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning after we loaded and found the requested action, we need to redo the whole request so we don't race against others. While we had to unlock rtnl in that time, thread B's request was processed next on that CPU. Thread B added a new tp instance successfully to the classifier chain. When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN and destroying its tp instance which never got linked, we goto replay and redo A's request. This time when walking the classifier chain in tc_ctl_tfilter() for checking for existing tp instances we had a priority match and found the tp instance that was created and linked by thread B. Now calling again into tp->ops->change() with that tp was successful and returned without error. tp_created was never cleared in the second round, thus kernel thinks that we need to link it into the classifier chain (once again). tp and *back point to the same object due to the match we had earlier on. Thus for thread B's already public tp, we reset tp->next to tp itself and link it into the chain, which eventually causes the mentioned endless loop in tc_classify() once a packet hits the data path. Fix is to clear tp_created at the beginning of each request, also when we replay it. On the paths that can cause -EAGAIN we already destroy the original tp instance we had and on replay we really need to start from scratch. It seems that this issue was first introduced in commit 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup"). Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup") Reported-by: Shahar Klein Signed-off-by: Daniel Borkmann Cc: Cong Wang --- Shahar, you mentioned you wanted to run again later without the debug printk's. Once you do that and come to the same result again, please feel free to reply with a Tested-by or such. Thanks everyone! net/sched/cls_api.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index 3fbba79..1ecdf80 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -148,13 +148,15 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n) unsigned long cl; unsigned long fh; int err; - int tp_created = 0; + int tp_created; if ((n->nlmsg_type != RTM_GETTFILTER) && !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) return -EPERM; replay: + tp_created = 0; + err = nlmsg_parse(n, sizeof(*t), tca, TCA_MAX, NULL); if (err < 0) return err; I've tested this specific patch (without cong's patch and without the many prints) many times and in many test permutations today and it didn't lock Thanks again Daniel! Shahar
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, Dec 22, 2016 at 5:30 PM, Theodore Ts'o wrote: > I'd do this first, as one set. Adding a new file to crypto is > unlikely to cause merge conflicts. Ack. > >> 2. convert char/random to use siphash. to: ted ts'o's random-next > > I'm confused, I thought you had agreed to the batched chacha20 > approach? Sorry, I meant to write this. Long day, little sleep. Yes, of course. Batched entropy. >> 3. move lib/md5.c to static function in crypto/md5.c, remove entry >> inside of linux/cryptohash.h. to: ??'s ??-next > > This is cleanup, so it doesn't matter that much when it happens. md5 > changes to crypto is also unlikely to cause conflicts, so we could do > this at the same time as (2), if Herbert (the crypto maintainer) agrees. Alright, sure. > >> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove >> entry inside of linux/cryptohash.c. to: td ts'o's ext-next > > This is definitely separate. Okay, I'll submit it to you separately. > One more thing. Can you add some test cases to lib/siphash.h? > Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with > some test inputs and known outputs? I'm going to need to add a > version of siphash to e2fsprogs, and I want to make sure the userspace > version is implementing the same algorithm as the kernel siphash. I've already written these. They're behind TEST_HASH. They currently test every single line of code of all implementations of siphash. I spent a long time on these. The test vectors themselves were taken from the SipHash creators' reference publication. Check out lib/test_siphash.c in my tree. Jason On Thu, Dec 22, 2016 at 5:30 PM, Theodore Ts'o wrote: > On Thu, Dec 22, 2016 at 05:16:47PM +0100, Jason A. Donenfeld wrote: >> Could you offer a bit of advice on how to manage dependencies between >> patchsets during merge windows? I'm a bit new to the process. >> >> Specifically, we how have 4 parts: >> 1. add siphash, and use it for some networking code. to: david miller's >> net-next > > I'd do this first, as one set. Adding a new file to crypto is > unlikely to cause merge conflicts. > >> 2. convert char/random to use siphash. to: ted ts'o's random-next > > I'm confused, I thought you had agreed to the batched chacha20 > approach? > >> 3. move lib/md5.c to static function in crypto/md5.c, remove entry >> inside of linux/cryptohash.h. to: ??'s ??-next > > This is cleanup, so it doesn't matter that much when it happens. md5 > changes to crypto is also unlikely to cause conflicts, so we could do > this at the same time as (2), if Herbert (the crypto maintainer) agrees. > >> 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove >> entry inside of linux/cryptohash.c. to: td ts'o's ext-next > > This is definitely separate. > > One more thing. Can you add some test cases to lib/siphash.h? > Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with > some test inputs and known outputs? I'm going to need to add a > version of siphash to e2fsprogs, and I want to make sure the userspace > version is implementing the same algorithm as the kernel siphash. > > - Ted -- Jason A. Donenfeld Deep Space Explorer fr: +33 6 51 90 82 66 us: +1 513 476 1200 www.jasondonenfeld.com www.zx2c4.com zx2c4.com/keys/AB9942E6D4A4CFC3412620A749FC7012A5DE03AE.asc
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, Dec 22, 2016 at 05:16:47PM +0100, Jason A. Donenfeld wrote: > Could you offer a bit of advice on how to manage dependencies between > patchsets during merge windows? I'm a bit new to the process. > > Specifically, we how have 4 parts: > 1. add siphash, and use it for some networking code. to: david miller's > net-next I'd do this first, as one set. Adding a new file to crypto is unlikely to cause merge conflicts. > 2. convert char/random to use siphash. to: ted ts'o's random-next I'm confused, I thought you had agreed to the batched chacha20 approach? > 3. move lib/md5.c to static function in crypto/md5.c, remove entry > inside of linux/cryptohash.h. to: ??'s ??-next This is cleanup, so it doesn't matter that much when it happens. md5 changes to crypto is also unlikely to cause conflicts, so we could do this at the same time as (2), if Herbert (the crypto maintainer) agrees. > 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove > entry inside of linux/cryptohash.c. to: td ts'o's ext-next This is definitely separate. One more thing. Can you add some test cases to lib/siphash.h? Triggered off of a CONFIG_SIPHASH_REGRESSION_TEST config flag, with some test inputs and known outputs? I'm going to need to add a version of siphash to e2fsprogs, and I want to make sure the userspace version is implementing the same algorithm as the kernel siphash. - Ted
Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
Hi all, I don't know what your design requirements are for this. It looks like you're generating some kind of crypto digest of a program, and you need to avoid collisions. If you'd like to go with a PRF (keyed hash function) that uses some kernel secret key, then I'd strongly suggest using Keyed-Blake2. Alternatively, if you need for userspace to be able to calculate the same hash, and don't want to use some kernel secret, then I'd still suggest using Blake2, which will be fast and secure. If you can wait until January, I'll work on a commit adding the primitive to the tree. I've already written it and I just need to get things cleaned up. > Blake2 is both less stable (didn't they slightly change it recently?) No, Blake2 is very stable. It's also extremely secure and has been extensively studied. Not to mention it's faster than SHA2. And if you want to use it as a PRF, it's obvious better suited and faster to use Blake2's keyed PRF mode than HMAC-SHA2. If you don't care about performance, and you don't want to use a PRF, then just use SHA2-256. If you're particularly concerned about certain types of attacks, you could go with SHA2-512 truncated to 256 bytes, but somehow I doubt you need this. Anyway, the take away is: stop using SHA1. It's time has come. > Everyone, please, please, please don't open-code crypto primitives. > Is this and the code above it even correct? It might be but on a very I shuttered looking at this too. How did this possibly make it past review? The attitude toward crypto here is generally *terrifying*. Unless you're a professional cryptographer, the only correct attitude to have is a careful and conservative one. > At the very least, there should be a separate function that calculates > the hash of a buffer and that function should explicitly run itself > against test vectors of various lengths to make sure that it's > calculating what it claims to be calculating. And it doesn't look > like the userspace code has landed, so, if this thing isn't > calculating SHA1 correctly, it's plausible that no one has noticed. If userspace hasn't landed, can we get away with changing this code after 4.10? Or should we just fix it before 4.10? Or should we revert it before 4.10? Development-policy-things like this I have zero clue about, so I heed to your guidance. Jason
[PATCH iproute2 v3 2/4] ifstat: Add extended statistics to ifstat
Extended stats are part of the RTM_GETSTATS method. This patch adds them to ifstat. While extended stats can come in many forms, we support only the rtnl_link_stats64 struct for them (which is the 64 bits version of struct rtnl_link_stats). We support stats in the main nesting level, or one lower. The extension can be called by its name or any shorten of it. If there is more than one matched, the first one will be picked. To get the extended stats the flag -x is used. Signed-off-by: Nogah Frankel Reviewed-by: Jiri Pirko --- misc/ifstat.c | 161 -- 1 file changed, 146 insertions(+), 15 deletions(-) diff --git a/misc/ifstat.c b/misc/ifstat.c index 5bcbcc8..ce666b3 100644 --- a/misc/ifstat.c +++ b/misc/ifstat.c @@ -34,6 +34,7 @@ #include "libnetlink.h" #include "json_writer.h" #include "SNAPSHOT.h" +#include "utils.h" int dump_zeros; int reset_history; @@ -48,17 +49,21 @@ int pretty; double W; char **patterns; int npatterns; +bool is_extended; +int filter_type; +int sub_type; char info_source[128]; int source_mismatch; #define MAXS (sizeof(struct rtnl_link_stats)/sizeof(__u32)) +#define NO_SUB_TYPE 0x struct ifstat_ent { struct ifstat_ent *next; char*name; int ifindex; - unsigned long long val[MAXS]; + __u64 val[MAXS]; double rate[MAXS]; __u32 ival[MAXS]; }; @@ -106,6 +111,48 @@ static int match(const char *id) return 0; } +static int get_nlmsg_extended(const struct sockaddr_nl *who, + struct nlmsghdr *m, void *arg) +{ + struct if_stats_msg *ifsm = NLMSG_DATA(m); + struct rtattr *tb[IFLA_STATS_MAX+1]; + int len = m->nlmsg_len; + struct ifstat_ent *n; + + if (m->nlmsg_type != RTM_NEWSTATS) + return 0; + + len -= NLMSG_LENGTH(sizeof(*ifsm)); + if (len < 0) + return -1; + + parse_rtattr(tb, IFLA_STATS_MAX, IFLA_STATS_RTA(ifsm), len); + if (tb[filter_type] == NULL) + return 0; + + n = malloc(sizeof(*n)); + if (!n) + abort(); + + n->ifindex = ifsm->ifindex; + n->name = strdup(ll_index_to_name(ifsm->ifindex)); + + if (sub_type == NO_SUB_TYPE) { + memcpy(&n->val, RTA_DATA(tb[filter_type]), sizeof(n->val)); + } else { + struct rtattr *attr; + + attr = parse_rtattr_one_nested(sub_type, tb[filter_type]); + if (attr == NULL) + return 0; + memcpy(&n->val, RTA_DATA(attr), sizeof(n->val)); + } + memset(&n->rate, 0, sizeof(n->rate)); + n->next = kern_db; + kern_db = n; + return 0; +} + static int get_nlmsg(const struct sockaddr_nl *who, struct nlmsghdr *m, void *arg) { @@ -147,18 +194,34 @@ static void load_info(void) { struct ifstat_ent *db, *n; struct rtnl_handle rth; + __u32 filter_mask; if (rtnl_open(&rth, 0) < 0) exit(1); - if (rtnl_wilddump_request(&rth, AF_INET, RTM_GETLINK) < 0) { - perror("Cannot send dump request"); - exit(1); - } + if (is_extended) { + ll_init_map(&rth); + filter_mask = IFLA_STATS_FILTER_BIT(filter_type); + if (rtnl_wilddump_stats_req_filter(&rth, AF_UNSPEC, RTM_GETSTATS, + filter_mask) < 0) { + perror("Cannot send dump request"); + exit(1); + } - if (rtnl_dump_filter(&rth, get_nlmsg, NULL) < 0) { - fprintf(stderr, "Dump terminated\n"); - exit(1); + if (rtnl_dump_filter(&rth, get_nlmsg_extended, NULL) < 0) { + fprintf(stderr, "Dump terminated\n"); + exit(1); + } + } else { + if (rtnl_wilddump_request(&rth, AF_INET, RTM_GETLINK) < 0) { + perror("Cannot send dump request"); + exit(1); + } + + if (rtnl_dump_filter(&rth, get_nlmsg, NULL) < 0) { + fprintf(stderr, "Dump terminated\n"); + exit(1); + } } rtnl_close(&rth); @@ -553,10 +616,17 @@ static void update_db(int interval) } for (i = 0; i < MAXS; i++) { double sample; - unsigned long incr = h1->ival[i] - n->ival[i]; + __u64 incr; + + if (is_extended) { + incr = h1->val[i] - n->val[i]; +
[PATCH iproute2 v3 3/4] ifstat: Add 64 bits based stats to extended statistics
The default stats for ifstat are 32 bits based. The kernel supports 64 bits based stats. (They are returned in struct rtnl_link_stats64 which is an exact copy of struct rtnl_link_stats, in which the "normal" stats are returned, but with fields of u64 instead of u32). This patch adds them as an extended stats. It is read with filter type IFLA_STATS_LINK_64 and no sub type. It is under the name 64bits (or any shorten of it as "64") For example: ifstat -x 64bit Signed-off-by: Nogah Frankel Reviewed-by: Jiri Pirko --- misc/ifstat.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/misc/ifstat.c b/misc/ifstat.c index ce666b3..8325ac7 100644 --- a/misc/ifstat.c +++ b/misc/ifstat.c @@ -729,7 +729,8 @@ static int verify_forging(int fd) static void xstat_usage(void) { fprintf(stderr, -"Usage: ifstat supported xstats:\n"); +"Usage: ifstat supported xstats:\n" +" 64bits default stats, with 64 bits support\n"); } struct extended_stats_options_t { @@ -743,6 +744,7 @@ struct extended_stats_options_t { * Name length must be under 64 chars. */ static const struct extended_stats_options_t extended_stats_options[] = { + {"64bits", IFLA_STATS_LINK_64, NO_SUB_TYPE}, }; static bool get_filter_type(char *name) -- 2.4.3
Re: [PATCH net] ipvlan: fix multicast processing
From: Mahesh Bandewar Date: Wed, 21 Dec 2016 17:30:16 -0800 > From: Mahesh Bandewar > > In an IPvlan setup when master is set in loopback mode e.g. > > ethtool -K eth0 set loopback on > > where eth0 is master device for IPvlan setup. > > The failure is caused by the faulty logic that determines if the > packet is from TX-path vs. RX-path by just looking at the mac- > addresses on the packet while processing multicast packets. > > In the loopback-mode where this crash was happening, the packets > that are sent out are reflected by the NIC and are processed on > the RX path, but mac-address check tricks into thinking this > packet is from TX path and falsely uses dev_forward_skb() to pass > packets to the slave (virtual) devices. > > This patch records the path while queueing packets and eliminates > logic of looking at mac-addresses for the same decision. ... > Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a > work-queue") > Signed-off-by: Mahesh Bandewar This looks a lot better, applied, thanks.
Re: [PATCH net 1/1] tipc: don't send FIN message from connectionless socket
From: Jon Maloy Date: Thu, 22 Dec 2016 07:22:29 -0500 > In commit 6f00089c7372 ("tipc: remove SS_DISCONNECTING state") the > check for socket type is in the wrong place, causing a closing socket > to always send out a FIN message even when the socket was never > connected. This is normally harmless, since the destination node for > such messages most often is zero, and the message will be dropped, but > it is still a wrong and confusing behavior. > > We fix this in this commit. > > Reviewed-by: Parthasarathy Bhuvaragan > Signed-off-by: Jon Maloy Applied.
[PATCH iproute2 v3 1/4] ifstat: Includes reorder
Reorder the includes order in misc/ifstat.c to match convention. Signed-off-by: Nogah Frankel Reviewed-by: Jiri Pirko --- misc/ifstat.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/misc/ifstat.c b/misc/ifstat.c index 92d67b0..5bcbcc8 100644 --- a/misc/ifstat.c +++ b/misc/ifstat.c @@ -28,12 +28,12 @@ #include #include -#include -#include #include #include -#include +#include "libnetlink.h" +#include "json_writer.h" +#include "SNAPSHOT.h" int dump_zeros; int reset_history; -- 2.4.3
[PATCH iproute2 v3 0/4] update ifstat for new stats
Previously stats were gotten by RTM_GETLINK which returns 32 bits based statistics. It supports only one type of stats. Lately, a new method to get stats was added - RTM_GETSTATS. It supports ability to choose stats type. The basic stats were changed from 32 bits based to 64 bits based. This patchset adds ifstat the ability to get extended stats by this method. Its adds two types of extended stats: 64bits - the same as the "normal" stats but get the stats from the cpu in 64 bits based struct. SW - for packets that hit cpu. --- v2->v3: - patch 1/4: - add a new patch to reorder includes in misc/ifstat.c - patch 2/4: (previously 1/3) - fix typos. - change error print to use fprintf. v1->v2: - change from using RTM_GETSTATS always to using it only for extended stats. - Add 64bits extended stats type. Nogah Frankel (4): ifstat: Includes reorder ifstat: Add extended statistics to ifstat ifstat: Add 64 bits based stats to extended statistics ifstat: Add "sw only" extended statistics to ifstat misc/ifstat.c | 171 +++--- 1 file changed, 153 insertions(+), 18 deletions(-) -- 2.4.3
[PATCH iproute2 v3 4/4] ifstat: Add "sw only" extended statistics to ifstat
Add support for extended statistics of SW only type, for counting only the packets that went via the cpu. (useful for systems with forward offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT. It is under the name 'software' (or any shorten of it as 'soft' or simply 's') For example: ifstat -x s Signed-off-by: Nogah Frankel Reviewed-by: Jiri Pirko --- misc/ifstat.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/misc/ifstat.c b/misc/ifstat.c index 8325ac7..79c0dba 100644 --- a/misc/ifstat.c +++ b/misc/ifstat.c @@ -730,7 +730,8 @@ static void xstat_usage(void) { fprintf(stderr, "Usage: ifstat supported xstats:\n" -" 64bits default stats, with 64 bits support\n"); +" 64bits default stats, with 64 bits support\n" +" softwareSW stats. Counts only packets that went via the CPU\n"); } struct extended_stats_options_t { @@ -745,6 +746,7 @@ struct extended_stats_options_t { */ static const struct extended_stats_options_t extended_stats_options[] = { {"64bits", IFLA_STATS_LINK_64, NO_SUB_TYPE}, + {"software", IFLA_STATS_LINK_OFFLOAD_XSTATS, IFLA_OFFLOAD_XSTATS_CPU_HIT}, }; static bool get_filter_type(char *name) -- 2.4.3
Re: [PATCH net] net/mlx4_en: Fix user prio field in XDP forward
From: Tariq Toukan Date: Thu, 22 Dec 2016 14:32:58 +0200 > The user prio field is wrong (and overflows) in the XDP forward > flow. > This is a result of a bad value for num_tx_rings_p_up, which should > account all XDP TX rings, as they operate for the same user prio. > > Signed-off-by: Tariq Toukan > Reported-by: Martin KaFai Lau Applied.
Re: [PATCH v2] stmmac: CSR clock configuration fix
From: Joao Pinto Date: Thu, 22 Dec 2016 12:38:00 + > When testing stmmac with my QoS reference design I checked a problem in the > CSR clock configuration that was impossibilitating the phy discovery, since > every read operation returned 0x. This patch fixes the issue. > > Signed-off-by: Joao Pinto Applied.
Re: [PATCH v2 net] ipvlan: fix various issues in ipvlan_process_multicast()
From: Eric Dumazet Date: Wed, 21 Dec 2016 18:00:24 -0800 > From: Eric Dumazet > > 1) netif_rx() / dev_forward_skb() should not be called from process > context. > > 2) ipvlan_count_rx() should be called with preemption disabled. > > 3) We should check if ipvlan->dev is up before feeding packets > to netif_rx() > > 4) We need to prevent device from disappearing if some packets > are in the multicast backlog. > > 5) One kfree_skb() should be a consume_skb() eventually > > Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to > a work-queue") > Signed-off-by: Eric Dumazet > Cc: Mahesh Bandewar Applied.
ipv6: handle -EFAULT from skb_copy_bits
By setting certain socket options on ipv6 raw sockets, we can confuse the length calculation in rawv6_push_pending_frames triggering a BUG_ON. RIP: 0010:[] [] rawv6_sendmsg+0xc30/0xc40 RSP: 0018:881f6c4a7c18 EFLAGS: 00010282 RAX: fff2 RBX: 881f6c681680 RCX: 0002 RDX: 881f6c4a7cf8 RSI: 0030 RDI: 881fed0f6a00 RBP: 881f6c4a7da8 R08: R09: 0009 R10: 881fed0f6a00 R11: 0009 R12: 0030 R13: 881fed0f6a00 R14: 881fee39ba00 R15: 881fefa93a80 Call Trace: [] ? unmap_page_range+0x693/0x830 [] inet_sendmsg+0x67/0xa0 [] sock_sendmsg+0x38/0x50 [] SYSC_sendto+0xef/0x170 [] SyS_sendto+0xe/0x10 [] do_syscall_64+0x50/0xa0 [] entry_SYSCALL64_slow_path+0x25/0x25 Handle by jumping to the failure path if skb_copy_bits gets an EFAULT. Reproducer: #include #include #include #include #include #include #include #define LEN 504 int main(int argc, char* argv[]) { int fd; int zero = 0; char buf[LEN]; memset(buf, 0, LEN); fd = socket(AF_INET6, SOCK_RAW, 7); setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4); setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN); sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110); } Signed-off-by: Dave Jones diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 291ebc260e70..ea89073c8247 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -591,7 +591,11 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6, } offset += skb_transport_offset(skb); - BUG_ON(skb_copy_bits(skb, offset, &csum, 2)); + err = skb_copy_bits(skb, offset, &csum, 2); + if (err < 0) { + ip6_flush_pending_frames(sk); + goto out; + } /* in case cksum was not initialized */ if (unlikely(csum))
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
Hi Ted, On Thu, Dec 22, 2016 at 4:58 PM, Theodore Ts'o wrote: > Can we do this as a separate series, please? At this point, it's a > completely separate change from a logical perspective, and we can take > in the change through the random.git tree. > > Changes that touch files that are normally changed in several > different git trees leads to the potential for merge conflicts during > the linux-next integration and merge window processes. Which is why > it's generally best to try to isolate changes as much as possible. Sure, I can separate things out. Could you offer a bit of advice on how to manage dependencies between patchsets during merge windows? I'm a bit new to the process. Specifically, we how have 4 parts: 1. add siphash, and use it for some networking code. to: david miller's net-next 2. convert char/random to use siphash. to: ted ts'o's random-next 3. move lib/md5.c to static function in crypto/md5.c, remove entry inside of linux/cryptohash.h. to: ??'s ??-next 4. move lib/halfmd4.c to static function in fs/ext/hash.c, remove entry inside of linux/cryptohash.c. to: td ts'o's ext-next Problem: 2 depends on 1, 3 depends on 1 & 2. But this can be simplified into 3 parts: 1. add siphash, and use it for some networking code. to: david miller's net-next 2. convert char/random to use siphash, move lib/md5.c to static function in crypto/md5.c, remove entry inside of linux/cryptohash.h. to: ted ts'o's random-next 3. move lib/halfmd4.c to static function in fs/ext/hash.c, remove entry inside of linux/cryptohash.c. to: td ts'o's ext-next Problem: 2 depends on 1. Is that okay with you? Also, would you like me to merge (3) and (2) of the second list into one series for you? Jason
Re: pull-request: wireless-drivers 2016-12-22
From: Kalle Valo Date: Thu, 22 Dec 2016 17:11:27 +0200 > before the holidays a really small pull request for 4.10. I just want to > have the regressions in rtlwifi and ath9k fixed quickly so I send this > earlier than I normally would. > > Please let me know if there are any problems. Pulled, thanks.
Re: [PATCH net] net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
From: Lorenzo Colitti Date: Fri, 23 Dec 2016 00:33:57 +0900 > Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP > protocols.") made ip_do_redirect call sock_net(sk) to determine > the network namespace of the passed-in socket. This crashes if sk > is NULL. > > Fix this by getting the network namespace from the skb instead. > > Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.") > Signed-off-by: Lorenzo Colitti Applied, thanks Lorenzo.
Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
On Wed, Dec 21, 2016 at 6:07 PM, Andy Lutomirski wrote: > On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin > wrote: >> As a separate message, to disentangle the threads, I'd like to >> talk about get_random_long(). >> >> After some thinking, I still like the "state-preserving" construct >> that's equivalent to the current MD5 code. Yes, we could just do >> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to >> preserve a bit more. >> >> It requires library support from the SipHash code to return the full >> SipHash state, but I hope that's a fair thing to ask for. > > I don't even think it needs that. This is just adding a > non-destructive final operation, right? > >> >> Here's my current straw man design for comment. It's very similar to >> the current MD5-based design, but feeds all the seed material in the >> "correct" way, as opposed to Xring directly into the MD5 state. >> >> * Each CPU has a (Half)SipHash state vector, >> "unsigned long get_random_int_hash[4]". Unlike the current >> MD5 code, we take care to initialize it to an asymmetric state. >> >> * There's a global 256-bit random_int_secret (which we could >> reseed periodically). >> >> To generate a random number: >> * If get_random_int_hash is all-zero, seed it with fresh a half-sized >> SipHash key and the appropriate XOR constants. >> * Generate three words of random_get_entropy(), jiffies, and current->pid. >> (This is arbitary seed material, copied from the current code.) >> * Crank through that with (Half)SipHash-1-0. >> * Crank through the random_int_secret with (Half)SipHash-1-0. >> * Return v1 ^ v3. > > Just to clarify, if we replace SipHash with a black box, I think this > effectively means, where "entropy" is random_get_entropy() || jiffies > || current->pid: > > The first call returns H(random seed || entropy_0 || secret). The > second call returns H(random seed || entropy_0 || secret || entropy_1 > || secret). Etc. Having slept on this, I like it less. The problem is that a backtracking attacker doesn't just learn H(random seed || entropy_0 || secret || ...) -- they learn the internal state of the hash function that generates that value. This probably breaks any attempt to apply security properties of the hash function. For example, the internal state could easily contain a whole bunch of prior outputs it in verbatim. --Andy
BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
On Thu, Dec 22, 2016 at 7:51 AM, Hannes Frederic Sowa wrote: > On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote: >> Hi Hannes, >> >> On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa >> wrote: >> > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI. >> > You don't want to give people new IPv6 addresses with the same stable >> > secret (across reboots) after a kernel upgrade. Maybe they lose >> > connectivity then and it is extra work? >> >> Ahh, too bad. So it goes. > > If no other users survive we can put it into the ipv6 module. > >> > The bpf hash stuff can be changed during this merge window, as it is >> > not yet in a released kernel. Albeit I would probably have preferred >> > something like sha256 here, which can be easily replicated by user >> > space tools (minus the problem of patching out references to not >> > hashable data, which must be zeroed). >> >> Oh, interesting, so time is of the essence then. Do you want to handle >> changing the new eBPF code to something not-SHA1 before it's too late, >> as part of a ne w patchset that can fast track itself to David? And >> then I can preserve my large series for the next merge window. > > This algorithm should be a non-seeded algorithm, because the hashes > should be stable and verifiable by user space tooling. Thus this would > need a hashing algorithm that is hardened against pre-image > attacks/collision resistance, which siphash is not. I would prefer some > higher order SHA algorithm for that actually. > You mean: commit 7bd509e311f408f7a5132fcdde2069af65fa05ae Author: Daniel Borkmann Date: Sun Dec 4 23:19:41 2016 +0100 bpf: add prog_digest and expose it via fdinfo/netlink Yes, please! This actually matters for security -- imagine a malicious program brute-forcing a collision so that it gets loaded wrong. And this is IMO a use case for SHA-256 or SHA-512/256 (preferably the latter). Speed basically doesn't matter here and Blake2 is both less stable (didn't they slightly change it recently?) and much less well studied. My inclination would have been to seed them with something that isn't exposed to userspace for the precise reason that it would prevent user code from making assumptions about what's in the hash. But if there's a use case for why user code needs to be able to calculate the hash on its own, then that's fine. But perhaps the actual fdinfo string should be "sha256:abcd1234..." to give some flexibility down the road. Also: + result = (__force __be32 *)fp->digest; + for (i = 0; i < SHA_DIGEST_WORDS; i++) + result[i] = cpu_to_be32(fp->digest[i]); Everyone, please, please, please don't open-code crypto primitives. Is this and the code above it even correct? It might be but on a very brief glance it looks wrong to me. If you're doing this to avoid depending on crypto, then fix crypto so you can pull in the algorithm without pulling in the whole crypto core. At the very least, there should be a separate function that calculates the hash of a buffer and that function should explicitly run itself against test vectors of various lengths to make sure that it's calculating what it claims to be calculating. And it doesn't look like the userspace code has landed, so, if this thing isn't calculating SHA1 correctly, it's plausible that no one has noticed. --Andy
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, Dec 22, 2016 at 07:03:29AM +0100, Jason A. Donenfeld wrote: > I find this compelling. We'll have one csprng for both > get_random_int/long and for /dev/urandom, and we'll be able to update > that silly warning on the comment of get_random_int/long to read > "gives output of either rdrand quality or of /dev/urandom quality", > which makes it more useful for other things. It introduces less error > prone code, and it lets the RNG analysis be spent on just one RNG, not > two. > > So, with your blessing, I'm going to move ahead with implementing a > pretty version of this for v8. Can we do this as a separate series, please? At this point, it's a completely separate change from a logical perspective, and we can take in the change through the random.git tree. Changes that touch files that are normally changed in several different git trees leads to the potential for merge conflicts during the linux-next integration and merge window processes. Which is why it's generally best to try to isolate changes as much as possible. Cheers, - Ted
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, Dec 22, 2016 at 02:10:33PM +0100, Jason A. Donenfeld wrote: > On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa > wrote: > > following up on what appears to be a random subject: ;) > > > > IIRC, ext4 code by default still uses half_md4 for hashing of filenames > > in the htree. siphash seems to fit this use case pretty good. > > I saw this too. I'll try to address it in v8 of this series. This is a separate issue, and this series is getting a bit too complex. So I'd suggest pushing this off to a separate change. Changing the htree hash algorithm is an on-disk format change, and so we couldn't roll it out until e2fsprogs gets updated and rolled out pretty broadley. In fact George sent me patches to add siphash as a hash algorithm for htree a while back (for both the kernel and e2fsprogs), but I never got around to testing and applying them, mainly because while it's technically faster, I had other higher priority issues to work on --- and see previous comments regarding pixel peeping. Improving the hash algorithm by tens or even hundreds of nanoseconds isn't really going to matter since we only do a htree lookup on a file creation or cold cache lookup, and the SSD or HDD I/O times will dominate. And from the power perspective, saving microwatts of CPU power isn't going to matter if you're going to be spinning up the storage device - Ted
Re: [PATCH] ethtool: add one ethtool option to set relax ordering mode
On Wed, Dec 21, 2016 at 5:39 PM, maowenan wrote: > > >> -Original Message- >> From: Stephen Hemminger [mailto:step...@networkplumber.org] >> Sent: Thursday, December 22, 2016 9:28 AM >> To: maowenan >> Cc: netdev@vger.kernel.org; jeffrey.t.kirs...@intel.com >> Subject: Re: [PATCH] ethtool: add one ethtool option to set relax ordering >> mode >> >> On Thu, 8 Dec 2016 14:51:38 +0800 >> Mao Wenan wrote: >> >> > This patch provides one way to set/unset IXGBE NIC TX and RX relax >> > ordering mode, which can be set by ethtool. >> > Relax ordering is one mode of 82599 NIC, to enable this mode can >> > enhance the performance for some cpu architecure. >> >> Then it should be done by CPU architecture specific quirks (preferably in PCI >> layer) so that all users get the option without having to do manual >> intervention. >> >> > example: >> > ethtool -s enp1s0f0 relaxorder off >> > ethtool -s enp1s0f0 relaxorder on >> >> Doing it via ethtool is a developer API (for testing) not something that >> makes >> sense in production. > > > This feature is not mandatory for all users, acturally relax ordering default > configuration of 82599 is 'disable', > So this patch gives one way to enable relax ordering to be selected in some > performance condition. That isn't quite true. The default for Sparc systems is to have it enabled. Really this is something that is platform specific. I agree with Stephen that it would work better if this was handled as a series of platform specific quirks handled at something like the PCI layer rather than be a switch the user can toggle on and off. With that being said there are changes being made that should help to improve the situation. Specifically I am looking at adding support for the DMA_ATTR_WEAK_ORDERING which may also allow us to identify cases where you might be able to specify the DMA behavior via the DMA mapping instead of having to make the final decision in the device itself. - Alex
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, Dec 22, 2016 at 4:51 PM, Hannes Frederic Sowa wrote: > This algorithm should be a non-seeded algorithm, because the hashes > should be stable and verifiable by user space tooling. Thus this would > need a hashing algorithm that is hardened against pre-image > attacks/collision resistance, which siphash is not. I would prefer some > higher order SHA algorithm for that actually. Right. SHA-256, SHA-512/256, Blake2s, or Blake2b would probably be good candidates for this.
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, 2016-12-22 at 16:41 +0100, Jason A. Donenfeld wrote: > Hi Hannes, > > On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa > wrote: > > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI. > > You don't want to give people new IPv6 addresses with the same stable > > secret (across reboots) after a kernel upgrade. Maybe they lose > > connectivity then and it is extra work? > > Ahh, too bad. So it goes. If no other users survive we can put it into the ipv6 module. > > The bpf hash stuff can be changed during this merge window, as it is > > not yet in a released kernel. Albeit I would probably have preferred > > something like sha256 here, which can be easily replicated by user > > space tools (minus the problem of patching out references to not > > hashable data, which must be zeroed). > > Oh, interesting, so time is of the essence then. Do you want to handle > changing the new eBPF code to something not-SHA1 before it's too late, > as part of a new patchset that can fast track itself to David? And > then I can preserve my large series for the next merge window. This algorithm should be a non-seeded algorithm, because the hashes should be stable and verifiable by user space tooling. Thus this would need a hashing algorithm that is hardened against pre-image attacks/collision resistance, which siphash is not. I would prefer some higher order SHA algorithm for that actually. Bye, Hannes
Re: [PATCH v2] stmmac: CSR clock configuration fix
G'day Joao, On 22/12/2016 20:38, Joao Pinto wrote: When testing stmmac with my QoS reference design I checked a problem in the CSR clock configuration that was impossibilitating the phy discovery, since every read operation returned 0x. This patch fixes the issue. Signed-off-by: Joao Pinto --- changes v1->v2 (David Miller) - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch to make sense drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +- drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c | 2 +- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c index b21d03f..94223c8 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins, mac->mii.reg_shift = 6; mac->mii.reg_mask = 0x07C0; mac->mii.clk_csr_shift = 2; - mac->mii.clk_csr_mask = 0xF; + mac->mii.clk_csr_mask = GENMASK(4, 2); Should this not be GENMASK(5,2) /* Get and dump the chip ID */ *synopsys_id = stmmac_get_synopsys_id(hwid); diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c index a1d582f..8a40e69 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c @@ -197,7 +197,7 @@ struct mac_device_info *dwmac100_setup(void __iomem *ioaddr, int *synopsys_id) mac->mii.reg_shift = 6; mac->mii.reg_mask = 0x07C0; mac->mii.clk_csr_shift = 2; - mac->mii.clk_csr_mask = 0xF; + mac->mii.clk_csr_mask = GENMASK(4, 2); same as above? /* Synopsys Id is not available on old chips */ *synopsys_id = 0; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c index 23322fd..fda01f7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c @@ -81,8 +81,8 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg) value |= (phyaddr << priv->hw->mii.addr_shift) & priv->hw->mii.addr_mask; value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask; - value |= (priv->clk_csr & priv->hw->mii.clk_csr_mask) - << priv->hw->mii.clk_csr_shift; + value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift) + & priv->hw->mii.clk_csr_mask; if (priv->plat->has_gmac4) value |= MII_GMAC4_READ; @@ -122,8 +122,8 @@ static int stmmac_mdio_write(struct mii_bus *bus, int phyaddr, int phyreg, & priv->hw->mii.addr_mask; value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask; - value |= ((priv->clk_csr & priv->hw->mii.clk_csr_mask) - << priv->hw->mii.clk_csr_shift); + value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift) + & priv->hw->mii.clk_csr_mask; if (priv->plat->has_gmac4) value |= MII_GMAC4_WRITE; -- Regards Phil Reid
Re: [PATCH v2] stmmac: CSR clock configuration fix
Hello Phil, Às 3:42 PM de 12/22/2016, Phil Reid escreveu: > G'day Joao, > > On 22/12/2016 20:38, Joao Pinto wrote: >> When testing stmmac with my QoS reference design I checked a problem in the >> CSR clock configuration that was impossibilitating the phy discovery, since >> every read operation returned 0x. This patch fixes the issue. >> >> Signed-off-by: Joao Pinto >> --- >> changes v1->v2 (David Miller) >> - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch >> to make sense >> >> drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +- >> drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c | 2 +- >> drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 >> 3 files changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c >> b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c >> index b21d03f..94223c8 100644 >> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c >> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c >> @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem >> *ioaddr, int mcbins, >> mac->mii.reg_shift = 6; >> mac->mii.reg_mask = 0x07C0; >> mac->mii.clk_csr_shift = 2; >> -mac->mii.clk_csr_mask = 0xF; >> +mac->mii.clk_csr_mask = GENMASK(4, 2); > > Should this not be GENMASK(5,2) According to Universal MAC databook (valid for MAC100 and MAC1000), we have: Bits: 4:2 000 60-100 MHz clk_csr_i/42 001 100-150 MHz clk_csr_i/62 010 20-35 MHz clk_csr_i/16 011 35-60 MHz clk_csr_i/26 100 150-250 MHz clk_csr_i/102 101 250-300 MHz clk_csr_i/124 110, 111 Reserved So only bits 2, 3 and 4 should be masked. Thanks. > >> >> /* Get and dump the chip ID */ >> *synopsys_id = stmmac_get_synopsys_id(hwid); >> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c >> b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c >> index a1d582f..8a40e69 100644 >> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c >> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c >> @@ -197,7 +197,7 @@ struct mac_device_info *dwmac100_setup(void __iomem >> *ioaddr, int *synopsys_id) >> mac->mii.reg_shift = 6; >> mac->mii.reg_mask = 0x07C0; >> mac->mii.clk_csr_shift = 2; >> -mac->mii.clk_csr_mask = 0xF; >> +mac->mii.clk_csr_mask = GENMASK(4, 2); > same as above? > >> >> /* Synopsys Id is not available on old chips */ >> *synopsys_id = 0; >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c >> b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c >> index 23322fd..fda01f7 100644 >> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c >> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c >> @@ -81,8 +81,8 @@ static int stmmac_mdio_read(struct mii_bus *bus, int >> phyaddr, int phyreg) >> value |= (phyaddr << priv->hw->mii.addr_shift) >> & priv->hw->mii.addr_mask; >> value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask; >> -value |= (priv->clk_csr & priv->hw->mii.clk_csr_mask) >> -<< priv->hw->mii.clk_csr_shift; >> +value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift) >> +& priv->hw->mii.clk_csr_mask; >> if (priv->plat->has_gmac4) >> value |= MII_GMAC4_READ; >> >> @@ -122,8 +122,8 @@ static int stmmac_mdio_write(struct mii_bus *bus, int >> phyaddr, int phyreg, >> & priv->hw->mii.addr_mask; >> value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask; >> >> -value |= ((priv->clk_csr & priv->hw->mii.clk_csr_mask) >> -<< priv->hw->mii.clk_csr_shift); >> +value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift) >> +& priv->hw->mii.clk_csr_mask; >> if (priv->plat->has_gmac4) >> value |= MII_GMAC4_WRITE; >> >> > >
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
Hi Hannes, On Thu, Dec 22, 2016 at 4:33 PM, Hannes Frederic Sowa wrote: > IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI. > You don't want to give people new IPv6 addresses with the same stable > secret (across reboots) after a kernel upgrade. Maybe they lose > connectivity then and it is extra work? Ahh, too bad. So it goes. > The bpf hash stuff can be changed during this merge window, as it is > not yet in a released kernel. Albeit I would probably have preferred > something like sha256 here, which can be easily replicated by user > space tools (minus the problem of patching out references to not > hashable data, which must be zeroed). Oh, interesting, so time is of the essence then. Do you want to handle changing the new eBPF code to something not-SHA1 before it's too late, as part of a new patchset that can fast track itself to David? And then I can preserve my large series for the next merge window. Jason
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, 2016-12-22 at 16:29 +0100, Jason A. Donenfeld wrote: > On Thu, Dec 22, 2016 at 4:12 PM, Jason A. Donenfeld wrote: > > As a first step, I'm considering adding a patch to move halfmd4.c > > inside the ext4 domain, or at the very least, simply remove it from > > linux/cryptohash.h. That'll then leave the handful of bizarre sha1 > > usages to consider. > > Specifically something like this: > > https://git.zx2c4.com/linux-dev/commit/?h=siphash&id=978213351f9633bd1e3d1fdc3f19d28e36eeac90 > > That only leaves two more uses of "cryptohash" to consider, but they > require a bit of help. First, sha_transform in net/ipv6/addrconf.c. > That might be a straight-forward conversion to SipHash, but perhaps > not; I need to look closely and think about it. The next is > sha_transform in kernel/bpf/core.c. I really have no idea what's going > on with the eBPF stuff, so that will take a bit longer to study. Maybe > sha1 is fine in the end there? I'm not sure yet. IPv6 you cannot touch anymore. The hashing algorithm is part of uAPI. You don't want to give people new IPv6 addresses with the same stable secret (across reboots) after a kernel upgrade. Maybe they lose connectivity then and it is extra work? The bpf hash stuff can be changed during this merge window, as it is not yet in a released kernel. Albeit I would probably have preferred something like sha256 here, which can be easily replicated by user space tools (minus the problem of patching out references to not hashable data, which must be zeroed). Bye, Hannes
[PATCH net] net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.") made ip_do_redirect call sock_net(sk) to determine the network namespace of the passed-in socket. This crashes if sk is NULL. Fix this by getting the network namespace from the skb instead. Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.") Signed-off-by: Lorenzo Colitti --- net/ipv4/route.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index fa5c037227..9eabf49013 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -798,6 +798,7 @@ static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buf struct rtable *rt; struct flowi4 fl4; const struct iphdr *iph = (const struct iphdr *) skb->data; + struct net *net = dev_net(skb->dev); int oif = skb->dev->ifindex; u8 tos = RT_TOS(iph->tos); u8 prot = iph->protocol; @@ -805,7 +806,7 @@ static void ip_do_redirect(struct dst_entry *dst, struct sock *sk, struct sk_buf rt = (struct rtable *) dst; - __build_flow_key(sock_net(sk), &fl4, sk, iph, oif, tos, prot, mark, 0); + __build_flow_key(net, &fl4, sk, iph, oif, tos, prot, mark, 0); __ip_do_redirect(rt, skb, &fl4, true); } -- 2.8.0.rc3.226.g39d4020
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, Dec 22, 2016 at 4:12 PM, Jason A. Donenfeld wrote: > As a first step, I'm considering adding a patch to move halfmd4.c > inside the ext4 domain, or at the very least, simply remove it from > linux/cryptohash.h. That'll then leave the handful of bizarre sha1 > usages to consider. Specifically something like this: https://git.zx2c4.com/linux-dev/commit/?h=siphash&id=978213351f9633bd1e3d1fdc3f19d28e36eeac90 That only leaves two more uses of "cryptohash" to consider, but they require a bit of help. First, sha_transform in net/ipv6/addrconf.c. That might be a straight-forward conversion to SipHash, but perhaps not; I need to look closely and think about it. The next is sha_transform in kernel/bpf/core.c. I really have no idea what's going on with the eBPF stuff, so that will take a bit longer to study. Maybe sha1 is fine in the end there? I'm not sure yet.
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, Dec 22, 2016 at 4:05 PM, Hannes Frederic Sowa wrote: > This change would need a new version of the ext4 super block, because > you should not change existing file systems. Right. As a first step, I'm considering adding a patch to move halfmd4.c inside the ext4 domain, or at the very least, simply remove it from linux/cryptohash.h. That'll then leave the handful of bizarre sha1 usages to consider.
pull-request: wireless-drivers 2016-12-22
Hi Dave, before the holidays a really small pull request for 4.10. I just want to have the regressions in rtlwifi and ath9k fixed quickly so I send this earlier than I normally would. Please let me know if there are any problems. Kalle The following changes since commit ad688cdbb076833ba17fc65591cd0fe01900a5cf: stmmac: fix memory barriers (2016-12-19 11:05:02 -0500) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2016-12-22 for you to fetch changes up to 22b68b93ae2506bd56ee3bf232a51bc8ab955b56: rtlwifi: Fix kernel oops introduced with commit e49656147359 (2016-12-21 16:34:16 +0200) wireless-drivers fixes for 4.10 All small fixes this time, especially important are the regression fixes for rtlwifi and ath9k. Arend Van Spriel (2): brcmfmac: fix memory leak in brcmf_cfg80211_attach() brcmfmac: fix uninitialized field in scheduled scan ssid configuration Ben Greear (1): ath10k: free host-mem with DMA_BIRECTIONAL flag Larry Finger (1): rtlwifi: Fix kernel oops introduced with commit e49656147359 Tobias Klausmann (1): ath9k: do not return early to fix rcu unlocking drivers/net/wireless/ath/ath10k/wmi.c |2 +- drivers/net/wireless/ath/ath9k/xmit.c |2 +- .../broadcom/brcm80211/brcmfmac/cfg80211.c |7 +-- .../net/wireless/broadcom/brcm80211/brcmfmac/pno.c |1 + drivers/net/wireless/realtek/rtlwifi/core.c|3 ++- 5 files changed, 10 insertions(+), 5 deletions(-)
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On 22.12.2016 14:10, Jason A. Donenfeld wrote: > On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa > wrote: >> following up on what appears to be a random subject: ;) >> >> IIRC, ext4 code by default still uses half_md4 for hashing of filenames >> in the htree. siphash seems to fit this use case pretty good. > > I saw this too. I'll try to address it in v8 of this series. This change would need a new version of the ext4 super block, because you should not change existing file systems.
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa wrote: > following up on what appears to be a random subject: ;) > > IIRC, ext4 code by default still uses half_md4 for hashing of filenames > in the htree. siphash seems to fit this use case pretty good. I saw this too. I'll try to address it in v8 of this series.
RE: [PATCH net 1/1] tipc: revert use of copy_from_iter_full()
> -Original Message- > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] > On Behalf Of Al Viro > Sent: Wednesday, 21 December, 2016 22:08 > To: Jon Maloy > Cc: da...@davemloft.net; netdev@vger.kernel.org; Parthasarathy Bhuvaragan > ; Ying Xue > ; ma...@donjonn.com; tipc- > discuss...@lists.sourceforge.net > Subject: Re: [PATCH net 1/1] tipc: revert use of copy_from_iter_full() > > On Thu, Dec 22, 2016 at 02:47:09AM +, Jon Maloy wrote: > > > > OK, I see what's going on there - unlike iterate_and_advance(), which > > > explicitly skips any work in case of empty iterator, iterate_all_kind() > > > does not. Could you check if the following fixes your problem? > > > > Confirmed. The crash disappeared and everything works fine. > > OK... This is the somewhat cleaned version of the same that will go to Linus, > assuming it survives the local beating. Hopefully, cleanups hadn't broken it > in your case... I tested it, and it still works with me. ///jon > > [iov_iter] fix iterate_all_kinds() on empty iterators > > Problem similar to ones dealt with in "fold checks into iterate_and_advance()" > and followups, except that in this case we really want to do nothing when > asked for zero-length operation - unlike zero-length iterate_and_advance(), > zero-length iterate_all_kinds() has no side effects, and callers are simpler > that way. > > That got exposed when copy_from_iter_full() had been used by tipc, which > builds an msghdr with zero payload and (now) feeds it to a primitive > based on iterate_all_kinds() instead of iterate_and_advance(). > > Reported-by: Jon Maloy > Signed-off-by: Al Viro > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > index 228892dabba6..25f572303801 100644 > --- a/lib/iov_iter.c > +++ b/lib/iov_iter.c > @@ -73,19 +73,21 @@ > } > > #define iterate_all_kinds(i, n, v, I, B, K) {\ > - size_t skip = i->iov_offset;\ > - if (unlikely(i->type & ITER_BVEC)) {\ > - struct bio_vec v; \ > - struct bvec_iter __bi; \ > - iterate_bvec(i, n, v, __bi, skip, (B)) \ > - } else if (unlikely(i->type & ITER_KVEC)) { \ > - const struct kvec *kvec;\ > - struct kvec v; \ > - iterate_kvec(i, n, v, kvec, skip, (K)) \ > - } else {\ > - const struct iovec *iov;\ > - struct iovec v; \ > - iterate_iovec(i, n, v, iov, skip, (I)) \ > + if (likely(n)) {\ > + size_t skip = i->iov_offset;\ > + if (unlikely(i->type & ITER_BVEC)) {\ > + struct bio_vec v; \ > + struct bvec_iter __bi; \ > + iterate_bvec(i, n, v, __bi, skip, (B)) \ > + } else if (unlikely(i->type & ITER_KVEC)) { \ > + const struct kvec *kvec;\ > + struct kvec v; \ > + iterate_kvec(i, n, v, kvec, skip, (K)) \ > + } else {\ > + const struct iovec *iov;\ > + struct iovec v; \ > + iterate_iovec(i, n, v, iov, skip, (I)) \ > + } \ > } \ > } > > @@ -576,7 +578,7 @@ bool copy_from_iter_full(void *addr, size_t bytes, struct > iov_iter *i) > WARN_ON(1); > return false; > } > - if (unlikely(i->count < bytes)) \ > + if (unlikely(i->count < bytes)) > return false; > > iterate_all_kinds(i, bytes, v, ({ > @@ -620,7 +622,7 @@ bool copy_from_iter_full_nocache(void *addr, size_t > bytes, struct iov_iter *i) > WARN_ON(1); > return false; > } > - if (unlikely(i->count < bytes)) \ > + if (unlikely(i->count < bytes)) > return false; > iterate_all_kinds(i, bytes, v, ({ > if (__copy_from_user_nocache((to += v.iov_len) - v.iov_len, > @@ -837,11 +839,8 @@ unsigned long iov_iter_alignment(const struct iov_iter > *i) > unsigned long res = 0; > size_t size = i->count; > > - if (!size) > - return 0; > - > if (unlikely(i->type & ITER_PIPE)) { > - if (i->iov_offset && allocated(&i->pipe->bufs[i->idx])) > + if (size && i->iov_offset && allocated(&i->pipe->bufs[i->idx])) >
Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
Hi Ted, On Thu, 2016-12-22 at 00:41 -0500, Theodore Ts'o wrote: > On Thu, Dec 22, 2016 at 03:49:39AM +0100, Jason A. Donenfeld wrote: > > > > Funny -- while you guys were sending this back & forth, I was writing > > my reply to Andy which essentially arrives at the same conclusion. > > Given that we're all arriving to the same thing, and that Ted shot in > > this direction long before we all did, I'm leaning toward abandoning > > SipHash for the de-MD5-ification of get_random_int/long, and working > > on polishing Ted's idea into something shiny for this patchset. > > here are my numbers comparing siphash (using the first three patches > of the v7 siphash patches) with my batched chacha20 implementation. > The results are taken by running get_random_* 1 times, and then > dividing the numbers by 1 to get the average number of cycles for > the call. I compiled 32-bit and 64-bit kernels, and ran the results > using kvm: > >siphashbatched chacha20 > get_random_int get_random_long get_random_int get_random_long > > 32-bit270 278 114146 > 64-bit 75 75 106186 > > > I did have two objections to this. The first was that my SipHash > > construction is faster. > > Well, it's faster on everything except 32-bit x86. :-P > > > The second, and the more > > important one, was that batching entropy up like this means that 32 > > calls will be really fast, and then the 33rd will be slow, since it > > has to do a whole ChaCha round, because get_random_bytes must be > > called to refill the batch. > > ... and this will take 2121 cycles on 64-bit x86, and 2315 cycles on a > 32-bit x86. Which on a 2.3 GHz processor, is just under a > microsecond. As far as being inconsistent on process startup, I very > much doubt a microsecond is really going to be visible to the user. :-) > > The bottom line is that I think we're really "pixel peeping" at this > point --- which is what obsessed digital photographers will do when > debating the quality of a Canon vs Nikon DSLR by blowing up a photo by > a thousand times, and then trying to claim that this is visible to the > human eye. Or people who obsessing over the frequency response curves > of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's > likely that in a blind head-to-head comparison, most people wouldn't > be able to tell the difference > > I think the main argument for using the batched getrandom approach is > that it, I would argue, simpler than introducing siphash into the > picture. On 64-bit platforms it is faster and more consistent, so > it's basically that versus complexity of having to adding siphash to > the things that people have to analyze when considering random number > security on Linux. But it's a close call either way, I think. following up on what appears to be a random subject: ;) IIRC, ext4 code by default still uses half_md4 for hashing of filenames in the htree. siphash seems to fit this use case pretty good. xfs could also need an update, as they don't seed the directory hash tables at all (but not sure if they are vulnerable). I should improve [1] a bit. [1] http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=blo b;f=src/dirhash_collide.c;h=55cec872d5061ac2ca0f56d1f11e6bf349d5bb97;hb =HEAD Bye, Hannes
[PATCH v2] stmmac: CSR clock configuration fix
When testing stmmac with my QoS reference design I checked a problem in the CSR clock configuration that was impossibilitating the phy discovery, since every read operation returned 0x. This patch fixes the issue. Signed-off-by: Joao Pinto --- changes v1->v2 (David Miller) - DWMAC100 and DWMAC1000 csr clocks masks should also be fixed for the patch to make sense drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c | 2 +- drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c | 2 +- drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c| 8 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c index b21d03f..94223c8 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac1000_core.c @@ -539,7 +539,7 @@ struct mac_device_info *dwmac1000_setup(void __iomem *ioaddr, int mcbins, mac->mii.reg_shift = 6; mac->mii.reg_mask = 0x07C0; mac->mii.clk_csr_shift = 2; - mac->mii.clk_csr_mask = 0xF; + mac->mii.clk_csr_mask = GENMASK(4, 2); /* Get and dump the chip ID */ *synopsys_id = stmmac_get_synopsys_id(hwid); diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c index a1d582f..8a40e69 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac100_core.c @@ -197,7 +197,7 @@ struct mac_device_info *dwmac100_setup(void __iomem *ioaddr, int *synopsys_id) mac->mii.reg_shift = 6; mac->mii.reg_mask = 0x07C0; mac->mii.clk_csr_shift = 2; - mac->mii.clk_csr_mask = 0xF; + mac->mii.clk_csr_mask = GENMASK(4, 2); /* Synopsys Id is not available on old chips */ *synopsys_id = 0; diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c index 23322fd..fda01f7 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c @@ -81,8 +81,8 @@ static int stmmac_mdio_read(struct mii_bus *bus, int phyaddr, int phyreg) value |= (phyaddr << priv->hw->mii.addr_shift) & priv->hw->mii.addr_mask; value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask; - value |= (priv->clk_csr & priv->hw->mii.clk_csr_mask) - << priv->hw->mii.clk_csr_shift; + value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift) + & priv->hw->mii.clk_csr_mask; if (priv->plat->has_gmac4) value |= MII_GMAC4_READ; @@ -122,8 +122,8 @@ static int stmmac_mdio_write(struct mii_bus *bus, int phyaddr, int phyreg, & priv->hw->mii.addr_mask; value |= (phyreg << priv->hw->mii.reg_shift) & priv->hw->mii.reg_mask; - value |= ((priv->clk_csr & priv->hw->mii.clk_csr_mask) - << priv->hw->mii.clk_csr_shift); + value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift) + & priv->hw->mii.clk_csr_mask; if (priv->plat->has_gmac4) value |= MII_GMAC4_WRITE; -- 2.9.3
[PATCH net] net/mlx4_en: Fix user prio field in XDP forward
The user prio field is wrong (and overflows) in the XDP forward flow. This is a result of a bad value for num_tx_rings_p_up, which should account all XDP TX rings, as they operate for the same user prio. Signed-off-by: Tariq Toukan Reported-by: Martin KaFai Lau --- drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index bcd955339058..edbe200ac2fa 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -1638,7 +1638,8 @@ int mlx4_en_start_port(struct net_device *dev) /* Configure tx cq's and rings */ for (t = 0 ; t < MLX4_EN_NUM_TX_TYPES; t++) { - u8 num_tx_rings_p_up = t == TX ? priv->num_tx_rings_p_up : 1; + u8 num_tx_rings_p_up = t == TX ? + priv->num_tx_rings_p_up : priv->tx_ring_num[t]; for (i = 0; i < priv->tx_ring_num[t]; i++) { /* Configure cq */ -- 1.8.3.1
Re: [PATCH] stmmac: CSR clock configuration fix
Hi David, Às 10:15 AM de 12/22/2016, Joao Pinto escreveu: > Às 6:21 PM de 12/21/2016, David Miller escreveu: >> From: Joao Pinto >> Date: Tue, 20 Dec 2016 11:21:47 + >> >>> When testing stmmac with my QoS reference design I checked a problem in the >>> CSR clock configuration that was impossibilitating the phy discovery, since >>> every read operation returned 0x. This patch fixes the issue. >>> >>> Signed-off-by: Joao Pinto >> >> This isn't enough. >> >> It looks like various parts of this driver set the mask field >> differently. >> >> dwmac1000_core.c and dwmac100_core.c set the mask to be the low bits. >> >> But dwmac4_core.c uses GENMASK(11, 8) which means the mask is a value >> which is shifted up already. >> >> So your patch will break chips driven by dwmac4_core.c. > > I am using a GMAC4 reference design to test the patches. The clock > configuration > as is, does not work, resulting in the phy discovery failure. By applying this > patch I am able to set the clock value properly. > > I am going to check in the Databook of GMAC4 and older versions in order to > justify better. So from the 4.20 Databook: Bits 11:8 CR parameter : CSR clock = 60-100 MHz; MDC clock = CSR 0001: CSR clock = 100-150 MHz; MDC clock = CSR 0010: CSR clock = 20-35 MHz; MDC clock = CSR 0011: CSR clock = 35-60 MHz; MDC clock = CSR 0100: CSR clock = 150-250 MHz; MDC clock = CSR 0101: CSR clock = 250-300 MHz; MDC clock = CSR 0110, 0111: Reserved For example, if you want configure the CSR clock to 250-300MHZ (my case), you will set the parameter CR to 0x5. The current mechanism simply messes the value. With this fix, the 0x5 is shifted to 11:8 like the databook specifies and applies a GENMASK(11:8) on top, causing the reset of every bit outside the 11:8 which is an assurance. For older cores like 4.10 and 4.00 the logic is the same. The information was confirmed by R&D. Thanks. > >> >> In order for your change to be correct you must consolidate all of >> these various pieces to use the same convention. >> > > Thanks. >
Re: [PATCH] stmmac: CSR clock configuration fix
Às 12:23 PM de 12/22/2016, Joao Pinto escreveu: > > Hi David, > > Às 10:15 AM de 12/22/2016, Joao Pinto escreveu: >> Às 6:21 PM de 12/21/2016, David Miller escreveu: >>> From: Joao Pinto >>> Date: Tue, 20 Dec 2016 11:21:47 + >>> When testing stmmac with my QoS reference design I checked a problem in the CSR clock configuration that was impossibilitating the phy discovery, since every read operation returned 0x. This patch fixes the issue. Signed-off-by: Joao Pinto >>> >>> This isn't enough. >>> >>> It looks like various parts of this driver set the mask field >>> differently. >>> >>> dwmac1000_core.c and dwmac100_core.c set the mask to be the low bits. >>> >>> But dwmac4_core.c uses GENMASK(11, 8) which means the mask is a value >>> which is shifted up already. >>> >>> So your patch will break chips driven by dwmac4_core.c. >> >> I am using a GMAC4 reference design to test the patches. The clock >> configuration >> as is, does not work, resulting in the phy discovery failure. By applying >> this >> patch I am able to set the clock value properly. >> >> I am going to check in the Databook of GMAC4 and older versions in order to >> justify better. > > So from the 4.20 Databook: > > Bits > 11:8 CR parameter > : CSR clock = 60-100 MHz; MDC clock = CSR > 0001: CSR clock = 100-150 MHz; MDC clock = CSR > 0010: CSR clock = 20-35 MHz; MDC clock = CSR > 0011: CSR clock = 35-60 MHz; MDC clock = CSR > 0100: CSR clock = 150-250 MHz; MDC clock = CSR > 0101: CSR clock = 250-300 MHz; MDC clock = CSR > 0110, 0111: Reserved > > For example, if you want configure the CSR clock to 250-300MHZ (my case), you > will set the parameter CR to 0x5. The current mechanism simply messes the > value. > With this fix, the 0x5 is shifted to 11:8 like the databook specifies and > applies a GENMASK(11:8) on top, causing the reset of every bit outside the > 11:8 > which is an assurance. > > For older cores like 4.10 and 4.00 the logic is the same. The information was > confirmed by R&D. > > Thanks. Bu checking DWMAC100 and DWMAC1000 I understand your concern now. I am going to change their masks also in order to put it right. V2 comming soon. Thanks. > >> >>> >>> In order for your change to be correct you must consolidate all of >>> these various pieces to use the same convention. >>> >> >> Thanks. >> >
[PATCH net 0/2] net/sched fixes for cls_flower and act_tunnel_key
Hi Dave, This small series contain a fix to the matching flags support in flower and to the tunnel key action MD prep for IPv6. On a non related note, wishing everyone around here a happy new year, celebrate and take a rest so we can do lots of good patch work(s) next. Or. Or Gerlitz (2): net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6 net/sched: cls_flower: Mandate mask when matching on flags net/sched/act_tunnel_key.c | 4 ++-- net/sched/cls_flower.c | 23 --- 2 files changed, 14 insertions(+), 13 deletions(-) -- 2.3.7
[PATCH net 2/2] net/sched: cls_flower: Mandate mask when matching on flags
When matching on flags, we should require the user to provide the mask and avoid using an all-ones mask. Not doing so causes matching on flags provided w.o mask to hit on the value being unset for all flags, which may not what the user wanted to happen. Fixes: faa3ffce7829 ('net/sched: cls_flower: Add support for matching on flags') Signed-off-by: Or Gerlitz Reported-by: Paul Blakey Acked-by: Jiri Pirko --- net/sched/cls_flower.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index 35ac28d..333f8e2 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -442,32 +442,32 @@ static void fl_set_key_flag(u32 flower_key, u32 flower_mask, } } -static void fl_set_key_flags(struct nlattr **tb, -u32 *flags_key, u32 *flags_mask) +static int fl_set_key_flags(struct nlattr **tb, + u32 *flags_key, u32 *flags_mask) { u32 key, mask; - if (!tb[TCA_FLOWER_KEY_FLAGS]) - return; + /* mask is mandatory for flags */ + if (!tb[TCA_FLOWER_KEY_FLAGS_MASK]) + return -EINVAL; key = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS])); - - if (!tb[TCA_FLOWER_KEY_FLAGS_MASK]) - mask = ~0; - else - mask = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS_MASK])); + mask = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS_MASK])); *flags_key = 0; *flags_mask = 0; fl_set_key_flag(key, mask, flags_key, flags_mask, TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOW_DIS_IS_FRAGMENT); + + return 0; } static int fl_set_key(struct net *net, struct nlattr **tb, struct fl_flow_key *key, struct fl_flow_key *mask) { __be16 ethertype; + int ret = 0; #ifdef CONFIG_NET_CLS_IND if (tb[TCA_FLOWER_INDEV]) { int err = tcf_change_indev(net, tb[TCA_FLOWER_INDEV]); @@ -614,9 +614,10 @@ static int fl_set_key(struct net *net, struct nlattr **tb, &mask->enc_tp.dst, TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK, sizeof(key->enc_tp.dst)); - fl_set_key_flags(tb, &key->control.flags, &mask->control.flags); + if (tb[TCA_FLOWER_KEY_FLAGS]) + ret = fl_set_key_flags(tb, &key->control.flags, &mask->control.flags); - return 0; + return ret; } static bool fl_mask_eq(struct fl_flow_mask *mask1, -- 2.3.7
[PATCH net 1/2] net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6
The UDP dst port was provided to the helper function which sets the IPv6 IP tunnel meta-data under a wrong param order, fix that. Fixes: 75bfbca01e48 ('net/sched: act_tunnel_key: Add UDP dst port option') Signed-off-by: Or Gerlitz Reviewed-by: Hadar Hen Zion --- net/sched/act_tunnel_key.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c index 7af7125..e3a58e0 100644 --- a/net/sched/act_tunnel_key.c +++ b/net/sched/act_tunnel_key.c @@ -134,8 +134,8 @@ static int tunnel_key_init(struct net *net, struct nlattr *nla, saddr = nla_get_in6_addr(tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC]); daddr = nla_get_in6_addr(tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]); - metadata = __ipv6_tun_set_dst(&saddr, &daddr, 0, 0, 0, - dst_port, TUNNEL_KEY, + metadata = __ipv6_tun_set_dst(&saddr, &daddr, 0, 0, dst_port, + 0, TUNNEL_KEY, key_id, 0); } -- 2.3.7
[PATCH net 1/1] tipc: don't send FIN message from connectionless socket
In commit 6f00089c7372 ("tipc: remove SS_DISCONNECTING state") the check for socket type is in the wrong place, causing a closing socket to always send out a FIN message even when the socket was never connected. This is normally harmless, since the destination node for such messages most often is zero, and the message will be dropped, but it is still a wrong and confusing behavior. We fix this in this commit. Reviewed-by: Parthasarathy Bhuvaragan Signed-off-by: Jon Maloy --- net/tipc/socket.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 333c5da..800caaa 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -441,15 +441,19 @@ static void __tipc_shutdown(struct socket *sock, int error) while ((skb = __skb_dequeue(&sk->sk_receive_queue)) != NULL) { if (TIPC_SKB_CB(skb)->bytes_read) { kfree_skb(skb); - } else { - if (!tipc_sk_type_connectionless(sk) && - sk->sk_state != TIPC_DISCONNECTING) { - tipc_set_sk_state(sk, TIPC_DISCONNECTING); - tipc_node_remove_conn(net, dnode, tsk->portid); - } - tipc_sk_respond(sk, skb, error); + continue; + } + if (!tipc_sk_type_connectionless(sk) && + sk->sk_state != TIPC_DISCONNECTING) { + tipc_set_sk_state(sk, TIPC_DISCONNECTING); + tipc_node_remove_conn(net, dnode, tsk->portid); } + tipc_sk_respond(sk, skb, error); } + + if (tipc_sk_type_connectionless(sk)) + return; + if (sk->sk_state != TIPC_DISCONNECTING) { skb = tipc_msg_create(TIPC_CRITICAL_IMPORTANCE, TIPC_CONN_MSG, SHORT_H_SIZE, 0, dnode, @@ -457,10 +461,8 @@ static void __tipc_shutdown(struct socket *sock, int error) tsk->portid, error); if (skb) tipc_node_xmit_skb(net, skb, dnode, tsk->portid); - if (!tipc_sk_type_connectionless(sk)) { - tipc_node_remove_conn(net, dnode, tsk->portid); - tipc_set_sk_state(sk, TIPC_DISCONNECTING); - } + tipc_node_remove_conn(net, dnode, tsk->portid); + tipc_set_sk_state(sk, TIPC_DISCONNECTING); } } -- 2.7.4
[PATCH] Fixed status entry in m_can documentation
Use valid value for 'enabled' in status field Signed-off-by: Vyacheslav V. Yurkov --- Documentation/devicetree/bindings/net/can/m_can.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/net/can/m_can.txt b/Documentation/devicetree/bindings/net/can/m_can.txt index 9e33177..5facaf5 100644 --- a/Documentation/devicetree/bindings/net/can/m_can.txt +++ b/Documentation/devicetree/bindings/net/can/m_can.txt @@ -63,5 +63,5 @@ Board dts: &m_can1 { pinctrl-names = "default"; pinctrl-0 = <&pinctrl_m_can1>; - status = "enabled"; + status = "okay"; }; -- 2.9.0
Re: [PATCH] stmmac: CSR clock configuration fix
Às 6:21 PM de 12/21/2016, David Miller escreveu: > From: Joao Pinto > Date: Tue, 20 Dec 2016 11:21:47 + > >> When testing stmmac with my QoS reference design I checked a problem in the >> CSR clock configuration that was impossibilitating the phy discovery, since >> every read operation returned 0x. This patch fixes the issue. >> >> Signed-off-by: Joao Pinto > > This isn't enough. > > It looks like various parts of this driver set the mask field > differently. > > dwmac1000_core.c and dwmac100_core.c set the mask to be the low bits. > > But dwmac4_core.c uses GENMASK(11, 8) which means the mask is a value > which is shifted up already. > > So your patch will break chips driven by dwmac4_core.c. I am using a GMAC4 reference design to test the patches. The clock configuration as is, does not work, resulting in the phy discovery failure. By applying this patch I am able to set the clock value properly. I am going to check in the Databook of GMAC4 and older versions in order to justify better. > > In order for your change to be correct you must consolidate all of > these various pieces to use the same convention. > Thanks.